From 43bcedea1955e711a10389a6b8b213be35c19e62 Mon Sep 17 00:00:00 2001
From: xuhcc Beancount, Ledger and HLedger all differ in how they represent their numbers internally, and in how they handle the precision of balance checks for a transaction’s postings. First, about how number are represented: Ledger uses rational numbers in an attempt to maintain the full precision of numbers resulting from mathematical operations. This works, but I believe this is perhaps not the most appropriate choice. The great majority of the cases where operations occur involve the conversion from a number of units and a price or a cost to the total value of an account’s posted change (e.g., units x cost = total cost). Our task in representing transactional information is the replication of operations that take place mostly in institutions. These operations always involve the rounding of numbers for units and currencies (banks do apply stochastic rounding), and the correct numbers to be used from the perspective of these institutions, and from the perspective of the government, are indeed the rounded numbers themselves. It is a not a question of mathematical purity, but one of practicality, and our system should do the same that banks do. Therefore, I think that we should always post the rounded numbers to accounts. Using rational numbers is not a limitation in that sense, but we must be careful to store rounded numbers where it matters. I think the approach implemented by Ledger is to keep as much of the original precision as possible. Beancount chooses a decimal number representation to store the numbers parsed from the input with the same precision they are written as. This method suffers from the same problem as using rational numbers does in that the result of mathematical operations between the decimal numbers will currently be stored with their full precision (albeit in decimal). Admittedly, I have yet to apply explicit quantization where required, which would be the correct thing to do. A scheme has to be devised to infer suitable precisions for automatically quantizing the numbers after operations. The decimal representation provides natural opportunities for rounding after operations, and it is a suitable choice for this, implementations even typically provide a context for the precision to take place. Also note that it will never be required to store numbers to an infinite precision: the institutions never do it themselves. HLedger, oddly enough, selects “double” fractional binary representation for its prices. This is an unfortunate choice, a worse one than using a precise representation: fractional decimal numbers input by the user are never represented precisely by their corresponding binary form. So all the numbers are incorrect but “close enough” that it works overall, and the only way to display a clean final result is by rounding to a suitable number of digits at the time of rendering a report. One could argue that the large number of digits provided by a 64-bit double representation is unlikely to cause significant errors given the quantity of operations we make… but binary rounding error could potentially accumulate, and the numbers are basically all incorrectly stored internally, rounded to their closest binary relative. Given that our task is accounting, why not just represent them correctly? Hledger (since 2014) stores numbers internally as decimals allowing "unlimited" integral digits and up to 255 decimal digits. Secondly, when checking that the postings of a transaction balance to zero, with all three systems it is necessary to allow for some tolerance on those amounts. This need is clear when you consider that inputting numbers in a text file implies a limited decimal representation. For example, if you’re going to multiply a number of units and a cost, say both written down with 2 fractional digits, you might end up with a number that has 4 fractional digits, and then you need to compare that result with a cash amount that would typically be entered with only 2 fractional digits. You need to allow for some looseness somehow. The systems differ in how they choose that tolerance: Ledger attempts to automatically derive the precision to use for its balance checks by using recently parsed context (in file order). The precision to be used is that of the last value parsed for the particular commodity under consideration. This can be problematic: it can lead to unnecessary side-effects between transactions which can be difficult to debug. HLedger, on the other hand, uses global precision settings. The whole file is processed first, then the precisions are derived from the most precise numbers seen in the entire input file. Hledger, on the other hand, uses the global display precisions of each commodity. The whole file is processed first, detecting an inferred or configured display precision for each commodity. Then a transaction is considered balanced if its sum appears zero when displayed with those precisions. (In future Hledger might balance using only local precisions inferred from the current journal entry, for better locality and robustness.) At the moment, Beancount uses a constant value for the tolerance used in its balance checking algorithm (0.005 of any unit). This is weak and should, at the very least, be commodity-dependent, if not also dependent on the particular account in which the commodity is used. Ultimately, it depends on the numbers of digits used to represent the particular postings. We have a proposal en route to fix this.Numbers and Precision of Operations
@@ -367,7 +367,7 @@
Numbers and Precision of Operations
-
@@ -7622,7 +7622,7 @@ beancount.core.amount.Amount.to_string(self, dformat=<beancount.core.display_context.DisplayFormatter object at 0x749446ea74d0>)
+beancount.core.amount.Amount.to_string(self, dformat=<beancount.core.display_context.DisplayFormatter object at 0x77a204a6f200>)
Returns:
@@ -14009,7 +14009,7 @@
-
<function NamedTuple at 0x749447011120>
– A type object for the new directive type.<function NamedTuple at 0x77a204bd51c0>
– A type object for the new directive type.
-
@@ -15847,7 +15847,7 @@ beancount.core.inventory.Inventory.to_string(self, dformat=<beancount.core.display_context.DisplayFormatter object at 0x749446ea74d0>, parens=True)
+beancount.core.inventory.Inventory.to_string(self, dformat=<beancount.core.display_context.DisplayFormatter object at 0x77a204a6f200>, parens=True)
-
@@ -16254,7 +16254,7 @@ beancount.core.position.Position.to_string(self, dformat=<beancount.core.display_context.DisplayFormatter object at 0x749446ea74d0>, detail=True)
+beancount.core.position.Position.to_string(self, dformat=<beancount.core.display_context.DisplayFormatter object at 0x77a204a6f200>, detail=True)
-
diff --git a/api_reference/beancount.parser.html b/api_reference/beancount.parser.html
index fa631af..b07fb36 100644
--- a/api_reference/beancount.parser.html
+++ b/api_reference/beancount.parser.html
@@ -3785,7 +3785,7 @@ beancount.core.position.to_string(pos, dformat=<beancount.core.display_context.DisplayFormatter object at 0x749446ea74d0>, detail=True)
+beancount.core.position.to_string(pos, dformat=<beancount.core.display_context.DisplayFormatter object at 0x77a204a6f200>, detail=True)
-
@@ -3886,7 +3886,7 @@ beancount.parser.cmptest.assertEqualEntries(expected_entries, actual_entries, failfunc=<function fail at 0x74944608c540>, allow_incomplete=False)
+beancount.parser.cmptest.assertEqualEntries(expected_entries, actual_entries, failfunc=<function fail at 0x77a203e78220>, allow_incomplete=False)
-
@@ -3978,7 +3978,7 @@ beancount.parser.cmptest.assertExcludesEntries(subset_entries, entries, failfunc=<function fail at 0x74944608c540>, allow_incomplete=False)
+beancount.parser.cmptest.assertExcludesEntries(subset_entries, entries, failfunc=<function fail at 0x77a203e78220>, allow_incomplete=False)
-
@@ -9662,7 +9662,7 @@ beancount.parser.cmptest.assertIncludesEntries(subset_entries, entries, failfunc=<function fail at 0x74944608c540>, allow_incomplete=False)
+beancount.parser.cmptest.assertIncludesEntries(subset_entries, entries, failfunc=<function fail at 0x77a203e78220>, allow_incomplete=False)
-
diff --git a/api_reference/beancount.tools.html b/api_reference/beancount.tools.html
index f33a393..303d557 100644
--- a/api_reference/beancount.tools.html
+++ b/api_reference/beancount.tools.html
@@ -480,7 +480,7 @@ beancount.parser.options.Opt(name, default_value, example_value=<object object at 0x7494475894f0>, converter=None, deprecated=False, alias=None)
+beancount.parser.options.Opt(name, default_value, example_value=<object object at 0x77a2051893e0>, converter=None, deprecated=False, alias=None)
-
diff --git a/api_reference/beancount.utils.html b/api_reference/beancount.utils.html
index 2b44264..03c355f 100644
--- a/api_reference/beancount.utils.html
+++ b/api_reference/beancount.utils.html
@@ -4422,7 +4422,7 @@ beancount.tools.treeify.dump_tree(node, file=<_io.StringIO object at 0x749444f21780>, prefix='')
+beancount.tools.treeify.dump_tree(node, file=<_io.StringIO object at 0x77a202d1da80>, prefix='')
-
diff --git a/external_contributions.html b/external_contributions.html
index 903c349..3678678 100644
--- a/external_contributions.html
+++ b/external_contributions.html
@@ -404,6 +404,7 @@ beancount.utils.misc_utils.is_sorted(iterable, key=<function <lambda> at 0x749446c30cc0>, cmp=<function <lambda> at 0x749446c30d60>)
+beancount.utils.misc_utils.is_sorted(iterable, key=<function <lambda> at 0x77a20480cae0>, cmp=<function <lambda> at 0x77a20480cb80>)
Interfaces / Webautobean/refactor (Archimedes Smith): Tooling to programmatically edit one's ledger, including formatting, sorting, refactoring, rearranging accounts, optimizing via plugins, migration from v2, inserting transactions in a ledger on import, and more.
seltzered/beancolage (Vivek Gani): An Eclipse Theia (vendor-agnostic vscode) app that tries to bundle existing beancount-based packages such as vscode-beancount and Fava.
aaronstacy.com/personal-finances-dashboard : HTML + D3.js visualization dashboard for Beancount data.
+https://github.com/aleyoscar/beancount-pulsar : A Pulsar package for Beancount - Plain Text Accounting, with syntax highlighting, toggling comments, snippets for some directives and automatic indentation. Pulsar package: https://web.pulsar-edit.dev/packages/beancount-pulsar
Beancount Mobile App (Kirill Goncharov): A mobile data entry app for Beancount. (Currently only Android is supported.) Repo: https://github.com/xuhcc/beancount-mobile (Announcement).
http://costflow.io: Plain Text Accounting in WeChat. "Send a message to our bot in Telegram, Facebook Messenger, Whatsapp, LINE, WeChat, etc. Costflow will transform your message into Beancount / Ledger / hledger format transaction magically. Append the transaction to the file in your Dropbox / Google Drive. With the help of their apps, the file will be synced to your computer."
diff --git a/importing_external_data.html b/importing_external_data.html index 3ff99ad..ae603a7 100644 --- a/importing_external_data.html +++ b/importing_external_data.html @@ -93,9 +93,9 @@I manually log into the various websites with my usernames & passwords and click the right buttons to generate the downloaded files I need. These files are recognized automatically by the importers and extracting transactions and filing the documents in a well-organized directory hierarchy is automated using the tools described in this document.
While I’m not scripting the fetching, I think it’s possible to do so on some sites. That work is left for you to implement where you think it’s worth the time.
-However, today, thanks to the open banking project, we have universal APIs that allow for the quick and reliable download of transactions from multiple bank accounts.
+Here’s a description of the typical kinds of files involved; this describes my use case and what I’ve managed to do. This should give you a qualitative sense of what’s involved.
I've made some headway toward converting data from PDF files, which is a common need, but it's incomplete; it turns out that fully automating table extraction from PDF isn't easy in the general case. I have some code that is close to working and will release it when the time is right. Otherwise, the best FOSS solution I’ve found for this is a tool called TabulaPDF but you still need to manually identify where the tables of data are located on the page; you may be able to automate some fetching using its sister project tabula-java.
Nevertheless, I usually have good success with my importers grepping around PDF statements converted to ugly text in order to identify what institution they are for and extracting the date of issuance of the document.
Finally, there are a number of different tools used to extract text from PDF documents, such as PDFMiner, LibreOffice, the xpdf library, the poppler library3 and more... but none of them works consistently on all input documents; you will likely end up installing many and relying on different ones for different input files. For this reason, I’m not requiring a dependency on PDF conversion tools from within Beancount. You should test what works on your specific documents and invoke those tools from your importer implementations.
+The open-source accounting software Firefly III already integrates with some free open banking APIs. For more information, you can visit Firefly III Documentation. The Beancount ecosystem is still lagging in this domain.
+An example of an open banking aggregator that could be interesting is GoCardless. GoCardless supports many PSD2-compliant banks in the EU and the UK, and it is free to use.
There are three Beancount tools provided to orchestrate the three stages of importing: