;doc: update command help

This commit is contained in:
Simon Michael 2024-03-24 14:22:37 -10:00
parent be24d6505f
commit 70b75e4921

View File

@ -21,48 +21,49 @@ hledger import bank.csv or perhaps hledger import *.csv.
Note you can import from any file format, though CSV files are the most Note you can import from any file format, though CSV files are the most
common import source, and these docs focus on that case. common import source, and these docs focus on that case.
Deduplication "Deduplication"
import does time-based deduplication, to detect only the new import tries to import only the transactions which are new since the
transactions since the last successful import. (This does not mean last import. So if your bank's CSV includes the last three months of
"ignore transactions that look the same", but rather "ignore data, you can download and import it every month (or week, or day) and
transactions that have been seen before".) This is intended for when you only the new transactions will be imported each time.
are periodically importing downloaded data, which may overlap with
previous downloads. Eg if every week (or every day) you download a
bank's last three months of CSV data, you can safely run
hledger import thebank.csv each time and only new transactions will be
imported.
Since the items being read (CSV records, eg) often do not come with It works as follows. For each imported FILE (usually a CSV file): - It
unique identifiers, hledger detects new transactions by date, assuming tries to find the latest date seen previously, by reading it from a
that: hidden .latest.FILE in the same directory. - Then it processes FILE,
ignoring any transactions on or before the "latest seen" date.
And after a successful import, it updates the .latest.FILE(s) for next
time (unless --dry-run was used).
This is simple but fairly effective. It assumes:
1. new items always have the newest dates 1. new items always have the newest dates
2. item dates do not change across reads 2. item dates are stable across successive CSV downloads
3. and items with the same date remain in the same relative order 3. the order of same-date items is stable across CSV downloads
across reads.
These are often true of CSV files representing transactions, or true These are true of most CSV files representing transactions, or true
enough so that it works pretty well in practice. 1 is important, but enough. If you have a bank whose CSV dates or ordering occasionally
violations of 2 and 3 amongst the old transactions won't matter (and if changes, you can reduce the chance of this happening in new transactions
you import often, the new transactions will be few, so less likely to be by importing more often (and in old transactions it doesn't matter).
the ones affected).
hledger remembers the latest date processed in each input file by saving Note, import avoids reprocessing the same dates across successive runs,
a hidden ".latest.FILE" file in FILE's directory (after a succesful but it does not detect transactions that are duplicated within a single
import). run. So eg if you downloaded but did not import bank.1.csv, and later
downloaded bank.2.csv with overlapping data, you should not import both
of them in a single run (hledger import bank.1.csv bank.2.csv); instead,
import them one at a time (hledger import bank.1.csv, then
hledger import bank.2.csv).
Eg when reading finance/bank.csv, it will look for and update the Normally you can ignore the .latest.* files, but if needed, you can
finance/.latest.bank.csv state file. The format is simple: one or more delete them (to make all transactions unseen), or construct/modify them
lines containing the same ISO-format date (YYYY-MM-DD), meaning "I have (to catch up to a certain date). The format is just a single ISO-format
processed transactions up to this date, and this many of them on that date (YYYY-MM-DD), possibly repeated on multiple lines. It means "I have
date." Normally you won't see or manipulate these state files yourself. seen transactions up to this date, and this many of them occurring on
But if needed, you can delete them to reset the state (making all that date".
transactions "new"), or you can construct them to "catch up" to a
certain date.
Note deduplication (and updating of state files) can also be done by (hledger print --new also uses and updates these .latest.* files, but it
print --new, but this is less often used. is not often used.)
Related: CSV > Working with CSV > Deduplicating, importing. Related: CSV > Working with CSV > Deduplicating, importing.