;doc: update command help

This commit is contained in:
Simon Michael 2024-03-24 14:51:25 -10:00
parent eb6b94ad5a
commit 2889bb6efb

View File

@ -21,22 +21,27 @@ hledger import bank.csv or perhaps hledger import *.csv.
Note you can import from any file format, though CSV files are the most Note you can import from any file format, though CSV files are the most
common import source, and these docs focus on that case. common import source, and these docs focus on that case.
"Deduplication" Skipping
import tries to import only the transactions which are new since the import tries to import only the transactions which are new since the
last import. So if your bank's CSV includes the last three months of last import, "skipping over" any that it saw last time. So if your
data, you can download and import it every month (or week, or day) and bank's CSV includes the last three months of data, you can download and
only the new transactions will be imported each time. import it every month (or week, or day) and only the new transactions
will be imported each time.
It works as follows. For each imported FILE (usually a CSV file): - It It works as follows. For each imported FILE:
tries to find the latest date seen previously, by reading it from a
hidden .latest.FILE in the same directory. - Then it processes FILE, - It tries to find the latest date seen previously, by reading it from
ignoring any transactions on or before the "latest seen" date. a hidden .latest.FILE in the same directory.
- Then it processes FILE, ignoring any transactions on or before the
"latest seen" date.
And after a successful import, it updates the .latest.FILE(s) for next And after a successful import, it updates the .latest.FILE(s) for next
time (unless --dry-run was used). time (unless --dry-run was used).
This is simple but fairly effective. It assumes: This is simple system that works fairly well for transaction data
(usually CSV, but it could be any of hledger's input formats). It
assumes:
1. new items always have the newest dates 1. new items always have the newest dates
2. item dates are stable across successive CSV downloads 2. item dates are stable across successive CSV downloads
@ -49,11 +54,15 @@ by importing more often (and in old transactions it doesn't matter).
Note, import avoids reprocessing the same dates across successive runs, Note, import avoids reprocessing the same dates across successive runs,
but it does not detect transactions that are duplicated within a single but it does not detect transactions that are duplicated within a single
run. So eg if you downloaded but did not import bank.1.csv, and later run. I'll call these "skipping" and "deduplication".
downloaded bank.2.csv with overlapping data, you should not import both
of them in a single run (hledger import bank.1.csv bank.2.csv); instead, So for example, say you downloaded but did not import bank.1.csv, and
import them one at a time (hledger import bank.1.csv, then later downloaded bank.2.csv with overlapping data. Then you should not
hledger import bank.2.csv). import both of them at once (hledger import bank.1.csv bank.2.csv), as
the overlapping data would appear twice and not be deduplicated.
Instead, import them one at a time
(hledger import bank.1.csv; hledger import bank.2.csv), and the second
import will skip the overlapping data.
Normally you can ignore the .latest.* files, but if needed, you can Normally you can ignore the .latest.* files, but if needed, you can
delete them (to make all transactions unseen), or construct/modify them delete them (to make all transactions unseen), or construct/modify them
@ -63,7 +72,7 @@ seen transactions up to this date, and this many of them occurring on
that date". that date".
(hledger print --new also uses and updates these .latest.* files, but it (hledger print --new also uses and updates these .latest.* files, but it
is not often used.) is less often used.)
Related: CSV > Working with CSV > Deduplicating, importing. Related: CSV > Working with CSV > Deduplicating, importing.