;doc: update command help
This commit is contained in:
parent
be24d6505f
commit
70b75e4921
@ -21,48 +21,49 @@ hledger import bank.csv or perhaps hledger import *.csv.
|
|||||||
Note you can import from any file format, though CSV files are the most
|
Note you can import from any file format, though CSV files are the most
|
||||||
common import source, and these docs focus on that case.
|
common import source, and these docs focus on that case.
|
||||||
|
|
||||||
Deduplication
|
"Deduplication"
|
||||||
|
|
||||||
import does time-based deduplication, to detect only the new
|
import tries to import only the transactions which are new since the
|
||||||
transactions since the last successful import. (This does not mean
|
last import. So if your bank's CSV includes the last three months of
|
||||||
"ignore transactions that look the same", but rather "ignore
|
data, you can download and import it every month (or week, or day) and
|
||||||
transactions that have been seen before".) This is intended for when you
|
only the new transactions will be imported each time.
|
||||||
are periodically importing downloaded data, which may overlap with
|
|
||||||
previous downloads. Eg if every week (or every day) you download a
|
|
||||||
bank's last three months of CSV data, you can safely run
|
|
||||||
hledger import thebank.csv each time and only new transactions will be
|
|
||||||
imported.
|
|
||||||
|
|
||||||
Since the items being read (CSV records, eg) often do not come with
|
It works as follows. For each imported FILE (usually a CSV file): - It
|
||||||
unique identifiers, hledger detects new transactions by date, assuming
|
tries to find the latest date seen previously, by reading it from a
|
||||||
that:
|
hidden .latest.FILE in the same directory. - Then it processes FILE,
|
||||||
|
ignoring any transactions on or before the "latest seen" date.
|
||||||
|
|
||||||
|
And after a successful import, it updates the .latest.FILE(s) for next
|
||||||
|
time (unless --dry-run was used).
|
||||||
|
|
||||||
|
This is simple but fairly effective. It assumes:
|
||||||
|
|
||||||
1. new items always have the newest dates
|
1. new items always have the newest dates
|
||||||
2. item dates do not change across reads
|
2. item dates are stable across successive CSV downloads
|
||||||
3. and items with the same date remain in the same relative order
|
3. the order of same-date items is stable across CSV downloads
|
||||||
across reads.
|
|
||||||
|
|
||||||
These are often true of CSV files representing transactions, or true
|
These are true of most CSV files representing transactions, or true
|
||||||
enough so that it works pretty well in practice. 1 is important, but
|
enough. If you have a bank whose CSV dates or ordering occasionally
|
||||||
violations of 2 and 3 amongst the old transactions won't matter (and if
|
changes, you can reduce the chance of this happening in new transactions
|
||||||
you import often, the new transactions will be few, so less likely to be
|
by importing more often (and in old transactions it doesn't matter).
|
||||||
the ones affected).
|
|
||||||
|
|
||||||
hledger remembers the latest date processed in each input file by saving
|
Note, import avoids reprocessing the same dates across successive runs,
|
||||||
a hidden ".latest.FILE" file in FILE's directory (after a succesful
|
but it does not detect transactions that are duplicated within a single
|
||||||
import).
|
run. So eg if you downloaded but did not import bank.1.csv, and later
|
||||||
|
downloaded bank.2.csv with overlapping data, you should not import both
|
||||||
|
of them in a single run (hledger import bank.1.csv bank.2.csv); instead,
|
||||||
|
import them one at a time (hledger import bank.1.csv, then
|
||||||
|
hledger import bank.2.csv).
|
||||||
|
|
||||||
Eg when reading finance/bank.csv, it will look for and update the
|
Normally you can ignore the .latest.* files, but if needed, you can
|
||||||
finance/.latest.bank.csv state file. The format is simple: one or more
|
delete them (to make all transactions unseen), or construct/modify them
|
||||||
lines containing the same ISO-format date (YYYY-MM-DD), meaning "I have
|
(to catch up to a certain date). The format is just a single ISO-format
|
||||||
processed transactions up to this date, and this many of them on that
|
date (YYYY-MM-DD), possibly repeated on multiple lines. It means "I have
|
||||||
date." Normally you won't see or manipulate these state files yourself.
|
seen transactions up to this date, and this many of them occurring on
|
||||||
But if needed, you can delete them to reset the state (making all
|
that date".
|
||||||
transactions "new"), or you can construct them to "catch up" to a
|
|
||||||
certain date.
|
|
||||||
|
|
||||||
Note deduplication (and updating of state files) can also be done by
|
(hledger print --new also uses and updates these .latest.* files, but it
|
||||||
print --new, but this is less often used.
|
is not often used.)
|
||||||
|
|
||||||
Related: CSV > Working with CSV > Deduplicating, importing.
|
Related: CSV > Working with CSV > Deduplicating, importing.
|
||||||
|
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user