;import, print: better deduplication docs
This commit is contained in:
parent
f7bbb39a77
commit
554f7a59fd
@ -6,19 +6,64 @@ transactions as imported, without actually importing any.
|
|||||||
|
|
||||||
_FLAGS
|
_FLAGS
|
||||||
|
|
||||||
The input files are specified as arguments - no need to write -f before each one.
|
Unlike other hledger commands, with `import` the journal file is an output file,
|
||||||
So eg to add new transactions from all CSV files to the main journal, it's just:
|
and will be modified, though only by appending (existing data will not be changed).
|
||||||
`hledger import *.csv`
|
The input files are specified as arguments, so to import one or more
|
||||||
|
CSV files to your main journal, you will run `hledger import bank.csv`
|
||||||
|
or perhaps `hledger import *.csv`.
|
||||||
|
|
||||||
New transactions are detected in the same way as print --new:
|
Note you can import from any file format, though CSV files are the
|
||||||
by assuming transactions are always added to the input files in increasing date order,
|
most common import source, and these docs focus on that case.
|
||||||
and by saving `.latest.FILE` state files.
|
|
||||||
|
|
||||||
The --dry-run output is in journal format, so you can filter it, eg
|
### Deduplication
|
||||||
to see only uncategorised transactions:
|
|
||||||
|
As a convenience `import` does *deduplication* while reading transactions.
|
||||||
|
This does not mean "ignore transactions that look the same",
|
||||||
|
but rather "ignore transactions that have been seen before".
|
||||||
|
This is intended for when you are periodically importing foreign data
|
||||||
|
which may contain already-imported transactions.
|
||||||
|
So eg, if every day you download bank CSV files containing redundant data,
|
||||||
|
you can safely run `hledger import bank.csv` and only new transactions will be imported.
|
||||||
|
(`import` is idempotent.)
|
||||||
|
|
||||||
|
Since the items being read (CSV records, eg) often do not come with
|
||||||
|
unique identifiers, hledger detects new transactions by date, assuming
|
||||||
|
that:
|
||||||
|
|
||||||
|
1. new items always have the newest dates
|
||||||
|
2. item dates do not change across reads
|
||||||
|
3. and items with the same date remain in the same relative order across reads.
|
||||||
|
|
||||||
|
These are often true of CSV files representing transactions, or true
|
||||||
|
enough so that it works pretty well in practice. 1 is important, but
|
||||||
|
violations of 2 and 3 amongst the old transactions won't matter (and
|
||||||
|
if you import often, the new transactions will be few, so less likely
|
||||||
|
to be the ones affected).
|
||||||
|
|
||||||
|
hledger remembers the latest date processed in each input file by
|
||||||
|
saving a hidden ".latest" state file in the same directory. Eg when
|
||||||
|
reading `finance/bank.csv`, it will look for and update the
|
||||||
|
`finance/.latest.bank.csv` state file.
|
||||||
|
The format is simple: one or more lines containing the
|
||||||
|
same ISO-format date (YYYY-MM-DD), meaning "I have processed
|
||||||
|
transactions up to this date, and this many of them on that date."
|
||||||
|
Normally you won't see or manipulate these state files yourself.
|
||||||
|
But if needed, you can delete them to reset the state (making all
|
||||||
|
transactions "new"), or you can construct them to "catch up" to a
|
||||||
|
certain date.
|
||||||
|
|
||||||
|
Note deduplication (and updating of state files) can also be done by
|
||||||
|
[`print --new`](#print), but this is less often used.
|
||||||
|
|
||||||
|
### Import testing
|
||||||
|
|
||||||
|
With `--dry-run`, the transactions that will be imported are printed
|
||||||
|
to the terminal, without affecting your journal.
|
||||||
|
The output is in journal format, so you can re-parse it.
|
||||||
|
Eg, to see any importable transactions which CSV rules have not categorised:
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
$ hledger import --dry ... | hledger -f- print unknown --ignore-assertions
|
$ hledger import --dry bank.csv | hledger -f- -I print unknown
|
||||||
```
|
```
|
||||||
|
|
||||||
### Importing balance assignments
|
### Importing balance assignments
|
||||||
|
|||||||
@ -79,21 +79,9 @@ With `-m`/`--match` and a STR argument, print will show at most one transaction:
|
|||||||
one whose description is most similar to STR, and is most recent. STR should contain at
|
one whose description is most similar to STR, and is most recent. STR should contain at
|
||||||
least two characters. If there is no similar-enough match, no transaction will be shown.
|
least two characters. If there is no similar-enough match, no transaction will be shown.
|
||||||
|
|
||||||
With `--new`, for each FILE being read, hledger reads (and writes) a special
|
With `--new`, hledger prints only transactions it has not seen on a previous run.
|
||||||
state file (`.latest.FILE` in the same directory), containing the latest transaction date(s)
|
This uses the same deduplication system as the [`import`](#import) command.
|
||||||
that were seen last time FILE was read. When this file is found, only transactions
|
(See import's docs for details.)
|
||||||
with newer dates (and new transactions on the latest date) are printed.
|
|
||||||
This is useful for ignoring already-seen entries in import data, such as downloaded CSV files.
|
|
||||||
Eg:
|
|
||||||
|
|
||||||
```shell
|
|
||||||
$ hledger -f bank1.csv print --new
|
|
||||||
(shows transactions added since last print --new on this file)
|
|
||||||
```
|
|
||||||
|
|
||||||
This assumes that transactions added to FILE always have same or increasing dates,
|
|
||||||
and that transactions on the same day do not get reordered.
|
|
||||||
See also the [import](#import) command.
|
|
||||||
|
|
||||||
This command also supports the
|
This command also supports the
|
||||||
[output destination](hledger.html#output-destination) and
|
[output destination](hledger.html#output-destination) and
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user