;import, print: better deduplication docs

2021-02-18 18:35:06 -08:00 · 2021-02-18 18:35:06 -08:00 · 554f7a59fd
commit 554f7a59fd
parent f7bbb39a77
2 changed files with 58 additions and 25 deletions
--- a/hledger/Hledger/Cli/Commands/Import.md
+++ b/hledger/Hledger/Cli/Commands/Import.md
@ -6,19 +6,64 @@ transactions as imported, without actually importing any.
 _FLAGS
-The input files are specified as arguments - no need to write -f before each one.
+Unlike other hledger commands, with `import` the journal file is an output file,
-So eg to add new transactions from all CSV files to the main journal, it's just: 
+and will be modified, though only by appending (existing data will not be changed).
-`hledger import *.csv`
+The input files are specified as arguments, so to import one or more
 CSV files to your main journal, you will run `hledger import bank.csv`
 or perhaps `hledger import *.csv`.
-New transactions are detected in the same way as print --new: 
+Note you can import from any file format, though CSV files are the
-by assuming transactions are always added to the input files in increasing date order,
+most common import source, and these docs focus on that case.
 and by saving `.latest.FILE` state files.
-The --dry-run output is in journal format, so you can filter it, eg 
+### Deduplication
-to see only uncategorised transactions: 
+
 As a convenience `import` does *deduplication* while reading transactions.
 This does not mean "ignore transactions that look the same",
 but rather "ignore transactions that have been seen before".
 This is intended for when you are periodically importing foreign data
 which may contain already-imported transactions.
 So eg, if every day you download bank CSV files containing redundant data,
 you can safely run `hledger import bank.csv` and only new transactions will be imported.
 (`import` is idempotent.)
 Since the items being read (CSV records, eg) often do not come with
 unique identifiers, hledger detects new transactions by date, assuming
 that:
 1. new items always have the newest dates
 2. item dates do not change across reads
 3. and items with the same date remain in the same relative order across reads.
 These are often true of CSV files representing transactions, or true
 enough so that it works pretty well in practice. 1 is important, but
 violations of 2 and 3 amongst the old transactions won't matter (and
 if you import often, the new transactions will be few, so less likely
 to be the ones affected).
 hledger remembers the latest date processed in each input file by
 saving a hidden ".latest" state file in the same directory. Eg when
 reading `finance/bank.csv`, it will look for and update the
 `finance/.latest.bank.csv` state file. 
 The format is simple: one or more lines containing the
 same ISO-format date (YYYY-MM-DD), meaning "I have processed
 transactions up to this date, and this many of them on that date."
 Normally you won't see or manipulate these state files yourself.
 But if needed, you can delete them to reset the state (making all
 transactions "new"), or you can construct them to "catch up" to a
 certain date. 
 Note deduplication (and updating of state files) can also be done by
 [`print --new`](#print), but this is less often used.
 ### Import testing
 With `--dry-run`, the transactions that will be imported are printed
 to the terminal, without affecting your journal.
 The output is in journal format, so you can re-parse it.
 Eg, to see any importable transactions which CSV rules have not categorised:
 ```shell
-$ hledger import --dry ... | hledger -f- print unknown --ignore-assertions
+$ hledger import --dry bank.csv | hledger -f- -I print unknown
 ```
 ### Importing balance assignments
--- a/hledger/Hledger/Cli/Commands/Print.md
+++ b/hledger/Hledger/Cli/Commands/Print.md
@ -79,21 +79,9 @@ With `-m`/`--match` and a STR argument, print will show at most one transaction:
 one whose description is most similar to STR, and is most recent. STR should contain at
 least two characters. If there is no similar-enough match, no transaction will be shown.
-With `--new`, for each FILE being read, hledger reads (and writes) a special 
+With `--new`, hledger prints only transactions it has not seen on a previous run.
-state file (`.latest.FILE` in the same directory), containing the latest transaction date(s)
+This uses the same deduplication system as the [`import`](#import) command.
-that were seen last time FILE was read. When this file is found, only transactions 
+(See import's docs for details.)
 with newer dates (and new transactions on the latest date) are printed.
 This is useful for ignoring already-seen entries in import data, such as downloaded CSV files.
 Eg:
 ```shell
 $ hledger -f bank1.csv print --new
 (shows transactions added since last print --new on this file)
 ```
 This assumes that transactions added to FILE always have same or increasing dates, 
 and that transactions on the same day do not get reordered.
 See also the [import](#import) command.    
 This command also supports the
 [output destination](hledger.html#output-destination) and