;import, print: better deduplication docs
This commit is contained in:
		
							parent
							
								
									f7bbb39a77
								
							
						
					
					
						commit
						554f7a59fd
					
				| @ -6,19 +6,64 @@ transactions as imported, without actually importing any. | ||||
| 
 | ||||
| _FLAGS | ||||
| 
 | ||||
| The input files are specified as arguments - no need to write -f before each one. | ||||
| So eg to add new transactions from all CSV files to the main journal, it's just:  | ||||
| `hledger import *.csv` | ||||
| Unlike other hledger commands, with `import` the journal file is an output file, | ||||
| and will be modified, though only by appending (existing data will not be changed). | ||||
| The input files are specified as arguments, so to import one or more | ||||
| CSV files to your main journal, you will run `hledger import bank.csv` | ||||
| or perhaps `hledger import *.csv`. | ||||
| 
 | ||||
| New transactions are detected in the same way as print --new:  | ||||
| by assuming transactions are always added to the input files in increasing date order, | ||||
| and by saving `.latest.FILE` state files. | ||||
| Note you can import from any file format, though CSV files are the | ||||
| most common import source, and these docs focus on that case. | ||||
| 
 | ||||
| The --dry-run output is in journal format, so you can filter it, eg  | ||||
| to see only uncategorised transactions:  | ||||
| ### Deduplication | ||||
| 
 | ||||
| As a convenience `import` does *deduplication* while reading transactions. | ||||
| This does not mean "ignore transactions that look the same", | ||||
| but rather "ignore transactions that have been seen before". | ||||
| This is intended for when you are periodically importing foreign data | ||||
| which may contain already-imported transactions. | ||||
| So eg, if every day you download bank CSV files containing redundant data, | ||||
| you can safely run `hledger import bank.csv` and only new transactions will be imported. | ||||
| (`import` is idempotent.) | ||||
| 
 | ||||
| Since the items being read (CSV records, eg) often do not come with | ||||
| unique identifiers, hledger detects new transactions by date, assuming | ||||
| that: | ||||
| 
 | ||||
| 1. new items always have the newest dates | ||||
| 2. item dates do not change across reads | ||||
| 3. and items with the same date remain in the same relative order across reads. | ||||
| 
 | ||||
| These are often true of CSV files representing transactions, or true | ||||
| enough so that it works pretty well in practice. 1 is important, but | ||||
| violations of 2 and 3 amongst the old transactions won't matter (and | ||||
| if you import often, the new transactions will be few, so less likely | ||||
| to be the ones affected). | ||||
| 
 | ||||
| hledger remembers the latest date processed in each input file by | ||||
| saving a hidden ".latest" state file in the same directory. Eg when | ||||
| reading `finance/bank.csv`, it will look for and update the | ||||
| `finance/.latest.bank.csv` state file.  | ||||
| The format is simple: one or more lines containing the | ||||
| same ISO-format date (YYYY-MM-DD), meaning "I have processed | ||||
| transactions up to this date, and this many of them on that date." | ||||
| Normally you won't see or manipulate these state files yourself. | ||||
| But if needed, you can delete them to reset the state (making all | ||||
| transactions "new"), or you can construct them to "catch up" to a | ||||
| certain date.  | ||||
| 
 | ||||
| Note deduplication (and updating of state files) can also be done by | ||||
| [`print --new`](#print), but this is less often used. | ||||
| 
 | ||||
| ### Import testing | ||||
| 
 | ||||
| With `--dry-run`, the transactions that will be imported are printed | ||||
| to the terminal, without affecting your journal. | ||||
| The output is in journal format, so you can re-parse it. | ||||
| Eg, to see any importable transactions which CSV rules have not categorised: | ||||
| 
 | ||||
| ```shell | ||||
| $ hledger import --dry ... | hledger -f- print unknown --ignore-assertions | ||||
| $ hledger import --dry bank.csv | hledger -f- -I print unknown | ||||
| ``` | ||||
| 
 | ||||
| ### Importing balance assignments | ||||
| @ -41,4 +86,4 @@ please test it and send a pull request.) | ||||
| ### Commodity display styles | ||||
| 
 | ||||
| Imported amounts will be formatted according to the canonical [commodity styles](hledger.html#commodity-display-style) | ||||
| (declared or inferred) in the main journal file. | ||||
| (declared or inferred) in the main journal file. | ||||
|  | ||||
| @ -79,21 +79,9 @@ With `-m`/`--match` and a STR argument, print will show at most one transaction: | ||||
| one whose description is most similar to STR, and is most recent. STR should contain at | ||||
| least two characters. If there is no similar-enough match, no transaction will be shown. | ||||
| 
 | ||||
| With `--new`, for each FILE being read, hledger reads (and writes) a special  | ||||
| state file (`.latest.FILE` in the same directory), containing the latest transaction date(s) | ||||
| that were seen last time FILE was read. When this file is found, only transactions  | ||||
| with newer dates (and new transactions on the latest date) are printed. | ||||
| This is useful for ignoring already-seen entries in import data, such as downloaded CSV files. | ||||
| Eg: | ||||
| 
 | ||||
| ```shell | ||||
| $ hledger -f bank1.csv print --new | ||||
| (shows transactions added since last print --new on this file) | ||||
| ``` | ||||
| 
 | ||||
| This assumes that transactions added to FILE always have same or increasing dates,  | ||||
| and that transactions on the same day do not get reordered. | ||||
| See also the [import](#import) command.     | ||||
| With `--new`, hledger prints only transactions it has not seen on a previous run. | ||||
| This uses the same deduplication system as the [`import`](#import) command. | ||||
| (See import's docs for details.) | ||||
| 
 | ||||
| This command also supports the | ||||
| [output destination](hledger.html#output-destination) and | ||||
|  | ||||
		Loading…
	
		Reference in New Issue
	
	Block a user