;import, print: better deduplication docs
This commit is contained in:
		
							parent
							
								
									f7bbb39a77
								
							
						
					
					
						commit
						554f7a59fd
					
				| @ -6,19 +6,64 @@ transactions as imported, without actually importing any. | |||||||
| 
 | 
 | ||||||
| _FLAGS | _FLAGS | ||||||
| 
 | 
 | ||||||
| The input files are specified as arguments - no need to write -f before each one. | Unlike other hledger commands, with `import` the journal file is an output file, | ||||||
| So eg to add new transactions from all CSV files to the main journal, it's just:  | and will be modified, though only by appending (existing data will not be changed). | ||||||
| `hledger import *.csv` | The input files are specified as arguments, so to import one or more | ||||||
|  | CSV files to your main journal, you will run `hledger import bank.csv` | ||||||
|  | or perhaps `hledger import *.csv`. | ||||||
| 
 | 
 | ||||||
| New transactions are detected in the same way as print --new:  | Note you can import from any file format, though CSV files are the | ||||||
| by assuming transactions are always added to the input files in increasing date order, | most common import source, and these docs focus on that case. | ||||||
| and by saving `.latest.FILE` state files. |  | ||||||
| 
 | 
 | ||||||
| The --dry-run output is in journal format, so you can filter it, eg  | ### Deduplication | ||||||
| to see only uncategorised transactions:  | 
 | ||||||
|  | As a convenience `import` does *deduplication* while reading transactions. | ||||||
|  | This does not mean "ignore transactions that look the same", | ||||||
|  | but rather "ignore transactions that have been seen before". | ||||||
|  | This is intended for when you are periodically importing foreign data | ||||||
|  | which may contain already-imported transactions. | ||||||
|  | So eg, if every day you download bank CSV files containing redundant data, | ||||||
|  | you can safely run `hledger import bank.csv` and only new transactions will be imported. | ||||||
|  | (`import` is idempotent.) | ||||||
|  | 
 | ||||||
|  | Since the items being read (CSV records, eg) often do not come with | ||||||
|  | unique identifiers, hledger detects new transactions by date, assuming | ||||||
|  | that: | ||||||
|  | 
 | ||||||
|  | 1. new items always have the newest dates | ||||||
|  | 2. item dates do not change across reads | ||||||
|  | 3. and items with the same date remain in the same relative order across reads. | ||||||
|  | 
 | ||||||
|  | These are often true of CSV files representing transactions, or true | ||||||
|  | enough so that it works pretty well in practice. 1 is important, but | ||||||
|  | violations of 2 and 3 amongst the old transactions won't matter (and | ||||||
|  | if you import often, the new transactions will be few, so less likely | ||||||
|  | to be the ones affected). | ||||||
|  | 
 | ||||||
|  | hledger remembers the latest date processed in each input file by | ||||||
|  | saving a hidden ".latest" state file in the same directory. Eg when | ||||||
|  | reading `finance/bank.csv`, it will look for and update the | ||||||
|  | `finance/.latest.bank.csv` state file.  | ||||||
|  | The format is simple: one or more lines containing the | ||||||
|  | same ISO-format date (YYYY-MM-DD), meaning "I have processed | ||||||
|  | transactions up to this date, and this many of them on that date." | ||||||
|  | Normally you won't see or manipulate these state files yourself. | ||||||
|  | But if needed, you can delete them to reset the state (making all | ||||||
|  | transactions "new"), or you can construct them to "catch up" to a | ||||||
|  | certain date.  | ||||||
|  | 
 | ||||||
|  | Note deduplication (and updating of state files) can also be done by | ||||||
|  | [`print --new`](#print), but this is less often used. | ||||||
|  | 
 | ||||||
|  | ### Import testing | ||||||
|  | 
 | ||||||
|  | With `--dry-run`, the transactions that will be imported are printed | ||||||
|  | to the terminal, without affecting your journal. | ||||||
|  | The output is in journal format, so you can re-parse it. | ||||||
|  | Eg, to see any importable transactions which CSV rules have not categorised: | ||||||
| 
 | 
 | ||||||
| ```shell | ```shell | ||||||
| $ hledger import --dry ... | hledger -f- print unknown --ignore-assertions | $ hledger import --dry bank.csv | hledger -f- -I print unknown | ||||||
| ``` | ``` | ||||||
| 
 | 
 | ||||||
| ### Importing balance assignments | ### Importing balance assignments | ||||||
| @ -41,4 +86,4 @@ please test it and send a pull request.) | |||||||
| ### Commodity display styles | ### Commodity display styles | ||||||
| 
 | 
 | ||||||
| Imported amounts will be formatted according to the canonical [commodity styles](hledger.html#commodity-display-style) | Imported amounts will be formatted according to the canonical [commodity styles](hledger.html#commodity-display-style) | ||||||
| (declared or inferred) in the main journal file. | (declared or inferred) in the main journal file. | ||||||
|  | |||||||
| @ -79,21 +79,9 @@ With `-m`/`--match` and a STR argument, print will show at most one transaction: | |||||||
| one whose description is most similar to STR, and is most recent. STR should contain at | one whose description is most similar to STR, and is most recent. STR should contain at | ||||||
| least two characters. If there is no similar-enough match, no transaction will be shown. | least two characters. If there is no similar-enough match, no transaction will be shown. | ||||||
| 
 | 
 | ||||||
| With `--new`, for each FILE being read, hledger reads (and writes) a special  | With `--new`, hledger prints only transactions it has not seen on a previous run. | ||||||
| state file (`.latest.FILE` in the same directory), containing the latest transaction date(s) | This uses the same deduplication system as the [`import`](#import) command. | ||||||
| that were seen last time FILE was read. When this file is found, only transactions  | (See import's docs for details.) | ||||||
| with newer dates (and new transactions on the latest date) are printed. |  | ||||||
| This is useful for ignoring already-seen entries in import data, such as downloaded CSV files. |  | ||||||
| Eg: |  | ||||||
| 
 |  | ||||||
| ```shell |  | ||||||
| $ hledger -f bank1.csv print --new |  | ||||||
| (shows transactions added since last print --new on this file) |  | ||||||
| ``` |  | ||||||
| 
 |  | ||||||
| This assumes that transactions added to FILE always have same or increasing dates,  |  | ||||||
| and that transactions on the same day do not get reordered. |  | ||||||
| See also the [import](#import) command.     |  | ||||||
| 
 | 
 | ||||||
| This command also supports the | This command also supports the | ||||||
| [output destination](hledger.html#output-destination) and | [output destination](hledger.html#output-destination) and | ||||||
|  | |||||||
		Loading…
	
		Reference in New Issue
	
	Block a user