hledger/Import.md at 00eb0aa16b7be068ff7f7cb0bf29bb73413c8312

Simon Michael c787286844 imp: doc: show flags help in manuals

Each CMD.md file now contains a snapshot of the flags help as rendered
by --help. For now these must be updated manually.

2024-06-07 06:55:33 -07:00

5.6 KiB

Raw Blame History

import

Import new transactions from one or more data files to the main journal.

Flags:
     --catchup              just mark all transactions as already imported
     --dry-run              just show the transactions to be imported

This command detects new transactions in each FILE argument since it was last run, and appends them to the main journal.

Or with --dry-run, it just print the transactions that would be added.

Or with --catchup, it just marks all of the FILEs’ current transactions as already imported.

This is one of the few hledger commands that writes to the journal file (see also add). It only appends; existing data will not be changed.

The input files are specified as arguments, so to import one or more CSV files to your main journal, you will run hledger import bank.csv or perhaps hledger import *.csv.

Note you can import from any file format, though CSV files are the most common import source, and these docs focus on that case. The target file (main journal) should be in journal format.

Date skipping

import tries to import only the transactions which are new since the last import, ignoring any that it has seen in previous runs. So if your bank’s CSV includes the last three months of data, you can download and import it every month (or week, or day) and only the new transactions will be imported each time.

It works as follows: for each imported FILE,

It tries to read the latest date previously seen, from .latest.FILE in the same directory
Then it processes FILE, ignoring transactions on or before that date

And after a successful import, unless --dry-run was used, it updates the .latest.FILE(s) for next time. This is a simple system that works for most real-world CSV files; it assumes the following are true, or true enough:

the name of the input file is stable across successive downloads
new items always have the newest dates
item dates are stable across downloads
the order of same-date items is stable across downloads.

Tips:

To help ensure a stable file name, remember you can use a CSV rules file as an input file.
If you have a bank whose CSV dates or ordering occasionally change, you can reduce the chance of this happening in new transactions by importing more often. (If it happens in old transactions, that’s harmless.)

Note this is just one kind of “deduplication”: not reprocessing the same dates across successive runs. import doesn’t detect other kinds of duplication, such as the same transaction appearing multiple times within a single run, or a new transaction that looks identical to a transaction already in the journal. (Because these can happen legitimately in real-world data.)

Here’s a situation where you need to run import with care: say you download but forget to import bank.1.csv, and a week later you download bank.2.csv with some overlapping data. You should not process both of these as a single import (hledger import bank.1.csv bank.2.csv), because the overlapping transactions would not be deduplicated. Instead, import one file at a time, using the same filename each time:

$ mv bank.1.csv bank.csv; hledger import bank.csv
$ mv bank.2.csv bank.csv; hledger import bank.csv

Normally you don’t need to think about .latest.* files, but you can create or modify them to catch up to a certain date, or delete them to mark all transactions as new. Their format is a single ISO-format YYYY-MM-DD date, optionally repeated on multiple lines, meaning “I have seen the transactions before this date, and this many of them on this date”.

hledger print --new also uses and updates these .latest.* files, but it is less often used.

Import testing

With --dry-run, the transactions that will be imported are printed to the terminal, without updating your journal or state files. The output is valid journal format, like the print command, so you can re-parse it. Eg, to see any importable transactions which CSV rules have not categorised:

$ hledger import --dry bank.csv | hledger -f- -I print unknown

or (live updating):

$ ls bank.csv* | entr bash -c 'echo ====; hledger import --dry bank.csv | hledger -f- -I print unknown'

Note: when importing from multiple files at once, it’s currently possible for some .latest files to be updated successfully, while the actual import fails because of a problem in one of the files, leaving them out of sync (and causing some transactions to be missed). To prevent this, do a –dry-run first and fix any problems before the real import.

Importing balance assignments

Entries added by import will have their posting amounts made explicit (like hledger print -x). This means that any balance assignments in imported files must be evaluated; but, imported files don’t get to see the main file’s account balances. As a result, importing entries with balance assignments (eg from an institution that provides only balances and not posting amounts) will probably generate incorrect posting amounts. To avoid this problem, use print instead of import:

$ hledger print IMPORTFILE [--new] >> $LEDGER_FILE

(If you think import should leave amounts implicit like print does, please test it and send a pull request.)

Import and commodity styles

Amounts in entries added by import will be formatted according to the journal’s canonical commodity styles, as declared by commodity directives or inferred from the journal’s amounts.

Related: CSV > Amount decimal places.

5.6 KiB Raw Blame History Unescape Escape