|
|
|
|
@ -6,123 +6,156 @@ Flags:
|
|
|
|
|
--catchup just mark all transactions as already imported
|
|
|
|
|
--dry-run just show the transactions to be imported
|
|
|
|
|
|
|
|
|
|
This command detects new transactions in each FILE argument since it was
|
|
|
|
|
last run, and appends them to the main journal.
|
|
|
|
|
This command detects new transactions in one or more data files
|
|
|
|
|
specified as arguments, and appends them to the main journal.
|
|
|
|
|
|
|
|
|
|
Or with --dry-run, it just print the transactions that would be added.
|
|
|
|
|
You can import from any input file format hledger supports, but
|
|
|
|
|
CSV/SSV/TSV files, downloaded from financial institutions, are the most
|
|
|
|
|
common import source.
|
|
|
|
|
|
|
|
|
|
Or with --catchup, it just marks all of the FILEs' current transactions
|
|
|
|
|
as already imported.
|
|
|
|
|
The import destination is the default journal file, or another specified
|
|
|
|
|
in the usual way with $LEDGER_FILE or -f/--file. It should be in journal
|
|
|
|
|
format.
|
|
|
|
|
|
|
|
|
|
This is one of the few hledger commands that writes to the journal file
|
|
|
|
|
(see also add). It only appends; existing data will not be changed.
|
|
|
|
|
Examples:
|
|
|
|
|
|
|
|
|
|
The input files are specified as arguments, so to import one or more CSV
|
|
|
|
|
files to your main journal, you will run hledger import bank.csv or
|
|
|
|
|
perhaps hledger import *.csv.
|
|
|
|
|
$ hledger import bank1-checking.csv bank1-savings.csv
|
|
|
|
|
|
|
|
|
|
Note you can import from any file format, though CSV files are the most
|
|
|
|
|
common import source, and these docs focus on that case. The target file
|
|
|
|
|
(main journal) should be in journal format.
|
|
|
|
|
$ hledger import *.csv
|
|
|
|
|
|
|
|
|
|
Date skipping
|
|
|
|
|
Import preview
|
|
|
|
|
|
|
|
|
|
import tries to import only the transactions which are new since the
|
|
|
|
|
last import, ignoring any that it has seen in previous runs. So if your
|
|
|
|
|
bank's CSV includes the last three months of data, you can download and
|
|
|
|
|
import it every month (or week, or day) and only the new transactions
|
|
|
|
|
will be imported each time.
|
|
|
|
|
It's useful to preview the import by running first with --dry-run, to
|
|
|
|
|
sanity check the range of dates being imported, and to check the effect
|
|
|
|
|
of your conversion rules if converting from CSV. Eg:
|
|
|
|
|
|
|
|
|
|
It works as follows: for each imported FILE,
|
|
|
|
|
$ hledger import bank.csv --dry-run
|
|
|
|
|
|
|
|
|
|
- It tries to read the latest date previously seen, from .latest.FILE
|
|
|
|
|
in the same directory
|
|
|
|
|
- Then it processes FILE, ignoring transactions on or before that date
|
|
|
|
|
The dry run output is valid journal format, so hledger can re-parse it.
|
|
|
|
|
If the output is large, you could show just the uncategorised
|
|
|
|
|
transactions like so:
|
|
|
|
|
|
|
|
|
|
And after a successful import, unless --dry-run was used, it updates the
|
|
|
|
|
.latest.FILE(s) for next time. This is a simple system that works for
|
|
|
|
|
most real-world CSV files; it assumes the following are true, or true
|
|
|
|
|
enough:
|
|
|
|
|
$ hledger import --dry-run bank.csv | hledger -f- -I print unknown
|
|
|
|
|
|
|
|
|
|
1. the name of the input file is stable across successive downloads
|
|
|
|
|
2. new items always have the newest dates
|
|
|
|
|
3. item dates are stable across downloads
|
|
|
|
|
4. the order of same-date items is stable across downloads.
|
|
|
|
|
You could also run this repeatedly to see the effect of edits to your
|
|
|
|
|
conversion rules:
|
|
|
|
|
|
|
|
|
|
Tips:
|
|
|
|
|
$ watchexec -- 'hledger import --dry-run bank.csv | hledger -f- -I print unknown'
|
|
|
|
|
|
|
|
|
|
- To help ensure a stable file name, remember you can use a CSV rules
|
|
|
|
|
file as an input file.
|
|
|
|
|
Once the conversion and dates look good enough to import to your
|
|
|
|
|
journal, perhaps with some manual fixups to follow, you would do the
|
|
|
|
|
actual import:
|
|
|
|
|
|
|
|
|
|
- If you have a bank whose CSV dates or ordering occasionally change,
|
|
|
|
|
you can reduce the chance of this happening in new transactions by
|
|
|
|
|
importing more often. (If it happens in old transactions, that's
|
|
|
|
|
harmless.)
|
|
|
|
|
$ hledger import bank.csv
|
|
|
|
|
|
|
|
|
|
Note this is just one kind of "deduplication": not reprocessing the same
|
|
|
|
|
dates across successive runs. import doesn't detect other kinds of
|
|
|
|
|
duplication, such as the same transaction appearing multiple times
|
|
|
|
|
within a single run, or a new transaction that looks identical to a
|
|
|
|
|
transaction already in the journal. (Because these can happen
|
|
|
|
|
legitimately in real-world data.)
|
|
|
|
|
Overlap detection
|
|
|
|
|
|
|
|
|
|
Here's a situation where you need to run import with care: say you
|
|
|
|
|
download but forget to import bank.1.csv, and a week later you download
|
|
|
|
|
bank.2.csv with some overlapping data. You should not process both of
|
|
|
|
|
these as a single import (hledger import bank.1.csv bank.2.csv), because
|
|
|
|
|
the overlapping transactions would not be deduplicated. Instead, import
|
|
|
|
|
one file at a time, using the same filename each time:
|
|
|
|
|
Reading CSV files is built in to hledger, and not specific to import; so
|
|
|
|
|
you could also import by doing hledger -f bank.csv print >>$LEDGER_FILE.
|
|
|
|
|
|
|
|
|
|
$ mv bank.1.csv bank.csv; hledger import bank.csv
|
|
|
|
|
$ mv bank.2.csv bank.csv; hledger import bank.csv
|
|
|
|
|
But import is easier and provides some advantages. The main one is that
|
|
|
|
|
it avoids re-importing transactions it has seen on previous runs. This
|
|
|
|
|
means you don't have to worry about overlapping data in successive
|
|
|
|
|
downloads of your bank CSV; just download and import as often as you
|
|
|
|
|
like, and only the new transactions will be imported each time.
|
|
|
|
|
|
|
|
|
|
Normally you don't need to think about .latest.* files, but you can
|
|
|
|
|
create or modify them to catch up to a certain date, or delete them to
|
|
|
|
|
mark all transactions as new. Their format is a single ISO-format
|
|
|
|
|
YYYY-MM-DD date, optionally repeated on multiple lines, meaning "I have
|
|
|
|
|
seen the transactions before this date, and this many of them on this
|
|
|
|
|
date".
|
|
|
|
|
We don't call this "deduplication", as it's generally not possible to
|
|
|
|
|
reliably detect duplicates in bank CSV. Instead, import remembers the
|
|
|
|
|
latest date processed previously in each CSV file (saving it in a hidden
|
|
|
|
|
file), and skips any records prior to that date. This works well for
|
|
|
|
|
most real-world CSV, where:
|
|
|
|
|
|
|
|
|
|
hledger print --new also uses and updates these .latest.* files, but it
|
|
|
|
|
is less often used.
|
|
|
|
|
1. the data file name is stable (does not change) across imports
|
|
|
|
|
2. the item dates are stable across imports
|
|
|
|
|
3. the order of same-date items is stable across imports
|
|
|
|
|
4. the newest items have the newest dates
|
|
|
|
|
|
|
|
|
|
Related: CSV > Working with CSV > Deduplicating, importing.
|
|
|
|
|
(Occasional violations of 2-4 are often harmless; you can reduce the
|
|
|
|
|
chance of disruption by downloading and importing more often.)
|
|
|
|
|
|
|
|
|
|
Import testing
|
|
|
|
|
Overlap detection is automatic, and shouldn't require much attention
|
|
|
|
|
from you, except perhaps at first import (see below). But here's how it
|
|
|
|
|
works:
|
|
|
|
|
|
|
|
|
|
With --dry-run, the transactions that will be imported are printed to
|
|
|
|
|
the terminal, without updating your journal or state files. The output
|
|
|
|
|
is valid journal format, like the print command, so you can re-parse it.
|
|
|
|
|
Eg, to see any importable transactions which CSV rules have not
|
|
|
|
|
categorised:
|
|
|
|
|
- For each FILE being imported from:
|
|
|
|
|
|
|
|
|
|
$ hledger import --dry bank.csv | hledger -f- -I print unknown
|
|
|
|
|
1. hledger reads a file named .latest.FILE file in the same
|
|
|
|
|
directory, if any. This file contains the latest record date
|
|
|
|
|
previously imported from FILE, in YYYY-MM-DD format. If multiple
|
|
|
|
|
records with that date were imported, the date is repeated on N
|
|
|
|
|
lines.
|
|
|
|
|
|
|
|
|
|
or (live updating):
|
|
|
|
|
2. hledger reads records from FILE. If a latest date was found in
|
|
|
|
|
step 1, any records before that date, and the first N records on
|
|
|
|
|
that date, are skipped.
|
|
|
|
|
|
|
|
|
|
$ ls bank.csv* | entr bash -c 'echo ====; hledger import --dry bank.csv | hledger -f- -I print unknown'
|
|
|
|
|
- After a successful import from all FILEs, without error and without
|
|
|
|
|
--dry-run, hledger updates each FILE's .latest.FILE for next time.
|
|
|
|
|
|
|
|
|
|
Note: when importing from multiple files at once, it's currently
|
|
|
|
|
possible for some .latest files to be updated successfully, while the
|
|
|
|
|
actual import fails because of a problem in one of the files, leaving
|
|
|
|
|
them out of sync (and causing some transactions to be missed). To
|
|
|
|
|
prevent this, do a --dry-run first and fix any problems before the real
|
|
|
|
|
import.
|
|
|
|
|
If this goes wrong, it's relatively easy to repair:
|
|
|
|
|
|
|
|
|
|
- You'll notice it before import when you preview with
|
|
|
|
|
import --dry-run.
|
|
|
|
|
- Or after import when you try to reconcile your hledger account
|
|
|
|
|
balances with your bank.
|
|
|
|
|
- hledger print -f FILE.csv will show all recently downloaded
|
|
|
|
|
transactions. Compare these with your journal. Copy/paste if needed.
|
|
|
|
|
- Update your conversion rules and print again, if needed.
|
|
|
|
|
- You can manually update or remove the .latest file, or use
|
|
|
|
|
import --catchup FILE.
|
|
|
|
|
- Download and import more often, eg twice a week, at least while you
|
|
|
|
|
are learning. It's easier to review and troubleshoot when there are
|
|
|
|
|
fewer transactions.
|
|
|
|
|
|
|
|
|
|
First import
|
|
|
|
|
|
|
|
|
|
The first time you import from a file, when no corresponding .latest
|
|
|
|
|
file has been created yet, all of the records will be imported.
|
|
|
|
|
|
|
|
|
|
But perhaps you have been entering the data manually, so you know that
|
|
|
|
|
all of these transactions are already recorded in the journal. In this
|
|
|
|
|
case you can run hledger import --catchup once. This will create a
|
|
|
|
|
.latest file containing the latest CSV record date, so that none of
|
|
|
|
|
those records will be re-imported.
|
|
|
|
|
|
|
|
|
|
Or, if you know that some but not all of the transactions are in the
|
|
|
|
|
journal, you can create the .latest file yourself. Eg, let's say you
|
|
|
|
|
previously recorded foobank transactions up to 2024-10-31 in the
|
|
|
|
|
journal. Then in the directory where you'll be saving foobank.csv, you
|
|
|
|
|
would create a .latest.foobank.csv file containing
|
|
|
|
|
|
|
|
|
|
2024-10-31
|
|
|
|
|
|
|
|
|
|
Or if you had three foobank transactions recorded with that date, you
|
|
|
|
|
would repeat the date that many times:
|
|
|
|
|
|
|
|
|
|
2024-10-31
|
|
|
|
|
2024-10-31
|
|
|
|
|
2024-10-31
|
|
|
|
|
|
|
|
|
|
Then hledger import foobank.csv [--dry-run] will import only the newer
|
|
|
|
|
records.
|
|
|
|
|
|
|
|
|
|
Importing balance assignments
|
|
|
|
|
|
|
|
|
|
Entries added by import will have their posting amounts made explicit
|
|
|
|
|
(like hledger print -x). This means that any balance assignments in
|
|
|
|
|
imported files must be evaluated; but, imported files don't get to see
|
|
|
|
|
the main file's account balances. As a result, importing entries with
|
|
|
|
|
balance assignments (eg from an institution that provides only balances
|
|
|
|
|
and not posting amounts) will probably generate incorrect posting
|
|
|
|
|
amounts. To avoid this problem, use print instead of import:
|
|
|
|
|
Journal entries added by import will have all posting amounts made
|
|
|
|
|
explicit (like print -x).
|
|
|
|
|
|
|
|
|
|
$ hledger print IMPORTFILE [--new] >> $LEDGER_FILE
|
|
|
|
|
This means that any balance assignments in the imported entries would
|
|
|
|
|
need to be evaluated. But this generally isn't possible, as the main
|
|
|
|
|
file's account balances are not visible during import. So try to avoid
|
|
|
|
|
generating balance assignments with your CSV rules, or importing from a
|
|
|
|
|
journal that contains balance assignments. (Balance assignments are best
|
|
|
|
|
avoided anyway.)
|
|
|
|
|
|
|
|
|
|
(If you think import should leave amounts implicit like print does,
|
|
|
|
|
please test it and send a pull request.)
|
|
|
|
|
But if you must use them, eg because your CSV includes only balances:
|
|
|
|
|
you can import with print, which leaves implicit amounts implicit.
|
|
|
|
|
(print can also do overlap detection like import, with the --new flag):
|
|
|
|
|
|
|
|
|
|
$ hledger print --new -f bank.csv >> $LEDGER_FILE
|
|
|
|
|
|
|
|
|
|
(If you think import should preserve implicit balances, please test that
|
|
|
|
|
and send a pull request.)
|
|
|
|
|
|
|
|
|
|
Import and commodity styles
|
|
|
|
|
|
|
|
|
|
@ -131,3 +164,31 @@ journal's canonical commodity styles, as declared by commodity
|
|
|
|
|
directives or inferred from the journal's amounts.
|
|
|
|
|
|
|
|
|
|
Related: CSV > Amount decimal places.
|
|
|
|
|
|
|
|
|
|
Import special cases
|
|
|
|
|
|
|
|
|
|
If you have a download whose file name varies, you could rename it to a
|
|
|
|
|
fixed name after each download. Or you could use a CSV source rule with
|
|
|
|
|
a suitable glob pattern, and import from the .rules file instead of the
|
|
|
|
|
data file.
|
|
|
|
|
|
|
|
|
|
Here's a situation where you would need to run import with care: say you
|
|
|
|
|
download bank.csv, but forget to import it or delete it. And next month
|
|
|
|
|
you download it again. This time your web browser may save it as
|
|
|
|
|
bank (2).csv. So now each of these may have data not included in the
|
|
|
|
|
other. And a source rule with a glob pattern would match only the most
|
|
|
|
|
recent file. So in this case you should import from each one in turn, in
|
|
|
|
|
the correct order, taking care to use the same filename each time:
|
|
|
|
|
|
|
|
|
|
$ hledger import bank.csv
|
|
|
|
|
$ mv 'bank (2).csv' bank.csv
|
|
|
|
|
$ hledger import bank.csv
|
|
|
|
|
|
|
|
|
|
Here are two kinds of "deduplication" which import does not handle (and
|
|
|
|
|
generally should not, since these can happen legitimately in financial
|
|
|
|
|
data):
|
|
|
|
|
|
|
|
|
|
- Two or more of the new CSV records are identical, and generate
|
|
|
|
|
identical new journal entries.
|
|
|
|
|
- A new CSV record generates a journal entry identical to one(s)
|
|
|
|
|
already in the journal.
|
|
|
|
|
|