469 lines
18 KiB
Plaintext
469 lines
18 KiB
Plaintext
|
|
hledger_csv(5) hledger User Manuals hledger_csv(5)
|
|
|
|
|
|
|
|
NAME
|
|
CSV - how hledger reads CSV data, and the CSV rules file format
|
|
|
|
DESCRIPTION
|
|
hledger can read CSV (comma-separated value) files as if they were
|
|
journal files, automatically converting each CSV record into a transac-
|
|
tion. (To learn about writing CSV, see CSV output.)
|
|
|
|
Converting CSV to transactions requires some special conversion rules.
|
|
These do several things:
|
|
|
|
o they describe the layout and format of the CSV data
|
|
|
|
o they can customize the generated journal entries (transactions) using
|
|
a simple templating language
|
|
|
|
o they can add refinements based on patterns in the CSV data, eg cate-
|
|
gorizing transactions with more detailed account names.
|
|
|
|
When reading a CSV file named FILE.csv, hledger looks for a conversion
|
|
rules file named FILE.csv.rules in the same directory. You can over-
|
|
ride this with the --rules-file option. If the rules file does not ex-
|
|
ist, hledger will auto-create one with some example rules, which you'll
|
|
need to adjust.
|
|
|
|
At minimum, the rules file must identify the date and amount fields.
|
|
It's often necessary to specify the date format, and the number of
|
|
header lines to skip, also. Eg:
|
|
|
|
fields date, _, _, amount
|
|
date-format %d/%m/%Y
|
|
skip 1
|
|
|
|
More examples in the EXAMPLES section below.
|
|
|
|
CSV RULES
|
|
The following kinds of rule can appear in the rules file, in any order
|
|
(except for end which can appear only inside a conditional block).
|
|
Blank lines and lines beginning with # or ; are ignored.
|
|
|
|
skip
|
|
skip N
|
|
|
|
The word "skip" followed by a number (or no number, meaning 1) tells
|
|
hledger to ignore this many non-empty lines preceding the CSV data.
|
|
(Empty/blank lines are skipped automatically.) You'll need this when-
|
|
ever your CSV data contains header lines.
|
|
|
|
It also has a second purpose: it can be used to ignore certain CSV
|
|
records, see conditional blocks below.
|
|
|
|
fields
|
|
fields FIELDNAME1, FIELDNAME2, ...
|
|
|
|
A fields list ("fields" followed by one or more comma-separated field
|
|
names) is the quick way to assign CSV field values to hledger fields.
|
|
It (a) names the CSV fields, in order (names may not contain white-
|
|
space; fields you don't care about can be left unnamed), and (b) as-
|
|
signs them to hledger fields if you use standard hledger field names.
|
|
Here's an example:
|
|
|
|
# use the 1st, 2nd and 4th CSV fields as the transaction's date, description and amount,
|
|
# ignore the 3rd, 5th and 6th fields,
|
|
# and name the 7th and 8th fields for later reference:
|
|
# 1 2 3 4 5 6 7 8
|
|
|
|
fields date, description, , amount1, , , somefield, anotherfield
|
|
|
|
Here are the standard hledger field names:
|
|
|
|
Transaction fields
|
|
date, date2, status, code, description, comment can be used to form the
|
|
transaction's first line. Only date is required. (See also date-for-
|
|
mat below.)
|
|
|
|
Posting fields
|
|
accountN, where N is 1 to 9, sets the Nth posting's account name. Most
|
|
often there are two postings, so you'll want to set account1 and ac-
|
|
count2.
|
|
|
|
A number of field/pseudo-field names are available for setting posting
|
|
amounts:
|
|
|
|
o amountN sets posting N's amount
|
|
|
|
o amountN-in and amountN-out can be used instead, if the CSV has sepa-
|
|
rate fields for debits and credits
|
|
|
|
o currencyN sets a currency symbol to be left-prefixed to the amount,
|
|
useful if the CSV provides that as a separate field
|
|
|
|
o balanceN sets a (separate) balance assertion amount (or when no post-
|
|
ing amount is set, a balance assignment)
|
|
|
|
If you write these with no number (amount, amount-in, amount-out, cur-
|
|
rency, balance), it means posting 1. Also, if you set an amount for
|
|
posting 1 only, a second posting that balances the transaction will be
|
|
generated automatically. This helps support CSV rules created before
|
|
hledger 1.16.
|
|
|
|
Finally, commentN sets a comment on the Nth posting. Comments can of
|
|
course contain tags.
|
|
|
|
(field assignment)
|
|
HLEDGERFIELDNAME FIELDVALUE
|
|
|
|
Instead of or in addition to a fields list, you can assign a value to a
|
|
hledger field by writing its name (any of the standard names above)
|
|
followed by a text value. The value may contain interpolated CSV
|
|
fields, referenced by their 1-based position in the CSV record (%N), or
|
|
by the name they were given in the fields list (%CSVFIELDNAME). Eg:
|
|
|
|
# set the amount to the 4th CSV field, with " USD" appended
|
|
amount %4 USD
|
|
|
|
# combine three fields to make a comment, containing note: and date: tags
|
|
comment note: %somefield - %anotherfield, date: %1
|
|
|
|
Interpolation strips any outer whitespace, so a CSV value like " 1 "
|
|
becomes 1 when interpolated (#1051). Note you can only interpolate CSV
|
|
fields, not the hledger fields being assigned to; for more on this, see
|
|
TIPS.
|
|
|
|
date-format
|
|
date-format DATEFMT
|
|
|
|
This is a helper for the date (and date2) fields. If your CSV dates
|
|
are not formatted like YYYY-MM-DD, YYYY/MM/DD or YYYY.MM.DD, you'll
|
|
need to specify the format by writing "date-format" followed by a strp-
|
|
time-like date parsing pattern, which must parse the date field values
|
|
completely. Examples:
|
|
|
|
# for dates like "11/06/2013":
|
|
date-format %m/%d/%Y
|
|
|
|
# for dates like "6/11/2013". The - allows leading zeros to be optional.
|
|
date-format %-d/%-m/%Y
|
|
|
|
# for dates like "2013-Nov-06":
|
|
date-format %Y-%h-%d
|
|
|
|
# for dates like "11/6/2013 11:32 PM":
|
|
date-format %-m/%-d/%Y %l:%M %p
|
|
|
|
if
|
|
if PATTERN
|
|
RULE
|
|
|
|
if
|
|
PATTERN
|
|
PATTERN
|
|
PATTERN
|
|
RULE
|
|
RULE
|
|
|
|
Conditional blocks apply one or more rules to CSV records which are
|
|
matched by any of the PATTERNs. This allows transactions to be cus-
|
|
tomised or categorised based on patterns in the data.
|
|
|
|
A single pattern can be written on the same line as the "if"; or multi-
|
|
ple patterns can be written on the following lines, non-indented.
|
|
|
|
Patterns are case-insensitive regular expressions which try to match
|
|
any part of the whole CSV record. It's not yet possible to match
|
|
within a specific field. Note the CSV record they see is close but not
|
|
identical to the one in the CSV file; eg double quotes are removed, and
|
|
the separator character becomes comma.
|
|
|
|
After the patterns, there should be one or more rules to apply, all in-
|
|
dented by at least one space. Three kinds of rule are allowed in con-
|
|
ditional blocks:
|
|
|
|
o field assignments (to set a field's value)
|
|
|
|
o skip (to skip the matched CSV record)
|
|
|
|
o end (to skip all remaining CSV records).
|
|
|
|
Examples:
|
|
|
|
# if the CSV record contains "groceries", set account2 to "expenses:groceries"
|
|
if groceries
|
|
account2 expenses:groceries
|
|
|
|
# if the CSV record contains any of these patterns, set account2 and comment as shown
|
|
if
|
|
monthly service fee
|
|
atm transaction fee
|
|
banking thru software
|
|
account2 expenses:business:banking
|
|
comment XXX deductible ? check it
|
|
|
|
end
|
|
As mentioned above, this rule can be used inside conditional blocks
|
|
(only) to cause hledger to stop reading CSV records and proceed with
|
|
command execution. Eg:
|
|
|
|
# ignore everything following the first empty record
|
|
if ,,,,
|
|
end
|
|
|
|
include
|
|
include RULESFILE
|
|
|
|
Include another CSV rules file at this point, as if it were written in-
|
|
line. RULESFILE is an absolute file path or a path relative to the
|
|
current file's directory.
|
|
|
|
This can be useful eg for reusing common rules in several rules files:
|
|
|
|
# someaccount.csv.rules
|
|
|
|
## someaccount-specific rules
|
|
fields date,description,amount
|
|
account1 some:account
|
|
account2 some:misc
|
|
|
|
## common rules
|
|
include categorisation.rules
|
|
|
|
newest-first
|
|
hledger always sorts the generated transactions by date. Transactions
|
|
on the same date should appear in the same order as their CSV records,
|
|
as hledger can usually auto-detect whether the CSV's normal order is
|
|
oldest first or newest first. But if all of the following are true:
|
|
|
|
o the CSV might sometimes contain just one day of data (all records
|
|
having the same date)
|
|
|
|
o the CSV records are normally in reverse chronological order (newest
|
|
first)
|
|
|
|
o and you care about preserving the order of same-day transactions
|
|
|
|
you should add the newest-first rule as a hint. Eg:
|
|
|
|
# tell hledger explicitly that the CSV is normally newest-first
|
|
newest-first
|
|
|
|
EXAMPLES
|
|
A more complete example, generating three-posting transactions:
|
|
|
|
# hledger CSV rules for amazon.com order history
|
|
|
|
# sample:
|
|
# "Date","Type","To/From","Name","Status","Amount","Fees","Transaction ID"
|
|
# "Jul 29, 2012","Payment","To","Adapteva, Inc.","Completed","$25.00","$0.00","17LA58JSK6PRD4HDGLNJQPI1PB9N8DKPVHL"
|
|
|
|
# skip one header line
|
|
skip 1
|
|
|
|
# name the csv fields (and assign the transaction's date, amount and code)
|
|
fields date, _, toorfrom, name, amzstatus, amount1, fees, code
|
|
|
|
# how to parse the date
|
|
date-format %b %-d, %Y
|
|
|
|
# combine two fields to make the description
|
|
description %toorfrom %name
|
|
|
|
# save these fields as tags
|
|
comment status:%amzstatus
|
|
|
|
# set the base account for all transactions
|
|
account1 assets:amazon
|
|
|
|
# flip the sign on the amount
|
|
amount -%amount
|
|
|
|
# Put fees in a separate posting
|
|
amount3 %fees
|
|
comment3 fees
|
|
|
|
For more examples, see Convert CSV files.
|
|
|
|
TIPS
|
|
Reading multiple CSV files
|
|
You can read multiple CSV files at once using multiple -f arguments on
|
|
the command line. hledger will look for a correspondingly-named rules
|
|
file for each CSV file. If you use the --rules-file option, that rules
|
|
file will be used for all the CSV files.
|
|
|
|
Deduplicating, importing
|
|
When you download a CSV file repeatedly, eg to get your latest bank
|
|
transactions, the new file may contain some of the same records as the
|
|
old one. The print --new command is one simple way to detect just the
|
|
new transactions. Or better still, the import command appends those
|
|
new transactions to your main journal. This is the easiest way to im-
|
|
port CSV data. Eg, after downloading your latest CSV files:
|
|
|
|
$ hledger import *.csv [--dry]
|
|
|
|
Other import methods
|
|
A number of other tools and workflows, hledger-specific and otherwise,
|
|
exist for converting, deduplicating, classifying and managing CSV data.
|
|
See:
|
|
|
|
o https://hledger.org -> sidebar -> real world setups
|
|
|
|
o https://plaintextaccounting.org -> data import/conversion
|
|
|
|
Valid CSV
|
|
hledger accepts CSV conforming to RFC 4180. Some things to note when
|
|
values are enclosed in quotes:
|
|
|
|
o you must use double quotes (not single quotes)
|
|
|
|
o spaces outside the quotes are not allowed
|
|
|
|
Other separator characters
|
|
With the --separator 'CHAR' option, hledger will expect the separator
|
|
to be CHAR instead of a comma. Ie it will read other "Character Sepa-
|
|
rated Values" formats, such as TSV (Tab Separated Values). Note: on
|
|
the command line, use a real tab character in quotes, not Eg:
|
|
|
|
$ hledger -f foo.tsv --separator ' ' print
|
|
|
|
(Experimental.)
|
|
|
|
Setting amounts
|
|
A posting amount can be set in one of these ways:
|
|
|
|
o by assigning (with a fields list or field assigment) to amountN
|
|
(posting N's amount) or amount (posting 1's amount)
|
|
|
|
o by assigning to amountN-in and amountN-out (or amount-in and amount-
|
|
out). For each CSV record, whichever of these has a non-zero value
|
|
will be used, with appropriate sign. If both contain a non-zero
|
|
value, this may not work.
|
|
|
|
o by assigning to balanceN (or balance) instead of the above, setting
|
|
the amount indirectly via a balance assignment.
|
|
|
|
There is some special handling for sign in amounts:
|
|
|
|
o If an amount value is parenthesised, it will be de-parenthesised and
|
|
sign-flipped.
|
|
|
|
o If an amount value begins with a double minus sign, those cancel out
|
|
and are removed.
|
|
|
|
If the currency/commodity symbol is provided as a separate CSV field,
|
|
you can assign it to currency (affects all posting amounts) or curren-
|
|
cyN (affects just posting N's amount). The symbol will be prepended to
|
|
the amount. Or for more control, you can set both currency symbol and
|
|
amount with a field assignment, eg:
|
|
|
|
fields date,description,currency,amount
|
|
# add currency symbol on the right:
|
|
amount %amount %currency
|
|
|
|
Referencing other fields
|
|
In field assignments, you can interpolate only CSV fields, not hledger
|
|
fields. In the example below, there's both a CSV field and a hledger
|
|
field named amount1, but %amount1 always means the CSV field, not the
|
|
hledger field:
|
|
|
|
# Name the third CSV field "amount1"
|
|
fields date,description,amount1
|
|
|
|
# Set hledger's amount1 to the CSV amount1 field followed by USD
|
|
amount1 %amount1 USD
|
|
|
|
# Set comment to the CSV amount1 (not the amount1 assigned above)
|
|
comment %amount1
|
|
|
|
Here, since there's no CSV amount1 field, %amount1 will produce a lit-
|
|
eral "amount1":
|
|
|
|
fields date,description,csvamount
|
|
amount1 %csvamount USD
|
|
# Can't interpolate amount1 here
|
|
comment %amount1
|
|
|
|
When there are multiple field assignments to the same hledger field,
|
|
only the last one takes effect. Here, comment's value will be be B, or
|
|
C if "something" is matched, but never A:
|
|
|
|
comment A
|
|
comment B
|
|
if something
|
|
comment C
|
|
|
|
How CSV rules are evaluated
|
|
Here's how to think of CSV rules being evaluated (if you really need
|
|
to). First,
|
|
|
|
o include - all includes are inlined, from top to bottom, depth first.
|
|
(At each include point the file is inlined and scanned for further
|
|
includes, before proceeding.)
|
|
|
|
Then "global" rules are evaluated, top to bottom. If a rule is re-
|
|
peated, the last one wins:
|
|
|
|
o skip (at top level)
|
|
|
|
o date-format
|
|
|
|
o newest-first
|
|
|
|
o fields - names the CSV fields, optionally sets up initial assignments
|
|
to hledger fields
|
|
|
|
Then for each CSV record in turn:
|
|
|
|
o test all if blocks. If any of them contain a end rule, skip all re-
|
|
maining CSV records. Otherwise if any of them contain a skip rule,
|
|
skip that many CSV records. If there are multiple matched skip
|
|
rules, the first one wins.
|
|
|
|
o collect all field assignments at top level and in matched if blocks.
|
|
When there are multiple assignments for a field, keep only the last
|
|
one.
|
|
|
|
o compute a value for each hledger field - either the one that was as-
|
|
signed to it (and interpolate the %CSVFIELDNAME references), or a de-
|
|
fault
|
|
|
|
o generate a synthetic hledger transaction from these values, which be-
|
|
comes part of the input to the hledger command that has been selected
|
|
|
|
Valid transactions
|
|
hledger currently does not post-process and validate transactions gen-
|
|
erated from CSV as thoroughly as transactions read from a journal file.
|
|
This means that if your rules are wrong, you can generate invalid
|
|
transactions. Or, amounts may not be displayed with a canonical dis-
|
|
play style.
|
|
|
|
So when setting up or adjusting CSV rules, you should check your re-
|
|
sults visually with the print command. You can pipe print's output
|
|
through hledger once more to validate and canonicalise fully. Eg:
|
|
|
|
$ hledger -f some.csv print | hledger -f- print -I
|
|
|
|
(The -I/--ignore-assertions flag disables balance assertion checks,
|
|
usually needed when re-parsing print output.)
|
|
|
|
|
|
|
|
REPORTING BUGS
|
|
Report bugs at http://bugs.hledger.org (or on the #hledger IRC channel
|
|
or hledger mail list)
|
|
|
|
|
|
AUTHORS
|
|
Simon Michael <simon@joyful.com> and contributors
|
|
|
|
|
|
COPYRIGHT
|
|
Copyright (C) 2007-2019 Simon Michael.
|
|
Released under GNU GPL v3 or later.
|
|
|
|
|
|
SEE ALSO
|
|
hledger(1), hledger-ui(1), hledger-web(1), hledger-api(1),
|
|
hledger_csv(5), hledger_journal(5), hledger_timeclock(5), hledger_time-
|
|
dot(5), ledger(1)
|
|
|
|
http://hledger.org
|
|
|
|
|
|
|
|
hledger 1.15.99 September 2019 hledger_csv(5)
|