hledger/hledger-lib/hledger_csv.m4.md
2019-11-05 21:16:42 +00:00

12 KiB
Raw Blame History

author date title
author
monthyear hledger_csv(5) hledger version

web({{ docversionlinks({{csv}}) }}) man({{

NAME

CSV - how hledger reads CSV data, and the CSV rules file format

DESCRIPTION

}})

hledger can read CSV (comma-separated value) files as if they were journal files, automatically converting each CSV record into a transaction. (To learn about writing CSV, see CSV output.)

Converting CSV to transactions requires some special conversion rules. These do several things:

  • they describe the layout and format of the CSV data
  • they can customize the generated journal entries using a simple templating language
  • they can add refinements based on patterns in the CSV data, eg categorizing transactions with more detailed account names.

When reading a CSV file named FILE.csv, hledger looks for a conversion rules file named FILE.csv.rules in the same directory. You can override this with the --rules-file option. If the rules file does not exist, hledger will auto-create one with some example rules, which youll need to adjust.

At minimum, the rules file must identify the date and amount fields. Its often necessary to specify the date format, and the number of header lines to skip, also. Eg:

fields date, _, _, amount1
date-format  %d/%m/%Y
skip 1

A more complete example:

# hledger CSV rules for amazon.com order history

# sample:
# "Date","Type","To/From","Name","Status","Amount","Fees","Transaction ID"
# "Jul 29, 2012","Payment","To","Adapteva, Inc.","Completed","$25.00","$0.00","17LA58JSK6PRD4HDGLNJQPI1PB9N8DKPVHL"

# skip one header line
skip 1

# name the csv fields (and assign the transaction's date, amount and code)
fields date, _, toorfrom, name, amzstatus, amount1, fees, code

# how to parse the date
date-format %b %-d, %Y

# combine two fields to make the description
description %toorfrom %name

# save these fields as tags
comment     status:%amzstatus

# set the base account for all transactions
account1    assets:amazon

# flip the sign on the amount
amount      -%amount

# Put fees in a separate posting
amount3     %fees
comment3    fees

For more examples, see Convert CSV files.

CSV RULES

The following seven kinds of rule can appear in the rules file, in any order. Blank lines and lines beginning with # or ; are ignored.

skip

skipN

Skip this many non-empty lines preceding the CSV data. (Empty/blank lines are skipped automatically.) Youll need this whenever your CSV data contains header lines. Eg: This can also be used in a conditional block to ignore certain CSV records.

# ignore the first CSV line
skip 1

date-format

date-formatDATEFMT

When your CSV date fields are not formatted like YYYY/MM/DD (or YYYY-MM-DD or YYYY.MM.DD), youll need to specify the format. DATEFMT is a strptime-like date parsing pattern, which must parse the date field values completely. Examples:

# for dates like "11/06/2013":
date-format %m/%d/%Y
# for dates like "6/11/2013" (note the - to make leading zeros optional):
date-format %-d/%-m/%Y
# for dates like "2013-Nov-06":
date-format %Y-%h-%d
# for dates like "11/6/2013 11:32 PM":
date-format %-m/%-d/%Y %l:%M %p

field list

fieldsFIELDNAME1, FIELDNAME2

This (a) names the CSV fields, in order (names may not contain whitespace; uninteresting names may be left blank), and (b) assigns them to journal entry fields if you use any of these standard field names:

Fields date, date2, status, code, description will form transaction description.

An assignment to any of accountN, amountN, amountN-in, amountN-out, balanceN or currencyN will generate a posting (though its your responsibility to ensure it is a well formed one). Normally the Ns are consecutive starting from 1 but its not required. One posting will be generated for each unique N. If you wish to supply a comment for the posting, use commentN, though comment on its own will not cause posting to be generated.

Fields amount, amount-in, amount-out, currency, balance and comment are treated as aliases for amount1, and so on. If your rules file leads to both aliased fields having different values, hledger will raise an error.

You need to provide enough information to create at least one posting.

Eg:

# use the 1st, 2nd and 4th CSV fields as the entry's date, description and amount,
# and give the 7th and 8th fields meaningful names for later reference:
#
# CSV field:
#      1     2            3 4       5 6 7          8
# entry field:
fields date, description, , amount1, , , somefield, anotherfield

field assignment

ENTRYFIELDNAME FIELDVALUE

This sets a journal entry field (one of the standard names above) to the given text value, which can include CSV field values interpolated by name (%CSVFIELDNAME) or 1-based position (%N). Eg:

# set the amount to the 4th CSV field with "USD " prepended
amount USD %4
# combine three fields to make a comment (containing two tags)
comment note: %somefield - %anotherfield, date: %1

Field assignments can be used instead of or in addition to a field list.

Note, interpolation strips any outer whitespace, so a CSV value like " 1 " becomes 1 when interpolated (#1051).

conditional block

if PATTERN
    FIELDASSIGNMENTS

if
PATTERN
PATTERN
    FIELDASSIGNMENTS

if PATTERN
PATTERN
    skip N

if PATTERN
PATTERN
    skip end

This applies one or more field assignments, only to those CSV records matched by one of the PATTERNs. The patterns are case-insensitive regular expressions which match anywhere within the whole CSV record (its not yet possible to match within a specific field). When there are multiple patterns they can be written on separate lines, unindented. The field assignments are on separate lines indented by at least one space.

Instead of field assignments you can specify skip N to skip the next N records (including the one that matchied) or skip end to skip the rest of the file.

Examples:

# if the CSV record contains "groceries", set account2 to "expenses:groceries"
if groceries
 account2 expenses:groceries
# if the CSV record contains any of these patterns, set account2 and comment as shown
if
monthly service fee
atm transaction fee
banking thru software
 account2 expenses:business:banking
 comment  XXX deductible ? check it

include

includeRULESFILE

Include another rules file at this point. RULESFILE is either an absolute file path or a path relative to the current files directory. Eg:

# rules reused with several CSV files
include common.rules

newest-first

newest-first

Consider adding this rule if all of the following are true: you might be processing just one day of data, your CSV records are in reverse chronological order (newest first), and you care about preserving the order of same-day transactions. It usually isnt needed, because hledger autodetects the CSV order, but when all CSV records have the same date it will assume they are oldest first.

CSV TIPS

CSV ordering

The generated journal entries will be sorted by date. The order of same-day entries will be preserved (except in the special case where you might need newest-first, see above).

CSV accounts

Each journal entry will have at least two postings, to account1 and some other account (usually account2). Its conventional and recommended to use account1 for the account whose CSV we are reading.

CSV amounts

A posting amount could be set in one of these ways:

  • with an amountN field assignment, which sets the Nth postings amount

  • (When the CSV has debit and credit amounts in separate fields:)
    with field assignments for the amountN-in and amountN-out pseudo fields (both of them). Whichever one has a value will be used, with appropriate sign. If both contain a value, it might not work so well.

  • with balanceN field assignment that creates a balance assignment (see below).

There is some special handling for sign in amounts:

  • If an amount value is parenthesised, it will be de-parenthesised and sign-flipped.
  • If an amount value begins with a double minus sign, those will cancel out and be removed.

If the currency/commodity symbol is provided as a separate CSV field, assign it to the currency pseudo field (applicable to the whole transaction) or currencyN (applicable to Nth posting only); the symbol will be prepended to the amount (TODO: when there is an amount). Or, you can use an amountN field assignment for more control, eg:

fields date,description,currency,amount1
amount1 %amount1 %currency

CSV balance assertions/assignments

If the CSV includes a running balance, you can assign that to one of the pseudo fields balance (or balance1), balance2, … up to balance9. This will generate a balance assertion (or if the amount is left empty, a balance assignment), on the appropriate posting, whenever the running balance field is non-empty.

References to other fields and evaluation order

Field assignments could include references to other fields or even to the same field you are trying to assign:

fields date,description,currency,amount1

amount1 %amount1 USD
amount1 %amount1 EUR
amount1 %amount1 %currency

if SOME_REGEXP
    amount1 %amount1 GBP

This is how this file would be evaluated.

First, parts of CVS record are assigned according to fields directive.

Then all other field assignments written at top level, or included in if blocks are considered to see if they should be applied. They are checked in the order they are written, with later assignment overwriting earlier ones.

Once full set of field assignments that should be applied is known, their values are computed, and this is when all %<fieldname> references are evaluated.

So for a particular row from CSV file, value from fourth column would be assigned to amount1.

Then hledger will decide that amount1 would have to be amended to %amount1 USD, but this will not happen immediately. This choice would be replaced by decision to rewrite amount1 to %amount EUR, which will in turn be thrown away in favor of %amount1 %currency. If the if block condition will match the row, it will assign amount1 to %amount1 GBP.

Overall, we will end up with one of the two alternatives for amount1 - either %amount1 %currency or %amount1 GBP.

Now substitution of all referenced values will happen, using the current values for %amount1 and currency, which were provided by the fields directive.

Reading multiple CSV files

You can read multiple CSV files at once using multiple -f arguments on the command line, and hledger will look for a correspondingly-named rules file for each. Note if you use the --rules-file option, this one rules file will be used for all the CSV files being read.

Valid CSV

hledger follows RFC 4180, with the addition of a customisable separator character.

Some things to note:

When quoting fields,

  • you must use double quotes, not single quotes
  • spaces outside the quotes are not allowed.