diff --git a/hledger-lib/hledger_csv.5 b/hledger-lib/hledger_csv.5 index 45f6b889e..9562e6974 100644 --- a/hledger-lib/hledger_csv.5 +++ b/hledger-lib/hledger_csv.5 @@ -18,8 +18,8 @@ These do several things: .IP \[bu] 2 they describe the layout and format of the CSV data .IP \[bu] 2 -they can customize the generated journal entries using a simple -templating language +they can customize the generated journal entries (transactions) using a +simple templating language .IP \[bu] 2 they can add refinements based on patterns in the CSV data, eg categorizing transactions with more detailed account names. @@ -44,70 +44,142 @@ skip 1 \f[R] .fi .PP -A more complete example: +More examples in the EXAMPLES section below. +.SH CSV RULES +.PP +The following kinds of rule can appear in the rules file, in any order +(except for \f[C]end\f[R] which can appear only inside a conditional +block). +Blank lines and lines beginning with \f[C]#\f[R] or \f[C];\f[R] are +ignored. +.SS \f[C]skip\f[R] .IP .nf \f[C] -# hledger CSV rules for amazon.com order history - -# sample: -# \[dq]Date\[dq],\[dq]Type\[dq],\[dq]To/From\[dq],\[dq]Name\[dq],\[dq]Status\[dq],\[dq]Amount\[dq],\[dq]Fees\[dq],\[dq]Transaction ID\[dq] -# \[dq]Jul 29, 2012\[dq],\[dq]Payment\[dq],\[dq]To\[dq],\[dq]Adapteva, Inc.\[dq],\[dq]Completed\[dq],\[dq]$25.00\[dq],\[dq]$0.00\[dq],\[dq]17LA58JSK6PRD4HDGLNJQPI1PB9N8DKPVHL\[dq] - -# skip one header line -skip 1 - -# name the csv fields (and assign the transaction\[aq]s date, amount and code) -fields date, _, toorfrom, name, amzstatus, amount, fees, code - -# how to parse the date -date-format %b %-d, %Y - -# combine two fields to make the description -description %toorfrom %name - -# save these fields as tags -comment status:%amzstatus, fees:%fees - -# set the base account for all transactions -account1 assets:amazon - -# flip the sign on the amount -amount -%amount +skip N \f[R] .fi .PP -For more examples, see Convert CSV files. -.SH CSV RULES -.PP -The following seven kinds of rule can appear in the rules file, in any -order. -Blank lines and lines beginning with \f[C]#\f[R] or \f[C];\f[R] are -ignored. -.SS skip -.PP -\f[C]skip\f[R]\f[I]\f[CI]N\f[I]\f[R] -.PP -Skip this many non-empty lines preceding the CSV data. +The word \[dq]skip\[dq] followed by a number (or no number, meaning 1) +tells hledger to ignore this many non-empty lines preceding the CSV +data. (Empty/blank lines are skipped automatically.) You\[aq]ll need this whenever your CSV data contains header lines. +.PP +It also has a second purpose: it can be used to ignore certain CSV +records, see conditional blocks below. +.SS \f[C]fields\f[R] +.IP +.nf +\f[C] +fields FIELDNAME1, FIELDNAME2, ... +\f[R] +.fi +.PP +A fields list (\[dq]fields\[dq] followed by one or more comma-separated +field names) is the quick way to assign CSV field values to hledger +fields. +It (a) names the CSV fields, in order (names may not contain whitespace; +fields you don\[aq]t care about can be left unnamed), and (b) assigns +them to hledger fields if you use standard hledger field names. +Here\[aq]s an example: +.IP +.nf +\f[C] +# use the 1st, 2nd and 4th CSV fields as the transaction\[aq]s date, description and amount, +# ignore the 3rd, 5th and 6th fields, +# and name the 7th and 8th fields for later reference: +# 1 2 3 4 5 6 7 8 + +fields date, description, , amount1, , , somefield, anotherfield +\f[R] +.fi +.PP +Here are the standard hledger field names: +.SS Transaction fields +.PP +\f[C]date\f[R], \f[C]date2\f[R], \f[C]status\f[R], \f[C]code\f[R], +\f[C]description\f[R], \f[C]comment\f[R] can be used to form the +transaction\[aq]s first line. +Only \f[C]date\f[R] is required. +(See also date-format below.) +.SS Posting fields +.PP +\f[C]accountN\f[R], where N is 1 to 9, sets the Nth posting\[aq]s +account name. +Most often there are two postings, so you\[aq]ll want to set +\f[C]account1\f[R] and \f[C]account2\f[R]. +.PP +A number of field/pseudo-field names are available for setting posting +amounts: +.IP \[bu] 2 +\f[C]amountN\f[R] sets posting N\[aq]s amount +.IP \[bu] 2 +\f[C]amountN-in\f[R] and \f[C]amountN-out\f[R] can be used instead, if +the CSV has separate fields for debits and credits +.IP \[bu] 2 +\f[C]currencyN\f[R] sets a currency symbol to be left-prefixed to the +amount, useful if the CSV provides that as a separate field +.IP \[bu] 2 +\f[C]balanceN\f[R] sets a (separate) balance assertion amount (or when +no posting amount is set, a balance assignment) +.PP +If you write these with no number (\f[C]amount\f[R], +\f[C]amount-in\f[R], \f[C]amount-out\f[R], \f[C]currency\f[R], +\f[C]balance\f[R]), it means posting 1. +Also, if you set an amount for posting 1 only, a second posting that +balances the transaction will be generated automatically. +This helps support CSV rules created before hledger 1.16. +.PP +Finally, \f[C]commentN\f[R] sets a comment on the Nth posting. +Comments can of course contain tags. +.SS \f[C](field assignment)\f[R] +.IP +.nf +\f[C] +HLEDGERFIELDNAME FIELDVALUE +\f[R] +.fi +.PP +Instead of or in addition to a fields list, you can assign a value to a +hledger field by writing its name (any of the standard names above) +followed by a text value. +The value may contain interpolated CSV fields, referenced by their +1-based position in the CSV record (\f[C]%N\f[R]), or by the name they +were given in the fields list (\f[C]%CSVFIELDNAME\f[R]). Eg: .IP .nf \f[C] -# ignore the first CSV line -skip 1 +# set the amount to the 4th CSV field, with \[dq] USD\[dq] appended +amount %4 USD +\f[R] +.fi +.IP +.nf +\f[C] +# combine three fields to make a comment, containing note: and date: tags +comment note: %somefield - %anotherfield, date: %1 \f[R] .fi -.SS date-format .PP -\f[C]date-format\f[R]\f[I]\f[CI]DATEFMT\f[I]\f[R] +Interpolation strips any outer whitespace, so a CSV value like +\f[C]\[dq] 1 \[dq]\f[R] becomes \f[C]1\f[R] when interpolated (#1051). +Note you can only interpolate CSV fields, not the hledger fields being +assigned to; for more on this, see TIPS. +.SS \f[C]date-format\f[R] +.IP +.nf +\f[C] +date-format DATEFMT +\f[R] +.fi .PP -When your CSV date fields are not formatted like \f[C]YYYY/MM/DD\f[R] -(or \f[C]YYYY-MM-DD\f[R] or \f[C]YYYY.MM.DD\f[R]), you\[aq]ll need to -specify the format. -DATEFMT is a strptime-like date parsing pattern, which must parse the -date field values completely. +This is a helper for the \f[C]date\f[R] (and \f[C]date2\f[R]) fields. +If your CSV dates are not formatted like \f[C]YYYY-MM-DD\f[R], +\f[C]YYYY/MM/DD\f[R] or \f[C]YYYY.MM.DD\f[R], you\[aq]ll need to specify +the format by writing \[dq]date-format\[dq] followed by a strptime-like +date parsing pattern, which must parse the date field values completely. Examples: .IP .nf @@ -119,7 +191,7 @@ date-format %m/%d/%Y .IP .nf \f[C] -# for dates like \[dq]6/11/2013\[dq] (note the - to make leading zeros optional): +# for dates like \[dq]6/11/2013\[dq]. The - allows leading zeros to be optional. date-format %-d/%-m/%Y \f[R] .fi @@ -137,90 +209,47 @@ date-format %Y-%h-%d date-format %-m/%-d/%Y %l:%M %p \f[R] .fi -.SS field list -.PP -\f[C]fields\f[R]\f[I]\f[CI]FIELDNAME1\f[I]\f[R], -\f[I]\f[CI]FIELDNAME2\f[I]\f[R]... -.PP -This (a) names the CSV fields, in order (names may not contain -whitespace; uninteresting names may be left blank), and (b) assigns them -to journal entry fields if you use any of these standard field names: -\f[C]date\f[R], \f[C]date2\f[R], \f[C]status\f[R], \f[C]code\f[R], -\f[C]description\f[R], \f[C]comment\f[R], \f[C]account1\f[R], -\f[C]account2\f[R], \f[C]amount\f[R], \f[C]amount-in\f[R], -\f[C]amount-out\f[R], \f[C]currency\f[R], \f[C]balance\f[R], -\f[C]balance1\f[R], \f[C]balance2\f[R]. -Eg: +.SS \f[C]if\f[R] .IP .nf \f[C] -# use the 1st, 2nd and 4th CSV fields as the entry\[aq]s date, description and amount, -# and give the 7th and 8th fields meaningful names for later reference: -# -# CSV field: -# 1 2 3 4 5 6 7 8 -# entry field: -fields date, description, , amount, , , somefield, anotherfield -\f[R] -.fi -.SS field assignment -.PP -\f[I]\f[CI]ENTRYFIELDNAME\f[I]\f[R] \f[I]\f[CI]FIELDVALUE\f[I]\f[R] -.PP -This sets a journal entry field (one of the standard names above) to the -given text value, which can include CSV field values interpolated by -name (\f[C]%CSVFIELDNAME\f[R]) or 1-based position (\f[C]%N\f[R]). -Eg: -.IP -.nf -\f[C] -# set the amount to the 4th CSV field with \[dq]USD \[dq] prepended -amount USD %4 -\f[R] -.fi -.IP -.nf -\f[C] -# combine three fields to make a comment (containing two tags) -comment note: %somefield - %anotherfield, date: %1 +if PATTERN + RULE + +if +PATTERN +PATTERN +PATTERN + RULE + RULE \f[R] .fi .PP -Field assignments can be used instead of or in addition to a field list. +Conditional blocks apply one or more rules to CSV records which are +matched by any of the PATTERNs. +This allows transactions to be customised or categorised based on +patterns in the data. .PP -Note, interpolation strips any outer whitespace, so a CSV value like -\f[C]\[dq] 1 \[dq]\f[R] becomes \f[C]1\f[R] when interpolated (#1051). -.SS conditional block +A single pattern can be written on the same line as the \[dq]if\[dq]; or +multiple patterns can be written on the following lines, non-indented. .PP -\f[C]if\f[R] \f[I]\f[CI]PATTERN\f[I]\f[R] -.PD 0 -.P -.PD -\ \ \ \ \f[I]\f[CI]FIELDASSIGNMENTS\f[I]\f[R]... +Patterns are case-insensitive regular expressions which try to match any +part of the whole CSV record. +It\[aq]s not yet possible to match within a specific field. +Note the CSV record they see is close but not identical to the one in +the CSV file; eg double quotes are removed, and the separator character +becomes comma. .PP -\f[C]if\f[R] -.PD 0 -.P -.PD -\f[I]\f[CI]PATTERN\f[I]\f[R] -.PD 0 -.P -.PD -\f[I]\f[CI]PATTERN\f[I]\f[R]... -.PD 0 -.P -.PD -\ \ \ \ \f[I]\f[CI]FIELDASSIGNMENTS\f[I]\f[R]... +After the patterns, there should be one or more rules to apply, all +indented by at least one space. +Three kinds of rule are allowed in conditional blocks: +.IP \[bu] 2 +field assignments (to set a field\[aq]s value) +.IP \[bu] 2 +skip (to skip the matched CSV record) +.IP \[bu] 2 +end (to skip all remaining CSV records). .PP -This applies one or more field assignments, only to those CSV records -matched by one of the PATTERNs. -The patterns are case-insensitive regular expressions which match -anywhere within the whole CSV record (it\[aq]s not yet possible to match -within a specific field). -When there are multiple patterns they can be written on separate lines, -unindented. -The field assignments are on separate lines indented by at least one -space. Examples: .IP .nf @@ -242,112 +271,319 @@ banking thru software comment XXX deductible ? check it \f[R] .fi -.SS include +.SS \f[C]end\f[R] .PP -\f[C]include\f[R]\f[I]\f[CI]RULESFILE\f[I]\f[R] -.PP -Include another rules file at this point. -\f[C]RULESFILE\f[R] is either an absolute file path or a path relative -to the current file\[aq]s directory. +As mentioned above, this rule can be used inside conditional blocks +(only) to cause hledger to stop reading CSV records and proceed with +command execution. Eg: .IP .nf \f[C] -# rules reused with several CSV files -include common.rules +# ignore everything following the first empty record +if ,,,, + end +\f[R] +.fi +.SS \f[C]include\f[R] +.IP +.nf +\f[C] +include RULESFILE \f[R] .fi -.SS newest-first .PP -\f[C]newest-first\f[R] +Include another CSV rules file at this point, as if it were written +inline. +\f[C]RULESFILE\f[R] is an absolute file path or a path relative to the +current file\[aq]s directory. .PP -Consider adding this rule if all of the following are true: you might be -processing just one day of data, your CSV records are in reverse -chronological order (newest first), and you care about preserving the -order of same-day transactions. -It usually isn\[aq]t needed, because hledger autodetects the CSV order, -but when all CSV records have the same date it will assume they are -oldest first. -.SH CSV TIPS -.SS CSV ordering +This can be useful eg for reusing common rules in several rules files: +.IP +.nf +\f[C] +# someaccount.csv.rules + +## someaccount-specific rules +fields date,description,amount +account1 some:account +account2 some:misc + +## common rules +include categorisation.rules +\f[R] +.fi +.SS \f[C]newest-first\f[R] .PP -The generated journal entries will be sorted by date. -The order of same-day entries will be preserved (except in the special -case where you might need \f[C]newest-first\f[R], see above). -.SS CSV accounts -.PP -Each journal entry will have two postings, to \f[C]account1\f[R] and -\f[C]account2\f[R] respectively. -It\[aq]s not yet possible to generate entries with more than two -postings. -It\[aq]s conventional and recommended to use \f[C]account1\f[R] for the -account whose CSV we are reading. -.SS CSV amounts -.PP -A transaction amount must be set, in one of these ways: +hledger always sorts the generated transactions by date. +Transactions on the same date should appear in the same order as their +CSV records, as hledger can usually auto-detect whether the CSV\[aq]s +normal order is oldest first or newest first. +But if all of the following are true: .IP \[bu] 2 -with an \f[C]amount\f[R] field assignment, which sets the first -posting\[aq]s amount +the CSV might sometimes contain just one day of data (all records having +the same date) .IP \[bu] 2 -(When the CSV has debit and credit amounts in separate fields:) -.PD 0 -.P -.PD -with field assignments for the \f[C]amount-in\f[R] and -\f[C]amount-out\f[R] pseudo fields (both of them). -Whichever one has a value will be used, with appropriate sign. -If both contain a value, it might not work so well. +the CSV records are normally in reverse chronological order (newest +first) .IP \[bu] 2 -or implicitly by means of a balance assignment (see below). +and you care about preserving the order of same-day transactions +.PP +you should add the \f[C]newest-first\f[R] rule as a hint. +Eg: +.IP +.nf +\f[C] +# tell hledger explicitly that the CSV is normally newest-first +newest-first +\f[R] +.fi +.SH EXAMPLES +.PP +A more complete example, generating three-posting transactions: +.IP +.nf +\f[C] +# hledger CSV rules for amazon.com order history + +# sample: +# \[dq]Date\[dq],\[dq]Type\[dq],\[dq]To/From\[dq],\[dq]Name\[dq],\[dq]Status\[dq],\[dq]Amount\[dq],\[dq]Fees\[dq],\[dq]Transaction ID\[dq] +# \[dq]Jul 29, 2012\[dq],\[dq]Payment\[dq],\[dq]To\[dq],\[dq]Adapteva, Inc.\[dq],\[dq]Completed\[dq],\[dq]$25.00\[dq],\[dq]$0.00\[dq],\[dq]17LA58JSK6PRD4HDGLNJQPI1PB9N8DKPVHL\[dq] + +# skip one header line +skip 1 + +# name the csv fields (and assign the transaction\[aq]s date, amount and code) +fields date, _, toorfrom, name, amzstatus, amount1, fees, code + +# how to parse the date +date-format %b %-d, %Y + +# combine two fields to make the description +description %toorfrom %name + +# save these fields as tags +comment status:%amzstatus + +# set the base account for all transactions +account1 assets:amazon + +# flip the sign on the amount +amount -%amount + +# Put fees in a separate posting +amount3 %fees +comment3 fees +\f[R] +.fi +.PP +For more examples, see Convert CSV files. +.SH TIPS +.SS Reading multiple CSV files +.PP +You can read multiple CSV files at once using multiple \f[C]-f\f[R] +arguments on the command line. +hledger will look for a correspondingly-named rules file for each CSV +file. +If you use the \f[C]--rules-file\f[R] option, that rules file will be +used for all the CSV files. +.SS Deduplicating, importing +.PP +When you download a CSV file repeatedly, eg to get your latest bank +transactions, the new file may contain some of the same records as the +old one. +The print --new command is one simple way to detect just the new +transactions. +Or better still, the import command appends those new transactions to +your main journal. +This is the easiest way to import CSV data. +Eg, after downloading your latest CSV files: +.IP +.nf +\f[C] +$ hledger import *.csv [--dry] +\f[R] +.fi +.SS Other import methods +.PP +A number of other tools and workflows, hledger-specific and otherwise, +exist for converting, deduplicating, classifying and managing CSV data. +See: +.IP \[bu] 2 +https://hledger.org -> sidebar -> real world setups +.IP \[bu] 2 +https://plaintextaccounting.org -> data import/conversion +.SS Valid CSV +.PP +hledger accepts CSV conforming to RFC 4180. +Some things to note when values are enclosed in quotes: +.IP \[bu] 2 +you must use double quotes (not single quotes) +.IP \[bu] 2 +spaces outside the quotes are not allowed +.SS Other separator characters +.PP +With the \f[C]--separator \[aq]CHAR\[aq]\f[R] option, hledger will +expect the separator to be CHAR instead of a comma. +Ie it will read other \[dq]Character Separated Values\[dq] formats, such +as TSV (Tab Separated Values). +Note: on the command line, use a real tab character in quotes, not Eg: +.IP +.nf +\f[C] +$ hledger -f foo.tsv --separator \[aq] \[aq] print +\f[R] +.fi +.PP +(Experimental.) +.SS Setting amounts +.PP +A posting amount can be set in one of these ways: +.IP \[bu] 2 +by assigning (with a fields list or field assigment) to +\f[C]amountN\f[R] (posting N\[aq]s amount) or \f[C]amount\f[R] (posting +1\[aq]s amount) +.IP \[bu] 2 +by assigning to \f[C]amountN-in\f[R] and \f[C]amountN-out\f[R] (or +\f[C]amount-in\f[R] and \f[C]amount-out\f[R]). +For each CSV record, whichever of these has a non-zero value will be +used, with appropriate sign. +If both contain a non-zero value, this may not work. +.IP \[bu] 2 +by assigning to \f[C]balanceN\f[R] (or \f[C]balance\f[R]) instead of the +above, setting the amount indirectly via a balance assignment. .PP There is some special handling for sign in amounts: .IP \[bu] 2 If an amount value is parenthesised, it will be de-parenthesised and sign-flipped. .IP \[bu] 2 -If an amount value begins with a double minus sign, those will cancel -out and be removed. +If an amount value begins with a double minus sign, those cancel out and +are removed. .PP If the currency/commodity symbol is provided as a separate CSV field, -assign it to the \f[C]currency\f[R] pseudo field; the symbol will be -prepended to the amount (TODO: when there is an amount). -Or, you can use an \f[C]amount\f[R] field assignment for more control, -eg: +you can assign it to \f[C]currency\f[R] (affects all posting amounts) or +\f[C]currencyN\f[R] (affects just posting N\[aq]s amount). +The symbol will be prepended to the amount. +Or for more control, you can set both currency symbol and amount with a +field assignment, eg: .IP .nf \f[C] fields date,description,currency,amount +# add currency symbol on the right: amount %amount %currency \f[R] .fi -.SS CSV balance assertions/assignments +.SS Referencing other fields .PP -If the CSV includes a running balance, you can assign that to one of the -pseudo fields \f[C]balance\f[R] (or \f[C]balance1\f[R]) or -\f[C]balance2\f[R]. -This will generate a balance assertion (or if the amount is left empty, -a balance assignment), on the first or second posting, whenever the -running balance field is non-empty. -(TODO: #1000) -.SS Reading multiple CSV files +In field assignments, you can interpolate only CSV fields, not hledger +fields. +In the example below, there\[aq]s both a CSV field and a hledger field +named amount1, but %amount1 always means the CSV field, not the hledger +field: +.IP +.nf +\f[C] +# Name the third CSV field \[dq]amount1\[dq] +fields date,description,amount1 + +# Set hledger\[aq]s amount1 to the CSV amount1 field followed by USD +amount1 %amount1 USD + +# Set comment to the CSV amount1 (not the amount1 assigned above) +comment %amount1 +\f[R] +.fi .PP -You can read multiple CSV files at once using multiple \f[C]-f\f[R] -arguments on the command line, and hledger will look for a -correspondingly-named rules file for each. -Note if you use the \f[C]--rules-file\f[R] option, this one rules file -will be used for all the CSV files being read. -.SS Valid CSV +Here, since there\[aq]s no CSV amount1 field, %amount1 will produce a +literal \[dq]amount1\[dq]: +.IP +.nf +\f[C] +fields date,description,csvamount +amount1 %csvamount USD +# Can\[aq]t interpolate amount1 here +comment %amount1 +\f[R] +.fi .PP -hledger follows RFC 4180, with the addition of a customisable separator -character. +When there are multiple field assignments to the same hledger field, +only the last one takes effect. +Here, comment\[aq]s value will be be B, or C if \[dq]something\[dq] is +matched, but never A: +.IP +.nf +\f[C] +comment A +comment B +if something + comment C +\f[R] +.fi +.SS How CSV rules are evaluated .PP -Some things to note: -.PP -When quoting fields, +Here\[aq]s how to think of CSV rules being evaluated (if you really need +to). +First, .IP \[bu] 2 -you must use double quotes, not single quotes +include - all includes are inlined, from top to bottom, depth first. +(At each include point the file is inlined and scanned for further +includes, before proceeding.) +.PP +Then \[dq]global\[dq] rules are evaluated, top to bottom. +If a rule is repeated, the last one wins: .IP \[bu] 2 -spaces outside the quotes are not allowed. +skip (at top level) +.IP \[bu] 2 +date-format +.IP \[bu] 2 +newest-first +.IP \[bu] 2 +fields - names the CSV fields, optionally sets up initial assignments to +hledger fields +.PP +Then for each CSV record in turn: +.IP \[bu] 2 +test all \f[C]if\f[R] blocks. +If any of them contain a \f[C]end\f[R] rule, skip all remaining CSV +records. +Otherwise if any of them contain a \f[C]skip\f[R] rule, skip that many +CSV records. +If there are multiple matched skip rules, the first one wins. +.IP \[bu] 2 +collect all field assignments at top level and in matched if blocks. +When there are multiple assignments for a field, keep only the last one. +.IP \[bu] 2 +compute a value for each hledger field - either the one that was +assigned to it (and interpolate the %CSVFIELDNAME references), or a +default +.IP \[bu] 2 +generate a synthetic hledger transaction from these values, which +becomes part of the input to the hledger command that has been selected +.SS Valid transactions +.PP +hledger currently does not post-process and validate transactions +generated from CSV as thoroughly as transactions read from a journal +file. +This means that if your rules are wrong, you can generate invalid +transactions. +Or, amounts may not be displayed with a canonical display style. +.PP +So when setting up or adjusting CSV rules, you should check your results +visually with the print command. +You can pipe print\[aq]s output through hledger once more to validate +and canonicalise fully. +Eg: +.IP +.nf +\f[C] +$ hledger -f some.csv print | hledger -f- print -I +\f[R] +.fi +.PP +(The -I/--ignore-assertions flag disables balance assertion checks, +usually needed when re-parsing print output.) .SH "REPORTING BUGS" diff --git a/hledger-lib/hledger_csv.info b/hledger-lib/hledger_csv.info index 8e288f965..bac63c42b 100644 --- a/hledger-lib/hledger_csv.info +++ b/hledger-lib/hledger_csv.info @@ -14,8 +14,8 @@ transaction. (To learn about _writing_ CSV, see CSV output.) rules. These do several things: * they describe the layout and format of the CSV data - * they can customize the generated journal entries using a simple - templating language + * they can customize the generated journal entries (transactions) + using a simple templating language * they can add refinements based on patterns in the CSV data, eg categorizing transactions with more detailed account names. @@ -33,93 +33,164 @@ fields date, _, _, amount date-format %d/%m/%Y skip 1 - A more complete example: - -# hledger CSV rules for amazon.com order history - -# sample: -# "Date","Type","To/From","Name","Status","Amount","Fees","Transaction ID" -# "Jul 29, 2012","Payment","To","Adapteva, Inc.","Completed","$25.00","$0.00","17LA58JSK6PRD4HDGLNJQPI1PB9N8DKPVHL" - -# skip one header line -skip 1 - -# name the csv fields (and assign the transaction's date, amount and code) -fields date, _, toorfrom, name, amzstatus, amount, fees, code - -# how to parse the date -date-format %b %-d, %Y - -# combine two fields to make the description -description %toorfrom %name - -# save these fields as tags -comment status:%amzstatus, fees:%fees - -# set the base account for all transactions -account1 assets:amazon - -# flip the sign on the amount -amount -%amount - - For more examples, see Convert CSV files. + More examples in the EXAMPLES section below. * Menu: * CSV RULES:: -* CSV TIPS:: +* EXAMPLES:: +* TIPS::  -File: hledger_csv.info, Node: CSV RULES, Next: CSV TIPS, Prev: Top, Up: Top +File: hledger_csv.info, Node: CSV RULES, Next: EXAMPLES, Prev: Top, Up: Top 1 CSV RULES *********** -The following seven kinds of rule can appear in the rules file, in any -order. Blank lines and lines beginning with '#' or ';' are ignored. +The following kinds of rule can appear in the rules file, in any order +(except for 'end' which can appear only inside a conditional block). +Blank lines and lines beginning with '#' or ';' are ignored. * Menu: * skip:: -* date-format:: -* field list:: +* fields:: * field assignment:: -* conditional block:: +* date-format:: +* if:: +* end:: * include:: * newest-first::  -File: hledger_csv.info, Node: skip, Next: date-format, Up: CSV RULES +File: hledger_csv.info, Node: skip, Next: fields, Up: CSV RULES -1.1 skip -======== +1.1 'skip' +========== -'skip'_'N'_ +skip N - Skip this many non-empty lines preceding the CSV data. (Empty/blank -lines are skipped automatically.) You'll need this whenever your CSV -data contains header lines. Eg: + The word "skip" followed by a number (or no number, meaning 1) tells +hledger to ignore this many non-empty lines preceding the CSV data. +(Empty/blank lines are skipped automatically.) You'll need this +whenever your CSV data contains header lines. -# ignore the first CSV line -skip 1 + It also has a second purpose: it can be used to ignore certain CSV +records, see conditional blocks below.  -File: hledger_csv.info, Node: date-format, Next: field list, Prev: skip, Up: CSV RULES +File: hledger_csv.info, Node: fields, Next: field assignment, Prev: skip, Up: CSV RULES -1.2 date-format -=============== +1.2 'fields' +============ -'date-format'_'DATEFMT'_ +fields FIELDNAME1, FIELDNAME2, ... - When your CSV date fields are not formatted like 'YYYY/MM/DD' (or -'YYYY-MM-DD' or 'YYYY.MM.DD'), you'll need to specify the format. -DATEFMT is a strptime-like date parsing pattern, which must parse the -date field values completely. Examples: + A fields list ("fields" followed by one or more comma-separated field +names) is the quick way to assign CSV field values to hledger fields. +It (a) names the CSV fields, in order (names may not contain whitespace; +fields you don't care about can be left unnamed), and (b) assigns them +to hledger fields if you use standard hledger field names. Here's an +example: + +# use the 1st, 2nd and 4th CSV fields as the transaction's date, description and amount, +# ignore the 3rd, 5th and 6th fields, +# and name the 7th and 8th fields for later reference: +# 1 2 3 4 5 6 7 8 + +fields date, description, , amount1, , , somefield, anotherfield + + Here are the standard hledger field names: + +* Menu: + +* Transaction fields:: +* Posting fields:: + + +File: hledger_csv.info, Node: Transaction fields, Next: Posting fields, Up: fields + +1.2.1 Transaction fields +------------------------ + +'date', 'date2', 'status', 'code', 'description', 'comment' can be used +to form the transaction's first line. Only 'date' is required. (See +also date-format below.) + + +File: hledger_csv.info, Node: Posting fields, Prev: Transaction fields, Up: fields + +1.2.2 Posting fields +-------------------- + +'accountN', where N is 1 to 9, sets the Nth posting's account name. +Most often there are two postings, so you'll want to set 'account1' and +'account2'. + + A number of field/pseudo-field names are available for setting +posting amounts: + + * 'amountN' sets posting N's amount + * 'amountN-in' and 'amountN-out' can be used instead, if the CSV has + separate fields for debits and credits + * 'currencyN' sets a currency symbol to be left-prefixed to the + amount, useful if the CSV provides that as a separate field + * 'balanceN' sets a (separate) balance assertion amount (or when no + posting amount is set, a balance assignment) + + If you write these with no number ('amount', 'amount-in', +'amount-out', 'currency', 'balance'), it means posting 1. Also, if you +set an amount for posting 1 only, a second posting that balances the +transaction will be generated automatically. This helps support CSV +rules created before hledger 1.16. + + Finally, 'commentN' sets a comment on the Nth posting. Comments can +of course contain tags. + + +File: hledger_csv.info, Node: field assignment, Next: date-format, Prev: fields, Up: CSV RULES + +1.3 '(field assignment)' +======================== + +HLEDGERFIELDNAME FIELDVALUE + + Instead of or in addition to a fields list, you can assign a value to +a hledger field by writing its name (any of the standard names above) +followed by a text value. The value may contain interpolated CSV +fields, referenced by their 1-based position in the CSV record ('%N'), +or by the name they were given in the fields list ('%CSVFIELDNAME'). +Eg: + +# set the amount to the 4th CSV field, with " USD" appended +amount %4 USD + +# combine three fields to make a comment, containing note: and date: tags +comment note: %somefield - %anotherfield, date: %1 + + Interpolation strips any outer whitespace, so a CSV value like '" 1 +"' becomes '1' when interpolated (#1051). Note you can only interpolate +CSV fields, not the hledger fields being assigned to; for more on this, +see TIPS. + + +File: hledger_csv.info, Node: date-format, Next: if, Prev: field assignment, Up: CSV RULES + +1.4 'date-format' +================= + +date-format DATEFMT + + This is a helper for the 'date' (and 'date2') fields. If your CSV +dates are not formatted like 'YYYY-MM-DD', 'YYYY/MM/DD' or 'YYYY.MM.DD', +you'll need to specify the format by writing "date-format" followed by a +strptime-like date parsing pattern, which must parse the date field +values completely. Examples: # for dates like "11/06/2013": date-format %m/%d/%Y -# for dates like "6/11/2013" (note the - to make leading zeros optional): +# for dates like "6/11/2013". The - allows leading zeros to be optional. date-format %-d/%-m/%Y # for dates like "2013-Nov-06": @@ -129,73 +200,43 @@ date-format %Y-%h-%d date-format %-m/%-d/%Y %l:%M %p  -File: hledger_csv.info, Node: field list, Next: field assignment, Prev: date-format, Up: CSV RULES +File: hledger_csv.info, Node: if, Next: end, Prev: date-format, Up: CSV RULES -1.3 field list -============== +1.5 'if' +======== -'fields'_'FIELDNAME1'_, _'FIELDNAME2'_... +if PATTERN + RULE - This (a) names the CSV fields, in order (names may not contain -whitespace; uninteresting names may be left blank), and (b) assigns them -to journal entry fields if you use any of these standard field names: -'date', 'date2', 'status', 'code', 'description', 'comment', 'account1', -'account2', 'amount', 'amount-in', 'amount-out', 'currency', 'balance', -'balance1', 'balance2'. Eg: +if +PATTERN +PATTERN +PATTERN + RULE + RULE -# use the 1st, 2nd and 4th CSV fields as the entry's date, description and amount, -# and give the 7th and 8th fields meaningful names for later reference: -# -# CSV field: -# 1 2 3 4 5 6 7 8 -# entry field: -fields date, description, , amount, , , somefield, anotherfield + Conditional blocks apply one or more rules to CSV records which are +matched by any of the PATTERNs. This allows transactions to be +customised or categorised based on patterns in the data. - -File: hledger_csv.info, Node: field assignment, Next: conditional block, Prev: field list, Up: CSV RULES + A single pattern can be written on the same line as the "if"; or +multiple patterns can be written on the following lines, non-indented. -1.4 field assignment -==================== + Patterns are case-insensitive regular expressions which try to match +any part of the whole CSV record. It's not yet possible to match within +a specific field. Note the CSV record they see is close but not +identical to the one in the CSV file; eg double quotes are removed, and +the separator character becomes comma. -_'ENTRYFIELDNAME'_ _'FIELDVALUE'_ + After the patterns, there should be one or more rules to apply, all +indented by at least one space. Three kinds of rule are allowed in +conditional blocks: - This sets a journal entry field (one of the standard names above) to -the given text value, which can include CSV field values interpolated by -name ('%CSVFIELDNAME') or 1-based position ('%N'). Eg: + * field assignments (to set a field's value) + * skip (to skip the matched CSV record) + * end (to skip all remaining CSV records). -# set the amount to the 4th CSV field with "USD " prepended -amount USD %4 - -# combine three fields to make a comment (containing two tags) -comment note: %somefield - %anotherfield, date: %1 - - Field assignments can be used instead of or in addition to a field -list. - - Note, interpolation strips any outer whitespace, so a CSV value like -'" 1 "' becomes '1' when interpolated (#1051). - - -File: hledger_csv.info, Node: conditional block, Next: include, Prev: field assignment, Up: CSV RULES - -1.5 conditional block -===================== - -'if' _'PATTERN'_ - _'FIELDASSIGNMENTS'_... - - 'if' -_'PATTERN'_ -_'PATTERN'_... - _'FIELDASSIGNMENTS'_... - - This applies one or more field assignments, only to those CSV records -matched by one of the PATTERNs. The patterns are case-insensitive -regular expressions which match anywhere within the whole CSV record -(it's not yet possible to match within a specific field). When there -are multiple patterns they can be written on separate lines, unindented. -The field assignments are on separate lines indented by at least one -space. Examples: + Examples: # if the CSV record contains "groceries", set account2 to "expenses:groceries" if groceries @@ -210,176 +251,369 @@ banking thru software comment XXX deductible ? check it  -File: hledger_csv.info, Node: include, Next: newest-first, Prev: conditional block, Up: CSV RULES +File: hledger_csv.info, Node: end, Next: include, Prev: if, Up: CSV RULES -1.6 include -=========== +1.6 'end' +========= -'include'_'RULESFILE'_ +As mentioned above, this rule can be used inside conditional blocks +(only) to cause hledger to stop reading CSV records and proceed with +command execution. Eg: - Include another rules file at this point. 'RULESFILE' is either an -absolute file path or a path relative to the current file's directory. -Eg: +# ignore everything following the first empty record +if ,,,, + end -# rules reused with several CSV files -include common.rules + +File: hledger_csv.info, Node: include, Next: newest-first, Prev: end, Up: CSV RULES + +1.7 'include' +============= + +include RULESFILE + + Include another CSV rules file at this point, as if it were written +inline. 'RULESFILE' is an absolute file path or a path relative to the +current file's directory. + + This can be useful eg for reusing common rules in several rules +files: + +# someaccount.csv.rules + +## someaccount-specific rules +fields date,description,amount +account1 some:account +account2 some:misc + +## common rules +include categorisation.rules  File: hledger_csv.info, Node: newest-first, Prev: include, Up: CSV RULES -1.7 newest-first -================ +1.8 'newest-first' +================== -'newest-first' +hledger always sorts the generated transactions by date. Transactions +on the same date should appear in the same order as their CSV records, +as hledger can usually auto-detect whether the CSV's normal order is +oldest first or newest first. But if all of the following are true: - Consider adding this rule if all of the following are true: you might -be processing just one day of data, your CSV records are in reverse -chronological order (newest first), and you care about preserving the -order of same-day transactions. It usually isn't needed, because -hledger autodetects the CSV order, but when all CSV records have the -same date it will assume they are oldest first. + * the CSV might sometimes contain just one day of data (all records + having the same date) + * the CSV records are normally in reverse chronological order (newest + first) + * and you care about preserving the order of same-day transactions + + you should add the 'newest-first' rule as a hint. Eg: + +# tell hledger explicitly that the CSV is normally newest-first +newest-first  -File: hledger_csv.info, Node: CSV TIPS, Prev: CSV RULES, Up: Top +File: hledger_csv.info, Node: EXAMPLES, Next: TIPS, Prev: CSV RULES, Up: Top -2 CSV TIPS +2 EXAMPLES ********** +A more complete example, generating three-posting transactions: + +# hledger CSV rules for amazon.com order history + +# sample: +# "Date","Type","To/From","Name","Status","Amount","Fees","Transaction ID" +# "Jul 29, 2012","Payment","To","Adapteva, Inc.","Completed","$25.00","$0.00","17LA58JSK6PRD4HDGLNJQPI1PB9N8DKPVHL" + +# skip one header line +skip 1 + +# name the csv fields (and assign the transaction's date, amount and code) +fields date, _, toorfrom, name, amzstatus, amount1, fees, code + +# how to parse the date +date-format %b %-d, %Y + +# combine two fields to make the description +description %toorfrom %name + +# save these fields as tags +comment status:%amzstatus + +# set the base account for all transactions +account1 assets:amazon + +# flip the sign on the amount +amount -%amount + +# Put fees in a separate posting +amount3 %fees +comment3 fees + + For more examples, see Convert CSV files. + + +File: hledger_csv.info, Node: TIPS, Prev: EXAMPLES, Up: Top + +3 TIPS +****** + * Menu: -* CSV ordering:: -* CSV accounts:: -* CSV amounts:: -* CSV balance assertions/assignments:: * Reading multiple CSV files:: +* Deduplicating importing:: +* Other import methods:: * Valid CSV:: +* Other separator characters:: +* Setting amounts:: +* Referencing other fields:: +* How CSV rules are evaluated:: +* Valid transactions::  -File: hledger_csv.info, Node: CSV ordering, Next: CSV accounts, Up: CSV TIPS +File: hledger_csv.info, Node: Reading multiple CSV files, Next: Deduplicating importing, Up: TIPS -2.1 CSV ordering -================ +3.1 Reading multiple CSV files +============================== -The generated journal entries will be sorted by date. The order of -same-day entries will be preserved (except in the special case where you -might need 'newest-first', see above). +You can read multiple CSV files at once using multiple '-f' arguments on +the command line. hledger will look for a correspondingly-named rules +file for each CSV file. If you use the '--rules-file' option, that +rules file will be used for all the CSV files.  -File: hledger_csv.info, Node: CSV accounts, Next: CSV amounts, Prev: CSV ordering, Up: CSV TIPS +File: hledger_csv.info, Node: Deduplicating importing, Next: Other import methods, Prev: Reading multiple CSV files, Up: TIPS -2.2 CSV accounts -================ +3.2 Deduplicating, importing +============================ -Each journal entry will have two postings, to 'account1' and 'account2' -respectively. It's not yet possible to generate entries with more than -two postings. It's conventional and recommended to use 'account1' for -the account whose CSV we are reading. +When you download a CSV file repeatedly, eg to get your latest bank +transactions, the new file may contain some of the same records as the +old one. The print -new command is one simple way to detect just the +new transactions. Or better still, the import command appends those new +transactions to your main journal. This is the easiest way to import +CSV data. Eg, after downloading your latest CSV files: + +$ hledger import *.csv [--dry]  -File: hledger_csv.info, Node: CSV amounts, Next: CSV balance assertions/assignments, Prev: CSV accounts, Up: CSV TIPS +File: hledger_csv.info, Node: Other import methods, Next: Valid CSV, Prev: Deduplicating importing, Up: TIPS -2.3 CSV amounts -=============== +3.3 Other import methods +======================== -A transaction amount must be set, in one of these ways: +A number of other tools and workflows, hledger-specific and otherwise, +exist for converting, deduplicating, classifying and managing CSV data. +See: - * with an 'amount' field assignment, which sets the first posting's - amount + * https://hledger.org -> sidebar -> real world setups + * https://plaintextaccounting.org -> data import/conversion - * (When the CSV has debit and credit amounts in separate fields:) - with field assignments for the 'amount-in' and 'amount-out' pseudo - fields (both of them). Whichever one has a value will be used, - with appropriate sign. If both contain a value, it might not work - so well. + +File: hledger_csv.info, Node: Valid CSV, Next: Other separator characters, Prev: Other import methods, Up: TIPS - * or implicitly by means of a balance assignment (see below). +3.4 Valid CSV +============= + +hledger accepts CSV conforming to RFC 4180. Some things to note when +values are enclosed in quotes: + + * you must use double quotes (not single quotes) + * spaces outside the quotes are not allowed + + +File: hledger_csv.info, Node: Other separator characters, Next: Setting amounts, Prev: Valid CSV, Up: TIPS + +3.5 Other separator characters +============================== + +With the '--separator 'CHAR'' option, hledger will expect the separator +to be CHAR instead of a comma. Ie it will read other "Character +Separated Values" formats, such as TSV (Tab Separated Values). Note: on +the command line, use a real tab character in quotes, not + +$ hledger -f foo.tsv --separator ' ' print + + (Experimental.) + + +File: hledger_csv.info, Node: Setting amounts, Next: Referencing other fields, Prev: Other separator characters, Up: TIPS + +3.6 Setting amounts +=================== + +A posting amount can be set in one of these ways: + + * by assigning (with a fields list or field assigment) to 'amountN' + (posting N's amount) or 'amount' (posting 1's amount) + + * by assigning to 'amountN-in' and 'amountN-out' (or 'amount-in' and + 'amount-out'). For each CSV record, whichever of these has a + non-zero value will be used, with appropriate sign. If both + contain a non-zero value, this may not work. + + * by assigning to 'balanceN' (or 'balance') instead of the above, + setting the amount indirectly via a balance assignment. There is some special handling for sign in amounts: * If an amount value is parenthesised, it will be de-parenthesised and sign-flipped. - * If an amount value begins with a double minus sign, those will - cancel out and be removed. + * If an amount value begins with a double minus sign, those cancel + out and are removed. If the currency/commodity symbol is provided as a separate CSV field, -assign it to the 'currency' pseudo field; the symbol will be prepended -to the amount (TODO: when there is an amount). Or, you can use an -'amount' field assignment for more control, eg: +you can assign it to 'currency' (affects all posting amounts) or +'currencyN' (affects just posting N's amount). The symbol will be +prepended to the amount. Or for more control, you can set both currency +symbol and amount with a field assignment, eg: fields date,description,currency,amount +# add currency symbol on the right: amount %amount %currency  -File: hledger_csv.info, Node: CSV balance assertions/assignments, Next: Reading multiple CSV files, Prev: CSV amounts, Up: CSV TIPS +File: hledger_csv.info, Node: Referencing other fields, Next: How CSV rules are evaluated, Prev: Setting amounts, Up: TIPS -2.4 CSV balance assertions/assignments -====================================== +3.7 Referencing other fields +============================ -If the CSV includes a running balance, you can assign that to one of the -pseudo fields 'balance' (or 'balance1') or 'balance2'. This will -generate a balance assertion (or if the amount is left empty, a balance -assignment), on the first or second posting, whenever the running -balance field is non-empty. (TODO: #1000) +In field assignments, you can interpolate only CSV fields, not hledger +fields. In the example below, there's both a CSV field and a hledger +field named amount1, but %amount1 always means the CSV field, not the +hledger field: + +# Name the third CSV field "amount1" +fields date,description,amount1 + +# Set hledger's amount1 to the CSV amount1 field followed by USD +amount1 %amount1 USD + +# Set comment to the CSV amount1 (not the amount1 assigned above) +comment %amount1 + + Here, since there's no CSV amount1 field, %amount1 will produce a +literal "amount1": + +fields date,description,csvamount +amount1 %csvamount USD +# Can't interpolate amount1 here +comment %amount1 + + When there are multiple field assignments to the same hledger field, +only the last one takes effect. Here, comment's value will be be B, or +C if "something" is matched, but never A: + +comment A +comment B +if something + comment C  -File: hledger_csv.info, Node: Reading multiple CSV files, Next: Valid CSV, Prev: CSV balance assertions/assignments, Up: CSV TIPS +File: hledger_csv.info, Node: How CSV rules are evaluated, Next: Valid transactions, Prev: Referencing other fields, Up: TIPS -2.5 Reading multiple CSV files -============================== +3.8 How CSV rules are evaluated +=============================== -You can read multiple CSV files at once using multiple '-f' arguments on -the command line, and hledger will look for a correspondingly-named -rules file for each. Note if you use the '--rules-file' option, this -one rules file will be used for all the CSV files being read. +Here's how to think of CSV rules being evaluated (if you really need +to). First, + + * include - all includes are inlined, from top to bottom, depth + first. (At each include point the file is inlined and scanned for + further includes, before proceeding.) + + Then "global" rules are evaluated, top to bottom. If a rule is +repeated, the last one wins: + + * skip (at top level) + * date-format + * newest-first + * fields - names the CSV fields, optionally sets up initial + assignments to hledger fields + + Then for each CSV record in turn: + + * test all 'if' blocks. If any of them contain a 'end' rule, skip + all remaining CSV records. Otherwise if any of them contain a + 'skip' rule, skip that many CSV records. If there are multiple + matched skip rules, the first one wins. + * collect all field assignments at top level and in matched if + blocks. When there are multiple assignments for a field, keep only + the last one. + * compute a value for each hledger field - either the one that was + assigned to it (and interpolate the %CSVFIELDNAME references), or a + default + * generate a synthetic hledger transaction from these values, which + becomes part of the input to the hledger command that has been + selected  -File: hledger_csv.info, Node: Valid CSV, Prev: Reading multiple CSV files, Up: CSV TIPS +File: hledger_csv.info, Node: Valid transactions, Prev: How CSV rules are evaluated, Up: TIPS -2.6 Valid CSV -============= +3.9 Valid transactions +====================== -hledger follows RFC 4180, with the addition of a customisable separator -character. +hledger currently does not post-process and validate transactions +generated from CSV as thoroughly as transactions read from a journal +file. This means that if your rules are wrong, you can generate invalid +transactions. Or, amounts may not be displayed with a canonical display +style. - Some things to note: + So when setting up or adjusting CSV rules, you should check your +results visually with the print command. You can pipe print's output +through hledger once more to validate and canonicalise fully. Eg: - When quoting fields, +$ hledger -f some.csv print | hledger -f- print -I - * you must use double quotes, not single quotes - * spaces outside the quotes are not allowed. + (The -I/-ignore-assertions flag disables balance assertion checks, +usually needed when re-parsing print output.)  Tag Table: Node: Top72 -Node: CSV RULES2167 -Ref: #csv-rules2275 -Node: skip2538 -Ref: #skip2632 -Node: date-format2857 -Ref: #date-format2984 -Node: field list3534 -Ref: #field-list3671 -Node: field assignment4401 -Ref: #field-assignment4556 -Node: conditional block5180 -Ref: #conditional-block5334 -Node: include6230 -Ref: #include6360 -Node: newest-first6591 -Ref: #newest-first6705 -Node: CSV TIPS7116 -Ref: #csv-tips7210 -Node: CSV ordering7354 -Ref: #csv-ordering7472 -Node: CSV accounts7653 -Ref: #csv-accounts7791 -Node: CSV amounts8045 -Ref: #csv-amounts8203 -Node: CSV balance assertions/assignments9283 -Ref: #csv-balance-assertionsassignments9501 -Node: Reading multiple CSV files9822 -Ref: #reading-multiple-csv-files10022 -Node: Valid CSV10296 -Ref: #valid-csv10419 +Node: CSV RULES1428 +Ref: #csv-rules1536 +Node: skip1849 +Ref: #skip1942 +Node: fields2312 +Ref: #fields2434 +Node: Transaction fields3239 +Ref: #transaction-fields3379 +Node: Posting fields3547 +Ref: #posting-fields3679 +Node: field assignment4729 +Ref: #field-assignment4882 +Node: date-format5693 +Ref: #date-format5828 +Node: if6440 +Ref: #if6544 +Node: end7915 +Ref: #end8017 +Node: include8246 +Ref: #include8366 +Node: newest-first8804 +Ref: #newest-first8922 +Node: EXAMPLES9594 +Ref: #examples9701 +Node: TIPS10607 +Ref: #tips10688 +Node: Reading multiple CSV files10931 +Ref: #reading-multiple-csv-files11098 +Node: Deduplicating importing11358 +Ref: #deduplicating-importing11550 +Node: Other import methods11991 +Ref: #other-import-methods12158 +Node: Valid CSV12428 +Ref: #valid-csv12576 +Node: Other separator characters12778 +Ref: #other-separator-characters12955 +Node: Setting amounts13289 +Ref: #setting-amounts13459 +Node: Referencing other fields14702 +Ref: #referencing-other-fields14891 +Node: How CSV rules are evaluated15788 +Ref: #how-csv-rules-are-evaluated15986 +Node: Valid transactions17266 +Ref: #valid-transactions17413  End Tag Table diff --git a/hledger-lib/hledger_csv.txt b/hledger-lib/hledger_csv.txt index d2d1e594a..196b5adf9 100644 --- a/hledger-lib/hledger_csv.txt +++ b/hledger-lib/hledger_csv.txt @@ -16,8 +16,8 @@ DESCRIPTION o they describe the layout and format of the CSV data - o they can customize the generated journal entries using a simple tem- - plating language + o they can customize the generated journal entries (transactions) using + a simple templating language o they can add refinements based on patterns in the CSV data, eg cate- gorizing transactions with more detailed account names. @@ -36,63 +36,109 @@ DESCRIPTION date-format %d/%m/%Y skip 1 - A more complete example: - - # hledger CSV rules for amazon.com order history - - # sample: - # "Date","Type","To/From","Name","Status","Amount","Fees","Transaction ID" - # "Jul 29, 2012","Payment","To","Adapteva, Inc.","Completed","$25.00","$0.00","17LA58JSK6PRD4HDGLNJQPI1PB9N8DKPVHL" - - # skip one header line - skip 1 - - # name the csv fields (and assign the transaction's date, amount and code) - fields date, _, toorfrom, name, amzstatus, amount, fees, code - - # how to parse the date - date-format %b %-d, %Y - - # combine two fields to make the description - description %toorfrom %name - - # save these fields as tags - comment status:%amzstatus, fees:%fees - - # set the base account for all transactions - account1 assets:amazon - - # flip the sign on the amount - amount -%amount - - For more examples, see Convert CSV files. + More examples in the EXAMPLES section below. CSV RULES - The following seven kinds of rule can appear in the rules file, in any - order. Blank lines and lines beginning with # or ; are ignored. + The following kinds of rule can appear in the rules file, in any order + (except for end which can appear only inside a conditional block). + Blank lines and lines beginning with # or ; are ignored. skip - skipN + skip N - Skip this many non-empty lines preceding the CSV data. (Empty/blank - lines are skipped automatically.) You'll need this whenever your CSV - data contains header lines. Eg: + The word "skip" followed by a number (or no number, meaning 1) tells + hledger to ignore this many non-empty lines preceding the CSV data. + (Empty/blank lines are skipped automatically.) You'll need this when- + ever your CSV data contains header lines. - # ignore the first CSV line - skip 1 + It also has a second purpose: it can be used to ignore certain CSV + records, see conditional blocks below. + + fields + fields FIELDNAME1, FIELDNAME2, ... + + A fields list ("fields" followed by one or more comma-separated field + names) is the quick way to assign CSV field values to hledger fields. + It (a) names the CSV fields, in order (names may not contain white- + space; fields you don't care about can be left unnamed), and (b) as- + signs them to hledger fields if you use standard hledger field names. + Here's an example: + + # use the 1st, 2nd and 4th CSV fields as the transaction's date, description and amount, + # ignore the 3rd, 5th and 6th fields, + # and name the 7th and 8th fields for later reference: + # 1 2 3 4 5 6 7 8 + + fields date, description, , amount1, , , somefield, anotherfield + + Here are the standard hledger field names: + + Transaction fields + date, date2, status, code, description, comment can be used to form the + transaction's first line. Only date is required. (See also date-for- + mat below.) + + Posting fields + accountN, where N is 1 to 9, sets the Nth posting's account name. Most + often there are two postings, so you'll want to set account1 and ac- + count2. + + A number of field/pseudo-field names are available for setting posting + amounts: + + o amountN sets posting N's amount + + o amountN-in and amountN-out can be used instead, if the CSV has sepa- + rate fields for debits and credits + + o currencyN sets a currency symbol to be left-prefixed to the amount, + useful if the CSV provides that as a separate field + + o balanceN sets a (separate) balance assertion amount (or when no post- + ing amount is set, a balance assignment) + + If you write these with no number (amount, amount-in, amount-out, cur- + rency, balance), it means posting 1. Also, if you set an amount for + posting 1 only, a second posting that balances the transaction will be + generated automatically. This helps support CSV rules created before + hledger 1.16. + + Finally, commentN sets a comment on the Nth posting. Comments can of + course contain tags. + + (field assignment) + HLEDGERFIELDNAME FIELDVALUE + + Instead of or in addition to a fields list, you can assign a value to a + hledger field by writing its name (any of the standard names above) + followed by a text value. The value may contain interpolated CSV + fields, referenced by their 1-based position in the CSV record (%N), or + by the name they were given in the fields list (%CSVFIELDNAME). Eg: + + # set the amount to the 4th CSV field, with " USD" appended + amount %4 USD + + # combine three fields to make a comment, containing note: and date: tags + comment note: %somefield - %anotherfield, date: %1 + + Interpolation strips any outer whitespace, so a CSV value like " 1 " + becomes 1 when interpolated (#1051). Note you can only interpolate CSV + fields, not the hledger fields being assigned to; for more on this, see + TIPS. date-format - date-formatDATEFMT + date-format DATEFMT - When your CSV date fields are not formatted like YYYY/MM/DD (or YYYY- - MM-DD or YYYY.MM.DD), you'll need to specify the format. DATEFMT is a - strptime-like date parsing pattern, which must parse the date field - values completely. Examples: + This is a helper for the date (and date2) fields. If your CSV dates + are not formatted like YYYY-MM-DD, YYYY/MM/DD or YYYY.MM.DD, you'll + need to specify the format by writing "date-format" followed by a strp- + time-like date parsing pattern, which must parse the date field values + completely. Examples: # for dates like "11/06/2013": date-format %m/%d/%Y - # for dates like "6/11/2013" (note the - to make leading zeros optional): + # for dates like "6/11/2013". The - allows leading zeros to be optional. date-format %-d/%-m/%Y # for dates like "2013-Nov-06": @@ -101,59 +147,41 @@ CSV RULES # for dates like "11/6/2013 11:32 PM": date-format %-m/%-d/%Y %l:%M %p - field list - fieldsFIELDNAME1, FIELDNAME2... + if + if PATTERN + RULE - This (a) names the CSV fields, in order (names may not contain white- - space; uninteresting names may be left blank), and (b) assigns them to - journal entry fields if you use any of these standard field names: - date, date2, status, code, description, comment, account1, account2, - amount, amount-in, amount-out, currency, balance, balance1, balance2. - Eg: + if + PATTERN + PATTERN + PATTERN + RULE + RULE - # use the 1st, 2nd and 4th CSV fields as the entry's date, description and amount, - # and give the 7th and 8th fields meaningful names for later reference: - # - # CSV field: - # 1 2 3 4 5 6 7 8 - # entry field: - fields date, description, , amount, , , somefield, anotherfield + Conditional blocks apply one or more rules to CSV records which are + matched by any of the PATTERNs. This allows transactions to be cus- + tomised or categorised based on patterns in the data. - field assignment - ENTRYFIELDNAME FIELDVALUE + A single pattern can be written on the same line as the "if"; or multi- + ple patterns can be written on the following lines, non-indented. - This sets a journal entry field (one of the standard names above) to - the given text value, which can include CSV field values interpolated - by name (%CSVFIELDNAME) or 1-based position (%N). Eg: + Patterns are case-insensitive regular expressions which try to match + any part of the whole CSV record. It's not yet possible to match + within a specific field. Note the CSV record they see is close but not + identical to the one in the CSV file; eg double quotes are removed, and + the separator character becomes comma. - # set the amount to the 4th CSV field with "USD " prepended - amount USD %4 + After the patterns, there should be one or more rules to apply, all in- + dented by at least one space. Three kinds of rule are allowed in con- + ditional blocks: - # combine three fields to make a comment (containing two tags) - comment note: %somefield - %anotherfield, date: %1 + o field assignments (to set a field's value) - Field assignments can be used instead of or in addition to a field - list. + o skip (to skip the matched CSV record) - Note, interpolation strips any outer whitespace, so a CSV value like " - 1 " becomes 1 when interpolated (#1051). + o end (to skip all remaining CSV records). - conditional block - if PATTERN - FIELDASSIGNMENTS... - - if - PATTERN - PATTERN... - FIELDASSIGNMENTS... - - This applies one or more field assignments, only to those CSV records - matched by one of the PATTERNs. The patterns are case-insensitive reg- - ular expressions which match anywhere within the whole CSV record (it's - not yet possible to match within a specific field). When there are - multiple patterns they can be written on separate lines, unindented. - The field assignments are on separate lines indented by at least one - space. Examples: + Examples: # if the CSV record contains "groceries", set account2 to "expenses:groceries" if groceries @@ -167,90 +195,250 @@ CSV RULES account2 expenses:business:banking comment XXX deductible ? check it + end + As mentioned above, this rule can be used inside conditional blocks + (only) to cause hledger to stop reading CSV records and proceed with + command execution. Eg: + + # ignore everything following the first empty record + if ,,,, + end + include - includeRULESFILE + include RULESFILE - Include another rules file at this point. RULESFILE is either an abso- - lute file path or a path relative to the current file's directory. Eg: + Include another CSV rules file at this point, as if it were written in- + line. RULESFILE is an absolute file path or a path relative to the + current file's directory. - # rules reused with several CSV files - include common.rules + This can be useful eg for reusing common rules in several rules files: + + # someaccount.csv.rules + + ## someaccount-specific rules + fields date,description,amount + account1 some:account + account2 some:misc + + ## common rules + include categorisation.rules newest-first - newest-first + hledger always sorts the generated transactions by date. Transactions + on the same date should appear in the same order as their CSV records, + as hledger can usually auto-detect whether the CSV's normal order is + oldest first or newest first. But if all of the following are true: - Consider adding this rule if all of the following are true: you might - be processing just one day of data, your CSV records are in reverse - chronological order (newest first), and you care about preserving the - order of same-day transactions. It usually isn't needed, because - hledger autodetects the CSV order, but when all CSV records have the - same date it will assume they are oldest first. + o the CSV might sometimes contain just one day of data (all records + having the same date) -CSV TIPS - CSV ordering - The generated journal entries will be sorted by date. The order of - same-day entries will be preserved (except in the special case where - you might need newest-first, see above). + o the CSV records are normally in reverse chronological order (newest + first) - CSV accounts - Each journal entry will have two postings, to account1 and account2 re- - spectively. It's not yet possible to generate entries with more than - two postings. It's conventional and recommended to use account1 for - the account whose CSV we are reading. + o and you care about preserving the order of same-day transactions - CSV amounts - A transaction amount must be set, in one of these ways: + you should add the newest-first rule as a hint. Eg: - o with an amount field assignment, which sets the first posting's - amount + # tell hledger explicitly that the CSV is normally newest-first + newest-first - o (When the CSV has debit and credit amounts in separate fields:) - with field assignments for the amount-in and amount-out pseudo fields - (both of them). Whichever one has a value will be used, with appropri- - ate sign. If both contain a value, it might not work so well. +EXAMPLES + A more complete example, generating three-posting transactions: - o or implicitly by means of a balance assignment (see below). + # hledger CSV rules for amazon.com order history + + # sample: + # "Date","Type","To/From","Name","Status","Amount","Fees","Transaction ID" + # "Jul 29, 2012","Payment","To","Adapteva, Inc.","Completed","$25.00","$0.00","17LA58JSK6PRD4HDGLNJQPI1PB9N8DKPVHL" + + # skip one header line + skip 1 + + # name the csv fields (and assign the transaction's date, amount and code) + fields date, _, toorfrom, name, amzstatus, amount1, fees, code + + # how to parse the date + date-format %b %-d, %Y + + # combine two fields to make the description + description %toorfrom %name + + # save these fields as tags + comment status:%amzstatus + + # set the base account for all transactions + account1 assets:amazon + + # flip the sign on the amount + amount -%amount + + # Put fees in a separate posting + amount3 %fees + comment3 fees + + For more examples, see Convert CSV files. + +TIPS + Reading multiple CSV files + You can read multiple CSV files at once using multiple -f arguments on + the command line. hledger will look for a correspondingly-named rules + file for each CSV file. If you use the --rules-file option, that rules + file will be used for all the CSV files. + + Deduplicating, importing + When you download a CSV file repeatedly, eg to get your latest bank + transactions, the new file may contain some of the same records as the + old one. The print --new command is one simple way to detect just the + new transactions. Or better still, the import command appends those + new transactions to your main journal. This is the easiest way to im- + port CSV data. Eg, after downloading your latest CSV files: + + $ hledger import *.csv [--dry] + + Other import methods + A number of other tools and workflows, hledger-specific and otherwise, + exist for converting, deduplicating, classifying and managing CSV data. + See: + + o https://hledger.org -> sidebar -> real world setups + + o https://plaintextaccounting.org -> data import/conversion + + Valid CSV + hledger accepts CSV conforming to RFC 4180. Some things to note when + values are enclosed in quotes: + + o you must use double quotes (not single quotes) + + o spaces outside the quotes are not allowed + + Other separator characters + With the --separator 'CHAR' option, hledger will expect the separator + to be CHAR instead of a comma. Ie it will read other "Character Sepa- + rated Values" formats, such as TSV (Tab Separated Values). Note: on + the command line, use a real tab character in quotes, not Eg: + + $ hledger -f foo.tsv --separator ' ' print + + (Experimental.) + + Setting amounts + A posting amount can be set in one of these ways: + + o by assigning (with a fields list or field assigment) to amountN + (posting N's amount) or amount (posting 1's amount) + + o by assigning to amountN-in and amountN-out (or amount-in and amount- + out). For each CSV record, whichever of these has a non-zero value + will be used, with appropriate sign. If both contain a non-zero + value, this may not work. + + o by assigning to balanceN (or balance) instead of the above, setting + the amount indirectly via a balance assignment. There is some special handling for sign in amounts: - o If an amount value is parenthesised, it will be de-parenthesised and + o If an amount value is parenthesised, it will be de-parenthesised and sign-flipped. - o If an amount value begins with a double minus sign, those will cancel - out and be removed. + o If an amount value begins with a double minus sign, those cancel out + and are removed. - If the currency/commodity symbol is provided as a separate CSV field, - assign it to the currency pseudo field; the symbol will be prepended to - the amount (TODO: when there is an amount). Or, you can use an amount - field assignment for more control, eg: + If the currency/commodity symbol is provided as a separate CSV field, + you can assign it to currency (affects all posting amounts) or curren- + cyN (affects just posting N's amount). The symbol will be prepended to + the amount. Or for more control, you can set both currency symbol and + amount with a field assignment, eg: fields date,description,currency,amount + # add currency symbol on the right: amount %amount %currency - CSV balance assertions/assignments - If the CSV includes a running balance, you can assign that to one of - the pseudo fields balance (or balance1) or balance2. This will gener- - ate a balance assertion (or if the amount is left empty, a balance as- - signment), on the first or second posting, whenever the running balance - field is non-empty. (TODO: #1000) + Referencing other fields + In field assignments, you can interpolate only CSV fields, not hledger + fields. In the example below, there's both a CSV field and a hledger + field named amount1, but %amount1 always means the CSV field, not the + hledger field: - Reading multiple CSV files - You can read multiple CSV files at once using multiple -f arguments on - the command line, and hledger will look for a correspondingly-named - rules file for each. Note if you use the --rules-file option, this one - rules file will be used for all the CSV files being read. + # Name the third CSV field "amount1" + fields date,description,amount1 - Valid CSV - hledger follows RFC 4180, with the addition of a customisable separator - character. + # Set hledger's amount1 to the CSV amount1 field followed by USD + amount1 %amount1 USD - Some things to note: + # Set comment to the CSV amount1 (not the amount1 assigned above) + comment %amount1 - When quoting fields, + Here, since there's no CSV amount1 field, %amount1 will produce a lit- + eral "amount1": - o you must use double quotes, not single quotes + fields date,description,csvamount + amount1 %csvamount USD + # Can't interpolate amount1 here + comment %amount1 - o spaces outside the quotes are not allowed. + When there are multiple field assignments to the same hledger field, + only the last one takes effect. Here, comment's value will be be B, or + C if "something" is matched, but never A: + + comment A + comment B + if something + comment C + + How CSV rules are evaluated + Here's how to think of CSV rules being evaluated (if you really need + to). First, + + o include - all includes are inlined, from top to bottom, depth first. + (At each include point the file is inlined and scanned for further + includes, before proceeding.) + + Then "global" rules are evaluated, top to bottom. If a rule is re- + peated, the last one wins: + + o skip (at top level) + + o date-format + + o newest-first + + o fields - names the CSV fields, optionally sets up initial assignments + to hledger fields + + Then for each CSV record in turn: + + o test all if blocks. If any of them contain a end rule, skip all re- + maining CSV records. Otherwise if any of them contain a skip rule, + skip that many CSV records. If there are multiple matched skip + rules, the first one wins. + + o collect all field assignments at top level and in matched if blocks. + When there are multiple assignments for a field, keep only the last + one. + + o compute a value for each hledger field - either the one that was as- + signed to it (and interpolate the %CSVFIELDNAME references), or a de- + fault + + o generate a synthetic hledger transaction from these values, which be- + comes part of the input to the hledger command that has been selected + + Valid transactions + hledger currently does not post-process and validate transactions gen- + erated from CSV as thoroughly as transactions read from a journal file. + This means that if your rules are wrong, you can generate invalid + transactions. Or, amounts may not be displayed with a canonical dis- + play style. + + So when setting up or adjusting CSV rules, you should check your re- + sults visually with the print command. You can pipe print's output + through hledger once more to validate and canonicalise fully. Eg: + + $ hledger -f some.csv print | hledger -f- print -I + + (The -I/--ignore-assertions flag disables balance assertion checks, + usually needed when re-parsing print output.)