docs: CSV rules version 2 syntax

2013-03-29 23:08:33 +00:00 · 2013-03-29 23:08:33 +00:00 · 9c6ee3ae70
commit 9c6ee3ae70
parent 57eebd9ae5
1 changed files with 94 additions and 132 deletions
--- a/MANUAL.md
+++ b/MANUAL.md
@ -492,161 +492,123 @@ to each account name.
 ### CSV files
-Since version 0.18, hledger can also read
+hledger can also read
-[CSV](http://en.wikipedia.org/wiki/Comma-separated_values) files natively
+[CSV](http://en.wikipedia.org/wiki/Comma-separated_values) files,
-(previous versions provided a special `convert` command.)
+translating the CSV records into journal entries on the fly. In this
 case, we must provide an additional "rules file", which is a file
 named like the CSV file with an extra `.rules` suffix, containing
 rules specifying things like:
-An arbitrary CSV file does not provide enough information to be parsed as
+- which CSV fields correspond to which journal entry fields
-a journal. So when reading CSV, hledger looks for an additional
+- which date format is being used
-[rules file](#the-rules-file), which identifies the CSV fields and assigns
+- which account name(s) to use
 accounts. For reading `FILE.csv`, hledger uses `FILE.csv.rules` in the same
 directory, auto-creating it if needed. You should configure the rules file
 to get the best data from your CSV file. You can specify a different rules
 file with `--rules-file` (useful when reading from standard input).
-An example - sample.csv:
+Typically you'll keep one rules file for each account which you
 download as CSV. A default rules file will be created if it doesn't
 exist, in which case you'll need to refine it to get the best results.
 You can override the default rules file name with `--rules-file`.
 Here's a quick example.  Say we have downloaded `checking.csv` from a
 bank for the first time:
    sample.csv:
    "Date","Note","Amount"
-    "2012/3/22","TRANSFER TO SAVINGS","-10.00"
+    "2012/3/22","DEPOSIT","50.00"
-    "2012/3/23","SOMETHING ELSE","5.50"
+    "2012/3/23","TRANSFER TO SAVINGS","-10.00"
-sample.rules:
+We could create `checking.csv.rules` containing:
-    skip-lines 1
+    account1 assets:bank:checking
-    date-field 0
+    skip     1
-    description-field 1
+    fields   date, description, amount
    amount-field 2
    currency $
    base-account assets:bank:checking
-    SAVINGS
+    if ~ SAVINGS
-    assets:bank:savings
+     account2 assets:bank:savings
-the resulting journal:
+This says:
 "always use assets:bank:checking as the first account;
 ignore the first line;
 use the first, second and third CSV fields as the entry date, description and amount respectively;
 always prepend $ to the amount value;
 if the CSV record contains 'SAVINGS', use assets:bank:savings as the second account".
 Now hledger can read this CSV file:
-    $ hledger -f sample.csv print
+    $ hledger -f checking.csv print
-    using conversion rules file sample.rules
+    using conversion rules file checking.csv.rules
-    2012/03/22 TRANSFER TO SAVINGS
+    2012/03/22 DEPOSIT
        income:unknown             $-50.00
        assets:bank:checking        $50.00
    2012/03/23 TRANSFER TO SAVINGS
        assets:bank:savings         $10.00
        assets:bank:checking       $-10.00
-    2012/03/23 SOMETHING ELSE
+We might save this output as `checking.journal`, and/or merge it (manually) into the main journal file.
        income:unknown              $-5.50
        assets:bank:checking         $5.50
-### The rules file
+#### Rules syntax
-A rules file consists of the following optional directives, followed by
+The rules file is simple. Lines beginning with `#` or `;` and blank lines are ignored.
-account-assigning rules.  (Tip: rules file parse errors are not the
+The only requirement is that we specify how to fill journal entries' date and amount fields (at least),
-greatest, so check your rules file format if you're getting unexpected
+using a *field list*, or individual *field assignments*, or both:
 results.)
-`account-field`
+> **fields** *CSVFIELDNAME1*, *CSVFIELDNAME1*, ...
-
+> :   This (a field list) names the CSV fields (names may not contain whitespace or `;` or `#`),
-> If the CSV file contains data corresponding to several accounts (for
+>     and also assigns them to journal entry fields when you use any of these names:
 > example - bulk export from other accounting software), the specified
 > field's value, if non-empty, will override the value of `base-account`.
 `account2-field`
 > If the CSV file contains fields for both accounts in the transaction,
 > you can use this in addition to `account-field`.  If `account2-field` is
 > unspecified, the [account-assigning rules](#account-assigning-rules) are
 > used.
 `amount-field`
 > This directive specifies the CSV field containing the transaction
 > amount.  The field may contain a simple number or an hledger-style
 > [amount](#amounts), perhaps with a [price](#prices). See also
 > `amount-in-field`, `amount-out-field`, `currency-field` and
 > `base-currency`.
 `amount-in-field`
 `amount-out-field`
 > If the CSV file uses two different columns for in and out movements, use
 > these directives instead of `amount-field`.  Note these expect each
 > record to have a positive number in one of these fields and nothing in
 > the other.
 `base-account`
 > A default account to use in all transactions. May be overridden by
 > `account1-field` and `account2-field`.
 `base-currency`
 > A default currency symbol which will be prepended to all amounts.
 > See also `currency-field`.
 `code-field`
 > Which field contains the transaction code or check number (`(NNN)`).
 `currency-field`
 > The currency symbol in this field will be prepended to all amounts. This
 > overrides `base-currency`.
 `date-field`
 > Which field contains the transaction date. A number of common
 > four-digit-year date formats are understood by default; other formats
 > will require a `date-format` directive.
 `date-format`
 > This directive specifies one additional format to try when parsing the
 > date field, using the syntax of Haskell's
 > [formatTime](http://hackage.haskell.org/packages/archive/time/latest/doc/html/Data-Time-Format.html#v:formatTime).
 > Eg, if the CSV dates are non-padded D/M/YY, use:
 >
->     date-format %-d/%-m/%y
+> :   `date`
 > :   `date2`
 > :   `status`
 > :   `code`
 > :   `description`
 > :   `comment`
 > :   `account1`
 > :   `account2`
 > :   `currency`
 > :   `amount`
 > :   `amount-in`
 > :   `amount-out`
 > :   
 >
-> Note custom date formats work best when hledger is built with version
+> <!--  -->
 > 1.2.0.5 or greater of the [time](http://hackage.haskell.org/package/time) library.
 `description-field`
 > Which field contains the transaction's description. This can be a simple
 > field number, or a custom format combining multiple fields, eg:
 >
->     description-field %(1) - %(3)
+> *JOURNALFIELDNAME* *FIELDVALUE*
 > :   This (a field assignment) assigns the given text value,
 >     which can have CSV field values interpolated via `%name` or `%1`,
 >     to a journal entry field (one of the field names above).
 >     Field assignments may be used in addition to or instead of a field list.
 >
 > :   &nbsp;
-`date2-field`
+We can also have conditional field assignments which apply only to certain CSV records:
-> Which field contains the transaction's [secondary date](#primary-secondary-dates).
+> **if** *PATTERNS*<br>&nbsp;&nbsp;*FIELDASSIGNMENTS*
 > :   PATTERNS is one or more regular expressions on the same or following lines.
 >     <!-- then an optional `~` (indicating case-insensitive infix regular expression matching),\ -->
 >     These are followed by one or more indented field assignment lines.\
 >     In this example, any CSV record containing "groc" (case insensitive, anywhere within the whole record)
 >     will have its account2 and comment set as shown:
 > 
 >         if groc
 >          account2 expenses:groceries
 >          comment  household stuff
-`status-field`
+And we may sometimes need these as well:
-> Which field contains the transaction cleared status (`*`).
+> **skip** [*N*]
-
+> :   Skip this number of CSV lines (1 by default).
-`skip-lines`
+>     Use this to skip the initial CSV header line(s).
-
+>     <!-- hledger tries to skip initial CSV header lines automatically. -->
-> How many lines to skip in the beginning of the file, e.g. to skip a
+>     <!-- If it guesses wrong, use this directive to skip exactly N lines. -->
-> line of column headings.
+>     <!-- This can also be used in a conditional block to ignore certain CSV records. -->
-
+>
-Account-assigning rules select an account to transfer to based on the
+> **date-format** *DATEFMT*
-description field (unless `account2-field` is used.) Each
+> :   This is required if the values for `date` or `date2` fields are not in YYYY/MM/DD format (or close to it).
-account-assigning rule is a paragraph consisting of one or more
+>     DATEFMT specifies a strptime-style date parsing pattern containing [year/month/date format codes](http://hackage.haskell.org/packages/archive/time/latest/doc/html/Data-Time-Format.html#v:formatTime).
-case-insensitive regular expressions), one per line, followed by the
+>     Some common values:
-account name to use when the transaction's description matches any of
+>
-these patterns. Eg:
+>         %-d/%-m/%Y
-
+>         %-m/%-d/%Y
-    WHOLE FOODS
+>         %Y-%h-%d
    SUPERMARKET
    expenses:food:groceries
 If you want to clean up messy bank data, you can add `=` and a replacement
 pattern, which rewrites the matched part of the description. (To rewrite
 the entire description, use `.*PAT.*=REPL`). You can also refer to matched
 groups in the usual way with `\0` etc. Eg:
    BLKBSTR=BLOCKBUSTER
    expenses:entertainment
 ### Timelog files