;doc: csv: fix wrong if tables doc; rewrite several sections (#1977)

This commit is contained in:
Simon Michael 2023-01-11 13:19:56 -10:00
parent fc8fe8ee46
commit 545fd2d083

View File

@ -3101,23 +3101,21 @@ $ hledger -f paypal-custom.csv print
The following kinds of rule can appear in the rules file, in any order. The following kinds of rule can appear in the rules file, in any order.
Blank lines and lines beginning with `#` or `;` or `*` are ignored. Blank lines and lines beginning with `#` or `;` or `*` are ignored.
| | | | | |
|-------------------------------------------------|-----------------------------------------------------------------------------------------| |-------------------------------------------------|------------------------------------------------------------------------------------------------|
| [**`separator`**](#separator) | a custom field separator | | [**`separator`**](#separator) | declare the field separator, instead of relying on file extension |
| [**`skip`**](#skip) | skip one or more header lines or matched CSV records | | [**`skip`**](#skip) | skip one or more header lines at start of file |
| [**`date-format`**](#date-format) | how to parse dates in CSV records | | [**`date-format`**](#date-format) | declare how to parse dates in CSV records |
| [**`timezone`**](#timezone) | declare the time zone of ambiguous CSV date-times | | [**`timezone`**](#timezone) | declare the time zone of ambiguous CSV date-times |
| [**`decimal-mark`**](#decimal-mark-1) | the decimal mark used in CSV amounts, if ambiguous | | [**`decimal-mark`**](#decimal-mark-1) | declare the decimal mark used in CSV amounts, if ambiguous |
| [**`newest-first`**](#newest-first) | improve txn order when there are multiple records, newest first, all with the same date | | [**`newest-first`**](#newest-first) | improve txn order when: there are multiple records, newest first, all with the same date |
| [**`intra-day-reversed`**](#intra-day-reversed) | improve txn order when each day's txns are reverse of the overall date order | | [**`intra-day-reversed`**](#intra-day-reversed) | improve txn order when: same-day txns are in opposite order to the overall file |
| [**`balance-type`**](#balance-type) | choose which type of balance assignments to use | | [**`balance-type`**](#balance-type) | select which type of balance assignments to use |
| [**`fields`**](#fields) | name CSV fields, assign them to hledger fields | | [**`fields`**](#fields) | name CSV fields for easy reference, and optionally assign their values to hledger fields |
| [**Field assignment**](#field-assignment) | assign a value to one hledger field, with interpolation | | [**Field assignment**](#field-assignment) | assign a CSV value or interpolated text value to a hledger field, constructing the txn |
| [**Field names**](#field-names) | hledger field names, used in the fields list and field assignments | | [**`if` block**](#if-block) | conditionally assign values to hledger fields, or `skip` a record or `end` (skip rest of file) |
| [**`if`**](#if) | apply some rules to CSV records matched by patterns | | [**`if` table**](#if-table) | conditionally assign values to hledger fields, using compact syntax |
| [**`if` table**](#if-table) | apply some rules to CSV records matched by patterns, alternate syntax | | [**`include`**](#include) | inline another CSV rules file |
| [**`end`**](#end) | skip the remaining CSV records |
| [**`include`**](#include) | inline another CSV rules file |
## `separator` ## `separator`
@ -3277,16 +3275,16 @@ intra-day-reversed
```rules ```rules
fields FIELDNAME1, FIELDNAME2, ... fields FIELDNAME1, FIELDNAME2, ...
``` ```
A fields list (the word "fields" followed by comma-separated field A fields list (the word "fields" followed by comma-separated field names) is optional, but convenient.
names) is the quick way to assign CSV field values to [hledger fields](#field-names). It does two things:
(The other way is [field assignments](#field-assignment), see below.)
A fields list does does two things:
1. It names the CSV fields. 1. It names the CSV field in each column.
This is optional, but can be convenient later for interpolating them. This can be convenient if you are referencing them in other rules,
so you can say `%SomeField` instead of remembering `%13`.
2. Whenever you use a standard hledger field name (defined below), 2. Whenever you use one of the special [hledger field names](#field names) (described below),
the CSV value is assigned to that part of the hledger transaction. it assigns the CSV value in this position to that hledger field.
This is the quickest way to populate hledger's fields and build a transaction.
Here's an example that says Here's an example that says
"use the 1st, 2nd and 4th fields as the transaction's date, description and amount; "use the 1st, 2nd and 4th fields as the transaction's date, description and amount;
@ -3295,20 +3293,24 @@ name the last two fields for later reference; and ignore the others":
fields date, description, , amount, , , somefield, anotherfield fields date, description, , amount, , , somefield, anotherfield
``` ```
Tips: In a fields list, the separator is always comma; it is unrelated to the CSV file's separator.
Also:
- The fields list always use commas, even if your CSV data uses [another separator character](#separator). - There must be least two items in the list (at least one comma).
- Currently there must be least two items in the list (at least one comma).
- Field names may not contain spaces. Spaces before/after field names are optional. - Field names may not contain spaces. Spaces before/after field names are optional.
- Field names may contain `_` (underscore) or `-` (hyphen). - Field names may contain `_` (underscore) or `-` (hyphen).
- If the CSV contains column headings, it's a good idea to use these, suitably modified, as the basis for your field names (eg lower-cased, with underscores instead of spaces). - Fields you don't care about can be given a dummy name (eg just `_`) or an empty name.
- If some heading names match standard hledger fields, but you don't want to set the hledger fields directly, alter those names, eg by appending an underscore.
- Fields you don't care about can be given a dummy name (eg: `_` ), or no name. If the CSV contains column headings, it's convenient to use these for your field names,
suitably modified (eg lower-cased with spaces replaced by underscores).
Sometimes you may want to alter a CSV field name to avoid assigning to a hledger field with the same name.
Eg you could call the CSV's "balance" field `balance_` to avoid directly setting hledger's `balance` field (and generating a balance assertion).
## Field assignment ## Field assignment
```rules ```rules
HLEDGERFIELDNAME FIELDVALUE HLEDGERFIELD FIELDVALUE
``` ```
Field assignments are the more flexible way to assign CSV values to hledger fields. Field assignments are the more flexible way to assign CSV values to hledger fields.
@ -3319,7 +3321,7 @@ To assign a value to a hledger field, write the [field name](#field-names)
a space, followed by a text value on the same line. a space, followed by a text value on the same line.
This text value may interpolate CSV fields, This text value may interpolate CSV fields,
referenced by their 1-based position in the CSV record (`%N`), referenced by their 1-based position in the CSV record (`%N`),
or by the name they were given in the fields list (`%CSVFIELDNAME`). or by the name they were given in the fields list (`%CSVFIELD`).
Some examples: Some examples:
@ -3342,9 +3344,20 @@ becomes `1` when interpolated)
## Field names ## Field names
Here are the standard hledger field (and pseudo-field) names, which Note the two kinds of "field names" used in hledger CSV rules:
you can use in a [fields](#fields) list or in [field assignments](#field-assignment).
For more about the transaction parts they refer to, see [Transactions](#transactions). 1. **CSV field names** (`CSVFIELD` in examples):\
These are arbitrary names you have given to columns in the CSV data,
using a `fields` list, so that you can reference that CSV field
more conveniently by name (`%SomeField`) rather than column number (`%13`).
2. **hledger field names** (`HLEDGERFIELD` in examples):\
These are special reserved names corresponding (directly or indirectly) to
parts of a hledger transaction (described here and in Journal > [Transactions](#transactions)).
You assign values to these to construct a transaction (journal entry),
using field assignment rules or by writing them in a `fields` list.
Here are the hledger field names available in CSV rules, and their effects:
### date field ### date field
@ -3391,33 +3404,28 @@ a default account name will be chosen (like "expenses:unknown" or "income:unknow
### amount field ### amount field
`amountN` sets the amount of the Nth posting, There are a number of "amount" field name variants,
and causes that posting to be generated. to handle different situations when detecting and setting amounts:
`amountN` sets the amount of the Nth posting, and causes that posting to be generated.
By assigning to `amount1`, `amount2`, ... etc. you can generate up to 99 postings. By assigning to `amount1`, `amount2`, ... etc. you can generate up to 99 postings.
`amountN-in` and `amountN-out` can be used instead, `amountN-in` and `amountN-out` should be used instead when the CSV has debits and credits (inflows and outflows) in separate fields.
if the CSV uses separate fields for debits and credits (inflows and outflows).
hledger assumes both of these CSV fields are unsigned, and will automatically negate the "-out" value. hledger assumes both of these CSV fields are unsigned, and will automatically negate the "-out" value.
It also requires that at least one of them is either empty or zero. It also requires that at least one of them is either empty or zero.
See ["Setting amounts"](#setting-amounts) below for more on this topic. See ["Setting amounts"](#setting-amounts) below for more on this topic.
Note: it might sound as if amount-in is for one posting in a transaction and amount-out for the other posting, but not so;
the -in and -out rules work together to produce the amount for a single posting, from two CSV fields.
`amount`, or `amount-in` and `amount-out` are a legacy mode, `amount` (or `amount-in` and `amount-out`), with no number, are a legacy syntax
to keep pre-hledger-1.17 CSV rules files working (and for occasional convenience). kept for backwards compatibility and occasional convenience.
They are suitable only for two-posting transactions; They are suitable for two-posting transactions and behave as follows:
they set both posting 1's and posting 2's amount. they set both posting 1's and posting 2's amount,
Posting 2's amount will be negated, and also converted to cost with posting 2's amount negated and also converted to cost if there's a [cost price](#costs).
if there's a [cost price](#costs). Also, they will be overridden by the newer syntax if it is present;
eg if `amount1` is assigned, that overrides `amount` for posting 1;
Note: it might sound as if amount-in is for one posting and amount-out for the other posting, but no; `amount2-in` would override `amount-in` for posting 2, and so on.
use the -in and -out rules together for the same posting, producing one amount from two CSV fields. (This allows incrementally adding the newer numbered syntax in old rules files.)
If you have an existing rules file using the unnumbered form, you
might want to use the numbered form in certain conditional blocks,
without having to update and retest all the old rules.
To facilitate this,
posting 1 ignores `amount`/`amount-in`/`amount-out` if any of `amount1`/`amount1-in`/`amount1-out` are assigned,
and posting 2 ignores them if any of `amount2`/`amount2-in`/`amount2-out` are assigned,
avoiding conflicts.
### currency field ### currency field
@ -3441,84 +3449,53 @@ You can adjust the type of assertion/assignment with the
See [Tips](#tips) below for more about setting amounts and currency. See [Tips](#tips) below for more about setting amounts and currency.
## `if` ## `if` block
Rules can be applied conditionally, depending on patterns in the CSV data.
This allows flexibility; in particular, it is how you can categorise transactions,
selecting an appropriate account name based on their description (for example).
There are two ways to write conditional rules: "if blocks", described here,
and "if tables", described below.
An if block is the word `if`
and one or more "matcher" expressions (can be a word or phrase),
one per line, starting either on the same or next line;
followed by one or more indented rules.
Eg,
```rules ```rules
if MATCHER if MATCHER
RULE RULE
if
MATCHER
MATCHER
MATCHER
RULE
RULE
``` ```
Conditional blocks ("if blocks") are a block of rules that are applied or
only to CSV records which match certain patterns. They are often used
for customising account names based on transaction descriptions.
### Matching the whole record
Each MATCHER can be a record matcher, which looks like this:
```rules
REGEX
```
REGEX is a case-insensitive [regular expression] that tries to match anywhere within the CSV record.
It is a POSIX ERE (extended regular expression)
that also supports GNU word boundaries (`\b`, `\B`, `\<`, `\>`),
and nothing else.
If you have trouble, be sure to check our doc: <https://hledger.org/hledger.html#regular-expressions>
Important note: the record that is matched is not the original record, but a synthetic one,
with any enclosing double quotes (but not enclosing whitespace) removed, and always comma-separated
(which means that a field containing a comma will appear like two fields).
Eg, if the original record is `2020-01-01; "Acme, Inc."; 1,000`,
the REGEX will actually see `2020-01-01,Acme, Inc., 1,000`).
### Matching individual fields
Or, MATCHER can be a field matcher, like this:
```rules
%CSVFIELD REGEX
```
which matches just the content of a particular CSV field.
CSVFIELD is a percent sign followed by the field's name or column number, like `%date` or `%1`.
### Combining matchers
A single matcher can be written on the same line as the "if";
or multiple matchers can be written on the following lines, non-indented.
Multiple matchers are OR'd (any one of them can match), unless one begins with
an `&` symbol, in which case it is AND'ed with the previous matcher.
```rules ```rules
if if
MATCHER MATCHER
& MATCHER MATCHER
MATCHER
RULE
RULE RULE
``` ```
### Rules applied on successful match If any of the matchers succeeds, all of the indented rules will be applied.
They are usually [field assignments](#field-assignments),
but the following special rules may also be used within an if block:
After the patterns there should be one or more rules to apply, all - `skip` - skips the matched CSV record (generating no transaction from it)
indented by at least one space. Three kinds of rule are allowed in - `end` - skips the rest of the current CSV file.
conditional blocks:
- [field assignments](#field-assignment) (to set a hledger field) Some examples:
- [skip](#skip) (to skip the matched CSV record)
- [end](#end) (to skip all remaining CSV records).
Examples:
```rules ```rules
# if the CSV record contains "groceries", set account2 to "expenses:groceries" # if the record contains "groceries", set account2 to "expenses:groceries"
if groceries if groceries
account2 expenses:groceries account2 expenses:groceries
``` ```
```rules ```rules
# if the CSV record contains any of these patterns, set account2 and comment as shown # if the record contains any of these phrases, set account2 and a transaction comment as shown
if if
monthly service fee monthly service fee
atm transaction fee atm transaction fee
@ -3527,51 +3504,89 @@ banking thru software
comment XXX deductible ? check it comment XXX deductible ? check it
``` ```
```rules
# if an empty record is seen (assuming five fields), ignore the rest of the CSV file
if ,,,,
end
```
## Matchers
There are two kinds:
1. A record matcher is a word or single-line text fragment or regular expression (`REGEX`),
which hledger will try to match case-insensitively anywhere within the CSV record.\
Eg: `whole foods`
2. A field matcher is preceded with a percent sign and [CSV field name](#field-names) (`%CSVFIELD REGEX`).
hledger will try to match these just within the named CSV field.\
Eg: `%date 2023`
The regular expression is (as usual in hledger) a POSIX extended regular expression,
that also supports GNU word boundaries (`\b`, `\B`, `\<`, `\>`),
and nothing else.
If you have trouble, see "Regular expressions" in the hledger manual (<https://hledger.org/hledger.html#regular-expressions>).
With record matchers, it's important to know that the record matched is not the original CSV record, but a modified one:
separators will be converted to commas, and enclosing double quotes (but not enclosing whitespace) are removed.
So for example, when reading an SSV file, if the original record was:
```ssv
2020-01-01; "Acme, Inc."; 1,000
```
the regex would see, and try to match, this modified record text:
```
2020-01-01,Acme, Inc., 1,000
```
When an if block has multiple matchers, they are combined as follows:
- By default they are OR'd (any one of them can match)
- When a matcher is preceded by ampersand (`&`) it will be AND'ed with the previous matcher (both of them must match).
There's not yet an easy syntax to negate a matcher.
## `if` table ## `if` table
"if tables" are an alternative to [if blocks](#if-blocks);
they can express many matchers and field assignments in a more compact tabular format, like this:
```rules ```rules
if,CSVFIELDNAME1,CSVFIELDNAME2,...,CSVFIELDNAMEn if,HLEDGERFIELD1,HLEDGERFIELD2,...
MATCHER1,VALUE11,VALUE12,...,VALUE1n MATCHERA,VALUE1,VALUE2,...
MATCHER2,VALUE21,VALUE22,...,VALUE2n MATCHERB,VALUE1,VALUE2,...
MATCHER3,VALUE31,VALUE32,...,VALUE3n MATCHERC,VALUE1,VALUE2,...
<empty line> <empty line>
``` ```
Conditional tables ("if tables") are a different syntax to specify The first character after `if` is taken to be the separator for the rest of the table.
field assignments that will be applied only to CSV records which match certain patterns. It should be a non-alphanumeric character like `,` or `|` that does not appear anywhere else in the table.
(Note: it is unrelated to the CSV file's separator.)
Whitespace can be used in the matcher lines for readability, but not in the if line currently.
The table must be terminated by an empty line (or end of file).
Each line must contain the same number of separators; empty values are allowed.
MATCHER could be either field or record matcher, as described above. When MATCHER matches, The above means: try all of the matchers; whenever a matcher succeeds,
values from that row would be assigned to the CSV fields named on the `if` line, in the same order. assign all of the values on that line to the corresponding hledger fields;
later lines can overrider earlier ones.
It is equivalent to this sequence of if blocks:
Therefore `if` table is exactly equivalent to a sequence of of `if` blocks:
```rules ```rules
if MATCHER1 if MATCHERA
CSVFIELDNAME1 VALUE11 HLEDGERFIELD1 VALUE1
CSVFIELDNAME2 VALUE12 HLEDGERFIELD2 VALUE2
... ...
CSVFIELDNAMEn VALUE1n
if MATCHER2 if MATCHERB
CSVFIELDNAME1 VALUE21 HLEDGERFIELD1 VALUE1
CSVFIELDNAME2 VALUE22 HLEDGERFIELD2 VALUE2
... ...
CSVFIELDNAMEn VALUE2n
if MATCHER3 if MATCHERC
CSVFIELDNAME1 VALUE31 HLEDGERFIELD1 VALUE1
CSVFIELDNAME2 VALUE32 HLEDGERFIELD2 VALUE2
... ...
CSVFIELDNAMEn VALUE3n
``` ```
Each line starting with MATCHER should contain enough (possibly empty) values for all the listed fields.
Rules would be checked and applied in the order they are listed in the table and, like with `if` blocks, later rules (in the same or another table) or `if` blocks could override the effect of any rule.
Instead of ',' you can use a variety of other non-alphanumeric characters as a separator. First character after `if` is taken to be the separator for the rest of the table. It is the responsibility of the user to ensure that separator does not occur inside MATCHERs and values - there is no way to escape separator.
Example: Example:
```rules ```rules
if,account2,comment if,account2,comment
@ -3580,18 +3595,6 @@ atm transaction fee,expenses:business:banking,deductible? check it
2020/01/12.*Plumbing LLC,expenses:house:upkeep,emergency plumbing call-out 2020/01/12.*Plumbing LLC,expenses:house:upkeep,emergency plumbing call-out
``` ```
## `end`
This rule can be used inside [if blocks](#if) (only), to make hledger stop
reading this CSV file and move on to the next input file, or to command execution.
Eg:
```rules
# ignore everything following the first empty record
if ,,,,
end
```
## `include` ## `include`
```rules ```rules
@ -3942,7 +3945,7 @@ Then for each CSV record in turn:
- collect all field assignments at top level and in matched `if` blocks. - collect all field assignments at top level and in matched `if` blocks.
When there are multiple assignments for a field, keep only the last one. When there are multiple assignments for a field, keep only the last one.
- compute a value for each hledger field - either the one that was assigned to it - compute a value for each hledger field - either the one that was assigned to it
(and interpolate the %CSVFIELDNAME references), or a default (and interpolate the %CSVFIELD references), or a default
- generate a synthetic hledger transaction from these values. - generate a synthetic hledger transaction from these values.
This is all part of the CSV reader, one of several readers hledger can This is all part of the CSV reader, one of several readers hledger can