;doc: csv: fix wrong if tables doc; rewrite several sections (#1977)

This commit is contained in:
Simon Michael 2023-01-11 13:19:56 -10:00
parent fc8fe8ee46
commit 545fd2d083

View File

@ -3101,23 +3101,21 @@ $ hledger -f paypal-custom.csv print
The following kinds of rule can appear in the rules file, in any order.
Blank lines and lines beginning with `#` or `;` or `*` are ignored.
| | |
|-------------------------------------------------|-----------------------------------------------------------------------------------------|
| [**`separator`**](#separator) | a custom field separator |
| [**`skip`**](#skip) | skip one or more header lines or matched CSV records |
| [**`date-format`**](#date-format) | how to parse dates in CSV records |
| [**`timezone`**](#timezone) | declare the time zone of ambiguous CSV date-times |
| [**`decimal-mark`**](#decimal-mark-1) | the decimal mark used in CSV amounts, if ambiguous |
| [**`newest-first`**](#newest-first) | improve txn order when there are multiple records, newest first, all with the same date |
| [**`intra-day-reversed`**](#intra-day-reversed) | improve txn order when each day's txns are reverse of the overall date order |
| [**`balance-type`**](#balance-type) | choose which type of balance assignments to use |
| [**`fields`**](#fields) | name CSV fields, assign them to hledger fields |
| [**Field assignment**](#field-assignment) | assign a value to one hledger field, with interpolation |
| [**Field names**](#field-names) | hledger field names, used in the fields list and field assignments |
| [**`if`**](#if) | apply some rules to CSV records matched by patterns |
| [**`if` table**](#if-table) | apply some rules to CSV records matched by patterns, alternate syntax |
| [**`end`**](#end) | skip the remaining CSV records |
| [**`include`**](#include) | inline another CSV rules file |
| | |
|-------------------------------------------------|------------------------------------------------------------------------------------------------|
| [**`separator`**](#separator) | declare the field separator, instead of relying on file extension |
| [**`skip`**](#skip) | skip one or more header lines at start of file |
| [**`date-format`**](#date-format) | declare how to parse dates in CSV records |
| [**`timezone`**](#timezone) | declare the time zone of ambiguous CSV date-times |
| [**`decimal-mark`**](#decimal-mark-1) | declare the decimal mark used in CSV amounts, if ambiguous |
| [**`newest-first`**](#newest-first) | improve txn order when: there are multiple records, newest first, all with the same date |
| [**`intra-day-reversed`**](#intra-day-reversed) | improve txn order when: same-day txns are in opposite order to the overall file |
| [**`balance-type`**](#balance-type) | select which type of balance assignments to use |
| [**`fields`**](#fields) | name CSV fields for easy reference, and optionally assign their values to hledger fields |
| [**Field assignment**](#field-assignment) | assign a CSV value or interpolated text value to a hledger field, constructing the txn |
| [**`if` block**](#if-block) | conditionally assign values to hledger fields, or `skip` a record or `end` (skip rest of file) |
| [**`if` table**](#if-table) | conditionally assign values to hledger fields, using compact syntax |
| [**`include`**](#include) | inline another CSV rules file |
## `separator`
@ -3277,16 +3275,16 @@ intra-day-reversed
```rules
fields FIELDNAME1, FIELDNAME2, ...
```
A fields list (the word "fields" followed by comma-separated field
names) is the quick way to assign CSV field values to [hledger fields](#field-names).
(The other way is [field assignments](#field-assignment), see below.)
A fields list does does two things:
A fields list (the word "fields" followed by comma-separated field names) is optional, but convenient.
It does two things:
1. It names the CSV fields.
This is optional, but can be convenient later for interpolating them.
1. It names the CSV field in each column.
This can be convenient if you are referencing them in other rules,
so you can say `%SomeField` instead of remembering `%13`.
2. Whenever you use a standard hledger field name (defined below),
the CSV value is assigned to that part of the hledger transaction.
2. Whenever you use one of the special [hledger field names](#field names) (described below),
it assigns the CSV value in this position to that hledger field.
This is the quickest way to populate hledger's fields and build a transaction.
Here's an example that says
"use the 1st, 2nd and 4th fields as the transaction's date, description and amount;
@ -3295,20 +3293,24 @@ name the last two fields for later reference; and ignore the others":
fields date, description, , amount, , , somefield, anotherfield
```
Tips:
In a fields list, the separator is always comma; it is unrelated to the CSV file's separator.
Also:
- The fields list always use commas, even if your CSV data uses [another separator character](#separator).
- Currently there must be least two items in the list (at least one comma).
- There must be least two items in the list (at least one comma).
- Field names may not contain spaces. Spaces before/after field names are optional.
- Field names may contain `_` (underscore) or `-` (hyphen).
- If the CSV contains column headings, it's a good idea to use these, suitably modified, as the basis for your field names (eg lower-cased, with underscores instead of spaces).
- If some heading names match standard hledger fields, but you don't want to set the hledger fields directly, alter those names, eg by appending an underscore.
- Fields you don't care about can be given a dummy name (eg: `_` ), or no name.
- Fields you don't care about can be given a dummy name (eg just `_`) or an empty name.
If the CSV contains column headings, it's convenient to use these for your field names,
suitably modified (eg lower-cased with spaces replaced by underscores).
Sometimes you may want to alter a CSV field name to avoid assigning to a hledger field with the same name.
Eg you could call the CSV's "balance" field `balance_` to avoid directly setting hledger's `balance` field (and generating a balance assertion).
## Field assignment
```rules
HLEDGERFIELDNAME FIELDVALUE
HLEDGERFIELD FIELDVALUE
```
Field assignments are the more flexible way to assign CSV values to hledger fields.
@ -3319,7 +3321,7 @@ To assign a value to a hledger field, write the [field name](#field-names)
a space, followed by a text value on the same line.
This text value may interpolate CSV fields,
referenced by their 1-based position in the CSV record (`%N`),
or by the name they were given in the fields list (`%CSVFIELDNAME`).
or by the name they were given in the fields list (`%CSVFIELD`).
Some examples:
@ -3342,9 +3344,20 @@ becomes `1` when interpolated)
## Field names
Here are the standard hledger field (and pseudo-field) names, which
you can use in a [fields](#fields) list or in [field assignments](#field-assignment).
For more about the transaction parts they refer to, see [Transactions](#transactions).
Note the two kinds of "field names" used in hledger CSV rules:
1. **CSV field names** (`CSVFIELD` in examples):\
These are arbitrary names you have given to columns in the CSV data,
using a `fields` list, so that you can reference that CSV field
more conveniently by name (`%SomeField`) rather than column number (`%13`).
2. **hledger field names** (`HLEDGERFIELD` in examples):\
These are special reserved names corresponding (directly or indirectly) to
parts of a hledger transaction (described here and in Journal > [Transactions](#transactions)).
You assign values to these to construct a transaction (journal entry),
using field assignment rules or by writing them in a `fields` list.
Here are the hledger field names available in CSV rules, and their effects:
### date field
@ -3391,33 +3404,28 @@ a default account name will be chosen (like "expenses:unknown" or "income:unknow
### amount field
`amountN` sets the amount of the Nth posting,
and causes that posting to be generated.
There are a number of "amount" field name variants,
to handle different situations when detecting and setting amounts:
`amountN` sets the amount of the Nth posting, and causes that posting to be generated.
By assigning to `amount1`, `amount2`, ... etc. you can generate up to 99 postings.
`amountN-in` and `amountN-out` can be used instead,
if the CSV uses separate fields for debits and credits (inflows and outflows).
`amountN-in` and `amountN-out` should be used instead when the CSV has debits and credits (inflows and outflows) in separate fields.
hledger assumes both of these CSV fields are unsigned, and will automatically negate the "-out" value.
It also requires that at least one of them is either empty or zero.
See ["Setting amounts"](#setting-amounts) below for more on this topic.
Note: it might sound as if amount-in is for one posting in a transaction and amount-out for the other posting, but not so;
the -in and -out rules work together to produce the amount for a single posting, from two CSV fields.
`amount`, or `amount-in` and `amount-out` are a legacy mode,
to keep pre-hledger-1.17 CSV rules files working (and for occasional convenience).
They are suitable only for two-posting transactions;
they set both posting 1's and posting 2's amount.
Posting 2's amount will be negated, and also converted to cost
if there's a [cost price](#costs).
Note: it might sound as if amount-in is for one posting and amount-out for the other posting, but no;
use the -in and -out rules together for the same posting, producing one amount from two CSV fields.
If you have an existing rules file using the unnumbered form, you
might want to use the numbered form in certain conditional blocks,
without having to update and retest all the old rules.
To facilitate this,
posting 1 ignores `amount`/`amount-in`/`amount-out` if any of `amount1`/`amount1-in`/`amount1-out` are assigned,
and posting 2 ignores them if any of `amount2`/`amount2-in`/`amount2-out` are assigned,
avoiding conflicts.
`amount` (or `amount-in` and `amount-out`), with no number, are a legacy syntax
kept for backwards compatibility and occasional convenience.
They are suitable for two-posting transactions and behave as follows:
they set both posting 1's and posting 2's amount,
with posting 2's amount negated and also converted to cost if there's a [cost price](#costs).
Also, they will be overridden by the newer syntax if it is present;
eg if `amount1` is assigned, that overrides `amount` for posting 1;
`amount2-in` would override `amount-in` for posting 2, and so on.
(This allows incrementally adding the newer numbered syntax in old rules files.)
### currency field
@ -3441,84 +3449,53 @@ You can adjust the type of assertion/assignment with the
See [Tips](#tips) below for more about setting amounts and currency.
## `if`
## `if` block
Rules can be applied conditionally, depending on patterns in the CSV data.
This allows flexibility; in particular, it is how you can categorise transactions,
selecting an appropriate account name based on their description (for example).
There are two ways to write conditional rules: "if blocks", described here,
and "if tables", described below.
An if block is the word `if`
and one or more "matcher" expressions (can be a word or phrase),
one per line, starting either on the same or next line;
followed by one or more indented rules.
Eg,
```rules
if MATCHER
RULE
if
MATCHER
MATCHER
MATCHER
RULE
RULE
```
Conditional blocks ("if blocks") are a block of rules that are applied
only to CSV records which match certain patterns. They are often used
for customising account names based on transaction descriptions.
### Matching the whole record
Each MATCHER can be a record matcher, which looks like this:
```rules
REGEX
```
REGEX is a case-insensitive [regular expression] that tries to match anywhere within the CSV record.
It is a POSIX ERE (extended regular expression)
that also supports GNU word boundaries (`\b`, `\B`, `\<`, `\>`),
and nothing else.
If you have trouble, be sure to check our doc: <https://hledger.org/hledger.html#regular-expressions>
Important note: the record that is matched is not the original record, but a synthetic one,
with any enclosing double quotes (but not enclosing whitespace) removed, and always comma-separated
(which means that a field containing a comma will appear like two fields).
Eg, if the original record is `2020-01-01; "Acme, Inc."; 1,000`,
the REGEX will actually see `2020-01-01,Acme, Inc., 1,000`).
### Matching individual fields
Or, MATCHER can be a field matcher, like this:
```rules
%CSVFIELD REGEX
```
which matches just the content of a particular CSV field.
CSVFIELD is a percent sign followed by the field's name or column number, like `%date` or `%1`.
### Combining matchers
A single matcher can be written on the same line as the "if";
or multiple matchers can be written on the following lines, non-indented.
Multiple matchers are OR'd (any one of them can match), unless one begins with
an `&` symbol, in which case it is AND'ed with the previous matcher.
or
```rules
if
MATCHER
& MATCHER
MATCHER
MATCHER
RULE
RULE
```
### Rules applied on successful match
If any of the matchers succeeds, all of the indented rules will be applied.
They are usually [field assignments](#field-assignments),
but the following special rules may also be used within an if block:
After the patterns there should be one or more rules to apply, all
indented by at least one space. Three kinds of rule are allowed in
conditional blocks:
- `skip` - skips the matched CSV record (generating no transaction from it)
- `end` - skips the rest of the current CSV file.
- [field assignments](#field-assignment) (to set a hledger field)
- [skip](#skip) (to skip the matched CSV record)
- [end](#end) (to skip all remaining CSV records).
Some examples:
Examples:
```rules
# if the CSV record contains "groceries", set account2 to "expenses:groceries"
# if the record contains "groceries", set account2 to "expenses:groceries"
if groceries
account2 expenses:groceries
```
```rules
# if the CSV record contains any of these patterns, set account2 and comment as shown
# if the record contains any of these phrases, set account2 and a transaction comment as shown
if
monthly service fee
atm transaction fee
@ -3527,51 +3504,89 @@ banking thru software
comment XXX deductible ? check it
```
```rules
# if an empty record is seen (assuming five fields), ignore the rest of the CSV file
if ,,,,
end
```
## Matchers
There are two kinds:
1. A record matcher is a word or single-line text fragment or regular expression (`REGEX`),
which hledger will try to match case-insensitively anywhere within the CSV record.\
Eg: `whole foods`
2. A field matcher is preceded with a percent sign and [CSV field name](#field-names) (`%CSVFIELD REGEX`).
hledger will try to match these just within the named CSV field.\
Eg: `%date 2023`
The regular expression is (as usual in hledger) a POSIX extended regular expression,
that also supports GNU word boundaries (`\b`, `\B`, `\<`, `\>`),
and nothing else.
If you have trouble, see "Regular expressions" in the hledger manual (<https://hledger.org/hledger.html#regular-expressions>).
With record matchers, it's important to know that the record matched is not the original CSV record, but a modified one:
separators will be converted to commas, and enclosing double quotes (but not enclosing whitespace) are removed.
So for example, when reading an SSV file, if the original record was:
```ssv
2020-01-01; "Acme, Inc."; 1,000
```
the regex would see, and try to match, this modified record text:
```
2020-01-01,Acme, Inc., 1,000
```
When an if block has multiple matchers, they are combined as follows:
- By default they are OR'd (any one of them can match)
- When a matcher is preceded by ampersand (`&`) it will be AND'ed with the previous matcher (both of them must match).
There's not yet an easy syntax to negate a matcher.
## `if` table
"if tables" are an alternative to [if blocks](#if-blocks);
they can express many matchers and field assignments in a more compact tabular format, like this:
```rules
if,CSVFIELDNAME1,CSVFIELDNAME2,...,CSVFIELDNAMEn
MATCHER1,VALUE11,VALUE12,...,VALUE1n
MATCHER2,VALUE21,VALUE22,...,VALUE2n
MATCHER3,VALUE31,VALUE32,...,VALUE3n
if,HLEDGERFIELD1,HLEDGERFIELD2,...
MATCHERA,VALUE1,VALUE2,...
MATCHERB,VALUE1,VALUE2,...
MATCHERC,VALUE1,VALUE2,...
<empty line>
```
Conditional tables ("if tables") are a different syntax to specify
field assignments that will be applied only to CSV records which match certain patterns.
The first character after `if` is taken to be the separator for the rest of the table.
It should be a non-alphanumeric character like `,` or `|` that does not appear anywhere else in the table.
(Note: it is unrelated to the CSV file's separator.)
Whitespace can be used in the matcher lines for readability, but not in the if line currently.
The table must be terminated by an empty line (or end of file).
Each line must contain the same number of separators; empty values are allowed.
MATCHER could be either field or record matcher, as described above. When MATCHER matches,
values from that row would be assigned to the CSV fields named on the `if` line, in the same order.
The above means: try all of the matchers; whenever a matcher succeeds,
assign all of the values on that line to the corresponding hledger fields;
later lines can overrider earlier ones.
It is equivalent to this sequence of if blocks:
Therefore `if` table is exactly equivalent to a sequence of of `if` blocks:
```rules
if MATCHER1
CSVFIELDNAME1 VALUE11
CSVFIELDNAME2 VALUE12
if MATCHERA
HLEDGERFIELD1 VALUE1
HLEDGERFIELD2 VALUE2
...
CSVFIELDNAMEn VALUE1n
if MATCHER2
CSVFIELDNAME1 VALUE21
CSVFIELDNAME2 VALUE22
if MATCHERB
HLEDGERFIELD1 VALUE1
HLEDGERFIELD2 VALUE2
...
CSVFIELDNAMEn VALUE2n
if MATCHER3
CSVFIELDNAME1 VALUE31
CSVFIELDNAME2 VALUE32
if MATCHERC
HLEDGERFIELD1 VALUE1
HLEDGERFIELD2 VALUE2
...
CSVFIELDNAMEn VALUE3n
```
Each line starting with MATCHER should contain enough (possibly empty) values for all the listed fields.
Rules would be checked and applied in the order they are listed in the table and, like with `if` blocks, later rules (in the same or another table) or `if` blocks could override the effect of any rule.
Instead of ',' you can use a variety of other non-alphanumeric characters as a separator. First character after `if` is taken to be the separator for the rest of the table. It is the responsibility of the user to ensure that separator does not occur inside MATCHERs and values - there is no way to escape separator.
Example:
```rules
if,account2,comment
@ -3580,18 +3595,6 @@ atm transaction fee,expenses:business:banking,deductible? check it
2020/01/12.*Plumbing LLC,expenses:house:upkeep,emergency plumbing call-out
```
## `end`
This rule can be used inside [if blocks](#if) (only), to make hledger stop
reading this CSV file and move on to the next input file, or to command execution.
Eg:
```rules
# ignore everything following the first empty record
if ,,,,
end
```
## `include`
```rules
@ -3942,7 +3945,7 @@ Then for each CSV record in turn:
- collect all field assignments at top level and in matched `if` blocks.
When there are multiple assignments for a field, keep only the last one.
- compute a value for each hledger field - either the one that was assigned to it
(and interpolate the %CSVFIELDNAME references), or a default
(and interpolate the %CSVFIELD references), or a default
- generate a synthetic hledger transaction from these values.
This is all part of the CSV reader, one of several readers hledger can