;doc: cli: csv cleanups

This commit is contained in:
Simon Michael 2022-12-11 02:46:32 -10:00
parent 38f792677d
commit 89d5f973f4

View File

@ -4311,202 +4311,22 @@ $ hledger -f paypal-custom.csv print
The following kinds of rule can appear in the rules file, in any order.
Blank lines and lines beginning with `#` or `;` or `*` are ignored.
| | |
|-------------------------------------------|-----------------------------------------------------------------------|
| [**`skip`**](#skip) | skip one or more header lines or matched CSV records |
| [**`fields` list**](#fields-list) | name CSV fields, assign them to hledger fields |
| [**field assignment**](#field-assignment) | assign a value to one hledger field, with interpolation |
| [**Field names**](#field-names) | hledger field names, used in the fields list and field assignments |
| [**`separator`**](#separator) | a custom field separator |
| [**`if` block**](#if-block) | apply some rules to CSV records matched by patterns |
| [**`if` table**](#if-table) | apply some rules to CSV records matched by patterns, alternate syntax |
| [**`end`**](#end) | skip the remaining CSV records |
| [**`date-format`**](#date-format) | how to parse dates in CSV records |
| [**`decimal-mark`**](#decimal-mark-1) | the decimal mark used in CSV amounts, if ambiguous |
| [**`newest-first`**](#newest-first) | improve txn order when there are multiple records, newest first, all with the same date |
| [**`intra-day-reversed`**](#intra-day-reversed) | improve txn order when each day's txns are reverse of the overall date order |
| [**`include`**](#include) | inline another CSV rules file |
| [**`balance-type`**](#balance-type) | choose which type of balance assignments to use |
## `skip`
```rules
skip N
```
The word "skip" followed by a number (or no number, meaning 1)
tells hledger to ignore this many non-empty lines preceding the CSV data.
(Empty/blank lines are skipped automatically.)
You'll need this whenever your CSV data contains header lines.
It also has a second purpose: it can be used inside [if blocks](#if-block)
to ignore certain CSV records (described below).
## `fields` list
```rules
fields FIELDNAME1, FIELDNAME2, ...
```
A fields list (the word "fields" followed by comma-separated field
names) is the quick way to assign CSV field values to hledger fields.
(The other way is [field assignments](#field-assignment), see below.)
A fields list does does two things:
1. It names the CSV fields.
This is optional, but can be convenient later for interpolating them.
2. Whenever you use a standard hledger field name (defined below),
the CSV value is assigned to that part of the hledger transaction.
Here's an example that says
"use the 1st, 2nd and 4th fields as the transaction's date, description and amount;
name the last two fields for later reference; and ignore the others":
```rules
fields date, description, , amount, , , somefield, anotherfield
```
Tips:
- The fields list always use commas, even if your CSV data uses [another separator character](#separator).
- Currently there must be least two items in the list (at least one comma).
- Field names may not contain spaces. Spaces before/after field names are optional.
- Field names may contain `_` (underscore) or `-` (hyphen).
- If the CSV contains column headings, it's a good idea to use these, suitably modified, as the basis for your field names (eg lower-cased, with underscores instead of spaces).
- If some heading names match standard hledger fields, but you don't want to set the hledger fields directly, alter those names, eg by appending an underscore.
- Fields you don't care about can be given a dummy name (eg: `_` ), or no name.
## field assignment
```rules
HLEDGERFIELDNAME FIELDVALUE
```
Field assignments are the more flexible way to assign CSV values to hledger fields.
They can be used instead of or in addition to a [fields list](#fields-list) (see above).
To assign a value to a hledger field, write the field name
(any of the standard hledger field/pseudo-field names, defined below),
a space, followed by a text value on the same line.
This text value may interpolate CSV fields,
referenced by their 1-based position in the CSV record (`%N`),
or by the name they were given in the [fields list](#fields-list) (`%CSVFIELDNAME`).
Some examples:
```rules
# set the amount to the 4th CSV field, with " USD" appended
amount %4 USD
# combine three fields to make a comment, containing note: and date: tags
comment note: %somefield - %anotherfield, date: %1
```
Tips:
- Interpolation strips outer whitespace (so a CSV value like `" 1 "`
becomes `1` when interpolated)
([#1051](https://github.com/simonmichael/hledger/issues/1051)).
- Interpolations always refer to a CSV field -
you can't interpolate a hledger field.
(See [Referencing other fields](#referencing-other-fields) below).
## Field names
Here are the standard hledger field (and pseudo-field) names, which
you can use in a [fields list](#fields-list) and in [field assignments](#field-assignment).
For more about the transaction parts they refer to, see [Transactions](#transactions).
#### date field
Assigning to `date` sets the [transaction date](#simple-dates).
#### date2 field
`date2` sets the transaction's [secondary date](#secondary-dates), if any.
#### status field
`status` sets the transaction's [status](#status), if any.
#### code field
`code` sets the transaction's [code](#code), if any.
#### description field
`description` sets the transaction's [description](#description), if any.
#### comment field
`comment` sets the transaction's [comment](#transaction-comments), if any.
`commentN`, where N is a number, sets the Nth posting's comment.
Tips:
- You can assign multi-line comments by writing literal `\n` in the code. A comment starting with `\n` will begin on a new line.
- Comments can contain [tags](#tags), as usual.
#### account field
Assigning to `accountN`, where N is 1 to 99,
sets the account name of the Nth [posting](#postings),
and causes that posting to be generated.
Most often there are two postings, so you'll want to set `account1` and `account2`.
Typically `account1` is associated with the CSV file, and is set once with a top-level assignment,
while `account2` is set based on each transaction's description, and in conditional blocks.
If a posting's account name is left unset but its amount is set (see below),
a default account name will be chosen (like "expenses:unknown" or "income:unknown").
#### amount field
`amountN` sets the amount of the Nth posting,
and causes that posting to be generated.
By assigning to `amount1`, `amount2`, ... etc. you can generate up to 99 postings.
`amountN-in` and `amountN-out` can be used instead,
if the CSV uses separate fields for debits and credits (inflows and outflows).
hledger assumes both of these CSV fields are unsigned, and will automatically negate the "-out" value.
If they are signed, see ["Setting amounts"](#setting-amounts) below.
`amount`, or `amount-in` and `amount-out` are a legacy mode,
to keep pre-hledger-1.17 CSV rules files working (and for occasional convenience).
They are suitable only for two-posting transactions;
they set both posting 1's and posting 2's amount.
Posting 2's amount will be negated, and also converted to cost
if there's a [cost price](#costs).
If you have an existing rules file using the unnumbered form, you
might want to use the numbered form in certain conditional blocks,
without having to update and retest all the old rules.
To facilitate this,
posting 1 ignores `amount`/`amount-in`/`amount-out` if any of `amount1`/`amount1-in`/`amount1-out` are assigned,
and posting 2 ignores them if any of `amount2`/`amount2-in`/`amount2-out` are assigned,
avoiding conflicts.
#### currency field
`currency` sets a currency symbol, to be prepended to all postings' amounts.
You can use this if the CSV amounts do not have a currency symbol, eg if it is in a separate column.
`currencyN` prepends a currency symbol to just the Nth posting's amount.
#### balance field
`balanceN` sets a [balance assertion](#balance-assertions) amount
(or if the posting amount is left empty, a [balance assignment](#balance-assignments))
on posting N.
`balance` is a compatibility spelling for hledger <1.17;
it is equivalent to `balance1`.
You can adjust the type of assertion/assignment with the
[`balance-type` rule](#balance-type) (see below).
See [Tips](#tips) below for more about setting amounts and currency.
| | |
|-------------------------------------------------|-----------------------------------------------------------------------------------------|
| [**`separator`**](#separator) | a custom field separator |
| [**`skip`**](#skip) | skip one or more header lines or matched CSV records |
| [**`fields` list**](#fields-list) | name CSV fields, assign them to hledger fields |
| [**Field assignment**](#field-assignment) | assign a value to one hledger field, with interpolation |
| [**Field names**](#field-names) | hledger field names, used in the fields list and field assignments |
| [**`if` block**](#if-block) | apply some rules to CSV records matched by patterns |
| [**`if` table**](#if-table) | apply some rules to CSV records matched by patterns, alternate syntax |
| [**`end`**](#end) | skip the remaining CSV records |
| [**`date-format`**](#date-format) | how to parse dates in CSV records |
| [**`decimal-mark`**](#decimal-mark-1) | the decimal mark used in CSV amounts, if ambiguous |
| [**`newest-first`**](#newest-first) | improve txn order when there are multiple records, newest first, all with the same date |
| [**`intra-day-reversed`**](#intra-day-reversed) | improve txn order when each day's txns are reverse of the overall date order |
| [**`include`**](#include) | inline another CSV rules file |
| [**`balance-type`**](#balance-type) | choose which type of balance assignments to use |
## `separator`
@ -4534,6 +4354,185 @@ If the input file has a `.csv`, `.ssv` or `.tsv`
the appropriate separator will be inferred automatically, and you
won't need this rule.
## `skip`
```rules
skip N
```
The word "skip" followed by a number (or no number, meaning 1)
tells hledger to ignore this many non-empty lines preceding the CSV data.
(Empty/blank lines are skipped automatically.)
You'll need this whenever your CSV data contains header lines.
It also has a second purpose: it can be used inside [if blocks](#if-block)
to ignore certain CSV records (described below).
## `fields` list
```rules
fields FIELDNAME1, FIELDNAME2, ...
```
A fields list (the word "fields" followed by comma-separated field
names) is the quick way to assign CSV field values to [hledger fields](#field-names).
(The other way is [field assignments](#field-assignment), see below.)
A fields list does does two things:
1. It names the CSV fields.
This is optional, but can be convenient later for interpolating them.
2. Whenever you use a standard hledger field name (defined below),
the CSV value is assigned to that part of the hledger transaction.
Here's an example that says
"use the 1st, 2nd and 4th fields as the transaction's date, description and amount;
name the last two fields for later reference; and ignore the others":
```rules
fields date, description, , amount, , , somefield, anotherfield
```
Tips:
- The fields list always use commas, even if your CSV data uses [another separator character](#separator).
- Currently there must be least two items in the list (at least one comma).
- Field names may not contain spaces. Spaces before/after field names are optional.
- Field names may contain `_` (underscore) or `-` (hyphen).
- If the CSV contains column headings, it's a good idea to use these, suitably modified, as the basis for your field names (eg lower-cased, with underscores instead of spaces).
- If some heading names match standard hledger fields, but you don't want to set the hledger fields directly, alter those names, eg by appending an underscore.
- Fields you don't care about can be given a dummy name (eg: `_` ), or no name.
## Field assignment
```rules
HLEDGERFIELDNAME FIELDVALUE
```
Field assignments are the more flexible way to assign CSV values to hledger fields.
They can be used instead of or in addition to a [fields list](#fields-list) (see above).
To assign a value to a hledger field, write the [field name](#field-name)
(any of the standard hledger field/pseudo-field names, defined below),
a space, followed by a text value on the same line.
This text value may interpolate CSV fields,
referenced by their 1-based position in the CSV record (`%N`),
or by the name they were given in the fields list (`%CSVFIELDNAME`).
Some examples:
```rules
# set the amount to the 4th CSV field, with " USD" appended
amount %4 USD
# combine three fields to make a comment, containing note: and date: tags
comment note: %somefield - %anotherfield, date: %1
```
Tips:
- Interpolation strips outer whitespace (so a CSV value like `" 1 "`
becomes `1` when interpolated)
([#1051](https://github.com/simonmichael/hledger/issues/1051)).
- Interpolations always refer to a CSV field -
you can't interpolate a hledger field.
(See [Referencing other fields](#referencing-other-fields) below).
## Field names
Here are the standard hledger field (and pseudo-field) names, which
you can use in a [fields list](#fields-list) or in [field assignments](#field-assignment).
For more about the transaction parts they refer to, see [Transactions](#transactions).
### date field
Assigning to `date` sets the [transaction date](#simple-dates).
### date2 field
`date2` sets the transaction's [secondary date](#secondary-dates), if any.
### status field
`status` sets the transaction's [status](#status), if any.
### code field
`code` sets the transaction's [code](#code), if any.
### description field
`description` sets the transaction's [description](#description), if any.
### comment field
`comment` sets the transaction's [comment](#transaction-comments), if any.
`commentN`, where N is a number, sets the Nth posting's comment.
You can assign multi-line comments by writing literal `\n` in the code. A comment starting with `\n` will begin on a new line.
Comments can contain [tags](#tags), as usual.
### account field
Assigning to `accountN`, where N is 1 to 99,
sets the account name of the Nth [posting](#postings),
and causes that posting to be generated.
Most often there are two postings, so you'll want to set `account1` and `account2`.
Typically `account1` is associated with the CSV file, and is set once with a top-level assignment,
while `account2` is set based on each transaction's description, and in conditional blocks.
If a posting's account name is left unset but its amount is set (see below),
a default account name will be chosen (like "expenses:unknown" or "income:unknown").
### amount field
`amountN` sets the amount of the Nth posting,
and causes that posting to be generated.
By assigning to `amount1`, `amount2`, ... etc. you can generate up to 99 postings.
`amountN-in` and `amountN-out` can be used instead,
if the CSV uses separate fields for debits and credits (inflows and outflows).
hledger assumes both of these CSV fields are unsigned, and will automatically negate the "-out" value.
If they are signed, see ["Setting amounts"](#setting-amounts) below.
`amount`, or `amount-in` and `amount-out` are a legacy mode,
to keep pre-hledger-1.17 CSV rules files working (and for occasional convenience).
They are suitable only for two-posting transactions;
they set both posting 1's and posting 2's amount.
Posting 2's amount will be negated, and also converted to cost
if there's a [cost price](#costs).
If you have an existing rules file using the unnumbered form, you
might want to use the numbered form in certain conditional blocks,
without having to update and retest all the old rules.
To facilitate this,
posting 1 ignores `amount`/`amount-in`/`amount-out` if any of `amount1`/`amount1-in`/`amount1-out` are assigned,
and posting 2 ignores them if any of `amount2`/`amount2-in`/`amount2-out` are assigned,
avoiding conflicts.
### currency field
`currency` sets a currency symbol, to be prepended to all postings' amounts.
You can use this if the CSV amounts do not have a currency symbol, eg if it is in a separate column.
`currencyN` prepends a currency symbol to just the Nth posting's amount.
### balance field
`balanceN` sets a [balance assertion](#balance-assertions) amount
(or if the posting amount is left empty, a [balance assignment](#balance-assignments))
on posting N.
`balance` is a compatibility spelling for hledger <1.17;
it is equivalent to `balance1`.
You can adjust the type of assertion/assignment with the
[`balance-type` rule](#balance-type) (see below).
See [Tips](#tips) below for more about setting amounts and currency.
## `if` block
```rules
@ -4552,7 +4551,7 @@ Conditional blocks ("if blocks") are a block of rules that are applied
only to CSV records which match certain patterns. They are often used
for customising account names based on transaction descriptions.
#### Matching the whole record
### Matching the whole record
Each MATCHER can be a record matcher, which looks like this:
```rules
@ -4571,7 +4570,7 @@ with any enclosing double quotes (but not enclosing whitespace) removed, and alw
Eg, if the original record is `2020-01-01; "Acme, Inc."; 1,000`,
the REGEX will actually see `2020-01-01,Acme, Inc., 1,000`).
#### Matching individual fields
### Matching individual fields
Or, MATCHER can be a field matcher, like this:
```rules
@ -4580,7 +4579,7 @@ Or, MATCHER can be a field matcher, like this:
which matches just the content of a particular CSV field.
CSVFIELD is a percent sign followed by the field's name or column number, like `%date` or `%1`.
#### Combining matchers
### Combining matchers
A single matcher can be written on the same line as the "if";
or multiple matchers can be written on the following lines, non-indented.
@ -4594,7 +4593,7 @@ MATCHER
RULE
```
#### Rules applied on successful match
### Rules applied on successful match
After the patterns there should be one or more rules to apply, all
indented by at least one space. Three kinds of rule are allowed in