;doc: clarify csv file extensions and separator inferring

This commit is contained in:
Simon Michael 2020-08-21 09:00:54 -07:00
parent a3c749f9e7
commit c3d8857ae5
2 changed files with 36 additions and 24 deletions

View File

@ -12,11 +12,14 @@ _man_({{
# DESCRIPTION # DESCRIPTION
}}) }})
hledger can read hledger can read [CSV] files (Character Separated Value - usually
[CSV](http://en.wikipedia.org/wiki/Comma-separated_values) comma, semicolon, or tab) containing dated records as if they were
(Comma Separated Value/Character Separated Value) files as if they were journal files, journal files, automatically converting each CSV record into a
automatically converting each CSV record into a transaction. (To transaction.
learn about *writing* CSV, see [CSV output](hledger.html#csv-output).)
(To learn about *writing* CSV, see [CSV output](hledger.html#csv-output).)
[CSV]: http://en.wikipedia.org/wiki/Comma-separated_values
We describe each CSV file's format with a corresponding *rules file*. We describe each CSV file's format with a corresponding *rules file*.
By default this is named like the CSV file with a `.rules` extension By default this is named like the CSV file with a `.rules` extension
@ -506,21 +509,29 @@ See TIPS below for more about referencing other fields.
## `separator` ## `separator`
You can use the `separator` directive to read other kinds of You can use the `separator` rule to read other kinds of
character-separated data. Eg to read SSV (Semicolon Separated Values), use: character-separated data. The argument is any single separator
character, or the words `tab` or `space` (case insensitive). Eg, for
comma-separated values (CSV):
```
separator ,
```
or for semicolon-separated values (SSV):
``` ```
separator ; separator ;
``` ```
The separator directive accepts exactly one single byte character as a or for tab-separated values (TSV):
separator. To specify whitespace characters, you may use the special
words `TAB` or `SPACE`. Eg to read TSV (Tab Separated Values), use:
``` ```
separator TAB separator TAB
``` ```
See also: [File Extension](#file-extension). If the input file has a `.csv`, `.ssv` or `.tsv`
[file extension](#file-extension) (or a `csv:`, `ssv:`, `tsv:` prefix),
the appropriate separator will be inferred automatically, and you
won't need this rule.
## `if` block ## `if` block
@ -790,11 +801,10 @@ When CSV values are enclosed in quotes, note:
## File Extension ## File Extension
CSV ("Character Separated Values") files To help hledger identify the format and show the right error messages,
should be named with one of these filename extensions: `.csv`, `.ssv`, `.tsv`. CSV/SSV/TSV files should normally be named with a `.csv`, `.ssv` or `.tsv`
Or, the file path should be prefixed with one of `csv:`, `ssv:`, `tsv:`. filename extension. Or, the file path should be prefixed with `csv:`, `ssv:` or `tsv:`.
This helps hledger identify the format and show the right error messages. Eg:
For example:
```shell ```shell
$ hledger -f foo.ssv print $ hledger -f foo.ssv print
``` ```
@ -802,7 +812,9 @@ or:
``` ```
$ cat foo | hledger -f ssv:- foo $ cat foo | hledger -f ssv:- foo
``` ```
More about this: [Input files](hledger.html#input-files) in the hledger manual.
You can override the file extension with a [separator](#separator) rule if needed.
See also: [Input files](hledger.html#input-files) in the hledger manual.
## Reading multiple CSV files ## Reading multiple CSV files

View File

@ -731,11 +731,11 @@ hledger detects the format automatically based on the file extension,
or if that is not recognised, by trying each built-in "reader" in turn: or if that is not recognised, by trying each built-in "reader" in turn:
| Reader: | Reads: | Used for file extensions: | | Reader: | Reads: | Used for file extensions: |
|-------------|-----------------------------------------------------|-----------------------------------------------------| |-------------|------------------------------------------------------------------|--------------------------------------|
| `journal` | hledger's journal format, also some Ledger journals | `.journal` `.j` `.hledger` `.ledger` | | `journal` | hledger journal files and some Ledger journals, for transactions | `.journal` `.j` `.hledger` `.ledger` |
| `timeclock` | timeclock files (precise time logging) | `.timeclock` | | `timeclock` | timeclock files, for precise time logging | `.timeclock` |
| `timedot` | timedot files (approximate time logging) | `.timedot` | | `timedot` | timedot files, for approximate time logging | `.timedot` |
| `csv` | comma-separated values (data interchange) | `.csv` | | `csv` | comma/semicolon/tab/other-separated values, for data import | `.csv` `.ssv` `.tsv` |
If needed (eg to ensure correct error messages when a file has the "wrong" extension), If needed (eg to ensure correct error messages when a file has the "wrong" extension),
you can force a specific reader/format by prepending it to the file path with a colon. you can force a specific reader/format by prepending it to the file path with a colon.