;doc: clarify csv file extensions and separator inferring

This commit is contained in:
Simon Michael 2020-08-21 09:00:54 -07:00
parent a3c749f9e7
commit c3d8857ae5
2 changed files with 36 additions and 24 deletions

View File

@ -12,11 +12,14 @@ _man_({{
# DESCRIPTION # DESCRIPTION
}}) }})
hledger can read hledger can read [CSV] files (Character Separated Value - usually
[CSV](http://en.wikipedia.org/wiki/Comma-separated_values) comma, semicolon, or tab) containing dated records as if they were
(Comma Separated Value/Character Separated Value) files as if they were journal files, journal files, automatically converting each CSV record into a
automatically converting each CSV record into a transaction. (To transaction.
learn about *writing* CSV, see [CSV output](hledger.html#csv-output).)
(To learn about *writing* CSV, see [CSV output](hledger.html#csv-output).)
[CSV]: http://en.wikipedia.org/wiki/Comma-separated_values
We describe each CSV file's format with a corresponding *rules file*. We describe each CSV file's format with a corresponding *rules file*.
By default this is named like the CSV file with a `.rules` extension By default this is named like the CSV file with a `.rules` extension
@ -506,21 +509,29 @@ See TIPS below for more about referencing other fields.
## `separator` ## `separator`
You can use the `separator` directive to read other kinds of You can use the `separator` rule to read other kinds of
character-separated data. Eg to read SSV (Semicolon Separated Values), use: character-separated data. The argument is any single separator
character, or the words `tab` or `space` (case insensitive). Eg, for
comma-separated values (CSV):
```
separator ,
```
or for semicolon-separated values (SSV):
``` ```
separator ; separator ;
``` ```
The separator directive accepts exactly one single byte character as a or for tab-separated values (TSV):
separator. To specify whitespace characters, you may use the special
words `TAB` or `SPACE`. Eg to read TSV (Tab Separated Values), use:
``` ```
separator TAB separator TAB
``` ```
See also: [File Extension](#file-extension). If the input file has a `.csv`, `.ssv` or `.tsv`
[file extension](#file-extension) (or a `csv:`, `ssv:`, `tsv:` prefix),
the appropriate separator will be inferred automatically, and you
won't need this rule.
## `if` block ## `if` block
@ -790,11 +801,10 @@ When CSV values are enclosed in quotes, note:
## File Extension ## File Extension
CSV ("Character Separated Values") files To help hledger identify the format and show the right error messages,
should be named with one of these filename extensions: `.csv`, `.ssv`, `.tsv`. CSV/SSV/TSV files should normally be named with a `.csv`, `.ssv` or `.tsv`
Or, the file path should be prefixed with one of `csv:`, `ssv:`, `tsv:`. filename extension. Or, the file path should be prefixed with `csv:`, `ssv:` or `tsv:`.
This helps hledger identify the format and show the right error messages. Eg:
For example:
```shell ```shell
$ hledger -f foo.ssv print $ hledger -f foo.ssv print
``` ```
@ -802,7 +812,9 @@ or:
``` ```
$ cat foo | hledger -f ssv:- foo $ cat foo | hledger -f ssv:- foo
``` ```
More about this: [Input files](hledger.html#input-files) in the hledger manual.
You can override the file extension with a [separator](#separator) rule if needed.
See also: [Input files](hledger.html#input-files) in the hledger manual.
## Reading multiple CSV files ## Reading multiple CSV files

View File

@ -730,12 +730,12 @@ but it can also be one of several other formats, listed below.
hledger detects the format automatically based on the file extension, hledger detects the format automatically based on the file extension,
or if that is not recognised, by trying each built-in "reader" in turn: or if that is not recognised, by trying each built-in "reader" in turn:
| Reader: | Reads: | Used for file extensions: | | Reader: | Reads: | Used for file extensions: |
|-------------|-----------------------------------------------------|-----------------------------------------------------| |-------------|------------------------------------------------------------------|--------------------------------------|
| `journal` | hledger's journal format, also some Ledger journals | `.journal` `.j` `.hledger` `.ledger` | | `journal` | hledger journal files and some Ledger journals, for transactions | `.journal` `.j` `.hledger` `.ledger` |
| `timeclock` | timeclock files (precise time logging) | `.timeclock` | | `timeclock` | timeclock files, for precise time logging | `.timeclock` |
| `timedot` | timedot files (approximate time logging) | `.timedot` | | `timedot` | timedot files, for approximate time logging | `.timedot` |
| `csv` | comma-separated values (data interchange) | `.csv` | | `csv` | comma/semicolon/tab/other-separated values, for data import | `.csv` `.ssv` `.tsv` |
If needed (eg to ensure correct error messages when a file has the "wrong" extension), If needed (eg to ensure correct error messages when a file has the "wrong" extension),
you can force a specific reader/format by prepending it to the file path with a colon. you can force a specific reader/format by prepending it to the file path with a colon.