;doc: clarify csv file extensions and separator inferring

This commit is contained in:
Simon Michael 2020-08-21 09:00:54 -07:00
parent a3c749f9e7
commit c3d8857ae5
2 changed files with 36 additions and 24 deletions

View File

@ -12,11 +12,14 @@ _man_({{
# DESCRIPTION
}})
hledger can read
[CSV](http://en.wikipedia.org/wiki/Comma-separated_values)
(Comma Separated Value/Character Separated Value) files as if they were journal files,
automatically converting each CSV record into a transaction. (To
learn about *writing* CSV, see [CSV output](hledger.html#csv-output).)
hledger can read [CSV] files (Character Separated Value - usually
comma, semicolon, or tab) containing dated records as if they were
journal files, automatically converting each CSV record into a
transaction.
(To learn about *writing* CSV, see [CSV output](hledger.html#csv-output).)
[CSV]: http://en.wikipedia.org/wiki/Comma-separated_values
We describe each CSV file's format with a corresponding *rules file*.
By default this is named like the CSV file with a `.rules` extension
@ -506,21 +509,29 @@ See TIPS below for more about referencing other fields.
## `separator`
You can use the `separator` directive to read other kinds of
character-separated data. Eg to read SSV (Semicolon Separated Values), use:
You can use the `separator` rule to read other kinds of
character-separated data. The argument is any single separator
character, or the words `tab` or `space` (case insensitive). Eg, for
comma-separated values (CSV):
```
separator ,
```
or for semicolon-separated values (SSV):
```
separator ;
```
The separator directive accepts exactly one single byte character as a
separator. To specify whitespace characters, you may use the special
words `TAB` or `SPACE`. Eg to read TSV (Tab Separated Values), use:
or for tab-separated values (TSV):
```
separator TAB
```
See also: [File Extension](#file-extension).
If the input file has a `.csv`, `.ssv` or `.tsv`
[file extension](#file-extension) (or a `csv:`, `ssv:`, `tsv:` prefix),
the appropriate separator will be inferred automatically, and you
won't need this rule.
## `if` block
@ -790,11 +801,10 @@ When CSV values are enclosed in quotes, note:
## File Extension
CSV ("Character Separated Values") files
should be named with one of these filename extensions: `.csv`, `.ssv`, `.tsv`.
Or, the file path should be prefixed with one of `csv:`, `ssv:`, `tsv:`.
This helps hledger identify the format and show the right error messages.
For example:
To help hledger identify the format and show the right error messages,
CSV/SSV/TSV files should normally be named with a `.csv`, `.ssv` or `.tsv`
filename extension. Or, the file path should be prefixed with `csv:`, `ssv:` or `tsv:`.
Eg:
```shell
$ hledger -f foo.ssv print
```
@ -802,7 +812,9 @@ or:
```
$ cat foo | hledger -f ssv:- foo
```
More about this: [Input files](hledger.html#input-files) in the hledger manual.
You can override the file extension with a [separator](#separator) rule if needed.
See also: [Input files](hledger.html#input-files) in the hledger manual.
## Reading multiple CSV files

View File

@ -731,11 +731,11 @@ hledger detects the format automatically based on the file extension,
or if that is not recognised, by trying each built-in "reader" in turn:
| Reader: | Reads: | Used for file extensions: |
|-------------|-----------------------------------------------------|-----------------------------------------------------|
| `journal` | hledger's journal format, also some Ledger journals | `.journal` `.j` `.hledger` `.ledger` |
| `timeclock` | timeclock files (precise time logging) | `.timeclock` |
| `timedot` | timedot files (approximate time logging) | `.timedot` |
| `csv` | comma-separated values (data interchange) | `.csv` |
|-------------|------------------------------------------------------------------|--------------------------------------|
| `journal` | hledger journal files and some Ledger journals, for transactions | `.journal` `.j` `.hledger` `.ledger` |
| `timeclock` | timeclock files, for precise time logging | `.timeclock` |
| `timedot` | timedot files, for approximate time logging | `.timedot` |
| `csv` | comma/semicolon/tab/other-separated values, for data import | `.csv` `.ssv` `.tsv` |
If needed (eg to ensure correct error messages when a file has the "wrong" extension),
you can force a specific reader/format by prepending it to the file path with a colon.