From c3d8857ae58abc223c50de014124ddc2028f71b4 Mon Sep 17 00:00:00 2001 From: Simon Michael Date: Fri, 21 Aug 2020 09:00:54 -0700 Subject: [PATCH] ;doc: clarify csv file extensions and separator inferring --- hledger-lib/hledger_csv.m4.md | 48 ++++++++++++++++++++++------------- hledger/hledger.m4.md | 12 ++++----- 2 files changed, 36 insertions(+), 24 deletions(-) diff --git a/hledger-lib/hledger_csv.m4.md b/hledger-lib/hledger_csv.m4.md index 2025a71f7..affe07b12 100644 --- a/hledger-lib/hledger_csv.m4.md +++ b/hledger-lib/hledger_csv.m4.md @@ -12,11 +12,14 @@ _man_({{ # DESCRIPTION }}) -hledger can read -[CSV](http://en.wikipedia.org/wiki/Comma-separated_values) -(Comma Separated Value/Character Separated Value) files as if they were journal files, -automatically converting each CSV record into a transaction. (To -learn about *writing* CSV, see [CSV output](hledger.html#csv-output).) +hledger can read [CSV] files (Character Separated Value - usually +comma, semicolon, or tab) containing dated records as if they were +journal files, automatically converting each CSV record into a +transaction. + +(To learn about *writing* CSV, see [CSV output](hledger.html#csv-output).) + +[CSV]: http://en.wikipedia.org/wiki/Comma-separated_values We describe each CSV file's format with a corresponding *rules file*. By default this is named like the CSV file with a `.rules` extension @@ -506,21 +509,29 @@ See TIPS below for more about referencing other fields. ## `separator` -You can use the `separator` directive to read other kinds of -character-separated data. Eg to read SSV (Semicolon Separated Values), use: +You can use the `separator` rule to read other kinds of +character-separated data. The argument is any single separator +character, or the words `tab` or `space` (case insensitive). Eg, for +comma-separated values (CSV): + +``` +separator , +``` + +or for semicolon-separated values (SSV): ``` separator ; ``` -The separator directive accepts exactly one single byte character as a -separator. To specify whitespace characters, you may use the special -words `TAB` or `SPACE`. Eg to read TSV (Tab Separated Values), use: +or for tab-separated values (TSV): ``` separator TAB ``` -See also: [File Extension](#file-extension). - +If the input file has a `.csv`, `.ssv` or `.tsv` +[file extension](#file-extension) (or a `csv:`, `ssv:`, `tsv:` prefix), +the appropriate separator will be inferred automatically, and you +won't need this rule. ## `if` block @@ -790,11 +801,10 @@ When CSV values are enclosed in quotes, note: ## File Extension -CSV ("Character Separated Values") files -should be named with one of these filename extensions: `.csv`, `.ssv`, `.tsv`. -Or, the file path should be prefixed with one of `csv:`, `ssv:`, `tsv:`. -This helps hledger identify the format and show the right error messages. -For example: +To help hledger identify the format and show the right error messages, +CSV/SSV/TSV files should normally be named with a `.csv`, `.ssv` or `.tsv` +filename extension. Or, the file path should be prefixed with `csv:`, `ssv:` or `tsv:`. +Eg: ```shell $ hledger -f foo.ssv print ``` @@ -802,7 +812,9 @@ or: ``` $ cat foo | hledger -f ssv:- foo ``` -More about this: [Input files](hledger.html#input-files) in the hledger manual. + +You can override the file extension with a [separator](#separator) rule if needed. +See also: [Input files](hledger.html#input-files) in the hledger manual. ## Reading multiple CSV files diff --git a/hledger/hledger.m4.md b/hledger/hledger.m4.md index 058349d71..c544d87de 100644 --- a/hledger/hledger.m4.md +++ b/hledger/hledger.m4.md @@ -730,12 +730,12 @@ but it can also be one of several other formats, listed below. hledger detects the format automatically based on the file extension, or if that is not recognised, by trying each built-in "reader" in turn: -| Reader: | Reads: | Used for file extensions: | -|-------------|-----------------------------------------------------|-----------------------------------------------------| -| `journal` | hledger's journal format, also some Ledger journals | `.journal` `.j` `.hledger` `.ledger` | -| `timeclock` | timeclock files (precise time logging) | `.timeclock` | -| `timedot` | timedot files (approximate time logging) | `.timedot` | -| `csv` | comma-separated values (data interchange) | `.csv` | +| Reader: | Reads: | Used for file extensions: | +|-------------|------------------------------------------------------------------|--------------------------------------| +| `journal` | hledger journal files and some Ledger journals, for transactions | `.journal` `.j` `.hledger` `.ledger` | +| `timeclock` | timeclock files, for precise time logging | `.timeclock` | +| `timedot` | timedot files, for approximate time logging | `.timedot` | +| `csv` | comma/semicolon/tab/other-separated values, for data import | `.csv` `.ssv` `.tsv` | If needed (eg to ensure correct error messages when a file has the "wrong" extension), you can force a specific reader/format by prepending it to the file path with a colon.