doc:text encoding, csv encoding: cleanups [#2319]
This commit is contained in:
parent
cdf6ce072d
commit
35df9576ec
@ -106,14 +106,19 @@ For more about how to do that on your system, see [Common tasks > Setting LEDGER
|
||||
|
||||
## Text encoding
|
||||
|
||||
Data files containing non-ascii characters must use UTF-8 encoding, with the exception being `csv` files (see [`encoding`](#encoding) below).
|
||||
An optional [byte order mark (BOM)](https://www.unicode.org/faq/utf_bom.html#BOM) is allowed, at the beginning of the file (only).
|
||||
hledger input files containing non-ascii characters must use UTF-8 encoding,
|
||||
with the exception of CSV (SSV, TSV..) files, which can be read from other encodings (see [`encoding`](#encoding) CSV rule).
|
||||
|
||||
Also, your system should be configured with a locale that can decode UTF-8 text.
|
||||
On some unix systems, you may need set the `LANG` environment variable, eg.
|
||||
In UTF-8 input files, an optional [byte order mark (BOM)](https://www.unicode.org/faq/utf_bom.html#BOM) at the beginning of the file is allowed.
|
||||
|
||||
Your system may need to be configured with a locale that understands the input file's encoding.
|
||||
Eg on some unix systems, you may need set the `LANG` environment variable.
|
||||
You can read more about this in [Unicode characters](#unicode-characters), below.
|
||||
|
||||
On unix systems you can check a file's encoding with the `file` command.
|
||||
On some unix systems you can use the `file` command to show a file's text encoding.
|
||||
On mac, you'll need the version from homebrew: `brew install file-formula`.
|
||||
|
||||
hledger's text output is always UTF-8 encoded.
|
||||
|
||||
## Data formats
|
||||
|
||||
@ -3215,7 +3220,7 @@ The following kinds of rule can appear in the rules file, in any order.
|
||||
| | |
|
||||
|-------------------------------------------------|------------------------------------------------------------------------------------------------|
|
||||
| [**`source`**](#source) | optionally declare which file to read data from |
|
||||
| [**`encoding`**](#encoding) | optionally declare which encoding the data has |
|
||||
| [**`encoding`**](#encoding) | optionally declare which text encoding the data has |
|
||||
| [**`separator`**](#separator) | declare the field separator, instead of relying on file extension |
|
||||
| [**`skip`**](#skip) | skip one or more header lines at start of file |
|
||||
| [**`date-format`**](#date-format) | declare how to parse CSV dates/date-times |
|
||||
@ -3268,70 +3273,21 @@ See also ["Working with CSV > Reading files specified by rule"](#reading-files-s
|
||||
encoding ENCODING
|
||||
```
|
||||
|
||||
Specifying `encoding` followed by a valid encoding tells HLedger how to convert a
|
||||
csv to be able to make use of it.
|
||||
This is most often useful when getting a csv from a bank as they are sometimes
|
||||
in an old encoding.
|
||||
hledger normally expects non-ascii text to be UTF8-encoded.
|
||||
If you need to read CSV files which have some other encoding,
|
||||
you can do it by adding `encoding ENCODING` to your CSV rules.
|
||||
Eg: `encoding ISO88591`.
|
||||
|
||||
If none is given, `utf8` is assumed.
|
||||
The following encodings are supported
|
||||
(these names are case-insensitive, and can be written with inner spaces or hyphens if you prefer):
|
||||
ASCII, UTF8, UTF16, UTF32, ISO88591, ISO88592, ISO88593, ISO88594,
|
||||
ISO88595, ISO88596, ISO88597, ISO88598, ISO88599, ISO885910,
|
||||
ISO885911, ISO885913, ISO885914, ISO885915, ISO885916, CP1250, CP1251,
|
||||
CP1252, CP1253, CP1254, CP1255, CP1256, CP1257, CP1258, KOI8R, KOI8U,
|
||||
GB18030, MacOSRoman, JISX0201, JISX0208, ISO2022JP, ShiftJIS, CP437,
|
||||
CP737, CP775, CP850, CP852, CP855, CP857, CP860, CP861, CP862, CP863,
|
||||
CP864, CP865, CP866, CP869, CP874, CP932.
|
||||
|
||||
The encoding will be checked case-insensitive with some alternative spellings also allowed.
|
||||
The full list of valid encodings is:
|
||||
- ASCII
|
||||
- UTF8
|
||||
- UTF16
|
||||
- UTF32
|
||||
- ISO88591
|
||||
- ISO88592
|
||||
- ISO88593
|
||||
- ISO88594
|
||||
- ISO88595
|
||||
- ISO88596
|
||||
- ISO88597
|
||||
- ISO88598
|
||||
- ISO88599
|
||||
- ISO885910
|
||||
- ISO885911
|
||||
- ISO885913
|
||||
- ISO885914
|
||||
- ISO885915
|
||||
- ISO885916
|
||||
- CP1250
|
||||
- CP1251
|
||||
- CP1252
|
||||
- CP1253
|
||||
- CP1254
|
||||
- CP1255
|
||||
- CP1256
|
||||
- CP1257
|
||||
- CP1258
|
||||
- KOI8R
|
||||
- KOI8U
|
||||
- GB18030
|
||||
- MacOSRoman
|
||||
- JISX0201
|
||||
- JISX0208
|
||||
- ISO2022JP
|
||||
- ShiftJIS
|
||||
- CP437
|
||||
- CP737
|
||||
- CP775
|
||||
- CP850
|
||||
- CP852
|
||||
- CP855
|
||||
- CP857
|
||||
- CP860
|
||||
- CP861
|
||||
- CP862
|
||||
- CP863
|
||||
- CP864
|
||||
- CP865
|
||||
- CP866
|
||||
- CP869
|
||||
- CP874
|
||||
- CP932
|
||||
|
||||
Alternate spellings may be found in the [source code of `encoding`](https://hackage.haskell.org/package/encoding/docs/src/Data.Encoding.html#encodingFromStringExplicit)
|
||||
|
||||
## `separator`
|
||||
|
||||
|
||||
Loading…
Reference in New Issue
Block a user