feat: csv: rules files can be read directly; data file can be specified

CSV rules files can now be read directly, eg you have the option of
writing `hledger -f foo.csv.rules CMD`. By default this will read data
from foo.csv in the same directory.  But you can also specify a
different data file with a new `source FILE` rule. This has some
convenience features:

- If the data file does not exist, it is treated as empty, not an
  error.

- If FILE is a relative path, it is relative to the rules file's
  directory. If it is just a file name with no path, it is relative
  to ~/Downloads/.

- If FILE is a glob pattern, the most recently modified matched file
  is used.

This helps remove some of the busywork of managing CSV downloads.
Most of your financial institutions's default CSV filenames are
different and can be recognised by a glob pattern.  So you can put a
rule like `source Checking1*.csv` in foo-checking.csv.rules,
periodically download CSV from Foo's website accepting your browser's
defaults, and then run `hledger import checking.csv.rules` to import
any new transactions. The next time, if you have done no cleanup, your
browser will probably save it as something like Checking1-2.csv, and
hledger will still see that because of the * wild card. You can choose
whether to delete CSVs after import, or keep them for a while as
temporary backups, or archive them somewhere.
This commit is contained in:
Simon Michael 2023-05-12 11:27:41 -10:00
parent ddae3af8a3
commit 029b59093b
13 changed files with 3134 additions and 1486 deletions

View File

@ -80,6 +80,7 @@ import Hledger.Read.Common
import Hledger.Read.InputOptions
import Hledger.Read.JournalReader as JournalReader
import Hledger.Read.CsvReader (tests_CsvReader)
import Hledger.Read.RulesReader (tests_RulesReader)
-- import Hledger.Read.TimedotReader (tests_TimedotReader)
-- import Hledger.Read.TimeclockReader (tests_TimeclockReader)
import Hledger.Utils
@ -308,4 +309,5 @@ tests_Read = testGroup "Read" [
tests_Common
,tests_CsvReader
,tests_JournalReader
,tests_RulesReader
]

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,49 @@
--- * -*- outline-regexp:"--- \\*"; -*-
--- ** doc
{-|
CSV utilities.
-}
--- ** language
{-# LANGUAGE OverloadedStrings #-}
--- ** exports
module Hledger.Read.CsvUtils (
CSV, CsvRecord, CsvValue,
printCSV,
-- * Tests
tests_CsvUtils,
)
where
--- ** imports
import Prelude hiding (Applicative(..))
import Data.List (intersperse)
import Data.Text (Text)
import qualified Data.Text as T
import qualified Data.Text.Lazy as TL
import qualified Data.Text.Lazy.Builder as TB
import Hledger.Utils
--- ** doctest setup
-- $setup
-- >>> :set -XOverloadedStrings
type CSV = [CsvRecord]
type CsvRecord = [CsvValue]
type CsvValue = Text
printCSV :: [CsvRecord] -> TL.Text
printCSV = TB.toLazyText . unlinesB . map printRecord
where printRecord = foldMap TB.fromText . intersperse "," . map printField
printField = wrap "\"" "\"" . T.replace "\"" "\"\""
--- ** tests
tests_CsvUtils :: TestTree
tests_CsvUtils = testGroup "CsvUtils" [
]

View File

@ -100,9 +100,10 @@ import Hledger.Data
import Hledger.Read.Common
import Hledger.Utils
import qualified Hledger.Read.TimedotReader as TimedotReader (reader)
import qualified Hledger.Read.TimeclockReader as TimeclockReader (reader)
import qualified Hledger.Read.CsvReader as CsvReader (reader)
import qualified Hledger.Read.RulesReader as RulesReader (reader)
import qualified Hledger.Read.TimeclockReader as TimeclockReader (reader)
import qualified Hledger.Read.TimedotReader as TimedotReader (reader)
--- ** doctest setup
-- $setup
@ -137,6 +138,7 @@ readers' = [
reader
,TimeclockReader.reader
,TimedotReader.reader
,RulesReader.reader
,CsvReader.reader
-- ,LedgerReader.reader
]
@ -168,7 +170,7 @@ type PrefixedFilePath = FilePath
-- split that off. Eg "csv:-" -> (Just "csv", "-").
splitReaderPrefix :: PrefixedFilePath -> (Maybe String, FilePath)
splitReaderPrefix f =
headDef (Nothing, f)
headDef (Nothing, f) $
[(Just r, drop (length r + 1) f) | r <- readerNames, (r++":") `isPrefixOf` f]
--- ** reader

File diff suppressed because it is too large Load Diff

View File

@ -68,8 +68,10 @@ library
Hledger.Read
Hledger.Read.Common
Hledger.Read.CsvReader
Hledger.Read.CsvUtils
Hledger.Read.InputOptions
Hledger.Read.JournalReader
Hledger.Read.RulesReader
Hledger.Read.TimedotReader
Hledger.Read.TimeclockReader
Hledger.Reports

View File

@ -126,8 +126,10 @@ library:
- Hledger.Read
- Hledger.Read.Common
- Hledger.Read.CsvReader
- Hledger.Read.CsvUtils
- Hledger.Read.InputOptions
- Hledger.Read.JournalReader
- Hledger.Read.RulesReader
# - Hledger.Read.LedgerReader
- Hledger.Read.TimedotReader
- Hledger.Read.TimeclockReader

View File

@ -29,7 +29,7 @@ import Lucid as L hiding (value_)
import System.Console.CmdArgs.Explicit (flagNone, flagReq)
import Hledger
import Hledger.Read.CsvReader (CSV, CsvRecord, printCSV)
import Hledger.Read.CsvUtils (CSV, CsvRecord, printCSV)
import Hledger.Cli.CliOptions
import Hledger.Cli.Utils
import Text.Tabular.AsciiWide hiding (render)

View File

@ -277,7 +277,7 @@ import qualified Text.Tabular.AsciiWide as Tab
import Hledger
import Hledger.Cli.CliOptions
import Hledger.Cli.Utils
import Hledger.Read.CsvReader (CSV, printCSV)
import Hledger.Read.CsvUtils (CSV, printCSV)
-- | Command line options for this command.

View File

@ -23,7 +23,7 @@ import Lens.Micro ((^.), _Just, has)
import System.Console.CmdArgs.Explicit
import Hledger
import Hledger.Read.CsvReader (CSV, printCSV)
import Hledger.Read.CsvUtils (CSV, printCSV)
import Hledger.Cli.CliOptions
import Hledger.Cli.Utils
import System.Exit (exitFailure)

View File

@ -27,7 +27,7 @@ import qualified Data.Text.Lazy.Builder as TB
import System.Console.CmdArgs.Explicit (flagNone, flagReq)
import Hledger hiding (per)
import Hledger.Read.CsvReader (CSV, CsvRecord, printCSV)
import Hledger.Read.CsvUtils (CSV, CsvRecord, printCSV)
import Hledger.Cli.CliOptions
import Hledger.Cli.Utils
import Text.Tabular.AsciiWide hiding (render)

View File

@ -20,7 +20,7 @@ import qualified Data.Text.Lazy as TL
import qualified Data.Text.Lazy.Builder as TB
import Data.Time.Calendar (Day, addDays)
import System.Console.CmdArgs.Explicit as C
import Hledger.Read.CsvReader (CSV, printCSV)
import Hledger.Read.CsvUtils (CSV, printCSV)
import Lucid as L hiding (value_)
import Text.Tabular.AsciiWide as Tab hiding (render)

View File

@ -385,15 +385,14 @@ any of the supported file formats, which currently are:
| `journal` | hledger journal files and some Ledger journals, for transactions | `.journal` `.j` `.hledger` `.ledger` |
| `timeclock` | timeclock files, for precise time logging | `.timeclock` |
| `timedot` | timedot files, for approximate time logging | `.timedot` |
| `csv` | comma/semicolon/tab/other-separated values, for data import | `.csv` `.ssv` `.tsv` |
| `csv` | CSV/SSV/TSV/character-separated values, for data import | `.csv` `.ssv` `.tsv` `.csv.rules` `.ssv.rules` `.tsv.rules` |
These formats are described in more detail below.
hledger detects the format automatically based on the file extensions
shown above. If it can't recognise the file extension, it assumes
`journal` format. So for non-journal files, it's important to use a
recognised file extension, so as to either read successfully or to
show relevant error messages.
hledger detects the format automatically based on the file extensions shown above.
If it can't recognise the file extension, it assumes `journal` format.
So for non-journal files, it's important to use a recognised file extension,
so as to either read successfully or to show relevant error messages.
You can also force a specific reader/format by prefixing the file path
with the format and a colon. Eg, to read a .dat file as csv format:
@ -2914,6 +2913,7 @@ The following kinds of rule can appear in the rules file, in any order.
| | |
|-------------------------------------------------|------------------------------------------------------------------------------------------------|
| [**`source`**](#source) | optionally declare which file to read data from |
| [**`separator`**](#separator) | declare the field separator, instead of relying on file extension |
| [**`skip`**](#skip) | skip one or more header lines at start of file |
| [**`date-format`**](#date-format) | declare how to parse CSV dates/date-times |
@ -2931,6 +2931,35 @@ The following kinds of rule can appear in the rules file, in any order.
[Working with CSV](#working-with-csv) tips can be found below,
including [How CSV rules are evaluated](#how-csv-rules-are-evaluated).
## `source`
If you tell hledger to read a csv file with `-f foo.csv`, it will look for rules in `foo.csv.rules`.
Or, you can tell it to read the rules file, with `-f foo.csv.rules`, and it will look for data in `foo.csv` (since 1.30).
These are mostly equivalent, but the second method provides some extra features.
For one, the data file can be missing, without causing an error; it is just considered empty.
And, you can specify a different data file by adding a "source" rule:
```rules
source ./Checking1.csv
```
If you specify just a file name with no path, hledger will look for it
in your system's downloads directory (`~/Downloads`, currently):
```rules
source Checking1.csv
```
And if you specify a glob pattern, hledger will read the most recent of the matched files
(useful with repeated downloads):
```rules
source Checking1*.csv
```
See also ["Working with CSV > Reading files specified by rule"](#reading-files-specified-by-rule).
## `separator`
You can use the `separator` rule to read other kinds of
@ -2938,17 +2967,17 @@ character-separated data. The argument is any single separator
character, or the words `tab` or `space` (case insensitive). Eg, for
comma-separated values (CSV):
```
```rules
separator ,
```
or for semicolon-separated values (SSV):
```
```rules
separator ;
```
or for tab-separated values (TSV):
```
```rules
separator TAB
```
@ -3538,6 +3567,30 @@ hledger will look for a correspondingly-named rules file for each CSV
file. But if you use the `--rules-file` option, that rules file will
be used for all the CSV files.
### Reading files specified by rule
Instead of specifying a CSV file in the command line, you can specify
a rules file, as in `hledger -f foo.csv.rules CMD`.
By default this will read data from foo.csv in the same directory,
but you can add a [source](#source) rule to specify a different data file,
perhaps located in your web browser's download directory.
This feature was added in hledger 1.30, so you won't see it in most CSV rules examples.
But it helps remove some of the busywork of managing CSV downloads.
Most of your financial institutions's default CSV filenames are
different and can be recognised by a glob pattern. So you can put a
rule like `source Checking1*.csv` in foo-checking.csv.rules, and then
periodically follow a workflow like:
1. Download CSV from Foo's website, using your browser's defaults
2. Run `hledger import foo-checking.csv.rules` to import any new transactions
After import, you can: discard the CSV, or leave it where it is for a
while, or move it into your archives, as you prefer. If you do nothing,
next time your browser will save something like Checking1-2.csv,
and hledger will use that because of the `*` wild card and because
it is the most recent.
### Valid transactions
After reading a CSV file, hledger post-processes and validates the