imp:import: support -s/--strict properly (fix #2113)

hledger import -s now runs strict checks on an in-memory copy of the
updated journal, before updating the journal file; if strict checks
fail, nothing is written to disk.

And hledger import now does not update any .latest files until it has
run without error (no failing strict checks, no failure while writing
the journal file). This makes it more idempotent, so you can run it
again after fixing problems.
This commit is contained in:
Simon Michael 2023-11-16 21:59:42 -10:00
parent e92ab28cce
commit fba297f705
2 changed files with 57 additions and 30 deletions

View File

@ -1,5 +1,7 @@
{-# LANGUAGE OverloadedStrings #-} {-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE TemplateHaskell #-} {-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE MultiWayIf #-}
{-# LANGUAGE NamedFieldPuns #-}
module Hledger.Cli.Commands.Import ( module Hledger.Cli.Commands.Import (
importmode importmode
@ -42,30 +44,52 @@ importcmd opts@CliOpts{rawopts_=rawopts,inputopts_=iopts} j = do
Nothing -> Just inferredStyles Nothing -> Just inferredStyles
Just inputStyles -> Just $ inputStyles <> inferredStyles Just inputStyles -> Just $ inputStyles <> inferredStyles
iopts' = iopts{new_=True, new_save_=not dryrun, balancingopts_=defbalancingopts{commodity_styles_= combinedStyles}} iopts' = iopts{
new_=True, -- read only new transactions since last time
new_save_=False, -- defer saving .latest files until the end
strict_=False, -- defer strict checks until the end
balancingopts_=defbalancingopts{commodity_styles_= combinedStyles} -- use amount styles from both when balancing txns
}
case inputfiles of case inputfiles of
[] -> error' "please provide one or more input files as arguments" -- PARTIAL: [] -> error' "please provide one or more input files as arguments" -- PARTIAL:
fs -> do fs -> do
enewj <- runExceptT $ readJournalFiles iopts' fs enewjandlatestdates <- runExceptT $ readJournalFilesAndLatestDates iopts' fs
case enewj of case enewjandlatestdates of
Left e -> error' e Left err -> error' err
Right newj -> Right (newj, latestdates) ->
case sortOn tdate $ jtxns newj of case sortOn tdate $ jtxns newj of
-- with --dry-run the output should be valid journal format, so messages have ; prepended -- with --dry-run the output should be valid journal format, so messages have ; prepended
[] -> do [] -> do
-- in this case, we vary the output depending on --dry-run, which is a bit awkward -- in this case, we vary the output depending on --dry-run, which is a bit awkward
let semicolon = if dryrun then "; " else "" :: String let semicolon = if dryrun then "; " else "" :: String
printf "%sno new transactions found in %s\n\n" semicolon inputstr printf "%sno new transactions found in %s\n\n" semicolon inputstr
newts | dryrun -> do
printf "; would import %d new transactions from %s:\n\n" (length newts) inputstr
-- TODO how to force output here ?
-- length (jtxns newj) `seq` print' opts{rawopts_=("explicit",""):rawopts} newj
mapM_ (T.putStr . showTransaction) newts
newts | catchup -> do newts | catchup -> do
printf "marked %s as caught up, skipping %d unimported transactions\n\n" inputstr (length newts) printf "marked %s as caught up, skipping %d unimported transactions\n\n" inputstr (length newts)
newts -> do newts -> do
-- XXX This writes unix line endings (\n), some at least, if dryrun
-- even if the file uses dos line endings (\r\n), which could leave then do
-- mixed line endings in the file. See also writeFileWithBackupIfChanged. -- first show imported txns
foldM_ (`journalAddTransaction` opts) j newts -- gets forced somehow.. (how ?) printf "; would import %d new transactions from %s:\n\n" (length newts) inputstr
printf "imported %d new transactions from %s to %s\n" (length newts) inputstr (journalFilePath j) mapM_ (T.putStr . showTransaction) newts
-- then check the whole journal with them added, if in strict mode
when (strict_ iopts) $ strictChecks
else do
-- first check the whole journal with them added, if in strict mode
when (strict_ iopts) $ strictChecks
-- then add (append) the transactions to the main journal file
-- XXX This writes unix line endings (\n), some at least,
-- even if the file uses dos line endings (\r\n), which could leave
-- mixed line endings in the file. See also writeFileWithBackupIfChanged.
foldM_ (`journalAddTransaction` opts) j newts -- gets forced somehow.. (how ?)
printf "imported %d new transactions from %s to %s\n" (length newts) inputstr (journalFilePath j)
-- and finally update the .latest files
mapM_ (saveLatestDates latestdates . snd . splitReaderPrefix) fs
where
-- add the new transactions to the journal in memory and check the whole thing
strictChecks = either fail pure $ journalStrictChecks j'
where j' = foldl' (flip addTransaction) j newts

View File

@ -1,9 +1,10 @@
## import ## import
Read new transactions added to each FILE since last run, and add them to Read new transactions added to each FILE provided as arguments since
the journal. Or with --dry-run, just print the transactions last run, and add them to the journal.
that would be added. Or with --catchup, just mark all of the FILEs' Or with --dry-run, just print the transactions that would be added.
transactions as imported, without actually importing any. Or with --catchup, just mark all of the FILEs' current transactions
as imported, without importing them.
_FLAGS _FLAGS
@ -22,14 +23,14 @@ most common import source, and these docs focus on that case.
### Deduplication ### Deduplication
As a convenience `import` does *deduplication* while reading transactions. `import` does *time-based deduplication*, to detect only the new
This does not mean "ignore transactions that look the same", transactions since the last successful import.
but rather "ignore transactions that have been seen before". (This does not mean "ignore transactions that look the same",
This is intended for when you are periodically importing foreign data but rather "ignore transactions that have been seen before".)
which may contain already-imported transactions. This is intended for when you are periodically importing downloaded data,
So eg, if every day you download bank CSV files containing redundant data, which may overlap with previous downloads.
you can safely run `hledger import bank.csv` and only new transactions will be imported. Eg if every week (or every day) you download a bank's last three months of CSV data,
(`import` is idempotent.) you can safely run `hledger import thebank.csv` each time and only new transactions will be imported.
Since the items being read (CSV records, eg) often do not come with Since the items being read (CSV records, eg) often do not come with
unique identifiers, hledger detects new transactions by date, assuming unique identifiers, hledger detects new transactions by date, assuming
@ -46,8 +47,10 @@ if you import often, the new transactions will be few, so less likely
to be the ones affected). to be the ones affected).
hledger remembers the latest date processed in each input file by hledger remembers the latest date processed in each input file by
saving a hidden ".latest" state file in the same directory. Eg when saving a hidden ".latest.FILE" file in FILE's directory
reading `finance/bank.csv`, it will look for and update the (after a succesful import).
Eg when reading `finance/bank.csv`, it will look for and update the
`finance/.latest.bank.csv` state file. `finance/.latest.bank.csv` state file.
The format is simple: one or more lines containing the The format is simple: one or more lines containing the
same ISO-format date (YYYY-MM-DD), meaning "I have processed same ISO-format date (YYYY-MM-DD), meaning "I have processed