feat:csv: support data generating scripts
and rewrite the rules reader.
This commit is contained in:
parent
c7878e88da
commit
97899f9a9b
@ -79,7 +79,7 @@ import qualified Data.Text.IO as T
|
|||||||
import Data.Time ( Day, TimeZone, UTCTime, LocalTime, ZonedTime(ZonedTime),
|
import Data.Time ( Day, TimeZone, UTCTime, LocalTime, ZonedTime(ZonedTime),
|
||||||
defaultTimeLocale, getCurrentTimeZone, localDay, parseTimeM, utcToLocalTime, localTimeToUTC, zonedTimeToUTC, utctDay)
|
defaultTimeLocale, getCurrentTimeZone, localDay, parseTimeM, utcToLocalTime, localTimeToUTC, zonedTimeToUTC, utctDay)
|
||||||
import Safe (atMay, headMay, lastMay, readMay)
|
import Safe (atMay, headMay, lastMay, readMay)
|
||||||
import System.Directory (createDirectoryIfMissing, doesFileExist, getHomeDirectory, getModificationTime, renameFile, removeFile)
|
import System.Directory (createDirectoryIfMissing, doesFileExist, getHomeDirectory, getModificationTime, removeFile)
|
||||||
-- import System.Directory (createDirectoryIfMissing, doesFileExist, getHomeDirectory, getModificationTime, listDirectory, renameFile, doesDirectoryExist)
|
-- import System.Directory (createDirectoryIfMissing, doesFileExist, getHomeDirectory, getModificationTime, listDirectory, renameFile, doesDirectoryExist)
|
||||||
import System.Exit (ExitCode(..))
|
import System.Exit (ExitCode(..))
|
||||||
import System.FilePath (stripExtension, takeBaseName, takeDirectory, takeExtension, takeFileName, (<.>), (</>))
|
import System.FilePath (stripExtension, takeBaseName, takeDirectory, takeExtension, takeFileName, (<.>), (</>))
|
||||||
@ -118,7 +118,7 @@ getDownloadDir = do
|
|||||||
return $ home </> "Downloads" -- XXX
|
return $ home </> "Downloads" -- XXX
|
||||||
|
|
||||||
-- | Read, parse and post-process a "Journal" from the given rules file, or give an error.
|
-- | Read, parse and post-process a "Journal" from the given rules file, or give an error.
|
||||||
-- This particular reader also provides some extra features like data-cleaning and archiving.
|
-- This particular reader also provides some extra features like data cleaning/generating commands and data archiving.
|
||||||
--
|
--
|
||||||
-- The provided input file handle, and the --rules option, are ignored by this reader.
|
-- The provided input file handle, and the --rules option, are ignored by this reader.
|
||||||
-- Instead, a data file (or data-generating command) is usually specified by the @source@ rule.
|
-- Instead, a data file (or data-generating command) is usually specified by the @source@ rule.
|
||||||
@ -133,19 +133,18 @@ getDownloadDir = do
|
|||||||
--
|
--
|
||||||
-- The source rule can specify a data-cleaning command, after a @|@ separator: @source foo*.csv | sed -e 's/USD/$/g'@.
|
-- The source rule can specify a data-cleaning command, after a @|@ separator: @source foo*.csv | sed -e 's/USD/$/g'@.
|
||||||
-- This command is executed by the user's default shell, receives the data file's content on stdin,
|
-- This command is executed by the user's default shell, receives the data file's content on stdin,
|
||||||
-- and should output CSV data suitable for hledger's conversion rules.
|
-- and should output CSV data suitable for the conversion rules.
|
||||||
-- A # character can be used to comment out the data-cleaning command: @source foo*.csv # | ...@.
|
-- A # character can be used to comment out the data-cleaning command: @source foo*.csv # | ...@.
|
||||||
--
|
--
|
||||||
-- When using the source rule, if the archive rule is also present, some behaviours change:
|
-- Or the source rule can specify just a data-generating command, with no file pattern: @source | foo-csv.sh@.
|
||||||
|
-- In this case the command receives no input; it should output CSV data suitable for the conversion rules.
|
||||||
--
|
--
|
||||||
-- - The import command:
|
-- If the archive rule is present:
|
||||||
-- will move the data file to an archive directory after a successful read
|
-- After successfully reading the data file or data command and converting to a journal, while doing a non-dry-run import:
|
||||||
-- (renamed like the rules file, date-stamped, to an auto-created data/ directory next to the rules file).
|
-- the data will be archived in an auto-created data/ directory next to the rules file,
|
||||||
-- And it will read the oldest data file, not the newest, if the glob pattern matches multiple files.
|
-- with a name based on the rules file and the data file's modification date and extension
|
||||||
-- If there is a data-cleaning command, only the original uncleaned data is archived, currently.
|
-- (or for a data-generating command, the current date and the ".csv" extension).
|
||||||
--
|
-- And import will prefer the oldest file matched by a glob pattern (not the newest).
|
||||||
-- - Other commands:
|
|
||||||
-- will read the newest archived data file, if any, as a fallback if the glob pattern matches no data files.
|
|
||||||
--
|
--
|
||||||
-- Balance assertions are not checked by this reader.
|
-- Balance assertions are not checked by this reader.
|
||||||
--
|
--
|
||||||
@ -153,93 +152,150 @@ parse :: InputOpts -> FilePath -> Handle -> ExceptT String IO Journal
|
|||||||
parse iopts rulesfile h = do
|
parse iopts rulesfile h = do
|
||||||
lift $ hClose h -- We don't need it (XXX why ?)
|
lift $ hClose h -- We don't need it (XXX why ?)
|
||||||
|
|
||||||
-- XXX higher-than usual debug level for file reading to bypass excessive noise from elsewhere, normally 6 or 7
|
-- The rules reader does a lot; we must be organised.
|
||||||
rules <- readRulesFile $ dbg4 "reading rules file" rulesfile
|
|
||||||
|
-- 1. gather contextual info
|
||||||
|
-- gives: import flag, dryrun flag, rulesdir
|
||||||
|
|
||||||
let
|
let
|
||||||
-- XXX How can we know when the command is import, and if it's a dry run ? In a hacky way, currently.
|
args = progArgs
|
||||||
args = progArgs
|
import_ = dbg2 "import" $ any (`elem` args) ["import", "imp"]
|
||||||
-- XXX Difficult to identify the command name reliably here,
|
dryrun = dbg2 "dryrun" $ any (`elem` args) ["--dry-run", "--dry"]
|
||||||
-- Cli.hs's moveFlagsAfterCommand would help but is not importable here.
|
rulesdir = takeDirectory rulesfile
|
||||||
-- Just look for import or imp appearing anywhere in the arguments.
|
|
||||||
cmdisimport = dbg7 "cmdisimport" $ any (`elem` args) ["import", "imp"]
|
|
||||||
dryrun = dbg7 "dryrun" $ any (`elem` args) ["--dry-run", "--dry"]
|
|
||||||
importing = dbg7 "importing" $ cmdisimport && not dryrun
|
|
||||||
archive = dbg7 "archive" $ isJust (getDirective "archive" rules)
|
|
||||||
archiving = dbg7 "archiving" $ importing && archive
|
|
||||||
rulesdir = dbg7 "rulesdir" $ takeDirectory rulesfile
|
|
||||||
archivedir = dbg7 "archivedir" $ rulesdir </> "data"
|
|
||||||
|
|
||||||
mdatafileandcmd <- liftIO $ do
|
-- 2. parse the source and archive rules
|
||||||
dldir <- getDownloadDir -- look here for the data file if it's specified without a directory
|
-- needs: rules file
|
||||||
let
|
-- gives: file pattern, data cleaning/generating command, archive flag
|
||||||
msourcearg = getDirective "source" rules
|
|
||||||
|
|
||||||
-- Surrounding whitespace is removed from the whole source argument and from each part of it.
|
-- XXX higher-than usual logging priority for file reading (normally 6 or 7), to bypass excessive noise from elsewhere
|
||||||
-- A # before | makes the rest of line a comment.
|
rules <- readRulesFile $ dbg1 "reading rules file" rulesfile
|
||||||
-- A # after | is left for the shell to interpret; it could be part of the command or the start of a comment.
|
let
|
||||||
stripspaces = T.strip
|
msourcearg = getDirective "source" rules
|
||||||
stripcommentandspaces = stripspaces . T.takeWhile (/= '#')
|
-- Nothing -> error' $ rulesfile ++ " source rule must specify a file pattern or a command"
|
||||||
msourceandcmd = T.breakOn "|" . stripspaces <$> msourcearg
|
-- Surrounding whitespace is removed from the whole source argument and from each part of it.
|
||||||
msource = T.unpack . stripcommentandspaces . fst <$> msourceandcmd
|
-- A # before | makes the rest of line a comment.
|
||||||
mcmd = msourceandcmd >>= \sc ->
|
-- A # after | is left for the shell to interpret; it could be part of the command or the start of a comment.
|
||||||
|
stripspaces = T.strip
|
||||||
|
stripcommentandspaces = stripspaces . T.takeWhile (/= '#')
|
||||||
|
mpatandcmd = T.breakOn "|" . stripspaces <$> msourcearg
|
||||||
|
mpat = dbg2 "file pattern" $ -- a non-empty file pattern, or nothing
|
||||||
|
case T.unpack . stripcommentandspaces . fst <$> mpatandcmd of
|
||||||
|
Just s | not $ null s -> Just s
|
||||||
|
_ -> Nothing
|
||||||
|
mcmd = dbg2 "data command" $ -- a non-empty command, or nothing
|
||||||
|
mpatandcmd >>= \sc ->
|
||||||
let c = T.unpack . stripspaces . T.drop 1 . snd $ sc
|
let c = T.unpack . stripspaces . T.drop 1 . snd $ sc
|
||||||
in if null c then Nothing else Just c
|
in if null c then Nothing else Just c
|
||||||
|
|
||||||
datafiles <- case msource of
|
archive = isJust (getDirective "archive" rules)
|
||||||
Nothing -> return [maybe err (dbg4 "inferred source") $ dataFileFor rulesfile] -- shouldn't fail, f has .rules extension
|
|
||||||
where err = error' $ "could not infer a data file for " <> rulesfile
|
|
||||||
Just glb -> do
|
|
||||||
let (dir,desc) = if isFileName glb then (dldir," in download directory") else (rulesdir,"")
|
|
||||||
expandGlob dir (dbg4 "source rule" glb) >>= sortByModTime <&> dbg4 ("matched files"<>desc<>", oldest first")
|
|
||||||
-- XXX disable for now, too much complication: easy review of recent imported data:
|
|
||||||
-- `archive` also affects non-`import` commands reading the rules file:
|
|
||||||
-- when the `source` rule's glob pattern matches no files (no new downloads are available),
|
|
||||||
-- they will use the archive as a fallback (reading the newest archived file, if any).
|
|
||||||
-- if the source rule matched no files and we are reading not importing, use the most recent archived file.
|
|
||||||
-- case globmatches of
|
|
||||||
-- [] | archive && not cmdisimport -> do
|
|
||||||
-- archivesFor archivedir rulesfile <&> take 1 <&> dbg4 "latest file in archive directory"
|
|
||||||
-- _ -> return globmatches -- XXX don't let it be cleaned again
|
|
||||||
|
|
||||||
return $ case datafiles of
|
-- 3. find the file to be read, if any
|
||||||
[] -> (Nothing, Nothing)
|
-- needs: file pattern, data command, import flag, archive flag, downloads dir
|
||||||
[f] | cmdisimport -> dbg4 "importing" (Just f , mcmd)
|
-- gives: data file, data file description
|
||||||
[f] -> dbg4 "reading" (Just f , mcmd)
|
|
||||||
fs | cmdisimport && archiving -> dbg4 "importing oldest file" (headMay fs, mcmd)
|
|
||||||
fs | cmdisimport -> dbg4 "importing newest file" (lastMay fs, mcmd)
|
|
||||||
fs -> dbg4 "reading newest file" (lastMay fs, mcmd)
|
|
||||||
|
|
||||||
case mdatafileandcmd of
|
(mdatafile, datafiledesc) <- dbg2 "data file found ?" <$> case (mpat, mcmd) of
|
||||||
(Nothing, _) -> return nulljournal -- data file specified by source rule was not found
|
(Nothing, Nothing) -> error' $ "to make " ++ rulesfile ++ " readable,\n please add a 'source' rule with a non-empty file pattern or command"
|
||||||
(Just datafile, mcmd) -> do
|
(Nothing, Just _) -> return (Nothing, "")
|
||||||
exists <- liftIO $ doesFileExist datafile
|
(Just pat, _) -> do
|
||||||
if not (datafile=="-" || exists)
|
dldir <- liftIO getDownloadDir -- look here for the data file if it's specified without a directory
|
||||||
then return nulljournal -- data file inferred from rules file name was not found
|
let
|
||||||
else do
|
(startdir, dirdesc)
|
||||||
datafileh <- liftIO $ openFileOrStdin datafile
|
| isFileName pat = (dldir, " in download directory")
|
||||||
rawdata <- liftIO $ readHandlePortably datafileh
|
| otherwise = (rulesdir, "")
|
||||||
cleandata <- liftIO $ maybe (return rawdata) (\c -> runFilterCommand rulesfile c rawdata) mcmd
|
fs <- liftIO $
|
||||||
cleandatafileh <- liftIO $ inputToHandle cleandata
|
expandGlob startdir pat
|
||||||
do
|
>>= sortByModTime
|
||||||
readJournalFromCsv (Just $ Left rules) datafile cleandatafileh Nothing
|
<&> dbg2 ("matched files"<>dirdesc<>", oldest first")
|
||||||
-- apply any command line account aliases. Can fail with a bad replacement pattern.
|
return $
|
||||||
>>= liftEither . journalApplyAliases (aliasesFromOpts iopts)
|
if import_ && archive
|
||||||
-- journalFinalise assumes the journal's items are
|
then (headMay fs, " oldest file")
|
||||||
-- reversed, as produced by JournalReader's parser.
|
else (lastMay fs, " newest file")
|
||||||
-- But here they are already properly ordered. So we'd
|
|
||||||
-- better preemptively reverse them once more. XXX inefficient
|
|
||||||
. journalReverse
|
|
||||||
>>= journalFinalise iopts{balancingopts_=(balancingopts_ iopts){ignore_assertions_=True}} rulesfile ""
|
|
||||||
>>= \j -> do
|
|
||||||
when archiving $ liftIO $ saveToArchive archivedir rulesfile datafile (mcmd <&> const cleandata)
|
|
||||||
return j
|
|
||||||
|
|
||||||
-- | Run the given shell command, passing the given text as input, and return the output.
|
-- 4. log which file we are reading/importing/cleaning/generating
|
||||||
-- Or if the command fails, raise an informative error.
|
-- needs: data file, data file description, import flag
|
||||||
runFilterCommand :: FilePath -> String -> Text -> IO Text
|
|
||||||
runFilterCommand rulesfile cmd input = do
|
case (mdatafile, datafiledesc) of
|
||||||
let process = (shell cmd) { std_in = CreatePipe, std_out = CreatePipe, std_err = CreatePipe }
|
(Just f, desc) -> dbg1IO ("trying to " ++ (if import_ then "import" else "read") ++ desc) f
|
||||||
|
(Nothing, _) -> return ()
|
||||||
|
|
||||||
|
-- 5. read raw, cleaned or generated data
|
||||||
|
-- needs: file pattern, data file, data command
|
||||||
|
-- gives: clean data (possibly empty)
|
||||||
|
|
||||||
|
mexistingdatafile <- maybe (return Nothing) (\f -> liftIO $ do
|
||||||
|
exists <- doesFileExist f
|
||||||
|
return $ if exists then Just f else Nothing
|
||||||
|
) $ mdatafile
|
||||||
|
cleandata <- dbg1With (\t -> "read "++(show $ length $ T.lines t)++" lines") <$> case (mpat, mexistingdatafile, mcmd) of
|
||||||
|
|
||||||
|
-- file pattern, but no file found
|
||||||
|
(Just _, Nothing, _) -> -- trace "file pattern, but no file found" $
|
||||||
|
return ""
|
||||||
|
|
||||||
|
-- file found, and maybe a data cleaning command
|
||||||
|
(_, Just f, mc) -> -- trace "file found" $
|
||||||
|
liftIO $ do
|
||||||
|
raw <- openFileOrStdin f >>= readHandlePortably
|
||||||
|
maybe (return raw) (\c -> runCommandAsFilter rulesfile (dbg0Msg ("running: "++c) c) raw) mc
|
||||||
|
|
||||||
|
-- no file pattern, but a data generating command
|
||||||
|
(Nothing, _, Just cmd) -> -- trace "data generating command" $
|
||||||
|
liftIO $ runCommand rulesfile $ dbg0Msg ("running: " ++ cmd) cmd
|
||||||
|
|
||||||
|
-- neither a file pattern nor a data generating command
|
||||||
|
(Nothing, _, Nothing) -> -- trace "no file pattern or data generating command" $
|
||||||
|
error' $ rulesfile ++ " source rule must specify a file pattern or a command"
|
||||||
|
|
||||||
|
-- 6. convert the clean data to a (possibly empty) journal
|
||||||
|
-- needs: clean data, rules, rules file, data file if any
|
||||||
|
-- gives: journal
|
||||||
|
|
||||||
|
j <- do
|
||||||
|
cleandatah <- liftIO $ inputToHandle cleandata
|
||||||
|
readJournalFromCsv (Just $ Left rules) (fromMaybe "(cmd)" mdatafile) cleandatah Nothing
|
||||||
|
-- apply any command line account aliases. Can fail with a bad replacement pattern.
|
||||||
|
>>= liftEither . journalApplyAliases (aliasesFromOpts iopts)
|
||||||
|
-- journalFinalise assumes the journal's items are
|
||||||
|
-- reversed, as produced by JournalReader's parser.
|
||||||
|
-- But here they are already properly ordered. So we'd
|
||||||
|
-- better preemptively reverse them once more. XXX inefficient
|
||||||
|
. journalReverse
|
||||||
|
>>= journalFinalise iopts{balancingopts_=(balancingopts_ iopts){ignore_assertions_=True}} rulesfile ""
|
||||||
|
|
||||||
|
-- 7. if non-empty, successfully read and converted, and we're doing a non-dry-run archiving import: archive the data
|
||||||
|
-- needs: import/archive/dryrun flags, rules directory, rules file, data file if any, clean data
|
||||||
|
|
||||||
|
when (not (T.null cleandata) && import_ && archive && not dryrun) $
|
||||||
|
liftIO $ saveToArchive (rulesdir </> "data") rulesfile mdatafile cleandata
|
||||||
|
|
||||||
|
return j
|
||||||
|
|
||||||
|
-- | For the given rules file, run the given shell command, in the rules file's directory.
|
||||||
|
-- If the command fails, raise an error and show its error output;
|
||||||
|
-- otherwise return its output, and show any error output as a warning.
|
||||||
|
runCommand :: FilePath -> String -> IO Text
|
||||||
|
runCommand rulesfile cmd = do
|
||||||
|
let process = (shell cmd) { cwd = Just $ takeDirectory rulesfile, std_out = CreatePipe, std_err = CreatePipe }
|
||||||
|
withCreateProcess process $ \_ mhout mherr phandle -> do
|
||||||
|
case (mhout, mherr) of
|
||||||
|
(Just hout, Just herr) -> do
|
||||||
|
out <- T.hGetContents hout
|
||||||
|
err <- hGetContents' herr
|
||||||
|
exitCode <- waitForProcess phandle
|
||||||
|
case exitCode of
|
||||||
|
ExitSuccess -> do
|
||||||
|
unless (null err) $ warnIO err
|
||||||
|
return out
|
||||||
|
ExitFailure code ->
|
||||||
|
error' $ "in " ++ rulesfile ++ ": command \"" ++ cmd ++ "\" failed with exit code " ++ show code
|
||||||
|
++ (if null err then "" else ":\n" ++ err)
|
||||||
|
_ -> error' $ "in " ++ rulesfile ++ ": failed to create pipes for command execution"
|
||||||
|
|
||||||
|
-- | For the given rules file, run the given shell command, in the rules file's directory, passing the given text as input.
|
||||||
|
-- Return the output, or if the command fails, raise an informative error.
|
||||||
|
runCommandAsFilter :: FilePath -> String -> Text -> IO Text
|
||||||
|
runCommandAsFilter rulesfile cmd input = do
|
||||||
|
let process = (shell cmd) { cwd = Just $ takeDirectory rulesfile, std_in = CreatePipe, std_out = CreatePipe, std_err = CreatePipe }
|
||||||
withCreateProcess process $ \mhin mhout mherr phandle -> do
|
withCreateProcess process $ \mhin mhout mherr phandle -> do
|
||||||
case (mhin, mhout, mherr) of
|
case (mhin, mhout, mherr) of
|
||||||
(Just hin, Just hout, Just herr) -> do
|
(Just hin, Just hout, Just herr) -> do
|
||||||
@ -254,46 +310,50 @@ runFilterCommand rulesfile cmd input = do
|
|||||||
++ (if null err then "" else ":\n" ++ err)
|
++ (if null err then "" else ":\n" ++ err)
|
||||||
_ -> error' $ "in " ++ rulesfile ++ ": failed to create pipes for command execution"
|
_ -> error' $ "in " ++ rulesfile ++ ": failed to create pipes for command execution"
|
||||||
|
|
||||||
-- | Save some successfully imported data to the given archive directory,
|
type DirPath = FilePath
|
||||||
-- autocreating that if needed, and showing informational output on stderr.
|
|
||||||
-- The remaining arguments are: the rules file path (for naming), the original data file,
|
|
||||||
-- and if there was a data-cleaning command, the cleaned data from that file.
|
|
||||||
-- The archive file name will be RULESFILEBASENAME.DATAFILEMODDATE.DATAFILEEXT.
|
|
||||||
-- When there is cleaned data, currently only that is saved (not the original data).
|
|
||||||
saveToArchive :: FilePath -> FilePath -> FilePath -> Maybe Text -> IO ()
|
|
||||||
saveToArchive archivedir rulesfile datafile mcleandata = do
|
|
||||||
createDirectoryIfMissing True archivedir
|
|
||||||
hPutStrLn stderr $ "archiving " <> datafile
|
|
||||||
(_origname, cleanname) <- archiveFileName rulesfile datafile
|
|
||||||
let
|
|
||||||
cleanarchive = archivedir </> cleanname
|
|
||||||
-- origarchive = archivedir </> origname
|
|
||||||
case mcleandata of
|
|
||||||
Just cleandata -> do
|
|
||||||
-- disabled for simplicity:
|
|
||||||
-- the original data is also saved, as RULESFILEBASENAME.orig.DATAFILEMODDATE.DATAFILEEXT.
|
|
||||||
-- hPutStrLn stderr $ " as " <> origarchive
|
|
||||||
-- renameFile datafile origarchive
|
|
||||||
-- hPutStrLn stderr $ " and " <> cleanarchive
|
|
||||||
hPutStrLn stderr $ " as " <> cleanarchive
|
|
||||||
T.writeFile cleanarchive cleandata
|
|
||||||
removeFile datafile
|
|
||||||
Nothing -> do
|
|
||||||
hPutStrLn stderr $ " as " <> cleanarchive
|
|
||||||
renameFile datafile cleanarchive
|
|
||||||
|
|
||||||
-- | Figure out the file names to use when archiving, for the given rules file, the given data file.
|
-- | Save some successfully imported data
|
||||||
|
-- (more precisely: data that was successfully read and maybe cleaned, or that was generated, during an import)
|
||||||
|
-- to the given archive directory, autocreating that if needed, and show informational output on stderr.
|
||||||
|
-- The arguments are:
|
||||||
|
-- the archive directory,
|
||||||
|
-- the rules file (for naming),
|
||||||
|
-- the data file name, if any,
|
||||||
|
-- the data that was read, cleaned, or generated.
|
||||||
|
-- The archive file name will be RULESFILEBASENAME.DATAFILEMODDATEORCURRENTDATE.DATAFILEEXTORCSV.
|
||||||
|
-- Note for a data generating command, where there's no data file, we use the current date
|
||||||
|
-- and a .csv file extension (meaning "character-separated values" in this case).
|
||||||
|
saveToArchive :: DirPath -> FilePath -> Maybe FilePath -> Text -> IO ()
|
||||||
|
saveToArchive archivedir rulesfile mdatafile cleandata = do
|
||||||
|
createDirectoryIfMissing True archivedir
|
||||||
|
(_, cleanname) <- archiveFileName rulesfile mdatafile
|
||||||
|
let cleanarchive = archivedir </> cleanname
|
||||||
|
hPutStrLn stderr $ "archiving " <> cleanarchive
|
||||||
|
T.writeFile cleanarchive cleandata
|
||||||
|
maybe (return ()) removeFile mdatafile
|
||||||
|
|
||||||
|
-- | Figure out the file names to use when archiving, for the given rules file and the given data file if any.
|
||||||
-- The second name is for the final (possibly cleaned) data; the first name has ".orig" added,
|
-- The second name is for the final (possibly cleaned) data; the first name has ".orig" added,
|
||||||
-- and is used if both original and cleaned data are being archived. They will be like this:
|
-- and is used if both original and cleaned data are being archived. They will be like this:
|
||||||
-- ("RULESFILEBASENAME.orig.DATAFILEMODDATE.DATAFILEEXT", "RULESFILEBASENAME.DATAFILEMODDATE.DATAFILEEXT")
|
-- ("RULESFILEBASENAME.orig.DATAFILEMODDATE.DATAFILEEXT", "RULESFILEBASENAME.DATAFILEMODDATE.DATAFILEEXT")
|
||||||
archiveFileName :: FilePath -> FilePath -> IO (String, String)
|
archiveFileName :: FilePath -> Maybe FilePath -> IO (String, String)
|
||||||
archiveFileName rulesfile datafile = do
|
archiveFileName rulesfile mdatafile = do
|
||||||
moddate <- (show . utctDay) <$> getModificationTime datafile
|
let base = takeBaseName rulesfile
|
||||||
let (base, ext) = (takeBaseName rulesfile, takeExtension datafile)
|
case mdatafile of
|
||||||
return (
|
Just datafile -> do
|
||||||
base <.> "orig" <.> moddate <.> ext
|
moddate <- (show . utctDay) <$> getModificationTime datafile
|
||||||
,base <.> moddate <.> ext
|
let ext = takeExtension datafile
|
||||||
)
|
return (
|
||||||
|
base <.> "orig" <.> moddate <.> ext
|
||||||
|
,base <.> moddate <.> ext
|
||||||
|
)
|
||||||
|
Nothing -> do
|
||||||
|
let ext = "csv"
|
||||||
|
curdate <- show <$> getCurrentDay
|
||||||
|
return (
|
||||||
|
base <.> "orig" <.> curdate <.> ext
|
||||||
|
,base <.> curdate <.> ext
|
||||||
|
)
|
||||||
|
|
||||||
-- -- | In the given archive directory, if it exists, find the paths of data files saved for the given rules file.
|
-- -- | In the given archive directory, if it exists, find the paths of data files saved for the given rules file.
|
||||||
-- -- They will be reverse sorted by name, ie newest first, assuming normal archive file names.
|
-- -- They will be reverse sorted by name, ie newest first, assuming normal archive file names.
|
||||||
|
|||||||
@ -3294,23 +3294,54 @@ All this enables a convenient workflow where can you just download CSV files, th
|
|||||||
|
|
||||||
See also ["Working with CSV > Reading files specified by rule"](#reading-files-specified-by-rule).
|
See also ["Working with CSV > Reading files specified by rule"](#reading-files-specified-by-rule).
|
||||||
|
|
||||||
### Data cleaning
|
<!--
|
||||||
|
The source rule supports ~ for home directory: `source ~/Downloads/foo.csv`.
|
||||||
|
|
||||||
|
If the argument is a bare filename, its directory is assumed to be ~/Downloads: `source foo.csv`.
|
||||||
|
|
||||||
|
Otherwise if it is a relative path, it is assumed to be relative to the rules file's directory: `source new/foo.csv`.
|
||||||
|
|
||||||
|
The source rule can specify a glob pattern: `source foo*.csv`.
|
||||||
|
|
||||||
|
If the glob pattern matches multiple files, the newest (last modified) file is used (with one exception, described below).
|
||||||
|
|
||||||
|
The source rule can specify a data-cleaning command, after a `|` separator: `source foo*.csv | sed -e 's/USD/$/g'`.
|
||||||
|
This command is executed by the user's default shell, receives the data file's content on stdin,
|
||||||
|
and should output CSV data suitable for the conversion rules.
|
||||||
|
A # character can be used to comment out the data-cleaning command: `source foo*.csv # | ...`.
|
||||||
|
|
||||||
|
Or the source rule can specify a data-generating command, with no file pattern: `source | foo-csv.sh`.
|
||||||
|
In this case the command receives no input; it should output CSV data suitable for the conversion rules.
|
||||||
|
-->
|
||||||
|
|
||||||
|
### Data cleaning / generating commands
|
||||||
|
|
||||||
After `source`'s file pattern, you can write `|` (pipe) and a data cleaning command.
|
After `source`'s file pattern, you can write `|` (pipe) and a data cleaning command.
|
||||||
If hledger's CSV rules aren't enough, you can pre-process the downloaded data here with a shell command or script, to make it more suitable for conversion.
|
If hledger's CSV rules aren't enough, you can pre-process the downloaded data here with a shell command or script, to make it more suitable for conversion.
|
||||||
The command will be executed by your default shell, will receive the data file's content as standard input,
|
The command will be executed by your default shell, in the directory of the rules file, will receive the data file's content as standard input,
|
||||||
and should output zero or more lines of character-separated-values, ready for conversion by the CSV rules.
|
and should output zero or more lines of character-separated-values, suitable for conversion by the CSV rules.
|
||||||
|
|
||||||
|
Or, after `source` you can write `|` and a data generating command (with no file pattern before the `|`).
|
||||||
|
This command receives no input, and should output zero or more lines of character-separated values, suitable for conversion by the CSV rules.
|
||||||
|
|
||||||
|
Whenever hledger runs one of these commands, it will print the command on stderr.
|
||||||
|
If the command produces error output, but exits successfully, hledger will show the error output as a warning.
|
||||||
|
If the command fails, hledger will fail and show the error output in the error message.
|
||||||
|
|
||||||
*Added in 1.50; experimental.*
|
*Added in 1.50; experimental.*
|
||||||
|
|
||||||
## `archive`
|
## `archive`
|
||||||
|
|
||||||
Adding `archive` to a rules file causes the `import` command
|
With `archive` added to a rules file, the `import` command
|
||||||
to archive (move and rename) each imported data file, in a nearby `data/` directory.
|
will archive each successfully processed data file or data command output in a nearby `data/` directory.
|
||||||
Also, `import` will prefer the oldest of the `source` rule's glob-matched files rather than the newest.
|
The archive file name will be based on the rules file and the data file's modification date and extension
|
||||||
|
(or for a data-generating command, the current date and the ".csv" extension).
|
||||||
|
The original data file, if any, will be removed.
|
||||||
|
|
||||||
|
Also, in this mode `import` will prefer the oldest file matched by the `source` rule's glob pattern, not the newest.
|
||||||
(So if there are multiple downloads, they will be imported and archived oldest first.)
|
(So if there are multiple downloads, they will be imported and archived oldest first.)
|
||||||
|
|
||||||
Archiving imported data is optional, but it can be useful for
|
Archiving is optional, but it can be useful for
|
||||||
troubleshooting your CSV rules,
|
troubleshooting your CSV rules,
|
||||||
regenerating entries with improved rules,
|
regenerating entries with improved rules,
|
||||||
checking for variations in your bank's CSV,
|
checking for variations in your bank's CSV,
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user