Commit Graph

274 Commits

Author SHA1 Message Date
Joschua Kesper
5114962b2a feat:csv: add an encoding rule, allowing non-UTF8 CSV to be read [#2319]
Previously, hledger could read CSV files containing non-ascii
characters only if they are UTF8-encoded.  Now there is a new CSV
rule, encoding ENCODING, which allows reading CSV files with other
encodings.

This adds a dependency on the encoding library, which supports fewer
encodings than text-icu but does not require a third-party C library.
To avoid build issues on various platforms, we require version 0.10+.

This adds some use of the ImplicitParams language extension, required
by encoding's API, but only in a small code region.

This also changes the type of Reader's rReadFn; it now takes
a `Handle` rather than a `Text`, allowing more flexibility.
2025-02-15 14:48:30 -10:00
Michael Rees
d4ecdb3fea imp: Support tsv and ssv prefixes (#2164) 2024-02-08 06:44:44 -10:00
Simon Michael
029b59093b feat: csv: rules files can be read directly; data file can be specified
CSV rules files can now be read directly, eg you have the option of
writing `hledger -f foo.csv.rules CMD`. By default this will read data
from foo.csv in the same directory.  But you can also specify a
different data file with a new `source FILE` rule. This has some
convenience features:

- If the data file does not exist, it is treated as empty, not an
  error.

- If FILE is a relative path, it is relative to the rules file's
  directory. If it is just a file name with no path, it is relative
  to ~/Downloads/.

- If FILE is a glob pattern, the most recently modified matched file
  is used.

This helps remove some of the busywork of managing CSV downloads.
Most of your financial institutions's default CSV filenames are
different and can be recognised by a glob pattern.  So you can put a
rule like `source Checking1*.csv` in foo-checking.csv.rules,
periodically download CSV from Foo's website accepting your browser's
defaults, and then run `hledger import checking.csv.rules` to import
any new transactions. The next time, if you have done no cleanup, your
browser will probably save it as something like Checking1-2.csv, and
hledger will still see that because of the * wild card. You can choose
whether to delete CSVs after import, or keep them for a while as
temporary backups, or archive them somewhere.
2023-05-19 09:09:21 -10:00
Simon Michael
577e4b6347 fix!: csv: skip now counts non-blank lines more robustly (fix #2024)
Inner empty lines were not being skipped automatically, contrary to
docs. Now all empty lines are skipped automatically, and the `skip`
rule is needed only for non-empty lines, as intended.

This may be a breaking change: it's possible that the `skip` count
might need to be adjusted in some CSV rules files.
2023-05-11 17:06:12 -10:00
Simon Michael
69be1d4ef7 ;dev: csv: refactor, clarify 2023-05-11 15:44:56 -10:00
Simon Michael
70d4c0c638 ;dev: csv: refactor, clarify 2023-05-11 15:35:05 -10:00
Simon Michael
755c3d3dbb ;dev: csv: refactor 2023-05-11 15:34:31 -10:00
Simon Michael
c790aa6145 ;dev: lib: also build with GHC 9.6.1; add base-compat 2023-03-14 10:42:48 -10:00
Simon Michael
dfebf3174c imp: csv: check assigned account names are valid (parseable) (#1978) 2023-01-11 21:42:47 -10:00
Simon Michael
a9b63bb694 fix: csv: skip header lines before attempting to parse records (#1967) 2022-12-27 12:21:20 -10:00
Simon Michael
ace185f7d2 ;doc: update old manual links 2022-12-10 18:56:47 -10:00
Simon Michael
b50d60cfea ;doc: csv, timeclock, timedot: clarify comment lines (#1953) 2022-12-06 10:38:50 -10:00
Simon Michael
7fd25809e8 dev: fix customErrorBundlePretty import warnings 2022-10-07 07:43:28 -10:00
Simon Michael
01387548e7 feat: csv: intra-day-reversed compensates when days' txns are reversed
As in eg vanguard CSV.
2022-10-06 22:21:55 -10:00
Simon Michael
15b2e7d586 fix: csv: ignore extra whitespace in account rule when detecting virtual postings
Reported by CruxOfTheB in chat.
2022-10-03 07:50:23 -10:00
Simon Michael
3b24d9465b imp: csv: new timezone rule; convert zoned date-times to local dates (#1936)
Previously, CSV date-times with a different time zone from yours
(with or without explicit timezones in the CSV) could give off-by-one
dates, because the CSV timezone was ignored.

Now,

1. you can use the `timezone` rule to indicate which other
   timezone a CSV is implicitly using

2. CSV date-times with a timezone - whether declared by rule or
   parsed with %Z - are localised to the system time zone
   (or another set with the TZ environment variable).
2022-10-01 14:50:35 -10:00
Simon Michael
c80c72d7cd dev: lib, cli, bin: enable/fix name shadowing warnings
And a few other cleanups.
2022-08-23 12:16:15 +01:00
Simon Michael
147856e3bb imp: errors: timeclock, csv error improvements 2022-07-23 02:35:52 +01:00
Stephen Morgan
9155d679fe fix!: Revert "fix!: utf-8: Use with-utf8 to ensure all files are read and written with utf8 encoding. (#1619)"
This reverts commit e233f001c5.

This would break at least some people's workflow. A lighter touch is
probably sufficient.
2022-06-01 09:35:18 +10:00
Simon Michael
65e913b7c5
Merge pull request #1834 from Xitian9/utf8
Use with-utf8 and don't use Data.Text.IO.
2022-05-21 17:50:59 -10:00
Stephen Morgan
e233f001c5 fix!: utf-8: Use with-utf8 to ensure all files are read and written with utf8 encoding. (#1619)
May also fix #1154, #1033, #708, #536, #73: testing is needed.

This aims to solve all problems where misconfigured locales lead to
parsers failing on utf8-encoded data. This should hopefully avoid
encoding issues, but since it fundamentally alters how encoding is dealt
with it may lead to unexpected outcomes. Widespread testing on a number
of different platforms would be useful.
2022-05-22 13:12:19 +10:00
Stephen Morgan
15a5d5d38b
Merge pull request #1814 from Xitian9/csverror
imp: csv: Give an error if unable to substitute csv templates. (#1803)
2022-05-22 11:35:39 +10:00
Simon Michael
2f28e1b0a7 ref: rename CustomErr -> HledgerParseErrorData
Verbose, but use every chance to clarify the complicated parse error
situation.
2022-04-25 02:56:59 -10:00
Simon Michael
53332ee6a5 stack: re-enable hledger-web with ghc 9.2 2022-04-15 15:07:17 -10:00
Stephen Morgan
c48d98c515 imp: csv: Substitute empty string if csv template fails. (#1803) 2022-03-29 18:03:33 +11:00
Stephen Morgan
603b2e9f09 ref: Use ExceptT String IO a instead of IO (Either String a).
This increases composability and avoids some ugly case handling. We
re-export runExceptT in Hledger.Read.

The final return types of the following functions has been changed from
IO (Either String a) to ExceptT String IO a. If this causes a problem,
you can get the old behaviour by calling runExceptT on the output:
readJournal, readJournalFiles, readJournalFile

Or, you can use the easy functions readJournal', readJournalFiles', and
readJournalFile', which assume default options and return in the IO
monad.
2022-03-25 14:23:27 -10:00
Stephen Morgan
2f47ae05c6 fix: csv: Allow unicode in field references for csv. (#1809) 2022-02-06 14:16:17 -10:00
Simon Michael
0d83bdf6d7 cln: csv: small rename 2021-12-08 16:57:53 -10:00
Stephen Morgan
e35d0b7865 fix: csv: Successfully parse empty csv file. (#1183) 2021-11-18 20:50:02 -10:00
Stephen Morgan
87a7a586d4 fix: csv: Handle multiple zero amounts in postings in csv files. (#1733) 2021-11-18 20:48:55 -10:00
Stephen Morgan
1bc04685b7 pkg: Drop base-compat-batteries dependency.
Our supported stackage versions are now new enough that we don't need
any of the compatibility features anymore.
2021-10-31 07:56:07 -10:00
Stephen Morgan
4cfd3cb590 lib!: Remove GenericSourcePos, and replace it with either SourcePos or
(SourcePos, SourcePos).

This has been marked for possible removal for a while. We are keeping
strictly more information. Possible edge cases arise with Timeclock and
CsvReader, but I think these are covered.

The particular motivation for getting rid of this is that
GenericSourcePos is creating some awkward import considerations for
little gain. Removing this enables some flattening of the module
dependency tree.
2021-09-20 08:38:33 -10:00
Malte Brandy
e31eb58ada lib: Allow multiline comments in csv rules 2021-09-18 12:43:49 -10:00
Simon Michael
5485990cac fix: csv: report correct CSV line number in errors
Some errors in CSV conversion, such as a failing balance assertion,
were always being reported as line 2.
Reported by Lawrence Wu.
2021-09-01 06:58:15 -10:00
Stephen Morgan
8274da81fc cln: tests: Remove test and tests, which are just aliases for testCase
and testGroup.

Replacing these removes a layer of indirection, and reduces the need to
depend on Hledger.Utils.Test.
2021-08-30 16:32:19 -10:00
Stephen Morgan
d248aec313 cln: hlint: Remove eta reduce warnings. 2021-08-27 06:13:56 -10:00
Stephen Morgan
32dad455fd cln: hlint: Clean up section related warnings. 2021-08-27 06:13:56 -10:00
Stephen Morgan
8bf7c95697 cln: hlint: Clean up Functor related hlint warnings, and NOINLINE warning. 2021-08-27 06:13:56 -10:00
Stephen Morgan
1c211f8ab8 cln: hlint: Fix redundant return warning. 2021-08-26 21:00:35 -10:00
Arjen Langebaerd
3426030a91 feat: added commodity style commandline option 2021-08-17 22:05:29 -10:00
Simon Michael
b81f8f768d ;csv: amount-setting notes, doc improvements from reddit discussion
https://www.reddit.com/r/plaintextaccounting/comments/nxu1ss/hledger_parsing_csv_with_negative_amount_in_debit/
2021-06-11 16:30:43 -10:00
Stephen Morgan
0f1837816d lib,cli,ui,web: Add check balancednoautoconversion command, which checks that
transactions are balanced possibly using explicit prices, but without
inferring any prices. This is included in --strict mode.

Renames check autobalanced to check balancedwithautoconversion.
2021-06-07 18:58:58 -10:00
Stephen Morgan
68e975adf1 lib,cli,ui,web: Remove unused LANGUAGE pragmas. 2021-06-07 17:33:54 -10:00
Eric Mertens
48d558fc7a Tolerate spaces in amount fields in CSV files 2021-03-26 16:39:24 -07:00
Stephen Morgan
4cb9dfb5b8 lib: Properly escape quotes in csv output. 2021-03-25 09:41:42 -07:00
Stephen Morgan
d6a4310d8f lib,cli,ui,bin: Eliminate all uses of Mixed outside of Hledger.Data.Amount.
Exceptions are for dealing with the pamount field, which is really just
dealing with an unnormalised list of amounts.

This creates an API for dealing with MixedAmount, so we never have to
access the internals outside of Hledger.Data.Amount.

Also remove a comment, since it looks like #1207 has been resolved.
2021-03-18 09:47:59 +11:00
Stephen Morgan
dabb3ef82e lib,cli,ui,bin: Create a new API for MixedAmount arithmetic. This should
supplant the old interface, which relied on the Num typeclass.

MixedAmount did not have a very good Num instance. The only functions
which were defined were fromInteger, (+), and negate. Furthermore, it
was not law-abiding, as 0 + a /= a in general. Replacements for used
functions are:
0 -> nullmixedamt / mempty
(+) -> maPlus / (<>)
(-) -> maMinus
negate -> maNegate
sum -> maSum
sumStrict -> maSum

Also creates some new constructors for MixedAmount:
mixedAmount :: Amount -> MixedAmount
maAddAmount :: MixedAmount -> Amount -> MixedAmount
maAddAmounts :: MixedAmount -> [Amount] -> MixedAmount

Add Semigroup and Monoid instances for MixedAmount.
Ideally we would remove the Num instance entirely.

The only change needed have nullmixedamt/mempty substitute for
0 without problems was to not squash prices in
mixedAmount(Looks|Is)Zero. This is correct behaviour in any case.
2021-03-18 09:47:21 +11:00
Stephen Morgan
b203822cd1 lib: Make sure to add a newline to the end of aregister report. 2021-01-10 20:50:46 -08:00
Simon Michael
c21b666130 csv: handle more sign variations, eg a sign by itself
simplifySign now covers a few more sign combinations that might arise.
And in particular, it strips a standalone sign with no number,
which simplifies sign flipping with amount-in/amount-out.
2021-01-07 10:06:38 -08:00
Stephen Morgan
7d3cf1747a lib: Make consistent naming scheme for showMixedAmount* functions,
add conversion between old API and new API in the documentation.
2021-01-02 15:08:09 +11:00