Previously, hledger could read CSV files containing non-ascii
characters only if they are UTF8-encoded. Now there is a new CSV
rule, encoding ENCODING, which allows reading CSV files with other
encodings.
This adds a dependency on the encoding library, which supports fewer
encodings than text-icu but does not require a third-party C library.
To avoid build issues on various platforms, we require version 0.10+.
This adds some use of the ImplicitParams language extension, required
by encoding's API, but only in a small code region.
This also changes the type of Reader's rReadFn; it now takes
a `Handle` rather than a `Text`, allowing more flexibility.
CSV rules files can now be read directly, eg you have the option of
writing `hledger -f foo.csv.rules CMD`. By default this will read data
from foo.csv in the same directory. But you can also specify a
different data file with a new `source FILE` rule. This has some
convenience features:
- If the data file does not exist, it is treated as empty, not an
error.
- If FILE is a relative path, it is relative to the rules file's
directory. If it is just a file name with no path, it is relative
to ~/Downloads/.
- If FILE is a glob pattern, the most recently modified matched file
is used.
This helps remove some of the busywork of managing CSV downloads.
Most of your financial institutions's default CSV filenames are
different and can be recognised by a glob pattern. So you can put a
rule like `source Checking1*.csv` in foo-checking.csv.rules,
periodically download CSV from Foo's website accepting your browser's
defaults, and then run `hledger import checking.csv.rules` to import
any new transactions. The next time, if you have done no cleanup, your
browser will probably save it as something like Checking1-2.csv, and
hledger will still see that because of the * wild card. You can choose
whether to delete CSVs after import, or keep them for a while as
temporary backups, or archive them somewhere.
Inner empty lines were not being skipped automatically, contrary to
docs. Now all empty lines are skipped automatically, and the `skip`
rule is needed only for non-empty lines, as intended.
This may be a breaking change: it's possible that the `skip` count
might need to be adjusted in some CSV rules files.
Previously, CSV date-times with a different time zone from yours
(with or without explicit timezones in the CSV) could give off-by-one
dates, because the CSV timezone was ignored.
Now,
1. you can use the `timezone` rule to indicate which other
timezone a CSV is implicitly using
2. CSV date-times with a timezone - whether declared by rule or
parsed with %Z - are localised to the system time zone
(or another set with the TZ environment variable).
May also fix#1154, #1033, #708, #536, #73: testing is needed.
This aims to solve all problems where misconfigured locales lead to
parsers failing on utf8-encoded data. This should hopefully avoid
encoding issues, but since it fundamentally alters how encoding is dealt
with it may lead to unexpected outcomes. Widespread testing on a number
of different platforms would be useful.
This increases composability and avoids some ugly case handling. We
re-export runExceptT in Hledger.Read.
The final return types of the following functions has been changed from
IO (Either String a) to ExceptT String IO a. If this causes a problem,
you can get the old behaviour by calling runExceptT on the output:
readJournal, readJournalFiles, readJournalFile
Or, you can use the easy functions readJournal', readJournalFiles', and
readJournalFile', which assume default options and return in the IO
monad.
(SourcePos, SourcePos).
This has been marked for possible removal for a while. We are keeping
strictly more information. Possible edge cases arise with Timeclock and
CsvReader, but I think these are covered.
The particular motivation for getting rid of this is that
GenericSourcePos is creating some awkward import considerations for
little gain. Removing this enables some flattening of the module
dependency tree.
transactions are balanced possibly using explicit prices, but without
inferring any prices. This is included in --strict mode.
Renames check autobalanced to check balancedwithautoconversion.