Previously LEDGER_FILE=foo hledger add did, but hledger -f foo add didn't.
Now they both consistently will error if given a glob
(a path contining [, {, *, or ?) that matches nothing,
rather than auto-creating a file with a glob-like name.
Hledger.Utils.IO:
expandPathOrGlob
This restores the pre-1.50.3 behaviour of add and import, which once
again auto-create a missing file (specified by -f or LEDGER_FILE or
the builtin default path) rather than giving an error.
This fixes#2514 and refines the fix for [#2485].
There's also an improvement: they no longer create it unconditionally at the start;
they create lazily, when they have data to write.
Hledger.Read:
defaultExistingJournalPath
defaultExistingJournalPathSafely
readPossibleJournalFile
Hledger.Cli.Utils:
withPossibleJournal
Replace O(n log n) re-sorting of all prices on every valuation date
with O(log n) indexed lookups. By pre-building sorted price indexes
once at startup using O(n log n) time, we avoid redundant work
during reports.
This significantly improves performance for --value=end,COMM with daily
reports over long periods and large price databases.
Implementation:
- PriceIndex maps commodity pairs to a Map from date to effective
price, enabling O(log n) temporal lookups via M.lookupLE.
- DefaultValuationIndex provides efficient resolution of destination
commodities using the same temporal logic.
- makePriceGraph is updated to consume these indexes.
Signed-off-by: Oleg Bulatov <oleg@bulatov.me>
When including a literal path, don't use the Glob library at all.
Glob seems to read attributes of all files in a directory,
which disturbs build tools like tup which detect dependencies
based on filesystem operations.
Since 1.50.3, canonicalizePath was being called wastefully when
processing journals with many nested include files and/or many matches
for include glob paths. On a slow filesystem, with unusually
many includes, this might have been quite noticeable.
Now we canonicalise each file path just once as it is encountered,
avoiding the wasted IO work.
If transactions on the same date are coming from two files specified
with -f options, we expect them to be displayed in parse order, ie
respecting the order of the -f options. This wasn't always the case,
now it is.
Also, transactions' tindex field is now unique across all files,
where previously it started at 1 in each file. This affects hledger
data generally, not just the aregister command.
showing the problem include directive (previously the line number was
off by one). Likewise for other IO errors like when resolving ~ and
a home directory can't be found.
1.50* attempted to work around Glob's implicit searching of non-top-level dot dirs.
This was overzealous; it meant that journal's include completely
excluded paths involving a glob and a dot dir or dot file anywhere in the path.
Now, the pre-1.50 behaviour is restored:
`*` and `**` won't implicitly match dot files or top-level dot directories.
They will implicitly search non-top-level dot directories, as before (#Glob#49).
Before 1.50, journal's include directive's handling of glob patterns (*, **, ?, etc.)
had these limitations:
- ** always searched intermediate dot directories
- ** matched only directories, not files
In 1.50-1.50.3, it had different limitations, some unintended:
- it ignored all dot files, dot dirs, and symbolic links to dot dirs,
even when explicitly mentioned in the pattern (unless using --old-glob)
- it showed symbolic links dereferenced, eg in `hledger files` output
Now it has fewer limitations, mainly this:
- it ignores all dot files and dot dirs, even when explicitly mentioned (unless using --old-glob)
Ie it no longer ignores symbolic links to dot dirs, and it no longer shows symbolic links dereferenced.
Also: including the current file is now always harmless, whether using a glob pattern or not.
Internally, file paths in the "include file stack" (jincludefilestack) are now just absolute,
but not canonicalised; showing symbolic links un-dereferenced in output and error messages seems
generally more useful. This might affect output elsewhere also.
(Those paths are still canonicalised on the fly when checking for include cycles,
not so efficiently: each time an include directive is parsed, all the current parent files
and all the new glob-matched include files will be re-canonicalised.
Hopefully this is unnoticeable.)
Avoiding potentially confusing silent fallback. Also,
- Drop support for Ledger's legacy LEDGER environment variable;
we now support only LEDGER_FILE, for simplicity.
- Clarify the behaviour, eg when a glob pattern matches multiple files
or when the value is empty.
Since 1.50, sourceFilePath, which does IO operations, was being called for every item in the journal.
On my machine this was causing a ~40% slowdown,
but probably it could be more depending on storage system.
Now it's once again called only once per include directive.
Speed seems slightly better now than 1.43 for some reason
(eg: 13k txns/s -> 8k txns/s -> 14k txns/s).
PeriodData's use of Int keys caused wrong results with periodic
reports involving dates outside the machine-specific limits of Int.
Those were:
64 bits: -25252734927764696-04-22..25252734927768413-06-12
32 bits: -5877752-05-08..5881469-05-27
16 bits: 1769-02-28..1948-08-04
8 bits: 1858-07-12..1859-03-24
32 bits is supported by MicroHS; 16 and 8 bits aren't supported by
any known haskell version, but that could change in future.
For example, on 64 bit machines we got:
25252734927768413-06-12 PeriodData's max date
(expenses) 1
25252734927768414-01-01 next year past PeriodData's max date
(expenses) 2
$ hledger reg -O csv --yearly
"txnidx","date","code","description","account","amount","total"
"0","-25252734927764696-11-10","","","expenses","1","1"
Now it uses Integer (like the time package), fixing the bug.
And benchmarking shows memory and time usage slightly improved
(surprisingly; tested with up to 500 subperiods, eg
hledger -f examples/10ktxns-1kaccts.journal reg -1 cur:A -D >/dev/null)
This allows us to guarantee that the report periods are well-formed and
don't contain errors (e.g. empty spans, spans not contiguous, spans not
a partition).
Note the underlying representation is now for disjoint spans, whereas
previously the end date of a span was equal to the start date of the
next span, and then was adjusted backwards one day when needed.