Commit Graph

396 Commits

Author SHA1 Message Date
Simon Michael
ebaabe4305 imp:journal: fix a slight pessimisation of include directives
Since 1.50.3, canonicalizePath was being called wastefully when
processing journals with many nested include files and/or many matches
for include glob paths. On a slow filesystem, with unusually
many includes, this might have been quite noticeable.

Now we canonicalise each file path just once as it is encountered,
avoiding the wasted IO work.
2025-12-10 19:21:59 -10:00
Simon Michael
665e2d0a55 fix:journal:include: relative includes from a symlinked file work again [#2503] 2025-12-08 09:36:13 -10:00
Simon Michael
12234e0b7e ;fix:journal: including an unreadable file now shows a clearer error
showing the problem include directive (previously the line number was
off by one). Likewise for other IO errors like when resolving ~ and
a home directory can't be found.
2025-12-05 02:34:08 -10:00
Simon Michael
8cd113389a fix:journal:include: drop 1.50's exclusion of glob-matched dot paths [#2498]
1.50* attempted to work around Glob's implicit searching of non-top-level dot dirs.
This was overzealous; it meant that journal's include completely
excluded paths involving a glob and a dot dir or dot file anywhere in the path.

Now, the pre-1.50 behaviour is restored:
`*` and `**` won't implicitly match dot files or top-level dot directories.
They will implicitly search non-top-level dot directories, as before (#Glob#49).
2025-12-04 05:54:54 -10:00
Simon Michael
81744d81a1 fix:journal:include: fix some regressions with glob matching in 1.50-1.50.3
Before 1.50, journal's include directive's handling of glob patterns (*, **, ?, etc.)
had these limitations:

- ** always searched intermediate dot directories
- ** matched only directories, not files

In 1.50-1.50.3, it had different limitations, some unintended:

- it ignored all dot files, dot dirs, and symbolic links to dot dirs,
  even when explicitly mentioned in the pattern (unless using --old-glob)
- it showed symbolic links dereferenced, eg in `hledger files` output

Now it has fewer limitations, mainly this:

- it ignores all dot files and dot dirs, even when explicitly mentioned (unless using --old-glob)

Ie it no longer ignores symbolic links to dot dirs, and it no longer shows symbolic links dereferenced.
Also: including the current file is now always harmless, whether using a glob pattern or not.

Internally, file paths in the "include file stack" (jincludefilestack) are now just absolute,
but not canonicalised; showing symbolic links un-dereferenced in output and error messages seems
generally more useful. This might affect output elsewhere also.
(Those paths are still canonicalised on the fly when checking for include cycles,
not so efficiently: each time an include directive is parsed, all the current parent files
and all the new glob-matched include files will be re-canonicalised.
Hopefully this is unnoticeable.)
2025-12-01 11:28:51 -08:00
Simon Michael
00f6a832d4 fix:journal: consistent error message when include has no argument 2025-12-01 11:00:42 -08:00
Simon Michael
356e2ba88a fix:journal: repair 1.50's journal reading slowness [#2493]
Since 1.50, sourceFilePath, which does IO operations, was being called for every item in the journal.
On my machine this was causing a ~40% slowdown,
but probably it could be more depending on storage system.

Now it's once again called only once per include directive.
Speed seems slightly better now than 1.43 for some reason
(eg: 13k txns/s -> 8k txns/s -> 14k txns/s).
2025-11-15 21:22:36 -10:00
Simon Michael
2f007c93d2 dev: switch all qualifed imports to ImportQualifiedPost style 2025-09-29 19:28:59 -10:00
Simon Michael
efe1d11edb fix:timeclock: --old-timeclock also affects included files [#2417]
This required changing the Reader type and passing InputOpts down to
journal's include directive parser.
2025-09-01 08:26:44 +01:00
Simon Michael
5a3e34cc55 imp:timeclock: syntax is more robust and featureful
The default timeclock parser (ie when not using --old-timeclock) has
the following changes, related to issues such as
[#2141], [#2365], [#2400], [#2417]:

- semicolon now always starts a comment; timeclock account names can't include semicolons
  (though journal account names still can)
- clock-in and clock-out entries now have different syntax
- clock-ins now require an account name
- clock-outs now can have a comment and tags
- the doc has been rewritten, and now mentions the --old-timeclock flag

- lib: accountnamep and modifiedaccountnamep now take a flag to allow semicolons or not
2025-08-31 10:58:37 +01:00
Simon Michael
bfbef4bcbb dev: refactor PrefixedFilePath 2025-08-14 12:37:11 +01:00
Simon Michael
e69c72a6c7 dev: include: revert wrong error position fix; refactor
Errors in the main file are being reported a few lines too high,
due to the setOffset in includedirectivep.

It seems reverting this should have restored the original bug with
wrong line number in certain include error messages, but I can't find
that right now.
2025-07-28 11:57:46 +01:00
Simon Michael
b7e35f84a2 imp: include: add hidden --old-glob flag to restore old dot behaviour
This disables the workaround for Glob#49, allowing glob patterns to
find dot files and traverse dot directories again (sometimes too much).
2025-07-17 08:00:08 -07:00
Simon Michael
5ec770badd imp: include: more flexible **; show the correct line in read errors 2025-07-16 06:52:19 -07:00
Simon Michael
8215f19baa dev: include: cleanup 2025-07-16 06:52:19 -07:00
Simon Michael
3741f9f030 fix: include: report read failures with correct line number 2025-07-16 06:52:19 -07:00
Simon Michael
2dcfe22c89 imp: include: report ** without / as an error, for clarity 2025-07-16 06:52:19 -07:00
Simon Michael
460ae28826 imp: include: globs exclude current file; more cleanup 2025-07-16 06:52:19 -07:00
Simon Michael
b4a1add267 imp: include: more robust tests and glob pattern handling
This switches from filepattern back to Glob, which is more powerful.
New notes, implementation, workarounds and tests.
2025-07-16 06:52:19 -07:00
Simon Michael
1046f652b1 dev: PrefixedFilePath cleanups
And some helpers that weren't needed after all, but maybe in future
2025-07-16 06:52:19 -07:00
Simon Michael
28f60bcf92 dev: includedirectivep: refactor 2025-07-11 20:34:50 -07:00
Simon Michael
3a03927018 imp: include: show including file path in debug output 2025-07-11 20:12:13 -07:00
Simon Michael
0add2e90db imp: include: glob patterns always exclude the current file
Eg include **/*.journal is less likely no complain
2025-07-11 19:36:17 -07:00
Simon Michael
08017366b5 imp: file reading: demote some debug=6 output to level 7 2025-07-11 13:48:58 -07:00
Simon Michael
536589e2c2 imp: include: improve cycle and read failure error messages 2025-07-11 13:36:47 -07:00
Simon Michael
b71e001c51 imp: include: more robust ** patterns, and ignore dotted directories
** now ignores anything under dotted directories, ie directories whose
name begins with a dot. Eg .git/, foo/.secret/, etc.

Switched from Glob to filepattern lib.
2025-07-11 13:36:47 -07:00
Simon Michael
b1f416dee7 dev: parseIncludedFile: doc cleanup 2025-07-11 13:01:54 -07:00
Simon Michael
b7c4dc3b53 fix:journal: cyclic include error messages now show the correct line 2025-07-11 13:00:51 -07:00
Simon Michael
801a7adaa4 imp:include: better errors, eg for missing argument; more debug output 2025-07-11 12:13:52 -07:00
Simon Michael
c8a5b8eb37 dev: includedirectivep: cleanup 2025-07-11 12:13:12 -07:00
Simon Michael
f5d3b7bd38 fix:journal: include directive error messages now show the correct line
They were showing the line after the include directive, confusingly.
2025-07-11 11:55:29 -07:00
Simon Michael
2815a1865f dev: includedirectivep: cleanups, docs 2025-07-11 11:38:50 -07:00
Simon Michael
820a44eb07 imp:lib:Hledger.Utils.Debug: simpler, more consistent dbg* names 2025-05-21 22:54:00 -10:00
Simon Michael
2371f677e5 imp:journal: include directive now allows a same-line comment 2025-04-27 08:30:18 -10:00
Joschua Kesper
5114962b2a feat:csv: add an encoding rule, allowing non-UTF8 CSV to be read [#2319]
Previously, hledger could read CSV files containing non-ascii
characters only if they are UTF8-encoded.  Now there is a new CSV
rule, encoding ENCODING, which allows reading CSV files with other
encodings.

This adds a dependency on the encoding library, which supports fewer
encodings than text-icu but does not require a third-party C library.
To avoid build issues on various platforms, we require version 0.10+.

This adds some use of the ImplicitParams language extension, required
by encoding's API, but only in a small code region.

This also changes the type of Reader's rReadFn; it now takes
a `Handle` rather than a `Text`, allowing more flexibility.
2025-02-15 14:48:30 -10:00
Simon Michael
39f3b2c7ba ;dev: doc 2025-01-31 02:04:40 -10:00
Simon Michael
445e80fd41 dev:clarify: rename jcommodities to jdeclaredcommodities 2024-11-02 15:52:17 -10:00
Simon Michael
5e0a35b1da fix:journal:P directives: require a space after the symbol [#2280]
This prevents surprising parses, like
`P 2024-10-31 a0 1` parsed as `P 2024-10-31 a 01`.
2024-11-02 15:24:25 -10:00
Simon Michael
c66e901d8b dev: save the parse positions of PriceDirectives 2024-11-02 15:00:47 -10:00
Simon Michael
4d38c63ec8 dev: move/rename nullsourcepos 2024-11-02 14:59:43 -10:00
Simon Michael
f5c2ec681c dev: refactor: merge Text.Megaparsec.Custom into Hledger.Utils.Parse 2024-06-25 18:37:54 +01:00
Simon Michael
490a46fcd2 fix: journal: parse include directives with trailing whitespace
[https://github.com/adept/full-fledged-hledger/issues/29]
2024-05-02 07:26:12 -10:00
Simon Michael
5a36362b33 imp:journal: use a symlink's target's directory for relative include paths
When reading a symbolically-linked journal file,
relative paths in include directives are now evaluated
relative to the directory of the real linked file,
not the directory of the symlink.

This also seems to fix an obscure case where stats did not report
absolute included file paths in certain circumstances (stdin, maybe no
terminal..)
2024-02-22 08:48:31 -10:00
Michael Rees
d4ecdb3fea imp: Support tsv and ssv prefixes (#2164) 2024-02-08 06:44:44 -10:00
Simon Michael
6ae64c8f3e imp: allow declaring the empty payee name with "" (#2119) 2023-12-07 08:30:55 -10:00
Simon Michael
85845e51b2 dev: AmountStyle: rename, reorder fields more mnemonically
Since this type is about to change anyway.
2023-09-02 06:46:14 +01:00
Simon Michael
029b59093b feat: csv: rules files can be read directly; data file can be specified
CSV rules files can now be read directly, eg you have the option of
writing `hledger -f foo.csv.rules CMD`. By default this will read data
from foo.csv in the same directory.  But you can also specify a
different data file with a new `source FILE` rule. This has some
convenience features:

- If the data file does not exist, it is treated as empty, not an
  error.

- If FILE is a relative path, it is relative to the rules file's
  directory. If it is just a file name with no path, it is relative
  to ~/Downloads/.

- If FILE is a glob pattern, the most recently modified matched file
  is used.

This helps remove some of the busywork of managing CSV downloads.
Most of your financial institutions's default CSV filenames are
different and can be recognised by a glob pattern.  So you can put a
rule like `source Checking1*.csv` in foo-checking.csv.rules,
periodically download CSV from Foo's website accepting your browser's
defaults, and then run `hledger import checking.csv.rules` to import
any new transactions. The next time, if you have done no cleanup, your
browser will probably save it as something like Checking1-2.csv, and
hledger will still see that because of the * wild card. You can choose
whether to delete CSVs after import, or keep them for a while as
temporary backups, or archive them somewhere.
2023-05-19 09:09:21 -10:00
Simon Michael
fa70f160ae imp: partial/inferred dates are flexible, full dates are not (#1982)
DateSpans are now now aware of exact/flexible dates.
2023-02-17 07:24:19 -10:00
Simon Michael
5537a251f3 imp: journal: periodic txns need not start on an interval boundary
Eg, ~ monthly from 1/15 now works, instead of giving an error message.
2023-02-17 07:24:19 -10:00
Simon Michael
7a9b0fd94c feat: check: the tags check checks tag names 2023-02-16 11:56:22 -10:00