Commit Graph

53 Commits

Author SHA1 Message Date
Stephen Morgan
8968733630 cln!: Clean up journal parsing.
parseAndFinaliseJournal' has been removed. In the unlikely event you
needed it in your code, you can replace it with:
parseAndFinaliseJournal' parser iopts fp t =>
initialiseAndParseJournal parser iopts fp t
>>= liftEither . journalApplyAliases (aliasesFromOpts iopts)
>>= journalFinalise iopts fp t

Some parsers have been generalised from JournalParser to TextParser.
2022-03-26 15:35:19 -10:00
Simon Michael
41bde20095 timedot: a D default commodity (and style) is applied to timedot aounts
This means they can be priced and converted.
2021-11-05 23:34:46 -10:00
Stephen Morgan
1bc04685b7 pkg: Drop base-compat-batteries dependency.
Our supported stackage versions are now new enough that we don't need
any of the compatibility features anymore.
2021-10-31 07:56:07 -10:00
Stephen Morgan
4cfd3cb590 lib!: Remove GenericSourcePos, and replace it with either SourcePos or
(SourcePos, SourcePos).

This has been marked for possible removal for a while. We are keeping
strictly more information. Possible edge cases arise with Timeclock and
CsvReader, but I think these are covered.

The particular motivation for getting rid of this is that
GenericSourcePos is creating some awkward import considerations for
little gain. Removing this enables some flattening of the module
dependency tree.
2021-09-20 08:38:33 -10:00
Simon Michael
5c18fb289f ;dev: configure hlint, silence all current warnings 2021-08-11 14:51:46 -10:00
Stephen Morgan
5e7b69356f lib: Change internal representation of MixedAmount to use a strict Map
instead of a list of Amounts. No longer export Mixed constructor, to
keep API clean (if you really need it, you can import it directly from
Hledger.Data.Types). We also ensure the JSON representation of
MixedAmount doesn't change: it is stored as a normalised list of
Amounts.

This commit improves performance. Here are some indicative results.

hledger reg -f examples/10000x1000x10.journal
- Maximum residency decreases from 65MB to 60MB (8% decrease)
- Total memory in use decreases from 178MiB to 157MiB (12% decrease)

hledger reg -f examples/10000x10000x10.journal
- Maximum residency decreases from 69MB to 60MB (13% decrease)
- Total memory in use decreases from 198MiB to 153MiB (23% decrease)

hledger bal -f examples/10000x1000x10.journal
- Total heap usage decreases from 6.4GB to 6.0GB (6% decrease)
- Total memory in use decreases from 178MiB to 153MiB (14% decrease)

hledger bal -f examples/10000x10000x10.journal
- Total heap usage decreases from 7.3GB to 6.9GB (5% decrease)
- Total memory in use decreases from 196MiB to 185MiB (5% decrease)

hledger bal -M -f examples/10000x1000x10.journal
- Total heap usage decreases from 16.8GB to 10.6GB (47% decrease)
- Total time decreases from 14.3s to 12.0s (16% decrease)

hledger bal -M -f examples/10000x10000x10.journal
- Total heap usage decreases from 108GB to 48GB (56% decrease)
- Total time decreases from 62s to 41s (33% decrease)

If you never directly use the constructor Mixed or pattern match against
it then you don't need to make any changes. If you do, then do the
following:

- If you really care about the individual Amounts and never normalise
  your MixedAmount (for example, just storing `Mixed amts` and then
  extracting `amts` as a pattern match, then use should switch to using
  [Amount]. This should just involve removing the `Mixed` constructor.
- If you ever call `mixed`, `normaliseMixedAmount`, or do any sort of
  amount arithmetic (+), (-), then you should replace the constructor
  `Mixed` with the function `mixed`. To extract the list of Amounts, use
  the function `amounts`.
- If you ever call `normaliseMixedAmountSquashPricesForDisplay`, you can
  replace that with `mixedAmountStripPrices`. (N.B. this does something
  slightly different from `normaliseMixedAmountSquashPricesForDisplay`,
  but I don't think there's any use case for squashing prices and then
  keeping the first of the squashed prices around. If you disagree let
  me know.)
- Any remaining calls to `normaliseMixedAmount` can be removed, as that
  is now the identity function.
2021-05-01 09:45:29 -10:00
Simon Michael
d865ec5d65 lib: refactor: more consistent amount precision helpers
Hledger.Data.Amount:
renamed:
setAmountPrecision -> amountSetPrecision
setFullPrecision -> amountSetFullPrecision
setMixedAmountPrecision -> mixedAmountSetPrecision
added:
mixedAmountSetFullPrecision
2021-02-05 16:09:49 -08:00
Stephen Morgan
f6fa76bba7 lib,cli: Get rid of magic values for asprecision, use a sum type instead. 2020-08-30 23:00:35 +10:00
Stephen Morgan
ca2e55c954 lib: Replace some fromIntegral with toInteger. 2020-08-30 22:20:58 +10:00
Stephen Morgan
081ee390ab lib: Change skipMany spacenonewline to takeWhileP Nothing isNonNewlineSpace. 2020-07-22 14:58:53 -07:00
Simon Michael
9868d7f20d ;lib: update emacs code-folding config
orgstruct-mode was dropped from org 9.2, and I shouldn't have been
forcing it on anyway.

The new config allows its "replacement", outshine-mode, to do similar
code folding when you press tab on any of the lines matching
outline-regexp. But only if you patch it as mentioned at
https://github.com/alphapapa/outshine/issues/77.
Enable it by, eg: (add-hook 'haskell-mode-hook 'outshine-mode)
2020-03-28 17:09:47 -07:00
Simon Michael
374be00223 ;lib: fix org headings and doctest setup that were breaking haddock
(and in some cases, installation).
[ci skip]
2020-03-01 22:00:39 -08:00
Simon Michael
07258d727f ;timedot: parsing fixes; allow blank lines/comments within days 2020-03-01 14:06:29 -08:00
Simon Michael
b9954bff60 journal, lib: the include directive no longer guesses the format
The include directive now tries just one reader, based on the file
extension and defaulting to journal, like the rest of hledger.
(It doesn't yet handle a reader prefix.)

Reader-finding utilities have moved from Hledger.Read to
Hledger.Read.JournalReader so the include directive can use them.

Reader changes:
- rExperimental flag removed
- old rParser renamed to rReadFn
- new rParser field provides the actual parser.
  This seems to require making Reader a higher-kinded type, unfortunately.
2020-03-01 14:06:29 -08:00
Simon Michael
32eb839eac timedot: rewrite the parser, making it more usable
Now, org headlines before the first day entry are ignored,
regardless of content.

Note, blank lines inside a day entry are not allowed, currently.

It's now easier to be both valid journal and valid timedot at the same
time, so guessing the format of stdin is unreliable, and some tests
are failing. See following commit.
2020-03-01 14:06:29 -08:00
Simon Michael
26c19c65b0 timedot: allow a note after the date, use as transaction descriptions 2020-03-01 14:06:29 -08:00
Simon Michael
190233b576 timedot: more org support: dates/entries can be org headlines
Org headline prefixes (stars and space at beginning of line) are ignored.
2020-03-01 14:06:29 -08:00
Simon Michael
3dce879731 ;timedot: fix accidentally committed debug output breaking CI 2020-02-29 11:39:16 -08:00
Simon Michael
1bb33be54d ;lib: Hledger.Read.TimedotReader cleanup 2020-02-27 22:49:53 -08:00
Simon Michael
8535939f33 ;timedot: update parser tracing 2020-02-27 18:11:07 -08:00
Caleb Maclennan
11d9e5eb6a code: Strip extraneous trailing whitespace from Haskell sources 2019-07-15 16:40:49 +01:00
Alex Chen
3d2584d869 lib: switch to megaparsec 7 2018-09-30 20:15:12 -06:00
Alex Chen
855a8f1985 lib: Re-implement the 'ExceptT' layer of the parser
We previously had another parser type, 'type ErroringJournalParser =
ExceptT String ...' for throwing parse errors without the possibility of
backtracking. This parser type was removed under the assumption that it
would be possible to write our parser without this capability. However,
after a hairy backtracking bug, we would now prefer to have the option
to prevent backtracking.

- Define a 'FinalParseError' type specifically for the 'ExceptT' layer
- Any parse error can be raised as a "final" parse error
- Tracks the stack of include files for parser errors, anticipating the
  removal of the tracking of stacks of include files in megaparsec 7
  - Although a stack of include files is also tracked in the 'StateT
    Journal' layer of the parser, it seems easier to guarantee correct
    error messages in the 'ExceptT FinalParserError' layer
  - This does not make the 'StateT Journal' stack redundant because the
    'ExceptT FinalParseError' stack cannot be used to detect cycles of
    include files
2018-09-29 22:33:34 -06:00
Simon Michael
cd67f8ea68 tests: clear out old boilerplate 2018-08-31 18:12:17 -07:00
Simon Michael
d778a92561 tests: export HUnit/EasyTest from Hledger.Utils.Test; more helpers 2018-08-18 15:19:59 +01:00
Simon Michael
d5430e7ddf clean up debug helpers (api change) 2018-07-16 15:28:58 +01:00
Simon Michael
0ce9c5728a switch to base-compat-batteries to fix ghc 7.10 support (#794)
base-compat-batteries provides the same API across more ghc versions
than base-compat does, at the cost of more dependencies. Eg it exports
Prelude.Compat ((<>)) with ghc 7.10/base 4.8, which we expect.
My belief is that several of our deps already require it so the added
cost is not too great. We should probably go back to base-compat when
possible though, eg when we stop supporting ghc 7.10.
2018-06-04 17:32:42 -07:00
Peter Simons
6db7f800ee hledger-lib: fix doctest suite after recent package updates
The new version of our package set apparently contains both base-compat and
base-compat-batteries in its transitive closure. This breaks the doctest suite,
which just imports everything into scope when the tests are run, thereby making
module names like Prelude.Compat ambiguous.
2018-06-04 21:41:15 +02:00
Alex Chen
b245ec7b3d lib: remove the megaparsec compatability module 2018-05-22 12:16:46 -07:00
Alex Chen
09fd8132b7 lib: refactor: weaken types of comment parsers 2018-05-17 18:15:06 -07:00
Dmitry Astapov
ecf49b1e4b lib: auto postings generated before amount inference and balance checks (#729) 2018-04-17 14:33:32 -07:00
Moritz Kiefer
d7b68fbd7d Use skipMany/skipSome for parsing spacenonewline
This avoids allocating the list of space characters only to then
discard it.
2018-03-25 22:59:05 +01:00
Mykola Orliuk
b7dbe044b0 journal: use decimal sep hint for amount parser
Make use of commodity format directive as a hint for parsing amount.

Kinda resolves simonmichael/hledger#487
2017-11-27 15:47:56 -08:00
Simon Michael
580ad88dca timedot: fix parsing of month quantities (Nmo)
[ci skip]
2017-09-26 15:11:37 -10:00
Simon Michael
1ebf1fec28 timedot: also provide syntax for seconds, days, weeks, months & years 2017-08-21 17:28:57 -07:00
Simon Michael
5cdb60b69b timedot: allow minutes to be logged as Nm 2017-08-20 13:00:29 -07:00
Simon Michael
d7d5f8a064 add support for megaparsec 6 (fixes #594)
Older megaparsec is still supported.
Also cleans up our custom parser types,
and some text (un)packing is done in different places
(possible performance impact).
2017-07-27 19:20:46 -07:00
Johannes Gerer
74502f7e50 more general parser types enabling reuse outside of IO (#439) 2016-12-09 15:57:17 -08:00
Simon Michael
1f2276c100 lib: mark ledger reader as experimental, don't use automatically 2016-11-20 10:42:12 -08:00
Simon Michael
b6ff170688 lib: simplify format detection, avoid ledger reader by default
When we don't know a file's format, instead of choosing a subset of
readers based on content sniffing, now we just try them all.
Also, LedgerReader is now used only as a last resort,
as it's not yet competitive with JournalReader.
2016-11-18 13:24:57 -08:00
Simon Michael
3ddc9d7432 lib: clarify file format detectors 2016-11-16 13:25:33 -08:00
Moritz Kiefer
4141067428 Replace Parsec with Megaparsec (see #289) (#366)
* Replace Parsec with Megaparsec (see #289)

This builds upon PR #289 by @rasendubi

* Revert renaming of parseWithState to parseWithCtx

* Fix doctests

* Update for Megaparsec 5

* Specialize parser to improve performance

* Pretty print errors

* Swap StateT and ParsecT

This is necessary to get the correct backtracking behavior, i.e. discard
state changes if the parsing fails.
2016-07-29 08:57:10 -07:00
Simon Michael
770dcee742 lib: textification: comments and tags
No change.

hledger -f data/100x100x10.journal stats
<<ghc: 42859576 bytes, 84 GCs, 193781/269984 avg/max bytes residency (3 samples), 2M in use, 0.000 INIT (0.001 elapsed), 0.016 MUT (0.020 elapsed), 0.009 GC (0.011 elapsed) :ghc>>
<<ghc: 42859576 bytes, 84 GCs, 193781/269984 avg/max bytes residency (3 samples), 2M in use, 0.000 INIT (0.001 elapsed), 0.015 MUT (0.018 elapsed), 0.009 GC (0.013 elapsed) :ghc>>

hledger -f data/1000x1000x10.journal stats
<<ghc: 349576344 bytes, 681 GCs, 1407388/4091680 avg/max bytes residency (7 samples), 11M in use, 0.000 INIT (0.000 elapsed), 0.124 MUT (0.130 elapsed), 0.047 GC (0.055 elapsed) :ghc>>
<<ghc: 349576280 bytes, 681 GCs, 1407388/4091680 avg/max bytes residency (7 samples), 11M in use, 0.000 INIT (0.000 elapsed), 0.126 MUT (0.132 elapsed), 0.049 GC (0.058 elapsed) :ghc>>

hledger -f data/10000x1000x10.journal stats
<<ghc: 3424030664 bytes, 6658 GCs, 11403359/41071624 avg/max bytes residency (11 samples), 111M in use, 0.000 INIT (0.000 elapsed), 1.207 MUT (1.228 elapsed), 0.473 GC (0.528 elapsed) :ghc>>
<<ghc: 3424030760 bytes, 6658 GCs, 11403874/41077288 avg/max bytes residency (11 samples), 111M in use, 0.000 INIT (0.002 elapsed), 1.234 MUT (1.256 elapsed), 0.470 GC (0.520 elapsed) :ghc>>

hledger -f data/100000x1000x10.journal stats
<<ghc: 34306547448 bytes, 66727 GCs, 76805504/414629288 avg/max bytes residency (14 samples), 1009M in use, 0.000 INIT (0.003 elapsed), 12.615 MUT (12.813 elapsed), 4.656 GC (5.291 elapsed) :ghc>>
<<ghc: 34306547320 bytes, 66727 GCs, 76805504/414629288 avg/max bytes residency (14 samples), 1009M in use, 0.000 INIT (0.009 elapsed), 12.802 MUT (13.065 elapsed), 4.774 GC (5.441 elapsed) :ghc>>
2016-05-24 19:00:57 -07:00
Simon Michael
c89c33b36e lib: textification: parse stream
10% more allocation, but 35% lower maximum residency, and slightly quicker.

hledger -f data/100x100x10.journal stats
<<ghc: 39327768 bytes, 77 GCs, 196834/269496 avg/max bytes residency (3 samples), 2M in use, 0.000 INIT (0.010 elapsed), 0.020 MUT (0.092 elapsed), 0.014 GC (0.119 elapsed) :ghc>>
<<ghc: 42842136 bytes, 84 GCs, 194010/270912 avg/max bytes residency (3 samples), 2M in use, 0.000 INIT (0.009 elapsed), 0.016 MUT (0.029 elapsed), 0.012 GC (0.120 elapsed) :ghc>>

hledger -f data/1000x1000x10.journal stats
<<ghc: 314291440 bytes, 612 GCs, 2070776/6628048 avg/max bytes residency (7 samples), 16M in use, 0.000 INIT (0.000 elapsed), 0.128 MUT (0.144 elapsed), 0.059 GC (0.070 elapsed) :ghc>>
<<ghc: 349558872 bytes, 681 GCs, 1397597/4106384 avg/max bytes residency (7 samples), 11M in use, 0.000 INIT (0.004 elapsed), 0.124 MUT (0.133 elapsed), 0.047 GC (0.053 elapsed) :ghc>>

hledger -f data/10000x1000x10.journal stats
<<ghc: 3070026824 bytes, 5973 GCs, 12698030/62951784 avg/max bytes residency (10 samples), 124M in use, 0.000 INIT (0.002 elapsed), 1.268 MUT (1.354 elapsed), 0.514 GC (0.587 elapsed) :ghc>>
<<ghc: 3424013128 bytes, 6658 GCs, 11405501/41071624 avg/max bytes residency (11 samples), 111M in use, 0.000 INIT (0.001 elapsed), 1.343 MUT (1.406 elapsed), 0.511 GC (0.573 elapsed) :ghc>>

hledger -f data/100000x1000x10.journal stats
<<ghc: 30753387392 bytes, 59811 GCs, 117615462/666703600 avg/max bytes residency (14 samples), 1588M in use, 0.000 INIT (0.000 elapsed), 12.068 MUT (12.238 elapsed), 6.015 GC (7.190 elapsed) :ghc>>
<<ghc: 34306530696 bytes, 66727 GCs, 76806196/414629312 avg/max bytes residency (14 samples), 1009M in use, 0.000 INIT (0.010 elapsed), 14.357 MUT (16.370 elapsed), 5.298 GC (6.534 elapsed) :ghc>>
2016-05-24 19:00:57 -07:00
Simon Michael
0f5ee154c4 lib: simplify parsers; cleanups (#275)
The journal/timeclock/timedot parsers, instead of constructing (opaque)
journal update functions which are later applied to build the journal,
now construct the journal directly (by modifying the parser state). This
is easier to understand and debug. It also removes any possibility of
the journal updates being a space leak. (They weren't, in fact memory
usage is now slightly higher, but that will be addressed in other ways.)

Also:

Journal data and journal parse info have been merged into one type (for
now), and field names are more consistent.

The ParsedJournal type alias has been added to distinguish being-parsed
and finalised journals.

Journal is now a monoid.

stats: fixed an issue with ordering of include files

journal: fixed an issue with ordering of included same-date transactions

timeclock: sessions can no longer span file boundaries (unclocked-out
sessions will be auto-closed at the end of the file).

expandPath now throws a proper IO error (and requires the IO monad).
2016-05-23 00:44:19 -07:00
Simon Michael
7f5e09096f lib: rename JournalContext to JournalParseState 2016-05-18 20:57:34 -07:00
Simon Michael
84097b75c7 journal: can now include timeclock/timedot files (#320)
journal files can now include journal, timeclock or timedot files (but
not yet CSV files). Also timeclock/timedot files no longer support
default year directives.

The Hledger.Read.* modules have been reorganised for better reuse.
Hledger.Read.Utils has been renamed Hledger.Read.Common and holds
low-level parsers & utilities; high-level read utilities have moved to
Hledger.Read.
2016-05-17 19:46:54 -07:00
Simon Michael
a9afd7bcbe lib: slightly better journal/time format detection
The Journal, Timelog and Timedot readers' detectors now check
each line in the sample data, not just the first one. I think
the sample data is only about 30 chars right now, but even so
this fixed a format detection issue I was seeing.
2016-02-19 23:02:10 -08:00
Simon Michael
70863ae40b lib: timedot allow indenting 2016-02-19 22:58:08 -08:00
Simon Michael
4b4a4bacf7 lib: timedot parse order fix 2016-02-19 22:57:43 -08:00