;doc: update manuals
This commit is contained in:
parent
2889bb6efb
commit
8642db786a
@ -9866,24 +9866,28 @@ files to your main journal, you will run
|
|||||||
.PP
|
.PP
|
||||||
Note you can import from any file format, though CSV files are the most
|
Note you can import from any file format, though CSV files are the most
|
||||||
common import source, and these docs focus on that case.
|
common import source, and these docs focus on that case.
|
||||||
.SS \[dq]Deduplication\[dq]
|
.SS Skipping
|
||||||
\f[CR]import\f[R] tries to import only the transactions which are new
|
\f[CR]import\f[R] tries to import only the transactions which are new
|
||||||
since the last import.
|
since the last import, \[dq]skipping over\[dq] any that it saw last
|
||||||
|
time.
|
||||||
So if your bank\[aq]s CSV includes the last three months of data, you
|
So if your bank\[aq]s CSV includes the last three months of data, you
|
||||||
can download and \f[CR]import\f[R] it every month (or week, or day) and
|
can download and \f[CR]import\f[R] it every month (or week, or day) and
|
||||||
only the new transactions will be imported each time.
|
only the new transactions will be imported each time.
|
||||||
.PP
|
.PP
|
||||||
It works as follows.
|
It works as follows.
|
||||||
For each imported \f[CR]FILE\f[R] (usually a CSV file): \- It tries to
|
For each imported \f[CR]FILE\f[R]:
|
||||||
find the latest date seen previously, by reading it from a hidden
|
.IP \[bu] 2
|
||||||
\f[CR].latest.FILE\f[R] in the same directory.
|
It tries to find the latest date seen previously, by reading it from a
|
||||||
\- Then it processes \f[CR]FILE\f[R], ignoring any transactions on or
|
hidden \f[CR].latest.FILE\f[R] in the same directory.
|
||||||
|
.IP \[bu] 2
|
||||||
|
Then it processes \f[CR]FILE\f[R], ignoring any transactions on or
|
||||||
before the \[dq]latest seen\[dq] date.
|
before the \[dq]latest seen\[dq] date.
|
||||||
.PP
|
.PP
|
||||||
And after a successful import, it updates the \f[CR].latest.FILE\f[R](s)
|
And after a successful import, it updates the \f[CR].latest.FILE\f[R](s)
|
||||||
for next time (unless \f[CR]\-\-dry\-run\f[R] was used).
|
for next time (unless \f[CR]\-\-dry\-run\f[R] was used).
|
||||||
.PP
|
.PP
|
||||||
This is simple but fairly effective.
|
This is simple system that works fairly well for transaction data
|
||||||
|
(usually CSV, but it could be any of hledger\[aq]s input formats).
|
||||||
It assumes:
|
It assumes:
|
||||||
.IP "1." 3
|
.IP "1." 3
|
||||||
new items always have the newest dates
|
new items always have the newest dates
|
||||||
@ -9901,12 +9905,17 @@ more often (and in old transactions it doesn\[aq]t matter).
|
|||||||
Note, \f[CR]import\f[R] avoids reprocessing the same dates across
|
Note, \f[CR]import\f[R] avoids reprocessing the same dates across
|
||||||
successive runs, but it does not detect transactions that are duplicated
|
successive runs, but it does not detect transactions that are duplicated
|
||||||
within a single run.
|
within a single run.
|
||||||
So eg if you downloaded but did not import \f[CR]bank.1.csv\f[R], and
|
I\[aq]ll call these \[dq]skipping\[dq] and \[dq]deduplication\[dq].
|
||||||
later downloaded \f[CR]bank.2.csv\f[R] with overlapping data, you should
|
.PP
|
||||||
not import both of them in a single run
|
So for example, say you downloaded but did not import
|
||||||
(\f[CR]hledger import bank.1.csv bank.2.csv\f[R]); instead, import them
|
\f[CR]bank.1.csv\f[R], and later downloaded \f[CR]bank.2.csv\f[R] with
|
||||||
one at a time (\f[CR]hledger import bank.1.csv\f[R], then
|
overlapping data.
|
||||||
\f[CR]hledger import bank.2.csv\f[R]).
|
Then you should not import both of them at once
|
||||||
|
(\f[CR]hledger import bank.1.csv bank.2.csv\f[R]), as the overlapping
|
||||||
|
data would appear twice and not be deduplicated.
|
||||||
|
Instead, import them one at a time
|
||||||
|
(\f[CR]hledger import bank.1.csv; hledger import bank.2.csv\f[R]), and
|
||||||
|
the second import will skip the overlapping data.
|
||||||
.PP
|
.PP
|
||||||
Normally you can ignore the \f[CR].latest.*\f[R] files, but if needed,
|
Normally you can ignore the \f[CR].latest.*\f[R] files, but if needed,
|
||||||
you can delete them (to make all transactions unseen), or
|
you can delete them (to make all transactions unseen), or
|
||||||
@ -9917,7 +9926,7 @@ It means \[dq]I have seen transactions up to this date, and this many of
|
|||||||
them occurring on that date\[dq].
|
them occurring on that date\[dq].
|
||||||
.PP
|
.PP
|
||||||
(\f[CR]hledger print \-\-new\f[R] also uses and updates these
|
(\f[CR]hledger print \-\-new\f[R] also uses and updates these
|
||||||
\f[CR].latest.*\f[R] files, but it is not often used.)
|
\f[CR].latest.*\f[R] files, but it is less often used.)
|
||||||
.PP
|
.PP
|
||||||
Related: CSV > Working with CSV > Deduplicating, importing.
|
Related: CSV > Working with CSV > Deduplicating, importing.
|
||||||
.SS Import testing
|
.SS Import testing
|
||||||
|
|||||||
@ -9546,31 +9546,36 @@ most common import source, and these docs focus on that case.
|
|||||||
|
|
||||||
* Menu:
|
* Menu:
|
||||||
|
|
||||||
* "Deduplication"::
|
* Skipping::
|
||||||
* Import testing::
|
* Import testing::
|
||||||
* Importing balance assignments::
|
* Importing balance assignments::
|
||||||
* Commodity display styles::
|
* Commodity display styles::
|
||||||
|
|
||||||
|
|
||||||
File: hledger.info, Node: "Deduplication", Next: Import testing, Up: import
|
File: hledger.info, Node: Skipping, Next: Import testing, Up: import
|
||||||
|
|
||||||
24.19.1 "Deduplication"
|
24.19.1 Skipping
|
||||||
-----------------------
|
----------------
|
||||||
|
|
||||||
'import' tries to import only the transactions which are new since the
|
'import' tries to import only the transactions which are new since the
|
||||||
last import. So if your bank's CSV includes the last three months of
|
last import, "skipping over" any that it saw last time. So if your
|
||||||
data, you can download and 'import' it every month (or week, or day) and
|
bank's CSV includes the last three months of data, you can download and
|
||||||
only the new transactions will be imported each time.
|
'import' it every month (or week, or day) and only the new transactions
|
||||||
|
will be imported each time.
|
||||||
|
|
||||||
It works as follows. For each imported 'FILE' (usually a CSV file):
|
It works as follows. For each imported 'FILE':
|
||||||
- It tries to find the latest date seen previously, by reading it from a
|
|
||||||
hidden '.latest.FILE' in the same directory. - Then it processes
|
* It tries to find the latest date seen previously, by reading it
|
||||||
'FILE', ignoring any transactions on or before the "latest seen" date.
|
from a hidden '.latest.FILE' in the same directory.
|
||||||
|
* Then it processes 'FILE', ignoring any transactions on or before
|
||||||
|
the "latest seen" date.
|
||||||
|
|
||||||
And after a successful import, it updates the '.latest.FILE'(s) for
|
And after a successful import, it updates the '.latest.FILE'(s) for
|
||||||
next time (unless '--dry-run' was used).
|
next time (unless '--dry-run' was used).
|
||||||
|
|
||||||
This is simple but fairly effective. It assumes:
|
This is simple system that works fairly well for transaction data
|
||||||
|
(usually CSV, but it could be any of hledger's input formats). It
|
||||||
|
assumes:
|
||||||
|
|
||||||
1. new items always have the newest dates
|
1. new items always have the newest dates
|
||||||
2. item dates are stable across successive CSV downloads
|
2. item dates are stable across successive CSV downloads
|
||||||
@ -9583,11 +9588,15 @@ by importing more often (and in old transactions it doesn't matter).
|
|||||||
|
|
||||||
Note, 'import' avoids reprocessing the same dates across successive
|
Note, 'import' avoids reprocessing the same dates across successive
|
||||||
runs, but it does not detect transactions that are duplicated within a
|
runs, but it does not detect transactions that are duplicated within a
|
||||||
single run. So eg if you downloaded but did not import 'bank.1.csv',
|
single run. I'll call these "skipping" and "deduplication".
|
||||||
and later downloaded 'bank.2.csv' with overlapping data, you should not
|
|
||||||
import both of them in a single run ('hledger import bank.1.csv
|
So for example, say you downloaded but did not import 'bank.1.csv',
|
||||||
bank.2.csv'); instead, import them one at a time ('hledger import
|
and later downloaded 'bank.2.csv' with overlapping data. Then you
|
||||||
bank.1.csv', then 'hledger import bank.2.csv').
|
should not import both of them at once ('hledger import bank.1.csv
|
||||||
|
bank.2.csv'), as the overlapping data would appear twice and not be
|
||||||
|
deduplicated. Instead, import them one at a time ('hledger import
|
||||||
|
bank.1.csv; hledger import bank.2.csv'), and the second import will skip
|
||||||
|
the overlapping data.
|
||||||
|
|
||||||
Normally you can ignore the '.latest.*' files, but if needed, you can
|
Normally you can ignore the '.latest.*' files, but if needed, you can
|
||||||
delete them (to make all transactions unseen), or construct/modify them
|
delete them (to make all transactions unseen), or construct/modify them
|
||||||
@ -9597,12 +9606,12 @@ have seen transactions up to this date, and this many of them occurring
|
|||||||
on that date".
|
on that date".
|
||||||
|
|
||||||
('hledger print --new' also uses and updates these '.latest.*' files,
|
('hledger print --new' also uses and updates these '.latest.*' files,
|
||||||
but it is not often used.)
|
but it is less often used.)
|
||||||
|
|
||||||
Related: CSV > Working with CSV > Deduplicating, importing.
|
Related: CSV > Working with CSV > Deduplicating, importing.
|
||||||
|
|
||||||
|
|
||||||
File: hledger.info, Node: Import testing, Next: Importing balance assignments, Prev: "Deduplication", Up: import
|
File: hledger.info, Node: Import testing, Next: Importing balance assignments, Prev: Skipping, Up: import
|
||||||
|
|
||||||
24.19.2 Import testing
|
24.19.2 Import testing
|
||||||
----------------------
|
----------------------
|
||||||
@ -11717,84 +11726,84 @@ Node: help343889
|
|||||||
Ref: #help-1343998
|
Ref: #help-1343998
|
||||||
Node: import345371
|
Node: import345371
|
||||||
Ref: #import345494
|
Ref: #import345494
|
||||||
Node: "Deduplication"346604
|
Node: Skipping346597
|
||||||
Ref: #deduplication346735
|
Ref: #skipping346707
|
||||||
Node: Import testing348911
|
Node: Import testing349191
|
||||||
Ref: #import-testing349078
|
Ref: #import-testing349351
|
||||||
Node: Importing balance assignments349921
|
Node: Importing balance assignments350194
|
||||||
Ref: #importing-balance-assignments350127
|
Ref: #importing-balance-assignments350400
|
||||||
Node: Commodity display styles350776
|
Node: Commodity display styles351049
|
||||||
Ref: #commodity-display-styles350949
|
Ref: #commodity-display-styles351222
|
||||||
Node: incomestatement351078
|
Node: incomestatement351351
|
||||||
Ref: #incomestatement351220
|
Ref: #incomestatement351493
|
||||||
Node: notes352551
|
Node: notes352824
|
||||||
Ref: #notes352673
|
Ref: #notes352946
|
||||||
Node: payees353035
|
Node: payees353308
|
||||||
Ref: #payees353150
|
Ref: #payees353423
|
||||||
Node: prices353669
|
Node: prices353942
|
||||||
Ref: #prices353784
|
Ref: #prices354057
|
||||||
Node: print354437
|
Node: print354710
|
||||||
Ref: #print354552
|
Ref: #print354825
|
||||||
Node: print explicitness355528
|
Node: print explicitness355801
|
||||||
Ref: #print-explicitness355671
|
Ref: #print-explicitness355944
|
||||||
Node: print amount style356450
|
Node: print amount style356723
|
||||||
Ref: #print-amount-style356620
|
Ref: #print-amount-style356893
|
||||||
Node: print parseability357690
|
Node: print parseability357963
|
||||||
Ref: #print-parseability357862
|
Ref: #print-parseability358135
|
||||||
Node: print other features358611
|
Node: print other features358884
|
||||||
Ref: #print-other-features358790
|
Ref: #print-other-features359063
|
||||||
Node: print output format359311
|
Node: print output format359584
|
||||||
Ref: #print-output-format359459
|
Ref: #print-output-format359732
|
||||||
Node: register362598
|
Node: register362871
|
||||||
Ref: #register362720
|
Ref: #register362993
|
||||||
Node: Custom register output367751
|
Node: Custom register output368024
|
||||||
Ref: #custom-register-output367882
|
Ref: #custom-register-output368155
|
||||||
Node: rewrite369229
|
Node: rewrite369502
|
||||||
Ref: #rewrite369347
|
Ref: #rewrite369620
|
||||||
Node: Re-write rules in a file371245
|
Node: Re-write rules in a file371518
|
||||||
Ref: #re-write-rules-in-a-file371408
|
Ref: #re-write-rules-in-a-file371681
|
||||||
Node: Diff output format372557
|
Node: Diff output format372830
|
||||||
Ref: #diff-output-format372740
|
Ref: #diff-output-format373013
|
||||||
Node: rewrite vs print --auto373832
|
Node: rewrite vs print --auto374105
|
||||||
Ref: #rewrite-vs.-print---auto373992
|
Ref: #rewrite-vs.-print---auto374265
|
||||||
Node: roi374548
|
Node: roi374821
|
||||||
Ref: #roi374655
|
Ref: #roi374928
|
||||||
Node: Spaces and special characters in --inv and --pnl376467
|
Node: Spaces and special characters in --inv and --pnl376740
|
||||||
Ref: #spaces-and-special-characters-in---inv-and---pnl376707
|
Ref: #spaces-and-special-characters-in---inv-and---pnl376980
|
||||||
Node: Semantics of --inv and --pnl377195
|
Node: Semantics of --inv and --pnl377468
|
||||||
Ref: #semantics-of---inv-and---pnl377434
|
Ref: #semantics-of---inv-and---pnl377707
|
||||||
Node: IRR and TWR explained379284
|
Node: IRR and TWR explained379557
|
||||||
Ref: #irr-and-twr-explained379444
|
Ref: #irr-and-twr-explained379717
|
||||||
Node: stats382697
|
Node: stats382970
|
||||||
Ref: #stats382805
|
Ref: #stats383078
|
||||||
Node: tags384319
|
Node: tags384592
|
||||||
Ref: #tags-1384426
|
Ref: #tags-1384699
|
||||||
Node: test385435
|
Node: test385708
|
||||||
Ref: #test385528
|
Ref: #test385801
|
||||||
Node: PART 5 COMMON TASKS386270
|
Node: PART 5 COMMON TASKS386543
|
||||||
Ref: #part-5-common-tasks386416
|
Ref: #part-5-common-tasks386689
|
||||||
Node: Getting help386714
|
Node: Getting help386987
|
||||||
Ref: #getting-help386855
|
Ref: #getting-help387128
|
||||||
Node: Constructing command lines387615
|
Node: Constructing command lines387888
|
||||||
Ref: #constructing-command-lines387816
|
Ref: #constructing-command-lines388089
|
||||||
Node: Starting a journal file388473
|
Node: Starting a journal file388746
|
||||||
Ref: #starting-a-journal-file388675
|
Ref: #starting-a-journal-file388948
|
||||||
Node: Setting LEDGER_FILE389877
|
Node: Setting LEDGER_FILE390150
|
||||||
Ref: #setting-ledger_file390069
|
Ref: #setting-ledger_file390342
|
||||||
Node: Setting opening balances391026
|
Node: Setting opening balances391299
|
||||||
Ref: #setting-opening-balances391227
|
Ref: #setting-opening-balances391500
|
||||||
Node: Recording transactions394368
|
Node: Recording transactions394641
|
||||||
Ref: #recording-transactions394557
|
Ref: #recording-transactions394830
|
||||||
Node: Reconciling395113
|
Node: Reconciling395386
|
||||||
Ref: #reconciling395265
|
Ref: #reconciling395538
|
||||||
Node: Reporting397522
|
Node: Reporting397795
|
||||||
Ref: #reporting397671
|
Ref: #reporting397944
|
||||||
Node: Migrating to a new file401656
|
Node: Migrating to a new file401929
|
||||||
Ref: #migrating-to-a-new-file401813
|
Ref: #migrating-to-a-new-file402086
|
||||||
Node: BUGS402112
|
Node: BUGS402385
|
||||||
Ref: #bugs402202
|
Ref: #bugs402475
|
||||||
Node: Troubleshooting403081
|
Node: Troubleshooting403354
|
||||||
Ref: #troubleshooting403181
|
Ref: #troubleshooting403454
|
||||||
|
|
||||||
End Tag Table
|
End Tag Table
|
||||||
|
|
||||||
|
|||||||
@ -7719,21 +7719,26 @@ PART 4: COMMANDS
|
|||||||
Note you can import from any file format, though CSV files are the most
|
Note you can import from any file format, though CSV files are the most
|
||||||
common import source, and these docs focus on that case.
|
common import source, and these docs focus on that case.
|
||||||
|
|
||||||
"Deduplication"
|
Skipping
|
||||||
import tries to import only the transactions which are new since the
|
import tries to import only the transactions which are new since the
|
||||||
last import. So if your bank's CSV includes the last three months of
|
last import, "skipping over" any that it saw last time. So if your
|
||||||
data, you can download and import it every month (or week, or day) and
|
bank's CSV includes the last three months of data, you can download and
|
||||||
only the new transactions will be imported each time.
|
import it every month (or week, or day) and only the new transactions
|
||||||
|
will be imported each time.
|
||||||
|
|
||||||
It works as follows. For each imported FILE (usually a CSV file): - It
|
It works as follows. For each imported FILE:
|
||||||
tries to find the latest date seen previously, by reading it from a
|
|
||||||
hidden .latest.FILE in the same directory. - Then it processes FILE,
|
o It tries to find the latest date seen previously, by reading it from
|
||||||
ignoring any transactions on or before the "latest seen" date.
|
a hidden .latest.FILE in the same directory.
|
||||||
|
|
||||||
|
o Then it processes FILE, ignoring any transactions on or before the
|
||||||
|
"latest seen" date.
|
||||||
|
|
||||||
And after a successful import, it updates the .latest.FILE(s) for next
|
And after a successful import, it updates the .latest.FILE(s) for next
|
||||||
time (unless --dry-run was used).
|
time (unless --dry-run was used).
|
||||||
|
|
||||||
This is simple but fairly effective. It assumes:
|
This is simple system that works fairly well for transaction data (usu-
|
||||||
|
ally CSV, but it could be any of hledger's input formats). It assumes:
|
||||||
|
|
||||||
1. new items always have the newest dates
|
1. new items always have the newest dates
|
||||||
|
|
||||||
@ -7749,11 +7754,15 @@ PART 4: COMMANDS
|
|||||||
|
|
||||||
Note, import avoids reprocessing the same dates across successive runs,
|
Note, import avoids reprocessing the same dates across successive runs,
|
||||||
but it does not detect transactions that are duplicated within a single
|
but it does not detect transactions that are duplicated within a single
|
||||||
run. So eg if you downloaded but did not import bank.1.csv, and later
|
run. I'll call these "skipping" and "deduplication".
|
||||||
downloaded bank.2.csv with overlapping data, you should not import both
|
|
||||||
of them in a single run (hledger import bank.1.csv bank.2.csv); in-
|
So for example, say you downloaded but did not import bank.1.csv, and
|
||||||
stead, import them one at a time (hledger import bank.1.csv, then
|
later downloaded bank.2.csv with overlapping data. Then you should not
|
||||||
hledger import bank.2.csv).
|
import both of them at once (hledger import bank.1.csv bank.2.csv), as
|
||||||
|
the overlapping data would appear twice and not be deduplicated. In-
|
||||||
|
stead, import them one at a time (hledger import bank.1.csv; hledger
|
||||||
|
import bank.2.csv), and the second import will skip the overlapping
|
||||||
|
data.
|
||||||
|
|
||||||
Normally you can ignore the .latest.* files, but if needed, you can
|
Normally you can ignore the .latest.* files, but if needed, you can
|
||||||
delete them (to make all transactions unseen), or construct/modify them
|
delete them (to make all transactions unseen), or construct/modify them
|
||||||
@ -7763,7 +7772,7 @@ PART 4: COMMANDS
|
|||||||
ring on that date".
|
ring on that date".
|
||||||
|
|
||||||
(hledger print --new also uses and updates these .latest.* files, but
|
(hledger print --new also uses and updates these .latest.* files, but
|
||||||
it is not often used.)
|
it is less often used.)
|
||||||
|
|
||||||
Related: CSV > Working with CSV > Deduplicating, importing.
|
Related: CSV > Working with CSV > Deduplicating, importing.
|
||||||
|
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user