;doc: update manuals

This commit is contained in:
Simon Michael 2024-03-24 14:51:30 -10:00
parent 2889bb6efb
commit 8642db786a
3 changed files with 153 additions and 126 deletions

View File

@ -9866,24 +9866,28 @@ files to your main journal, you will run
.PP .PP
Note you can import from any file format, though CSV files are the most Note you can import from any file format, though CSV files are the most
common import source, and these docs focus on that case. common import source, and these docs focus on that case.
.SS \[dq]Deduplication\[dq] .SS Skipping
\f[CR]import\f[R] tries to import only the transactions which are new \f[CR]import\f[R] tries to import only the transactions which are new
since the last import. since the last import, \[dq]skipping over\[dq] any that it saw last
time.
So if your bank\[aq]s CSV includes the last three months of data, you So if your bank\[aq]s CSV includes the last three months of data, you
can download and \f[CR]import\f[R] it every month (or week, or day) and can download and \f[CR]import\f[R] it every month (or week, or day) and
only the new transactions will be imported each time. only the new transactions will be imported each time.
.PP .PP
It works as follows. It works as follows.
For each imported \f[CR]FILE\f[R] (usually a CSV file): \- It tries to For each imported \f[CR]FILE\f[R]:
find the latest date seen previously, by reading it from a hidden .IP \[bu] 2
\f[CR].latest.FILE\f[R] in the same directory. It tries to find the latest date seen previously, by reading it from a
\- Then it processes \f[CR]FILE\f[R], ignoring any transactions on or hidden \f[CR].latest.FILE\f[R] in the same directory.
.IP \[bu] 2
Then it processes \f[CR]FILE\f[R], ignoring any transactions on or
before the \[dq]latest seen\[dq] date. before the \[dq]latest seen\[dq] date.
.PP .PP
And after a successful import, it updates the \f[CR].latest.FILE\f[R](s) And after a successful import, it updates the \f[CR].latest.FILE\f[R](s)
for next time (unless \f[CR]\-\-dry\-run\f[R] was used). for next time (unless \f[CR]\-\-dry\-run\f[R] was used).
.PP .PP
This is simple but fairly effective. This is simple system that works fairly well for transaction data
(usually CSV, but it could be any of hledger\[aq]s input formats).
It assumes: It assumes:
.IP "1." 3 .IP "1." 3
new items always have the newest dates new items always have the newest dates
@ -9901,12 +9905,17 @@ more often (and in old transactions it doesn\[aq]t matter).
Note, \f[CR]import\f[R] avoids reprocessing the same dates across Note, \f[CR]import\f[R] avoids reprocessing the same dates across
successive runs, but it does not detect transactions that are duplicated successive runs, but it does not detect transactions that are duplicated
within a single run. within a single run.
So eg if you downloaded but did not import \f[CR]bank.1.csv\f[R], and I\[aq]ll call these \[dq]skipping\[dq] and \[dq]deduplication\[dq].
later downloaded \f[CR]bank.2.csv\f[R] with overlapping data, you should .PP
not import both of them in a single run So for example, say you downloaded but did not import
(\f[CR]hledger import bank.1.csv bank.2.csv\f[R]); instead, import them \f[CR]bank.1.csv\f[R], and later downloaded \f[CR]bank.2.csv\f[R] with
one at a time (\f[CR]hledger import bank.1.csv\f[R], then overlapping data.
\f[CR]hledger import bank.2.csv\f[R]). Then you should not import both of them at once
(\f[CR]hledger import bank.1.csv bank.2.csv\f[R]), as the overlapping
data would appear twice and not be deduplicated.
Instead, import them one at a time
(\f[CR]hledger import bank.1.csv; hledger import bank.2.csv\f[R]), and
the second import will skip the overlapping data.
.PP .PP
Normally you can ignore the \f[CR].latest.*\f[R] files, but if needed, Normally you can ignore the \f[CR].latest.*\f[R] files, but if needed,
you can delete them (to make all transactions unseen), or you can delete them (to make all transactions unseen), or
@ -9917,7 +9926,7 @@ It means \[dq]I have seen transactions up to this date, and this many of
them occurring on that date\[dq]. them occurring on that date\[dq].
.PP .PP
(\f[CR]hledger print \-\-new\f[R] also uses and updates these (\f[CR]hledger print \-\-new\f[R] also uses and updates these
\f[CR].latest.*\f[R] files, but it is not often used.) \f[CR].latest.*\f[R] files, but it is less often used.)
.PP .PP
Related: CSV > Working with CSV > Deduplicating, importing. Related: CSV > Working with CSV > Deduplicating, importing.
.SS Import testing .SS Import testing

View File

@ -9546,31 +9546,36 @@ most common import source, and these docs focus on that case.
* Menu: * Menu:
* "Deduplication":: * Skipping::
* Import testing:: * Import testing::
* Importing balance assignments:: * Importing balance assignments::
* Commodity display styles:: * Commodity display styles::
 
File: hledger.info, Node: "Deduplication", Next: Import testing, Up: import File: hledger.info, Node: Skipping, Next: Import testing, Up: import
24.19.1 "Deduplication" 24.19.1 Skipping
----------------------- ----------------
'import' tries to import only the transactions which are new since the 'import' tries to import only the transactions which are new since the
last import. So if your bank's CSV includes the last three months of last import, "skipping over" any that it saw last time. So if your
data, you can download and 'import' it every month (or week, or day) and bank's CSV includes the last three months of data, you can download and
only the new transactions will be imported each time. 'import' it every month (or week, or day) and only the new transactions
will be imported each time.
It works as follows. For each imported 'FILE' (usually a CSV file): It works as follows. For each imported 'FILE':
- It tries to find the latest date seen previously, by reading it from a
hidden '.latest.FILE' in the same directory. - Then it processes * It tries to find the latest date seen previously, by reading it
'FILE', ignoring any transactions on or before the "latest seen" date. from a hidden '.latest.FILE' in the same directory.
* Then it processes 'FILE', ignoring any transactions on or before
the "latest seen" date.
And after a successful import, it updates the '.latest.FILE'(s) for And after a successful import, it updates the '.latest.FILE'(s) for
next time (unless '--dry-run' was used). next time (unless '--dry-run' was used).
This is simple but fairly effective. It assumes: This is simple system that works fairly well for transaction data
(usually CSV, but it could be any of hledger's input formats). It
assumes:
1. new items always have the newest dates 1. new items always have the newest dates
2. item dates are stable across successive CSV downloads 2. item dates are stable across successive CSV downloads
@ -9583,11 +9588,15 @@ by importing more often (and in old transactions it doesn't matter).
Note, 'import' avoids reprocessing the same dates across successive Note, 'import' avoids reprocessing the same dates across successive
runs, but it does not detect transactions that are duplicated within a runs, but it does not detect transactions that are duplicated within a
single run. So eg if you downloaded but did not import 'bank.1.csv', single run. I'll call these "skipping" and "deduplication".
and later downloaded 'bank.2.csv' with overlapping data, you should not
import both of them in a single run ('hledger import bank.1.csv So for example, say you downloaded but did not import 'bank.1.csv',
bank.2.csv'); instead, import them one at a time ('hledger import and later downloaded 'bank.2.csv' with overlapping data. Then you
bank.1.csv', then 'hledger import bank.2.csv'). should not import both of them at once ('hledger import bank.1.csv
bank.2.csv'), as the overlapping data would appear twice and not be
deduplicated. Instead, import them one at a time ('hledger import
bank.1.csv; hledger import bank.2.csv'), and the second import will skip
the overlapping data.
Normally you can ignore the '.latest.*' files, but if needed, you can Normally you can ignore the '.latest.*' files, but if needed, you can
delete them (to make all transactions unseen), or construct/modify them delete them (to make all transactions unseen), or construct/modify them
@ -9597,12 +9606,12 @@ have seen transactions up to this date, and this many of them occurring
on that date". on that date".
('hledger print --new' also uses and updates these '.latest.*' files, ('hledger print --new' also uses and updates these '.latest.*' files,
but it is not often used.) but it is less often used.)
Related: CSV > Working with CSV > Deduplicating, importing. Related: CSV > Working with CSV > Deduplicating, importing.
 
File: hledger.info, Node: Import testing, Next: Importing balance assignments, Prev: "Deduplication", Up: import File: hledger.info, Node: Import testing, Next: Importing balance assignments, Prev: Skipping, Up: import
24.19.2 Import testing 24.19.2 Import testing
---------------------- ----------------------
@ -11717,84 +11726,84 @@ Node: help343889
Ref: #help-1343998 Ref: #help-1343998
Node: import345371 Node: import345371
Ref: #import345494 Ref: #import345494
Node: "Deduplication"346604 Node: Skipping346597
Ref: #deduplication346735 Ref: #skipping346707
Node: Import testing348911 Node: Import testing349191
Ref: #import-testing349078 Ref: #import-testing349351
Node: Importing balance assignments349921 Node: Importing balance assignments350194
Ref: #importing-balance-assignments350127 Ref: #importing-balance-assignments350400
Node: Commodity display styles350776 Node: Commodity display styles351049
Ref: #commodity-display-styles350949 Ref: #commodity-display-styles351222
Node: incomestatement351078 Node: incomestatement351351
Ref: #incomestatement351220 Ref: #incomestatement351493
Node: notes352551 Node: notes352824
Ref: #notes352673 Ref: #notes352946
Node: payees353035 Node: payees353308
Ref: #payees353150 Ref: #payees353423
Node: prices353669 Node: prices353942
Ref: #prices353784 Ref: #prices354057
Node: print354437 Node: print354710
Ref: #print354552 Ref: #print354825
Node: print explicitness355528 Node: print explicitness355801
Ref: #print-explicitness355671 Ref: #print-explicitness355944
Node: print amount style356450 Node: print amount style356723
Ref: #print-amount-style356620 Ref: #print-amount-style356893
Node: print parseability357690 Node: print parseability357963
Ref: #print-parseability357862 Ref: #print-parseability358135
Node: print other features358611 Node: print other features358884
Ref: #print-other-features358790 Ref: #print-other-features359063
Node: print output format359311 Node: print output format359584
Ref: #print-output-format359459 Ref: #print-output-format359732
Node: register362598 Node: register362871
Ref: #register362720 Ref: #register362993
Node: Custom register output367751 Node: Custom register output368024
Ref: #custom-register-output367882 Ref: #custom-register-output368155
Node: rewrite369229 Node: rewrite369502
Ref: #rewrite369347 Ref: #rewrite369620
Node: Re-write rules in a file371245 Node: Re-write rules in a file371518
Ref: #re-write-rules-in-a-file371408 Ref: #re-write-rules-in-a-file371681
Node: Diff output format372557 Node: Diff output format372830
Ref: #diff-output-format372740 Ref: #diff-output-format373013
Node: rewrite vs print --auto373832 Node: rewrite vs print --auto374105
Ref: #rewrite-vs.-print---auto373992 Ref: #rewrite-vs.-print---auto374265
Node: roi374548 Node: roi374821
Ref: #roi374655 Ref: #roi374928
Node: Spaces and special characters in --inv and --pnl376467 Node: Spaces and special characters in --inv and --pnl376740
Ref: #spaces-and-special-characters-in---inv-and---pnl376707 Ref: #spaces-and-special-characters-in---inv-and---pnl376980
Node: Semantics of --inv and --pnl377195 Node: Semantics of --inv and --pnl377468
Ref: #semantics-of---inv-and---pnl377434 Ref: #semantics-of---inv-and---pnl377707
Node: IRR and TWR explained379284 Node: IRR and TWR explained379557
Ref: #irr-and-twr-explained379444 Ref: #irr-and-twr-explained379717
Node: stats382697 Node: stats382970
Ref: #stats382805 Ref: #stats383078
Node: tags384319 Node: tags384592
Ref: #tags-1384426 Ref: #tags-1384699
Node: test385435 Node: test385708
Ref: #test385528 Ref: #test385801
Node: PART 5 COMMON TASKS386270 Node: PART 5 COMMON TASKS386543
Ref: #part-5-common-tasks386416 Ref: #part-5-common-tasks386689
Node: Getting help386714 Node: Getting help386987
Ref: #getting-help386855 Ref: #getting-help387128
Node: Constructing command lines387615 Node: Constructing command lines387888
Ref: #constructing-command-lines387816 Ref: #constructing-command-lines388089
Node: Starting a journal file388473 Node: Starting a journal file388746
Ref: #starting-a-journal-file388675 Ref: #starting-a-journal-file388948
Node: Setting LEDGER_FILE389877 Node: Setting LEDGER_FILE390150
Ref: #setting-ledger_file390069 Ref: #setting-ledger_file390342
Node: Setting opening balances391026 Node: Setting opening balances391299
Ref: #setting-opening-balances391227 Ref: #setting-opening-balances391500
Node: Recording transactions394368 Node: Recording transactions394641
Ref: #recording-transactions394557 Ref: #recording-transactions394830
Node: Reconciling395113 Node: Reconciling395386
Ref: #reconciling395265 Ref: #reconciling395538
Node: Reporting397522 Node: Reporting397795
Ref: #reporting397671 Ref: #reporting397944
Node: Migrating to a new file401656 Node: Migrating to a new file401929
Ref: #migrating-to-a-new-file401813 Ref: #migrating-to-a-new-file402086
Node: BUGS402112 Node: BUGS402385
Ref: #bugs402202 Ref: #bugs402475
Node: Troubleshooting403081 Node: Troubleshooting403354
Ref: #troubleshooting403181 Ref: #troubleshooting403454
 
End Tag Table End Tag Table

View File

@ -7719,21 +7719,26 @@ PART 4: COMMANDS
Note you can import from any file format, though CSV files are the most Note you can import from any file format, though CSV files are the most
common import source, and these docs focus on that case. common import source, and these docs focus on that case.
"Deduplication" Skipping
import tries to import only the transactions which are new since the import tries to import only the transactions which are new since the
last import. So if your bank's CSV includes the last three months of last import, "skipping over" any that it saw last time. So if your
data, you can download and import it every month (or week, or day) and bank's CSV includes the last three months of data, you can download and
only the new transactions will be imported each time. import it every month (or week, or day) and only the new transactions
will be imported each time.
It works as follows. For each imported FILE (usually a CSV file): - It It works as follows. For each imported FILE:
tries to find the latest date seen previously, by reading it from a
hidden .latest.FILE in the same directory. - Then it processes FILE, o It tries to find the latest date seen previously, by reading it from
ignoring any transactions on or before the "latest seen" date. a hidden .latest.FILE in the same directory.
o Then it processes FILE, ignoring any transactions on or before the
"latest seen" date.
And after a successful import, it updates the .latest.FILE(s) for next And after a successful import, it updates the .latest.FILE(s) for next
time (unless --dry-run was used). time (unless --dry-run was used).
This is simple but fairly effective. It assumes: This is simple system that works fairly well for transaction data (usu-
ally CSV, but it could be any of hledger's input formats). It assumes:
1. new items always have the newest dates 1. new items always have the newest dates
@ -7749,11 +7754,15 @@ PART 4: COMMANDS
Note, import avoids reprocessing the same dates across successive runs, Note, import avoids reprocessing the same dates across successive runs,
but it does not detect transactions that are duplicated within a single but it does not detect transactions that are duplicated within a single
run. So eg if you downloaded but did not import bank.1.csv, and later run. I'll call these "skipping" and "deduplication".
downloaded bank.2.csv with overlapping data, you should not import both
of them in a single run (hledger import bank.1.csv bank.2.csv); in- So for example, say you downloaded but did not import bank.1.csv, and
stead, import them one at a time (hledger import bank.1.csv, then later downloaded bank.2.csv with overlapping data. Then you should not
hledger import bank.2.csv). import both of them at once (hledger import bank.1.csv bank.2.csv), as
the overlapping data would appear twice and not be deduplicated. In-
stead, import them one at a time (hledger import bank.1.csv; hledger
import bank.2.csv), and the second import will skip the overlapping
data.
Normally you can ignore the .latest.* files, but if needed, you can Normally you can ignore the .latest.* files, but if needed, you can
delete them (to make all transactions unseen), or construct/modify them delete them (to make all transactions unseen), or construct/modify them
@ -7763,7 +7772,7 @@ PART 4: COMMANDS
ring on that date". ring on that date".
(hledger print --new also uses and updates these .latest.* files, but (hledger print --new also uses and updates these .latest.* files, but
it is not often used.) it is less often used.)
Related: CSV > Working with CSV > Deduplicating, importing. Related: CSV > Working with CSV > Deduplicating, importing.