;doc: update manuals

2024-03-24 14:51:30 -10:00 · 2024-03-24 14:51:30 -10:00 · 8642db786a
commit 8642db786a
parent 2889bb6efb
3 changed files with 153 additions and 126 deletions
--- a/hledger/hledger.1
+++ b/hledger/hledger.1
@ -9866,24 +9866,28 @@ files to your main journal, you will run
 .PP
 Note you can import from any file format, though CSV files are the most
 common import source, and these docs focus on that case.
-.SS \[dq]Deduplication\[dq]
+.SS Skipping
 \f[CR]import\f[R] tries to import only the transactions which are new
-since the last import.
+since the last import, \[dq]skipping over\[dq] any that it saw last
 time.
 So if your bank\[aq]s CSV includes the last three months of data, you
 can download and \f[CR]import\f[R] it every month (or week, or day) and
 only the new transactions will be imported each time.
 .PP
 It works as follows.
-For each imported \f[CR]FILE\f[R] (usually a CSV file): \- It tries to
+For each imported \f[CR]FILE\f[R]:
-find the latest date seen previously, by reading it from a hidden
+.IP \[bu] 2
-\f[CR].latest.FILE\f[R] in the same directory.
+It tries to find the latest date seen previously, by reading it from a
-\- Then it processes \f[CR]FILE\f[R], ignoring any transactions on or
+hidden \f[CR].latest.FILE\f[R] in the same directory.
 .IP \[bu] 2
 Then it processes \f[CR]FILE\f[R], ignoring any transactions on or
 before the \[dq]latest seen\[dq] date.
 .PP
 And after a successful import, it updates the \f[CR].latest.FILE\f[R](s)
 for next time (unless \f[CR]\-\-dry\-run\f[R] was used).
 .PP
-This is simple but fairly effective.
+This is simple system that works fairly well for transaction data
 (usually CSV, but it could be any of hledger\[aq]s input formats).
 It assumes:
 .IP "1." 3
 new items always have the newest dates
@ -9901,12 +9905,17 @@ more often (and in old transactions it doesn\[aq]t matter).
 Note, \f[CR]import\f[R] avoids reprocessing the same dates across
 successive runs, but it does not detect transactions that are duplicated
 within a single run.
-So eg if you downloaded but did not import \f[CR]bank.1.csv\f[R], and
+I\[aq]ll call these \[dq]skipping\[dq] and \[dq]deduplication\[dq].
-later downloaded \f[CR]bank.2.csv\f[R] with overlapping data, you should
+.PP
-not import both of them in a single run
+So for example, say you downloaded but did not import
-(\f[CR]hledger import bank.1.csv bank.2.csv\f[R]); instead, import them
+\f[CR]bank.1.csv\f[R], and later downloaded \f[CR]bank.2.csv\f[R] with
-one at a time (\f[CR]hledger import bank.1.csv\f[R], then
+overlapping data.
-\f[CR]hledger import bank.2.csv\f[R]).
+Then you should not import both of them at once
 (\f[CR]hledger import bank.1.csv bank.2.csv\f[R]), as the overlapping
 data would appear twice and not be deduplicated.
 Instead, import them one at a time
 (\f[CR]hledger import bank.1.csv; hledger import bank.2.csv\f[R]), and
 the second import will skip the overlapping data.
 .PP
 Normally you can ignore the \f[CR].latest.*\f[R] files, but if needed,
 you can delete them (to make all transactions unseen), or
@ -9917,7 +9926,7 @@ It means \[dq]I have seen transactions up to this date, and this many of
 them occurring on that date\[dq].
 .PP
 (\f[CR]hledger print \-\-new\f[R] also uses and updates these
-\f[CR].latest.*\f[R] files, but it is not often used.)
+\f[CR].latest.*\f[R] files, but it is less often used.)
 .PP
 Related: CSV > Working with CSV > Deduplicating, importing.
 .SS Import testing
--- a/hledger/hledger.info
+++ b/hledger/hledger.info
@ -9546,31 +9546,36 @@ most common import source, and these docs focus on that case.
 * Menu:
-* "Deduplication"::
+* Skipping::
 * Import testing::
 * Importing balance assignments::
 * Commodity display styles::
-File: hledger.info,  Node: "Deduplication",  Next: Import testing,  Up: import
+File: hledger.info,  Node: Skipping,  Next: Import testing,  Up: import
-24.19.1 "Deduplication"
+24.19.1 Skipping
-----------------------
+----------------
 'import' tries to import only the transactions which are new since the
-last import.  So if your bank's CSV includes the last three months of
+last import, "skipping over" any that it saw last time.  So if your
-data, you can download and 'import' it every month (or week, or day) and
+bank's CSV includes the last three months of data, you can download and
-only the new transactions will be imported each time.
+'import' it every month (or week, or day) and only the new transactions
 will be imported each time.
-   It works as follows.  For each imported 'FILE' (usually a CSV file):
+   It works as follows.  For each imported 'FILE':
- It tries to find the latest date seen previously, by reading it from a
+
-hidden '.latest.FILE' in the same directory.  - Then it processes
+   * It tries to find the latest date seen previously, by reading it
-'FILE', ignoring any transactions on or before the "latest seen" date.
+     from a hidden '.latest.FILE' in the same directory.
   * Then it processes 'FILE', ignoring any transactions on or before
     the "latest seen" date.
   And after a successful import, it updates the '.latest.FILE'(s) for
 next time (unless '--dry-run' was used).
-   This is simple but fairly effective.  It assumes:
+   This is simple system that works fairly well for transaction data
 (usually CSV, but it could be any of hledger's input formats).  It
 assumes:
  1. new items always have the newest dates
  2. item dates are stable across successive CSV downloads
@ -9583,11 +9588,15 @@ by importing more often (and in old transactions it doesn't matter).
   Note, 'import' avoids reprocessing the same dates across successive
 runs, but it does not detect transactions that are duplicated within a
-single run.  So eg if you downloaded but did not import 'bank.1.csv',
+single run.  I'll call these "skipping" and "deduplication".
-and later downloaded 'bank.2.csv' with overlapping data, you should not
+
-import both of them in a single run ('hledger import bank.1.csv
+   So for example, say you downloaded but did not import 'bank.1.csv',
-bank.2.csv'); instead, import them one at a time ('hledger import
+and later downloaded 'bank.2.csv' with overlapping data.  Then you
-bank.1.csv', then 'hledger import bank.2.csv').
+should not import both of them at once ('hledger import bank.1.csv
 bank.2.csv'), as the overlapping data would appear twice and not be
 deduplicated.  Instead, import them one at a time ('hledger import
 bank.1.csv; hledger import bank.2.csv'), and the second import will skip
 the overlapping data.
   Normally you can ignore the '.latest.*' files, but if needed, you can
 delete them (to make all transactions unseen), or construct/modify them
@ -9597,12 +9606,12 @@ have seen transactions up to this date, and this many of them occurring
 on that date".
   ('hledger print --new' also uses and updates these '.latest.*' files,
-but it is not often used.)
+but it is less often used.)
   Related: CSV > Working with CSV > Deduplicating, importing.
-File: hledger.info,  Node: Import testing,  Next: Importing balance assignments,  Prev: "Deduplication",  Up: import
+File: hledger.info,  Node: Import testing,  Next: Importing balance assignments,  Prev: Skipping,  Up: import
 24.19.2 Import testing
 ----------------------
@ -11717,84 +11726,84 @@ Node: help343889
 Ref: #help-1343998
 Node: import345371
 Ref: #import345494
-Node: "Deduplication"346604
+Node: Skipping346597
-Ref: #deduplication346735
+Ref: #skipping346707
-Node: Import testing348911
+Node: Import testing349191
-Ref: #import-testing349078
+Ref: #import-testing349351
-Node: Importing balance assignments349921
+Node: Importing balance assignments350194
-Ref: #importing-balance-assignments350127
+Ref: #importing-balance-assignments350400
-Node: Commodity display styles350776
+Node: Commodity display styles351049
-Ref: #commodity-display-styles350949
+Ref: #commodity-display-styles351222
-Node: incomestatement351078
+Node: incomestatement351351
-Ref: #incomestatement351220
+Ref: #incomestatement351493
-Node: notes352551
+Node: notes352824
-Ref: #notes352673
+Ref: #notes352946
-Node: payees353035
+Node: payees353308
-Ref: #payees353150
+Ref: #payees353423
-Node: prices353669
+Node: prices353942
-Ref: #prices353784
+Ref: #prices354057
-Node: print354437
+Node: print354710
-Ref: #print354552
+Ref: #print354825
-Node: print explicitness355528
+Node: print explicitness355801
-Ref: #print-explicitness355671
+Ref: #print-explicitness355944
-Node: print amount style356450
+Node: print amount style356723
-Ref: #print-amount-style356620
+Ref: #print-amount-style356893
-Node: print parseability357690
+Node: print parseability357963
-Ref: #print-parseability357862
+Ref: #print-parseability358135
-Node: print other features358611
+Node: print other features358884
-Ref: #print-other-features358790
+Ref: #print-other-features359063
-Node: print output format359311
+Node: print output format359584
-Ref: #print-output-format359459
+Ref: #print-output-format359732
-Node: register362598
+Node: register362871
-Ref: #register362720
+Ref: #register362993
-Node: Custom register output367751
+Node: Custom register output368024
-Ref: #custom-register-output367882
+Ref: #custom-register-output368155
-Node: rewrite369229
+Node: rewrite369502
-Ref: #rewrite369347
+Ref: #rewrite369620
-Node: Re-write rules in a file371245
+Node: Re-write rules in a file371518
-Ref: #re-write-rules-in-a-file371408
+Ref: #re-write-rules-in-a-file371681
-Node: Diff output format372557
+Node: Diff output format372830
-Ref: #diff-output-format372740
+Ref: #diff-output-format373013
-Node: rewrite vs print --auto373832
+Node: rewrite vs print --auto374105
-Ref: #rewrite-vs.-print---auto373992
+Ref: #rewrite-vs.-print---auto374265
-Node: roi374548
+Node: roi374821
-Ref: #roi374655
+Ref: #roi374928
-Node: Spaces and special characters in --inv and --pnl376467
+Node: Spaces and special characters in --inv and --pnl376740
-Ref: #spaces-and-special-characters-in---inv-and---pnl376707
+Ref: #spaces-and-special-characters-in---inv-and---pnl376980
-Node: Semantics of --inv and --pnl377195
+Node: Semantics of --inv and --pnl377468
-Ref: #semantics-of---inv-and---pnl377434
+Ref: #semantics-of---inv-and---pnl377707
-Node: IRR and TWR explained379284
+Node: IRR and TWR explained379557
-Ref: #irr-and-twr-explained379444
+Ref: #irr-and-twr-explained379717
-Node: stats382697
+Node: stats382970
-Ref: #stats382805
+Ref: #stats383078
-Node: tags384319
+Node: tags384592
-Ref: #tags-1384426
+Ref: #tags-1384699
-Node: test385435
+Node: test385708
-Ref: #test385528
+Ref: #test385801
-Node: PART 5 COMMON TASKS386270
+Node: PART 5 COMMON TASKS386543
-Ref: #part-5-common-tasks386416
+Ref: #part-5-common-tasks386689
-Node: Getting help386714
+Node: Getting help386987
-Ref: #getting-help386855
+Ref: #getting-help387128
-Node: Constructing command lines387615
+Node: Constructing command lines387888
-Ref: #constructing-command-lines387816
+Ref: #constructing-command-lines388089
-Node: Starting a journal file388473
+Node: Starting a journal file388746
-Ref: #starting-a-journal-file388675
+Ref: #starting-a-journal-file388948
-Node: Setting LEDGER_FILE389877
+Node: Setting LEDGER_FILE390150
-Ref: #setting-ledger_file390069
+Ref: #setting-ledger_file390342
-Node: Setting opening balances391026
+Node: Setting opening balances391299
-Ref: #setting-opening-balances391227
+Ref: #setting-opening-balances391500
-Node: Recording transactions394368
+Node: Recording transactions394641
-Ref: #recording-transactions394557
+Ref: #recording-transactions394830
-Node: Reconciling395113
+Node: Reconciling395386
-Ref: #reconciling395265
+Ref: #reconciling395538
-Node: Reporting397522
+Node: Reporting397795
-Ref: #reporting397671
+Ref: #reporting397944
-Node: Migrating to a new file401656
+Node: Migrating to a new file401929
-Ref: #migrating-to-a-new-file401813
+Ref: #migrating-to-a-new-file402086
-Node: BUGS402112
+Node: BUGS402385
-Ref: #bugs402202
+Ref: #bugs402475
-Node: Troubleshooting403081
+Node: Troubleshooting403354
-Ref: #troubleshooting403181
+Ref: #troubleshooting403454
 End Tag Table
--- a/hledger/hledger.txt
+++ b/hledger/hledger.txt
@ -7719,21 +7719,26 @@ PART 4: COMMANDS
       Note you can import from any file format, though CSV files are the most
       common import source, and these docs focus on that case.
-   "Deduplication"
+   Skipping
       import  tries  to  import only the transactions which are new since the
-       last import.  So if your bank's CSV includes the last three  months  of
+       last import, "skipping over" any that it saw last  time.   So  if  your
-       data,  you can download and import it every month (or week, or day) and
+       bank's CSV includes the last three months of data, you can download and
-       only the new transactions will be imported each time.
+       import  it  every month (or week, or day) and only the new transactions
       will be imported each time.
-       It works as follows.  For each imported FILE (usually a CSV file): - It
+       It works as follows.  For each imported FILE:
-       tries to find the latest date seen previously, by  reading  it  from  a
+
-       hidden  .latest.FILE  in the same directory.  - Then it processes FILE,
+       o It tries to find the latest date seen previously, by reading it  from
-       ignoring any transactions on or before the "latest seen" date.
+         a hidden .latest.FILE in the same directory.
       o Then  it  processes  FILE, ignoring any transactions on or before the
         "latest seen" date.
       And after a successful import, it updates the .latest.FILE(s) for  next
       time (unless --dry-run was used).
-       This is simple but fairly effective.  It assumes:
+       This is simple system that works fairly well for transaction data (usu-
       ally CSV, but it could be any of hledger's input formats).  It assumes:
       1. new items always have the newest dates
@ -7749,11 +7754,15 @@ PART 4: COMMANDS
       Note, import avoids reprocessing the same dates across successive runs,
       but it does not detect transactions that are duplicated within a single
-       run.   So eg if you downloaded but did not import bank.1.csv, and later
+       run.  I'll call these "skipping" and "deduplication".
-       downloaded bank.2.csv with overlapping data, you should not import both
+
-       of them in a single run (hledger  import  bank.1.csv  bank.2.csv);  in-
+       So  for  example, say you downloaded but did not import bank.1.csv, and
-       stead,  import  them  one  at  a  time (hledger import bank.1.csv, then
+       later downloaded bank.2.csv with overlapping data.  Then you should not
-       hledger import bank.2.csv).
+       import both of them at once (hledger import bank.1.csv bank.2.csv),  as
       the  overlapping  data would appear twice and not be deduplicated.  In-
       stead, import them one at a time (hledger  import  bank.1.csv;  hledger
       import  bank.2.csv),  and  the  second import will skip the overlapping
       data.
       Normally you can ignore the .latest.* files, but  if  needed,  you  can
       delete them (to make all transactions unseen), or construct/modify them
@ -7763,7 +7772,7 @@ PART 4: COMMANDS
       ring on that date".
       (hledger  print  --new also uses and updates these .latest.* files, but
-       it is not often used.)
+       it is less often used.)
       Related: CSV > Working with CSV > Deduplicating, importing.