;doc: update command help

2024-03-24 14:51:25 -10:00 · 2024-03-24 14:51:25 -10:00 · 2889bb6efb
commit 2889bb6efb
parent eb6b94ad5a
1 changed files with 24 additions and 15 deletions
--- a/hledger/Hledger/Cli/Commands/Import.txt
+++ b/hledger/Hledger/Cli/Commands/Import.txt
@ -21,22 +21,27 @@ hledger import bank.csv or perhaps hledger import *.csv.
 Note you can import from any file format, though CSV files are the most
 common import source, and these docs focus on that case.
-"Deduplication"
+Skipping
 import tries to import only the transactions which are new since the
-last import. So if your bank's CSV includes the last three months of
+last import, "skipping over" any that it saw last time. So if your
-data, you can download and import it every month (or week, or day) and
+bank's CSV includes the last three months of data, you can download and
-only the new transactions will be imported each time.
+import it every month (or week, or day) and only the new transactions
 will be imported each time.
-It works as follows. For each imported FILE (usually a CSV file): - It
+It works as follows. For each imported FILE:
-tries to find the latest date seen previously, by reading it from a
+
-hidden .latest.FILE in the same directory. - Then it processes FILE,
+-   It tries to find the latest date seen previously, by reading it from
-ignoring any transactions on or before the "latest seen" date.
+    a hidden .latest.FILE in the same directory.
 -   Then it processes FILE, ignoring any transactions on or before the
    "latest seen" date.
 And after a successful import, it updates the .latest.FILE(s) for next
 time (unless --dry-run was used).
-This is simple but fairly effective. It assumes:
+This is simple system that works fairly well for transaction data
 (usually CSV, but it could be any of hledger's input formats). It
 assumes:
 1.  new items always have the newest dates
 2.  item dates are stable across successive CSV downloads
@ -49,11 +54,15 @@ by importing more often (and in old transactions it doesn't matter).
 Note, import avoids reprocessing the same dates across successive runs,
 but it does not detect transactions that are duplicated within a single
-run. So eg if you downloaded but did not import bank.1.csv, and later
+run. I'll call these "skipping" and "deduplication".
-downloaded bank.2.csv with overlapping data, you should not import both
+
-of them in a single run (hledger import bank.1.csv bank.2.csv); instead,
+So for example, say you downloaded but did not import bank.1.csv, and
-import them one at a time (hledger import bank.1.csv, then
+later downloaded bank.2.csv with overlapping data. Then you should not
-hledger import bank.2.csv).
+import both of them at once (hledger import bank.1.csv bank.2.csv), as
 the overlapping data would appear twice and not be deduplicated.
 Instead, import them one at a time
 (hledger import bank.1.csv; hledger import bank.2.csv), and the second
 import will skip the overlapping data.
 Normally you can ignore the .latest.* files, but if needed, you can
 delete them (to make all transactions unseen), or construct/modify them
@ -63,7 +72,7 @@ seen transactions up to this date, and this many of them occurring on
 that date".
 (hledger print --new also uses and updates these .latest.* files, but it
-is not often used.)
+is less often used.)
 Related: CSV > Working with CSV > Deduplicating, importing.