;doc: update command help

2024-03-24 14:51:25 -10:00 · 2024-03-24 14:51:25 -10:00 · 2889bb6efb
commit 2889bb6efb
parent eb6b94ad5a
1 changed files with 24 additions and 15 deletions
--- a/hledger/Hledger/Cli/Commands/Import.txt
+++ b/hledger/Hledger/Cli/Commands/Import.txt
@ -21,22 +21,27 @@ hledger import bank.csv or perhaps hledger import *.csv.
 Note you can import from any file format, though CSV files are the most
 common import source, and these docs focus on that case.

-"Deduplication"
+Skipping

 import tries to import only the transactions which are new since the
-last import. So if your bank's CSV includes the last three months of
-data, you can download and import it every month (or week, or day) and
-only the new transactions will be imported each time.
+last import, "skipping over" any that it saw last time. So if your
+bank's CSV includes the last three months of data, you can download and
+import it every month (or week, or day) and only the new transactions
+will be imported each time.

-It works as follows. For each imported FILE (usually a CSV file): - It
-tries to find the latest date seen previously, by reading it from a
-hidden .latest.FILE in the same directory. - Then it processes FILE,
-ignoring any transactions on or before the "latest seen" date.
+It works as follows. For each imported FILE:
+
+-   It tries to find the latest date seen previously, by reading it from
+    a hidden .latest.FILE in the same directory.
+-   Then it processes FILE, ignoring any transactions on or before the
+    "latest seen" date.

 And after a successful import, it updates the .latest.FILE(s) for next
 time (unless --dry-run was used).

-This is simple but fairly effective. It assumes:
+This is simple system that works fairly well for transaction data
+(usually CSV, but it could be any of hledger's input formats). It
+assumes:

 1.  new items always have the newest dates
 2.  item dates are stable across successive CSV downloads
@ -49,11 +54,15 @@ by importing more often (and in old transactions it doesn't matter).

 Note, import avoids reprocessing the same dates across successive runs,
 but it does not detect transactions that are duplicated within a single
-run. So eg if you downloaded but did not import bank.1.csv, and later
-downloaded bank.2.csv with overlapping data, you should not import both
-of them in a single run (hledger import bank.1.csv bank.2.csv); instead,
-import them one at a time (hledger import bank.1.csv, then
-hledger import bank.2.csv).
+run. I'll call these "skipping" and "deduplication".
+
+So for example, say you downloaded but did not import bank.1.csv, and
+later downloaded bank.2.csv with overlapping data. Then you should not
+import both of them at once (hledger import bank.1.csv bank.2.csv), as
+the overlapping data would appear twice and not be deduplicated.
+Instead, import them one at a time
+(hledger import bank.1.csv; hledger import bank.2.csv), and the second
+import will skip the overlapping data.

 Normally you can ignore the .latest.* files, but if needed, you can
 delete them (to make all transactions unseen), or construct/modify them
@ -63,7 +72,7 @@ seen transactions up to this date, and this many of them occurring on
 that date".

 (hledger print --new also uses and updates these .latest.* files, but it
-is not often used.)
+is less often used.)

 Related: CSV > Working with CSV > Deduplicating, importing.