;doc: update command help

2024-03-24 14:22:37 -10:00 · 2024-03-24 14:22:37 -10:00 · 70b75e4921
commit 70b75e4921
parent be24d6505f
1 changed files with 35 additions and 34 deletions
--- a/hledger/Hledger/Cli/Commands/Import.txt
+++ b/hledger/Hledger/Cli/Commands/Import.txt
@ -21,48 +21,49 @@ hledger import bank.csv or perhaps hledger import *.csv.
 Note you can import from any file format, though CSV files are the most
 common import source, and these docs focus on that case.
-Deduplication
+"Deduplication"
-import does time-based deduplication, to detect only the new
+import tries to import only the transactions which are new since the
-transactions since the last successful import. (This does not mean
+last import. So if your bank's CSV includes the last three months of
-"ignore transactions that look the same", but rather "ignore
+data, you can download and import it every month (or week, or day) and
-transactions that have been seen before".) This is intended for when you
+only the new transactions will be imported each time.
 are periodically importing downloaded data, which may overlap with
 previous downloads. Eg if every week (or every day) you download a
 bank's last three months of CSV data, you can safely run
 hledger import thebank.csv each time and only new transactions will be
 imported.
-Since the items being read (CSV records, eg) often do not come with
+It works as follows. For each imported FILE (usually a CSV file): - It
-unique identifiers, hledger detects new transactions by date, assuming
+tries to find the latest date seen previously, by reading it from a
-that:
+hidden .latest.FILE in the same directory. - Then it processes FILE,
 ignoring any transactions on or before the "latest seen" date.
 And after a successful import, it updates the .latest.FILE(s) for next
 time (unless --dry-run was used).
 This is simple but fairly effective. It assumes:
 1.  new items always have the newest dates
-2.  item dates do not change across reads
+2.  item dates are stable across successive CSV downloads
-3.  and items with the same date remain in the same relative order
+3.  the order of same-date items is stable across CSV downloads
    across reads.
-These are often true of CSV files representing transactions, or true
+These are true of most CSV files representing transactions, or true
-enough so that it works pretty well in practice. 1 is important, but
+enough. If you have a bank whose CSV dates or ordering occasionally
-violations of 2 and 3 amongst the old transactions won't matter (and if
+changes, you can reduce the chance of this happening in new transactions
-you import often, the new transactions will be few, so less likely to be
+by importing more often (and in old transactions it doesn't matter).
 the ones affected).
-hledger remembers the latest date processed in each input file by saving
+Note, import avoids reprocessing the same dates across successive runs,
-a hidden ".latest.FILE" file in FILE's directory (after a succesful
+but it does not detect transactions that are duplicated within a single
-import).
+run. So eg if you downloaded but did not import bank.1.csv, and later
 downloaded bank.2.csv with overlapping data, you should not import both
 of them in a single run (hledger import bank.1.csv bank.2.csv); instead,
 import them one at a time (hledger import bank.1.csv, then
 hledger import bank.2.csv).
-Eg when reading finance/bank.csv, it will look for and update the
+Normally you can ignore the .latest.* files, but if needed, you can
-finance/.latest.bank.csv state file. The format is simple: one or more
+delete them (to make all transactions unseen), or construct/modify them
-lines containing the same ISO-format date (YYYY-MM-DD), meaning "I have
+(to catch up to a certain date). The format is just a single ISO-format
-processed transactions up to this date, and this many of them on that
+date (YYYY-MM-DD), possibly repeated on multiple lines. It means "I have
-date." Normally you won't see or manipulate these state files yourself.
+seen transactions up to this date, and this many of them occurring on
-But if needed, you can delete them to reset the state (making all
+that date".
 transactions "new"), or you can construct them to "catch up" to a
 certain date.
-Note deduplication (and updating of state files) can also be done by
+(hledger print --new also uses and updates these .latest.* files, but it
-print --new, but this is less often used.
+is not often used.)
 Related: CSV > Working with CSV > Deduplicating, importing.