;doc: import: edits
This commit is contained in:
parent
061580f0bc
commit
13cf8de1a1
@ -27,9 +27,6 @@ most common import source, and these docs focus on that case.
|
|||||||
So if your bank's CSV includes the last three months of data, you can download and `import` it every month (or week, or day)
|
So if your bank's CSV includes the last three months of data, you can download and `import` it every month (or week, or day)
|
||||||
and only the new transactions will be imported each time.
|
and only the new transactions will be imported each time.
|
||||||
|
|
||||||
This is a particular kind of "deduplication":
|
|
||||||
within each input file, it avoids reprocessing the same dates across successive runs.
|
|
||||||
|
|
||||||
It works as follows: for each imported `FILE`:
|
It works as follows: for each imported `FILE`:
|
||||||
|
|
||||||
- It tries to recall the latest date seen previously, reading it from a hidden `.latest.FILE` in the same directory.
|
- It tries to recall the latest date seen previously, reading it from a hidden `.latest.FILE` in the same directory.
|
||||||
@ -50,12 +47,17 @@ you can reduce the chance of this happening in new transactions by importing mor
|
|||||||
and in old transactions it doesn't matter.
|
and in old transactions it doesn't matter.
|
||||||
And remember you can use CSV rules files as input, which is one way to ensure a stable file name.
|
And remember you can use CSV rules files as input, which is one way to ensure a stable file name.
|
||||||
|
|
||||||
`import` doesn't detect other kinds of duplication, such as duplicate transactions within a single run.
|
Note this is a particular kind of "deduplication":
|
||||||
(In part, because legitimate duplicate transactions can easily occur in real-world data.)
|
avoiding reprocessing the same dates across successive runs.
|
||||||
So, say you downloaded but forgot to import `bank.1.csv`, and a week later you downloaded `bank.2.csv` with overlapping data.
|
`import` doesn't detect other kinds of duplication,
|
||||||
Now you should not import both of these at once (`hledger import bank.1.csv bank.2.csv`);
|
such as the same transaction appearing multiple times within a single run.
|
||||||
the overlapping transactions which appear twice would not be deduplicated since this is considered a single import.
|
This is intentional, because legitimate "duplicates" are fairly common in real-world data.
|
||||||
Instead, import these files one at a time, and also use the same filename each time for a common "latest seen" state:
|
|
||||||
|
Here's a situation where you would need to run `import` the right way to deduplicate.
|
||||||
|
Say you download but forget to import `bank.1.csv`, and a week later you download `bank.2.csv` with some overlapping data.
|
||||||
|
Now you should not process both of these as a single import (`hledger import bank.1.csv bank.2.csv`),
|
||||||
|
because the overlapping transactions would not be deduplicated.
|
||||||
|
Instead you would import one file at a time, using the same filename each time, like so:
|
||||||
|
|
||||||
```cli
|
```cli
|
||||||
$ mv bank.1.csv bank.csv; hledger import bank.csv
|
$ mv bank.1.csv bank.csv; hledger import bank.csv
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user