proxy70

TITLE: Converting a bank .csv statement to ledger
DATE: 2018-08-12
AUTHOR: John L. Godlee
====================================================================


I use ledger to keep track of my expenses and for a long time I had 
to manually copy in all my expenses manually from the csv output 
that my bank provides of my transactions. Then recently I read 
about ledger convert which can take a csv file and convert it to 
the ledger journal format.

  [ledger]: https://www.ledger-cli.org/

y bank's csv format looks like this:


    Date, Type, Description, Value, Balance, Account Name, Account 
Number

    10/08/2018,POS,"'1062 09AUG18 C , TESCO STORES 
",-11.68,6355.61,"'STUDENT ACCOUNT","'260204-20408582",
    10/08/2018,POS,"'1062 09AUG18 C , SAINSBURYS S/MKTS , 
GB",-3.80,2367.29,"'STUDENT ACCOUNT","'260204-20408582",
    10/08/2018,POS,"'1062 09AUG18 C , WINES , NTMG 
GB",-17.00,2371.09,"'STUDENT ACCOUNT","'260204-20408582",
    10/08/2018,POS,"'1062 09AUG18 , MOBILE APP  
",-30.00,2388.09,"'STUDENT ACCOUNT","'260204-20408582",

Notice that there are blank lines above and below the header row, 
so that is one of the first things to deal with.

I can fix the file with a few sed commands, noting that I use the 
macOS version of sed rather than gnused:

    sed -i "" '/^[[:space:]]*$/d' $1

    sed -i "" '/\d{4}\s\d{2}\D{3}\d{2}(\sC\s)?\s?,\s/g' $1

    sed -i "" '1s/.*/date,,payee,amount,,,/' $1

The first command removed any lines that are blank or contain 
spaces, which gets rid of the empty header lines. The second line 
removes some useless filler text in the notes column of the csv 
file, so for instance, ,"'1062 09AUG18 C , TESCO STORES ", gets 
contracted to ,"TESCO STORES ",. Finally, the header row is 
replaced with headers which ledger convert recognises.

ledger convert needs some inputs:

-   --input-date-format "%d/%m/%Y" tells ledger convert the format 
of the date column in the csv file.
-   --account assets:bank:student_acc denotes the bank account the 
csv file is for.
-   --rich-data extracts unnamed columns from the csv and adds them 
as notes to each ledger entry.
-   -f ~/.ledger.journal denotes the path to the ledger journal 
file to use for reference.
-   --invert inverts the sign of the transactions, which I have to 
do for this style of csv to stop ledger thinking that expenses are 
incomes and vice versa.

Then I can wrap all of this into a neat shell script which takes 
inputs of the input csv and output ledger journal:

    #!/bin/bash

    touch $2

    sed -i "" '/^[[:space:]]*$/d' $1

    sed -i "" '/\d{4}\s\d{2}\D{3}\d{2}(\sC\s)?\s?,\s/g' $1

    sed -i "" '1s/.*/date,,payee,amount,,,/' $1

    ledger convert $1 --input-date-format "%d/%m/%Y" --account 
assets:bank:student_acc --invert --rich-data -f ~/.ledger.journal > 
$2

For now, I'll copy in the compiled journal entries manually, I 
don't want to accidentally copy over my existing journal file by 
writing directly to it.

It's not a perfect system, I still have to manually fill in the 
type of expense in the ledger journal, but I really don't see any 
way around that as my expenses don't come from a finite list of 
sources, so categorising all of them would be impossible.

Update 2019_08_26

I had some problems with CSV files from one of my accounts. It 
turns out they were being stupid with the CSV formatting and had 
open double quotes which weren't closed in a description field, 
meaning that any commas in that field were interpreted as column 
delimiters. I took it as an excuse to improve the ledger convert 
bash scripts I'd set up previously. I wanted to clean them up and 
use some shell scripting techniques I've learned since last year 
when I wrote the original script.

    #!/bin/bash

    # Make temp. file
    temp=$(mktemp)

    # Format file depending on account
    if [ "$2" = "assets:bank:student" ] || [ "$2" = 
"assets:bank:isa" ] || [ "$2" = "assets:bank:res" ]
    then
        # Prepare file
        ## Remove blank lines | Replace column headers | Remove 
single quotes > send to temp
        sed '/^[[:space:]]*$/d' $1 | sed 
'1s/.*/date,,payee,amount,,,/' | sed "s/'//" > $temp
        date="%d/%m/%Y"

    elif [ "$2" = "assets:bank:monzo" ]
    then
        echo "TEST"
        # Prepare file
        ## Replace column headers | Change tag | Remove time from 
date > send to temp
        sed '1s/.*/transid,date,amount,,,,payee,,,,,/' $1 | sed 
's/general/misc/g' | sed 's/T[0-9][0-9]:[0-9][0-9]:[0-9][0-9]Z//g' 
> $temp
        date="%Y-%m-%d"

    else
        echo "Choose an account"
        echo "  assets:bank:student"
        echo "  assets:bank:isa"
        echo "  assets:bank:res"
        echo "  assets:bank:monzo"
    fi

    # Run ledger
    ledger convert $temp --input-date-format ${date} --account $2 
--invert --rich-data --auto-match -f ~/.ledger.journal

    # Remove temp
    rm ${temp}

I combined the scripts I had for my two banks into one, hinging on 
an if-else statement acting on the account name given as an 
argument to the script. It first cleans up the csv files and saves 
them to a $temp file, then runs ledger convert with certain options 
and sends output to STDOUT.