proxy70

2023-02-12	AWK replaces LibreCalc

  For the longest  time I used LibreCalc  to manage a list  of books I
  read. Everytime when  I finished a book  (not counting subliterature
  and non-fiction)  I would append  a new entry  to the list  and then
  sort it by  the authors' name. No formulas, no  fancy coloring, just
  plain old cells  containing text. For this simple  purpose using the
  heavy LibreOffice suit[1] is a bit of an overkill so I decided to do
  it in AWK instead.

  I'm really  fascinated with AWK.  It's an ancient and  arcane albeit
  very  powerful  language. For  example  the  following code  removes
  duplicate lines from a file while maintaining order:

    awk '!a[$0]++'

  Mind = blown.  On stackexchange you can find  a thorough explanation
  of  why this  works[2].  I always  imagine  some thick-bearded  UNIX
  hacker/magician  from  the  70s  coming  up with  this  stuff  in  a
  dusty-gray cellar of some US university.

  Back to  my literature list. First  things first I exported  file in
  LibreCalc to a csv-file. The header of this file looks like this

    Autor	Titel	Erscheinungsjahr	Sprache	...

  using TAB as the field separator.  I also got some extra fields with
  annotations  and  catgerories but  not  every  one of  these  fields
  actually contains a value.

  Next I created  an org-file from where I run  my awk-code and output
  the results. To set up AWK  to work with org-mode code blocks[3] you
  have to evaluate the follwing elisp-snippet (C-c C-c with the cursor
  inside the block):

    #+BEGIN_SRC elisp :results none
    (org-babel-do-load-languages
     'org-babel-load-languages
     '((awk . t)(shell . t)))
    #+END_SRC

  I put  this in my  config file. To view  the author, title  and year
  fields  of all  entries,  sorted  by the  authors'  name  I run  the
  follwing code:

    #+BEGIN_SRC awk :in-file literaturliste.csv 
    BEGIN { FS="\t"; OFS="\t" }
    NR>1 { print $1,$2,$3|"sort -t '\t'" }
    #+END_SRC

    #+RESULTS:
    | Abe, Kōbō             | The Woman in the Dunes       | 1962 |
    | Aitmatow, Tschingis   | Der Junge und das Meer       | 1977 |
    | Aitmatow, Tschingis   | Der Weg des Schnitters       | 1963 |
    | Aitmatow, Tschingis   | Djamila                      | 1958 |
    | Apitz, Bruno          | Nackt unter Wölfen           | 1958 |
    | Balzac, Honoré de     | Tante Lisbeth                | 1846 |
    | Balzac, Honoré de     | Vater Goriot                 | 1835 |
    | Bradbury, Ray         | Fahrenheit 451               | 1953 |
    | Brecht, Bertolt       | Der gute Mensch von Sezuan   | 1943 |
    | Brecht, Bertolt       | Leben des Galilei            | 1939 |
    | Brontë, Emily         | Wuthering Heights            | 1847 |
    [...]

  As you  can see the output  is formatted as a  org-table by default.
  Very convenient.

  Sometimes I  wonder what books I  recently read or how  many books I
  read last year. It's easy to check with something like this:

    #+BEGIN_SRC awk :in-file literaturliste.csv
    BEGIN { FS="\t"; OFS="\t" } 
    NR > 1 && length($7) { print $1,$2,$7|"sort -r -t '\t' -k3"}
    #+END_SRC

    #+RESULTS:
    | Zola, Émile         | Das Werk                  | 2023-01-23 |
    | Kawabata, Yasunari  | Tausend Kraniche          | 2023-01-05 |
    | Dostojewski, Fjodor | Schuld und Sühne          | 2022-12-12 |
    | Kawabata, Yasunari  | Snow Country              | 2022-11-27 |
    | Steinbeck, John     | Jenseits von Eden         | 2022-10-31 |
    | Zweig, Stefan       | Schachnovelle             | 2022-07-10 |
    | Balzac, Honoré de   | Tante Lisbeth             | 2022-06-15 |
    | Zola, Émile         | Nana                      | 2022-05-11 |
    | Balzac, Honoré de   | Vater Goriot              | 2022-04-22 |
    | Michener, James A.  | Sayonara                  | 2022-04-05 |
    | Houellebecq, Michel | Vernichten                | 2022-04-02 |
    | Zola, Émile         | Die Sünde des Abbé Mouret | 2022-02-12 |
    | McCarthy, Cormac    | Blood Meridian            | 2022-01-18 |
    [...]

  And finally  I want to view  the most common languages  in which the
  books in my list are written.

    #+BEGIN_SRC awk :in-file literaturliste.csv
    BEGIN { FS="\t"; OFS="\t" }
    NR > 1 { count[$4]++ }
    END { for (lang in count) {
		    print lang,count[lang]|"sort -nrt '\t' -k2"
	    }
    }
    #+END_SRC

    #+RESULTS:
    | Deutsch     | 81 |
    | Englisch    | 40 |
    | Russisch    | 24 |
    | Französisch | 22 |
    | Japanisch   |  5 |
    | Spanisch    |  2 |
    | Italienisch |  2 |

  Of course I  scratched only on the  surface of what you  can do with
  this  type of  setup  but I'll  leave  it at  that.  Have I  already
  mentioned that I really like plain text?




Footnotes
~~~~~~~~~

[1] https://www.libreoffice.org/

[2] https://unix.stackexchange.com/questions/159695/how-does-awk-a0-work

[3] https://orgmode.org/manual/Working-with-Source-Code.html