TITLE: Sort and filter .bib files
DATE: 2020-08-24
AUTHOR: John L. Godlee
====================================================================


I have a master .bib file called lib.bib, which contains all the 
BibTeX citations I've ever used. I use it as an index for 
organising which papers I've read and it means it's easier to reuse 
references for new manuscripts. When I submit a manuscript to a 
journal however, I want to include a standalone .bib file which 
only contains the references for the current manuscript. I wrote a 
little shell script which leverages ripgrep (rg) to create a sorted 
and filtered .bib file:

  [ripgrep (rg)]: https://github.com/BurntSushi/ripgrep

    #!/usr/bin/env sh

    # $1 = .tex file to find citations 
    # $2 = .bib file to grab entries

    match=($(rg -o '\\cite.*?\{.*?\}' $1 | sed -E 
"s/\\\\cite.*?\{(.*)\}/\1/g" | sed 's/,\s\+/\n/g' | sort | uniq))

    for i in "${match[@]}"; do 
        rg -N --color never --multiline --multiline-dotall 
"\{$i.*?^\}" $2
    done 

The script takes two arguments, the first is a .tex file which is 
searched to find all instances or \citep{.*}, \citealt{.*} etc., 
the second is a .bib file which is used to grab the references 
found in the .tex file.

Going line by line, first I create an array variable which contains 
all the BibTeX reference keys from the .tex file. It sorts these 
entries alphabetically and removes duplicates to create a tidy 
list. Next I loop over the array variable and search for each 
BibTeX entry in the .bib file, which is then printed to stdout so 
the user can do what they want with it, usually send to a new .bib 
file.