proxy70

TITLE: An R function to fill abbreviated genus names in a list of species
DATE: 2018-10-15
AUTHOR: John L. Godlee
====================================================================


I had a list of species names written up by a colleague, but the
colleague had abbreviated subsequent adjacent instances of a genus
in the list to the first letter of the genus with a dot after it,
which is common in written prose, but is pretty daft in a dataset.

Instead of going through and manually writing in all the genus
names, I wrote a function in R to do it for me:

    fill.genus <- function(x, abbrev = "."){
        rel_enc <- rle(as.character(x))
        empty <- which(grepl("\\.", rel_enc$value))
        rel_enc$values[empty] <- rel_enc$value[empty-1]
        inverse.rle(rel_enc)
    }

So if the dataset looks like this:

    genus <- c("Tapiphyllum",
        "Terminalia",
        "T.",
        "Tortuga",
        "T.",
        "Vangueriopsis",
        "V.",
        "V.",
        "Xeroderris",
        "Xylopia",

The output of fill.genus(genus) would look like:

    c("Tapiphyllum",
        "Terminalia",
        "Terminalia",
        "Tortuga",
        "Tortuga",
        "Vangueriopsis",
        "Vangueriopsis",
        "Vangueriopsis",
        "Xeroderris",
        "Xylopia",