https://www.etymonline.com/columns/post/who-lusts-for-certainty-lusts-for-lies Etymology [ ] [] Log in Who Lusts for Certainty Lusts for Lies September 21, 2023 at 1:40 pm We need to talk about the Google Ngram Viewer n-grams. They are wrong. [D.R.H.] Who Lusts for Certainty Lusts for Lies Here's the Ngram's idea of the frequency of the word said: [7e45901b5b] It doesn't look like an indicator of the diachronic change in the popularity of a very common English verb during the 20th century. It looks like the temperature graph of the last ice age. Younger people, rest assured that English authors in the 1970s did not all stop using "said" and then start again. Talia Felix and I plow in Google Books every day, researching. It is a marvel of a resource, but we know by experience how ineptly assembled that database is. And how many booby-traps lie hidden in it. I cannot tell you much about how AI works. But I can tell you how AI handles something I know how to handle. Here's another example. "The Great Toast Famine of '77" [ebe7cc61ca] Ngram says toast almost vanishes from the English language by 1980, and then it pops back up. WHY THAT'S WRONG There's a long-documented flaw in the Ngram formula, inherited from Google Books. The error makes a vast number of English words appear to be diminishing in use through the 20th century only to revive around 1980. A rough gist of an explanation for it seems to be that Google Books' corpus is heavily academic. The printed matter Google sucked up from universities had a disproportion of modern scientific and academic journals in it. The articles in those journals and textbooks lean on the same few words (as academics are wont to do when they write). That not only bloats the scores for those few words, it falsely drives down the other words. That creates that mid-20th-century "dip" in the Ngram of almost every word. Said likely appears less often in academic writing than in other writing, such as a novel or a newspaper. But academic papers use words such as, say, graph, a good deal more often. And here's what the Ngram for graph looks like in the 20th century: [2a6e1c82c7] See? No dip. That's just one error. Here's another: If you look at an Ngram for the F-word, you'll see very little use of it until modern times, which is expected. But the number of hits for it jumps up as you go back past about 1820, and keeps rising into the late 1700s (if you could see it). Those are all the word suck, written with the old "long " -s- -- the printer's -s-. It looks like a lower-case -f-, in worn-down fonts on cheap paper in old libraries. The use of that character faded out about 1820. Sometimes only context tells you whether it is an -f- or an -s-. AI has no clue. Here's another: Google Books fails to recognize identity in variant spellings. The Ngram for authorise is different from that for authorize, and neither counts authorizes. Google doesn't count plural forms in the noun Ngrams. It can't tell dog from dogs. Worse, many of Google Books' files are misdated. On a battered library book, an "1896" on the cover page can look like "1800" to a digital scanner. A stack of Bible tracts from the 1910s long appeared in Google Books as published in 1799. That date did appear on all their covers -- on the logo of the Bible tract society that printed them, as the date of its founding. I hardly trust Google Books dated search results to be right five times in a row. We even made a video about it. BUT PEOPLE WANT THEM The text of Etymonline is built entirely from print sources, and is done entirely by human beings. Ngrams are not. They are unreliable, a sloppy product of an ignorant technology, one made to sell and distract, one never taught the difference between "influence" and "inform." Why are they on the site at all? Because now, online, pictures win and words lose. The war is over; they won. Just remember: Ngrams are unreliable. Even if the world now prefers Ngram reality, where the word "said" went into eclipse with Jimmy Carter, you're allowed to be smarter than that. When you see an Ngrams on etymonline or anywhere else, admire it as decorative, whimsical, a gourami tank in a restaurant, abstract art on hotel walls, blueprints for roller-coasters. And where the Ngrams disagree with etymonline on a first date, presume we're right and they're wrong. Share Advertisement[INS::INS] A Word or Two * Who Lusts for Certainty Lusts for Lies September 21, 2023 at 1:40 pm * A Fig for Dates September 07, 2023 at 1:00 pm * SEARCH and RESEARCH August 04, 2023 at 3:38 pm * Homing in on Harlequin July 02, 2023 at 10:55 pm * An Intimate Encounter with Digital Archival Mania May 22, 2023 at 9:29 pm * A * B * C * D * E * F * G * H * I * J * K * L * M * N * O * P * Q * R * S * T * U * V * W * X * Y * Z LINKS ForumFull List of SourcesLinks PRODUCTS iOS AppAndroid AppChrome Extension ABOUT Who Did ThisIntroduction and ExplanationFollow on Facebook SUPPORT Donate with PayPalYe Olde Swag ShoppeSupport on Patreon (c) 2001-2023 Douglas Harper | Terms of Service | Privacy Policy