[HN Gopher] Who lusts for certainty lusts for lies
___________________________________________________________________
 
Who lusts for certainty lusts for lies
 
Author : hprotagonist
Score  : 349 points
Date   : 2023-09-26 10:50 UTC (12 hours ago)
 
web link (www.etymonline.com)
w3m dump (www.etymonline.com)
 
| laura_g wrote:
| What is it specifically about the 1970/80s that causes this dip?
| Was there an explosion of this academic writing around that era
| or something else to have this effect?
 
| thfuran wrote:
| That or maths. Though I seem to recall a quote about
| statistics...
 
  | [deleted]
 
  | hprotagonist wrote:
  | in the case of ngrams, both!
 
    | thfuran wrote:
    | Yes, I think (as the article says) using ngrams can easily
    | land you in the camp of telling lies with statistics.
 
| tensor wrote:
| The authors assert that the ngram statistics for "said" are
| wrong, and imply that they have evidence of the contrary, but
| they don't provide the evidence. Looking at their own website,
| all they provide is google ngram statistics:
| https://www.etymonline.com/word/said#etymonline_v_25922.
| 
| This coupled with the huge failing of not displaying zero on the
| y-axis of their graph, and even _interpreting_ the bad graph
| wrong, makes me not believe them at all. A very low quality
| article.
 
  | coldtea wrote:
  | A low effort comment. That "said" haven't declined and raised
  | the way shown isn't what needs evidence.
  | 
  | It's the extraordinary claim that it has that does.
  | 
  | That claim is Google's, and before accusing the author of the
  | blog, maybe how representative their unseen dataset is. Should
  | we take statistics with no knowledge of their input set at face
  | value because "trust Google"?
 
    | tensor wrote:
    | Google isn't claiming any such statement. It's merely
    | providing fun statistics based on their data set. With that
    | context, when I read a headline claiming that the statistics
    | are "wrong," it would imply that the counts are somehow off.
    | Maybe due to a bug in the algorithm or the like.
    | 
    | Instead, we get a strawman put up where they misrepresent
    | what the data set is, make up things that its "claiming,"
    | fail to investigate the underlying data sources and look into
    | "why" they see the trend they see, and also fail to provide
    | any alternative data.
    | 
    | It's cheap and snobby grandstanding, ironically complete with
    | faulty interpretations of the little data they DO present.
 
      | mattigames wrote:
      | But Google is claiming such thing by calling it "trends",
      | which the dictionary defines as "a general direction in
      | which something is developing or changing.", if they didn't
      | want to create such misunderstandings they would just call
      | it "word frequency on Google books" so the biases of the
      | data would be a lot more clear.
 
  | prepend wrote:
  | It's hard to present evidence because there's only one source.
  | So the article basically calls out flaws in the methodology of
  | Google Books/Ngram.
  | 
  | I think this is reasonable. As otherwise we end up accepting
  | things that exist solely, but are flawed. Just because
  | something exists and is easy to use doesn't mean it's right.
  | 
  | Just like the answer to "the most tweeted thing is X therefore
  | it is most popular and important" does not require a separate
  | study to find the truth. It's acceptable just to say "this is a
  | stupid methodology, don't accept it just because that's what
  | twitter says."
 
  | lolc wrote:
  | A decline to half the usage of "said" within 6 decades,
  | followed by a recovery to the previous level within two
  | decades? Show me evidence that the English language changed so
  | fast in that way. It's extraordinary and you'd have to bring
  | something convincing. Otherwise I believe their hypothesis and
  | their conclusion that ngrams are bunk.
  | 
  | Yeah they interpreted the "toast" graph wrong. They should be
  | more careful to read shitty graphs that cut off at the low
  | point.
 
    | pixelesque wrote:
    | It's possible (but I think unlikely) it could be somewhat due
    | to different usage of words than the English language
    | changing completely (which clearly didn't happen).
    | 
    | i.e. maybe instead of lots of books having direct text like
    | "David said" or "Dora said", over time there was a trend to
    | use a different more varied/descriptive way of describing
    | that, i.e. "David replied" or "Dora retorted"?
 
      | lolc wrote:
      | Yea there may be a shift in usage hidden in those numbers.
      | As this article laments, we can't use ngrams to measure the
      | develpment of usage between said, replied, and retorted.
 
    | tensor wrote:
    | It depends entirely on what the data set is, and to conclude
    | that it's "wrong" you'd have to consider the underlying data
    | too. Google ngrams makes no claim to be a consistent
    | benchmark type data set. Over time the content its based on
    | shifts, which can cause effects like this.
    | 
    | To make any sort of claim like "this word's usage changes
    | over time" in an academic sense you'd need to include a
    | discussion of the data sources you used and why those are
    | representative of word usage over time. The fact that they'd
    | even try to use google ngrams in this way shows how little
    | they actually researched the topic.
    | 
    | Google ngrams is a cute data set that can sometimes show
    | rough trends, but it's not some "authoritative source on
    | usage over time" and it doesn't claim to be.
    | 
    | The authors, on the other hand, are claiming to be
    | authoritative and thus the burden of evidence on their claims
    | is far far far higher. I didn't even get into their
    | completely unobjective and vague accusations of "AI" somehow
    | doing something bad. Ngrams don't involve AI, it's simple
    | word counting.
 
      | lolc wrote:
      | The way I read it, the article was a rant about how people
      | shouldn't be using ngrams to prove things.
 
  | lolinder wrote:
  | EtymOnline isn't in the business of tracking shifts in the
  | popularity of words over time, they set out to track shifts in
  | _meaning_. So it 's understandable that they don't have any
  | specific contrary evidence in their listing for "said".
  | 
  | As for why they don't include the evidence in TFA, as others
  | have noted, it's the extraordinary claim that "said" dropped to
  | nearly 1/3 of its peak usage that needs extraordinary evidence
  | backing it up. It's plenty sufficient for them to say "this
  | doesn't make any sense at all on its face, and is most likely
  | due to a major shift in the genre makeup of Google's dataset".
 
  | wrsh07 wrote:
  | I think what you want is for someone (yourself, me, the author)
  | to review newspapers or some similar source and determine how
  | the frequency percent changes over time for the word "said".
  | 
  | This is a reasonable request, but I also think it's fine for
  | the author to state it _as an expert_ that newspapers continued
  | using said at a similar frequency. The story they tell us
  | plausible, and I don't really think the burden of proof is on
  | them.
 
| vlz wrote:
| While the point made by the authors is certainly a valid one,
| it's a bit sneaky and not very fitting to their overall message
| that they have the Y-axes on the ngram graphs not 0-indexed. This
| makes the google results seem more extreme than they in fact are
| and is a bit of misdirection in itself.
| 
| Compare e.g. to the actual ngram viewer which seems to index by 0
| per default:
| 
| https://books.google.com/ngrams/graph?content=said&year_star...
| 
| https://books.google.com/ngrams/graph?content=said&year_star...
 
  | boxed wrote:
  | Such a shame too as the point would be equally valid without
  | the graph-lies.
 
    | chefandy wrote:
    | Kind of. The author could fix a lot of their problems with
    | the very prominent dropdown above the graph letting them
    | select the collection-- English fiction for example. The long
    | s character can be tricky for OCR, but is not likely relevant
    | to most people's casual use of the tool. I worked on a team
    | that overcame it in a high volume scanning project so they
    | should be able to correct that with software and their
    | existing page images. The plurals criticism is just wrong--
    | you can even do case sensitive searches.
    | 
    | It's not perfect, but it's not useless, and it's not a
    | "lie"-- it's just a blunt instrument. Even if the criticism
    | was factually correct, 'proving' that you can't do fine work
    | with blunt instrument is of dubious value.
    | 
    | I think a lot of folks around here are super thirsty to see
    | big tech companies get zinged and when it happens, their fact
    | checking skills suffer.
 
  | [deleted]
 
| stefantalpalaru wrote:
| [dead]
 
| nerdponx wrote:
| This is the fundamental problem of data analysis: your analysis
| is only as good as your data.
| 
| This is not an easy problem.
| 
| It's hard in general to evaluate data quality: How do we know
| when our data is good? Are we sure? How do we measure that and
| report on it?
| 
| If we do have some qualitative or quantitative assessment of data
| quality, how do we present it in a way that is integrated with
| the results of our analysis?
| 
| And if we want to quantitatively adjust our results for data
| quality, how do we do that?
| 
| There are answers to the above, but they lie beyond the realm of
| a simple line chart, and they tend to require a fair amount of
| custom effort for each project.
| 
| For example in the Google Ngrams case, one could present the data
| quality information on a chart showing the composition of data
| sources over time, broken out into broad categories like
| "academic" and "news". But then you have to assign categories to
| all those documents, which might be easy or hard depending on how
| they were obtained. And then you also have to post a link to that
| chart somewhere very prominently, so that people actually look at
| it, and maybe include some explanatory disclaimer text. That
| would help, but it's not going to prevent the intuitive reaction
| when a human looks at a time series of word usage declining.
| 
| Maybe a better option is to try to quantify the uncertainty in
| the word usage time series and overlay that on the chart. There
| are well-established visualization techniques for doing this. but
| how do we quantify uncertainty in word usage? In this case, our
| count of usages is exact: the only uncertainty is uncertainty
| related to sampling. In order to quantify uncertainty, we must
| estimate how much our sample of documents deviates from all
| documents written at that time. It might be doable, but it
| doesn't sound easy. And once we have that done, will people
| actually interpret that uncertainty overlay correctly? Or will
| they just look at the line going down and ignore the rest?
| 
| Your analysis is only as good as your data. This has been a
| fundamental problem for as long as we have been trying to analyze
| data, and it's never going to go away. We would do well to
| remember this as we move into the "AI age".
| 
| It also says something about us as well: throughout our lives, we
| learn from data. We observe and consider and form opinions. How
| good it is the data that we have observed? Are our conclusions
| valid?
 
| gcanyon wrote:
| From the comments on that page: "Do publishers still order many
| carloads of "is" each year during spring thaw..."
| 
| In Dictionopolis they do! Any Phantom Tollbooth peeps here?
| 
| https://en.wikipedia.org/wiki/The_Phantom_Tollbooth
 
| gitgud wrote:
| Reminds me of a feeling I had when solving a jigsaw puzzle:
| 
|  _Everything must fit together to reveal the big picture!_ ...
| 
| In reality things almost _never_ fit together to reveal some big
| picture... so trying to make them fit like puzzle pieces often
| leads to false conclusions
 
| digitalsushi wrote:
| When a measure (certainty) becomes a target, it ceases to be a
| good measure (lies)
 
| gniv wrote:
| BTW, that glyph should have a small bar on the left, but I don't
| see it in the article (in Chrome on Mac).
| 
| https://www.compart.com/en/unicode/U+017F (that looks more like
| an s)
| 
| Edit: But I see it in fixed-width font:                   s
 
  | bradrn wrote:
  | > that glyph should have a small bar on the left
  | 
  | It depends on the typeface. My browser's fixed-width font, for
  | instance, doesn't display a bar.
 
| brightball wrote:
| "Only a fool is sure if anything, the wise man is always
| guessing." - MacGuyver
 
| dotsam wrote:
| > It doesn't look like an indicator of the diachronic change in
| the popularity...
| 
| I thought all change is diachronic.*
| 
| I looked it up and found out that 'diachrony' is a term of art in
| linguistic analysis, contrasting with synchronic analysis.
| 
| https://en.wikipedia.org/wiki/Diachrony_and_synchrony
| 
| *Edit: I initially thought that saying 'diachronic change' was
| like saying 'three-sided triangle'. But thinking about it, I
| suppose things do change in space but not time, e.g 'the pattern
| changes abruptly'
 
| robertlagrant wrote:
| > Who Lusts for Certainty Lusts for Lies
| 
| Well, maybe[0].
| 
| [0] with thanks to https://xkcd.com/552
 
| diogenes4 wrote:
| At this point I'm waiting for data to show up validating that
| google ngrams has use.
 
| taeric wrote:
| Is this that the n-grams are wrong, or that they are limited in
| what you can do/say with them? I find the data fun, but I'm not
| entirely sure what to make of it. You will be doing a query on
| past books on today's lexicon. Which just feels wrong.
| 
| As an easy example that I know, if you search for "the", you will
| not find a lot of hits. Which, is mostly fair, as historically we
| know that "th" dropped off around the 1400s. That said, add in
| "ye" and you see a ton of its use.
| 
| Is that an intentional feature of n-grams? Feels more like an
| encoding mistake passed down through the ages. Would be like
| getting upset at the great vowel shift and not realizing that our
| phonetic symbols are not static universal truths.
 
| bluetomcat wrote:
| You can never construct a representative image of the past. You
| are operating with a limited amount of sources which have
| survived in one form or another. They are not evenly distributed
| across time and space. There is an inherent "data loss" problem
| when a person dies - gone are all the impressions, unwritten
| experiences, familiar smells. Even a living person's memory may
| not be reliable at one point.
 
  | psychoslave wrote:
  | That's why I always found so strange that only those with
  | fame/wealth distorted social representations ends up with a
  | Wikipedia biography.
 
    | not_knuth wrote:
    | Wikipedia is not meant to be an archive of _all_ information.
    | It 's meant to be an encyclopedia of things that are
    | _notable_ [1], which is probably where the confusion comes
    | from.
    | 
    | As you can imagine, the topic of what notability is, has been
    | discussed at length since Wikipedia's inception [2].
    | 
    | [1] Notability according to Wikipedia
    | https://en.wikipedia.org/wiki/Wikipedia:Notability
    | 
    | [2] Oldest Wikipedia talk comments I could find on Notability
    | https://en.m.wikipedia.org/w/index.php?title=Special:History.
    | ..
 
  | pintxo wrote:
  | At one point? Human memory is surprisingly unreliable.
  | 
  | One example to test for yourself:
  | https://youtu.be/vJG698U2Mvo?si=16fwk8wG8Yyhim5t
 
    | psychoslave wrote:
    | That is not even memory bias here.
    | 
    | Sure, what you pay attention to will impact what you
    | remember, but this experience goes further and show how your
    | attention can be manipulated to be blind to ploted events.
 
      | Miraltar wrote:
      | Exact but the point is still valid. The Mandela Effect is a
      | great example of it.
 
    | ongy wrote:
    | Serious question
    | 
    | Are you supposed to not see the gorilla? I assumed it's the
    | trap and there's some slightly less obvious catch in there.
 
| djha-skin wrote:
| The best part of this article is perhaps the following critique
| of ngrams and by extension their popular use in modern
| algorithms:
| 
| > The text of Etymonline is built entirely from print sources,
| and is done entirely by human beings. Ngrams are not. They are
| unreliable, a sloppy product of an ignorant technology, one made
| to sell and distract, _one never taught the difference between
| "influence" and "inform."_
| 
| > Why are they on the site at all? Because now, online, _pictures
| win and words lose_. The war is over; they won.
| 
|  _One never taught the difference between "influence" and
| "inform"._ What a scathing rebuke of our modern world and the
| social media that is part of it. Algorithms that attempt to
| quantify human speech and interaction and get it wrong most of
| the time in their quest to maximize their owner's profits.
| 
| This somber warning is especially poignant in an age more and
| more ruled by generative AI, which I'm told is essentially an
| ngram predictor.
 
  | acyou wrote:
  | Influence and inform are two sides of the same moral coin,
  | where we claim others ideas aren't their own, whereas we are
  | the virtuous informed ones who draw our own conclusions.
  | 
  | The low-pass filter of the mind only allows in what fits
  | somewhere inside the existing framework. If you don't reject
  | something, then being informed by it and being influenced by it
  | are the same thing. In that framework, people who claim to be
  | informed come off as high and mighty and a little lacking in
  | self consciousness.
 
    | gpderetta wrote:
    | I inform, you influence, he propagandizes.
 
  | thrdbndndn wrote:
  | > The text of Etymonline is built entirely from print sources,
  | and is done entirely by human beings. Ngrams are not.
  | 
  | I'm confused about this part actually. I assume by "entirely
  | from print sources" it means it does not include digital
  | sources? That doesn't sound very relevant to the issues
  | mentioned in the article though: unless it uses the "complete"
  | set of _all_ print source, it totally could have the same
  | skewed-dataset issues too; and humans can make the same mistake
  | as OCR does.
 
    | sudobash1 wrote:
    | Etymonline compiles the information on etymology and
    | historical usage from printed books (eg the Oxford English
    | Dictionary). That is what is being referred to here. They are
    | not having humans tally up different words from books. That
    | data is entirely from ngrams.
 
| crazygringo wrote:
| The n-grams aren't _wrong_ , but it is a real problem that the
| underlying corpus distribution changes massively over time (in
| this case, proportion of academic vs. non-academic works).
| 
| This is a really devilish problem with no easy answer.
| 
| Because on the one hand, it's certainly easy enough to normalize
| by genre -- e.g. fix academic works at 20%, popular magazines at
| 20%, fiction books at 40%, and so forth.
| 
| But the problem is that the popularity of genres changes over
| time separately in terms of supply and demand, as well as
| consumption of printed material overall. Fiction written might
| increase while fiction consumed might decrease. Or the
| consumption of books might decrease as television consumption
| increases.
| 
| So there isn't any objectively "right" answer at all.
| 
| But it would be nice if Google allowed you to plot popularity _by
| genre_ -- I think that would help a lot in terms of determining
| where and how words become more or less common.
 
| hyperific wrote:
| It seems to me that Google Ngram isn't _wrong_. It 's reporting
| statistics on the words it correctly identified in the corpus.
| The problem is the context of the statistics. You may somewhat
| confidently say the word "said" dips in usage at such and such
| time _in the Google Books corpus_. You can more confidently say
| it dips at such and such time for the subset of the corpus for
| which OCR correctly identified every instance of the word. But
| you can 't make claims in a broader context like "this word
| dipped in usage at such and such time" without having sufficient
| data.
 
  | dredmorbius wrote:
  | And this is why _sampling methodology_ is so much more vastly
  | important in drawing inferential population statistics than
  | _sample size_.
  | 
  | Sample 1 million books from an academic corpus, and you'll turn
  | up a very different linguistic corpus than selecting the ten
  | best-selling books for each decade of the 20th century.
 
  | gmd63 wrote:
  | Just as "it depends" is a meme for economists, "need more data"
  | is the galaxy-brain statistician meme.
  | 
  | Until you've solved the grand unified theory, you can never be
  | fully confident in the completeness of your data or statistical
  | inferences.
  | 
  | What's wrong is misleading the public away from this
  | understanding.
 
| thomasfromcdnjs wrote:
| Does this criticism of ngrams also translates to keyword trends
| when considering SEO/SEM?
 
| andrewflnr wrote:
| The title is true for a lot more areas of life than linguistics.
| There are no shortcuts to truth, DVD anyone who tries to offer
| you one is probably trying to sell you something.
 
  | madsbuch wrote:
  | The title is about certainty and not truth.
  | 
  | > Who Lusts for Certainty Lusts for Lies
  | 
  | I think this is one of the one-liners that sound good, but is
  | bogus at closer inspection.
  | 
  | That articles talks about history. In that context it might
  | make sense as it is hard to say something with certainty.
  | 
  | But in every speech I can say things with certainty without
  | lying.
  | 
  | If we furthermore drag the word certainty out of a philosophers
  | grip and apply a layman meaning to it, then many things are
  | certain as the word can also mean commitment.
 
    | RockyMcNuts wrote:
    | Who demands certainty demands bullshit would be more
    | accurate.
 
    | Delk wrote:
    | I don't think it's bogus.
    | 
    | I've seen people who strongly crave for (a feeling of)
    | certainty prefer simplified categorizations and false
    | absolutes to complexity that doesn't offer absolute certainty
    | and discrete clarity.
    | 
    | Similarly, some things aren't readily quantifiable, and in
    | some cases any quantification might be a great
    | oversimplification at best. In those cases wanting a
    | quantified and measurable answer instead of a more complex
    | answer with less (of a feeling of) certainty can amount to
    | wanting a lie. Or at least to wanting an answer that feels a
    | lot more certain and true than it actually is.
    | 
    | I think that's what the post is about.
    | 
    | Of course the title isn't absolutely true either. Of course
    | you can say and find things that are true and (to a good
    | approximation) certain. But that's not really what the post
    | or its title are trying to say.
 
    | speak_plainly wrote:
    | There's an entire field of study dedicated to these puzzles:
    | epistemology.
    | 
    | https://plato.stanford.edu/entries/certainty/
 
    | AnimalMuppet wrote:
    | In every speech you can say _some_ things with certainty
    | without lying.
    | 
    | But I think the point of the saying is in the other
    | direction. If you are _listening_ to a speech, the things
    | that the speaker can say with certainty may not be the ones
    | where you want certainty. And if you demand certainty on
    | those things, you will find those who will give it to you.
    | But the certainty itself is a lie - that 's why the speaker
    | can't (honestly) say those things with certainty.
    | 
    | What is the optimum political program for the United States?
    | There are plenty of people willing tell you with (apparent)
    | certainty what the answer is. The truth is that nobody knows
    | with certainty, and so the answers that sound certain are
    | lies. The actual program may be correct - _may_ be - but the
    | certainty itself is a lie.
    | 
    | This is often true in linguistics, and history, and politics,
    | and economics. Don't demand certainty where there is none.
 
  | ta8645 wrote:
  | This hits close to home with all the appeals to authority over
  | the last few years. With absolute confidence they were holders
  | of the truth, "trust the science!".
 
    | andrewflnr wrote:
    | Kinda, but most of the anti-scientific bullshit out there is
    | a symptom of precisely this phenomenon. _Actual_ science
    | cannot offer absolute certainly, so people reach for whatever
    | alternate theory offers the feeling of certainty. Blind faith
    | in  "the science" kind of works, and even gets pretty decent
    | practical results, but you know what's structurally really
    | hard to disprove and thus amenable to feeling certain?
    | Conspiracy theories!
 
      | ta8645 wrote:
      | > Conspiracy theories!
      | 
      | I hear what you're saying. In the end, we have to believe
      | _something_ -- on less than perfect information.
      | 
      | But understanding human nature, isn't a conspiracy theory.
      | And accepting obviously overreaching statements of "fact",
      | that literally nobody had the data to state unequivocally,
      | is not following the science.
      | 
      | It wasn't so long ago, that most people understood big
      | pharma was a profit seeking machine, that wasn't primarily
      | motivated by what is best for humanity. Overstating the
      | risks of Covid, and pretending that we faced an existential
      | threat, made everyone forget that truth, and
      | unquestioningly believe that only the purest of intentions
      | motivated the industrial/media response.
 
  | gilleain wrote:
  | What does "DVD anyone" mean?
  | 
  | (Perhaps a roundabout way to say "Make obsolete", as a way to
  | say "Get rid of"?)
 
    | mancerayder wrote:
    | I just can't CD what that means either.
 
      | Tactician_mark wrote:
      | It's a Blu-ray mystery to me.
 
        | psychoslave wrote:
        | It fades away vinyl from my ens.
        | 
        | https://en.wiktionary.org/wiki/ens
 
        | compiler-devel wrote:
        | The redditification of HN is sad. With reddit de facto
        | purging third-party apps with increased API prices, we
        | now see reddit-tier conversations spamming message boards
        | like HN.
 
        | sk0g wrote:
        | https://news.ycombinator.com/newsguidelines.html
        | 
        | > Please don't post comments saying that HN is turning
        | into Reddit. It's a semi-noob illusion, as old as the
        | hills.
 
        | decremental wrote:
        | [dead]
 
    | thechao wrote:
    | Typo insertion where the autocorrect hallucinates a word?
    | Happens to me sometimes...
 
      | andrewflnr wrote:
      | This. Sorry everyone.
 
      | adrianmonk wrote:
      | It's probably supposed to be "and" instead of "DVD". Both
      | words have a similar shape on the keyboard, especially if
      | you're doing swipe-style smartphone keyboard input.
 
| cainxinth wrote:
| Agnostics have been saying this for years (jk... sorta).
 
  | guardian5x wrote:
  | You are not wrong there. This title could also be an article
  | about atheism and religion.
 
  | lvass wrote:
  | Surely you meant to write agnostics.
 
    | cainxinth wrote:
    | Corrected it
 
| ttoinou wrote:
| The y-axis do not start at zero. So basically the author doesnt
| know how to read a graph.. what am I missing ?
 
| dahart wrote:
| > Ngram says toast almost vanishes from the English language by
| 1980, and then it pops back up.
| 
| The Ngram plot does not say that. It shows usage dropping ~40%
| (since 1800). It's indeed a problem that the graph Y axis doesn't
| go to zero, as others have pointed out. But did the etymonline
| authors really not notice this before declaring incorrectly what
| it says? I would find that hard to believe (especially
| considering the subsequent "see, no dip" example that has a zero
| Y and a small but visible plateau around 1980), and it's ironic
| considering the hyperbolic and accusatory title and and opening
| sentence.
 
  | lolinder wrote:
  | The graph axis isn't the only problem. The word "toast" did not
  | drop in usage by 40%, Google's dataset shifted dramatically
  | towards a different genre than it was composed of previously.
  | I've been in conversations with people trying to explain those
  | drops in the 70s, and no one (myself included) realized that it
  | was such a dramatic flaw in the data.
 
    | bee_rider wrote:
    | Is there no way to filter out particular data sets? This
    | seems like a pretty huge limitation.
 
    | dahart wrote:
    | That's fair, the article has a very valid point, which would
    | be made even stronger without the misreading of the plots
    | they're critiquing, whether it was accidental or intentional.
    | I always thought Ngrams were weird too, I remember in the
    | past thinking some of the dramatic shifts it shows were
    | unlikely.
 
| tantalor wrote:
| Why the title change?
| 
| Title on the site is "Who Lusts for Certainty Lusts for Lies"
| 
| Title here is "Google Ngram Viewer n-grams are wrong"
 
  | 0xfae wrote:
  | HN in general doesn't like "editorialized" titles. HN titles
  | are meant to be a factual representation of what you are going
  | read without the attention grabbing (albeit clever) title.
 
    | tantalor wrote:
    | Er no.
    | 
    | > Otherwise please use the original title, unless it is
    | misleading or linkbait; don't editorialize.
    | 
    | The "don't editorialize" guideline is meant for the
    | _submitter_ to not change the the title to make some point.
    | 
    | The site can & should use whatever title it wants. So be it
    | if they want to editorialize. That's their prerogative.
 
      | dredmorbius wrote:
      | Both your and GP comment are inaccurate and/or unclear.
      | 
      | HN _prefers_ but does not _require_ the original title.
      | 
      | HN _does not permit_ submitter editorialising.
      | 
      | Where the original title is clickbait, _which may include
      | editorialising_ , HN requests that submitters change the
      | title, if at all possible to some phrase within the
      | article.
      | 
      | Another de facto rule concerns "title fever", which is when
      | a title is so distracting that it overwhelms the content of
      | the article in discussion.
      | 
      | From the guidelines:
      | 
      |  _If the title includes the name of the site, please take
      | it out, because the site name will be displayed after the
      | link._
      | 
      |  _If the title contains a gratuitous number or number +
      | adjective, we 'd appreciate it if you'd crop it. E.g.
      | translate "10 Ways To Do X" to "How To Do X," and "14
      | Amazing Ys" to "Ys." Exception: when the number is
      | meaningful, e.g. "The 5 Platonic Solids."_
      | 
      |  _Otherwise please use the original title,_ unless it is
      | misleading or linkbait; _don 't editorialize._
      | 
      | 
      | 
      | Some of dang's comments on the issue:
      | 
      | - On changing original title (from yesterday, and NPR to
      | boot): .
      | Also: 
      | 
      | - On substituting a phrase from the article: 
      | 
      | - On submitter editorialising:
      | 
      | 
      | 
      | - Distracting titles:
      | .
      | Particularly cases where "the thread will lose its mind":
      | 
      | 
      | - "Title fever": (Beginning 4 'graphs in)
      | 
 
| AugustoCAS wrote:
| I'm going to use that title on the next conversations I have
| about estimates, in particular in the context of 'we need to know
| that this piece of work will be started in 4 months and finished
| in 8'. Those conversations definitely suck for me.
 
  | js8 wrote:
  | Though you should also remember "who lusts for promotion lusts
  | for telling lies".
 
  | CapitalistCartr wrote:
  | Only one goal can be first. If you want to set absolute dates,
  | all other requirements must be subordinate to that. In which
  | case, sure, we can absolutely meet it.
 
    | ChrisMarshallNY wrote:
    | There's that classic poster that you see in almost every auto
    | mechanic's shop.                   Good         Fast
    | Cheap              Pick 2
 
      | nuancebydefault wrote:
      | Not so rarely, you even need to settle for picking 1
 
  | jklinger410 wrote:
  | This title is an absolute banger
 
  | [deleted]
 
  | d-lisp wrote:
  | [flagged]
 
  | gascoigne wrote:
  | Surely if you have story pointed and T-shirt sized your epics
  | correctly that shouldn't be difficult? /s
 
  | dumbfounder wrote:
  | This guy sucks.
 
    | [deleted]
 
  | fenomas wrote:
  | And boo, incidentally, to whomever changed the HN title - from
  | the most memorably evocative title this site has ever seen to
  | one of the blandest.
 
    | etrevino wrote:
    | What was it? I arrived too late.
 
      | fenomas wrote:
      | Sorry, HN previously had TFA's actual title - "Who Lusts
      | for Certainty Lusts for Lies".
 
        | scubbo wrote:
        | I, uhhhh.....I would like to know what TFA is meant to
        | stand for, because I assume it is not "the sucking
        | article", but that was my first thought. Maybe
        | "featured"? Google is only giving me "Teach For America"
        | or "Trade Facilitation Agreement".
 
        | klyrs wrote:
        | Does "fornicating" sound more polite to you?
 
        | iudqnolq wrote:
        | it is the fucking article. or "featured" if you're
        | feeling classy.
 
        | mjochim wrote:
        | I like to read it as The Fine Article.
 
        | idrios wrote:
        | This is the kind of question that doesnt need to be
        | answered with certainty. "The fucking article" is
        | definitely the most fun interpretation of "TFA".
 
        | etrevino wrote:
        | lol, that's pretty good, I agree with you.
 
        | djsavvy wrote:
        | Looks like it's been changed back! What was the "bland"
        | title in the middle?
 
        | Intralexical wrote:
        | "Google Ngram Viewer n-grams are wrong".
 
    | [deleted]
 
    | dahart wrote:
    | The article title is certainly provocative, yes, and that's
    | the problem. Do you want clickbait titles? The article's
    | title is a combination of a platitude, an inaccurate and/or
    | irrelevant statement, and an implied inflammatory accusation.
    | Swapping the title for the more accurate more informational
    | less provocative first line is much better for me, but maybe
    | true that not flinging around the word "lies" could result in
    | fewer clicks.
 
      | fenomas wrote:
      | I don't think "Ngrams are wrong" is what TFA is about. The
      | author isn't an expert on Ngrams and he's not sharing any
      | new information about them; what he's really talking about
      | is how data about language is unreliable, and why Ngram
      | images are on his site even though he knows they're flawed.
      | Personally, I found the original title truer to the article
      | than the current one.
 
      | zem wrote:
      | the word "clickbait" is flung around way too readily these
      | days. a good title is _supposed_ to make you want to read
      | the article, and at its best it is an artistic flourish
      | that enhances the overall piece. and personally, i love
      | that. i enjoy seeing how writers (or editors) come up with
      | good titles, and the fun and interesting ways they relate
      | to the text of the piece. i enjoy when the title is clearly
      | an allusion or reference to something, and chasing it down
      | leads me to learn something new. and i even enjoy when the
      | title is just a pun or play on words, because writers live
      | for moments like that :)
      | 
      | in this case i definitely felt "wow, that's an interesting
      | quote, and i can see what they are getting at. let's read
      | the article to see how it's substantiated or used as a
      | springboard".
      | 
      | clickbait is more "we have some amazing!!!!! information to
      | tell you but to find out what you will have to read the
      | article", e.g. the classic listicle format "10 things we
      | imagined a beowulf cluster of - number 4 will shock you!",
      | the spammy "one weird trick doctors don't want you to know"
      | or the tabloid "john brown's shocking affair!". and yes,
      | that sort of thing is a plague on the internet and i would
      | not like to see more of it, but also that is not what is
      | going on here.
 
    | ComputerGuru wrote:
    | I personally feel like more people will click with this new
    | title. The old one was far too vague and ambiguous for a news
    | aggregation site. I thought the old title would be about
    | scientific papers and trying too hard to get definitive
    | answers out of them.
 
      | dredmorbius wrote:
      | The title and site reward those who'd click through on the
      | original rather than the bland substitute.
 
      | fenomas wrote:
      | Horses for courses, but to me the original title was the
      | forest and the stuff about Ngrams was the trees. As such I
      | found TFA interesting, even though I have no interest in
      | Ngrams or whether they're correct (which is why I
      | definitely would not have clicked on the current title).
 
        | setgree wrote:
        | adding "horses for courses" to my lexicon, TY :)
 
  | 1970-01-01 wrote:
  | At first glance, I thought it was a translated Latin phrase.
  | 
  | desiderat certum, desiderat falsitates
 
| PaulHoule wrote:
| Don't like the title, at least for this article.
| 
| When it comes to results like this it is more "lusting for
| clickbait" or the scientific equivalent thereof. (e.g. papers in
| _Science_ and _Nature_ aren't really particularly likely to be
| right, but they are particularly likely to be outrageous,
| particularly in fields like physics that aren't their center)
| 
| On the other hand, "Real Clear Poltics" always had a toxic
| sounding name to me since there is nothing "Real" or "Clear"
| about poltics: I think the best book about politics is Hunter S.
| Thompson's _Fear and Loathing on the Campaign Trail '72_ which is
| a druggie's personal experience following the candidates around
| and picking up hitchhikers on the road at 3am and getting strung
| out on the train and having moments of jarring sobriety like the
| time when he understood the parliamentary maneuvering that won
| McGovern the nomination while more conventional journalists were
| at a loss.
| 
| What I do know is 20 years from now an impeccably researched book
| will come out that makes a strong case that what we believed
| about political events today was all wrong and really it was
| something different. In the meantime different people are going
| to have radically different perspectives and... that's the way it
| is. Adjectives like "real" and "clear" are an attempt to shut
| down most of those perspectives and pretend one of those
| viewpoints is privileged. Makes me thing of Baudrillard's
| thorough shitting on the word "real" in _Simulacra and
| Simulation_ which ought to completely convince you that people
| peddling the fake will be heralded by the word "real".
| 
| (Or for that matter, that Scientology calls itself the "science
| of certainty.")
 
  | paulsutter wrote:
  | And it will also be wrong.
  | 
  | > 20 years from now an impeccably researched book will come out
  | that makes a strong case that what we believed about political
  | events today was all wrong and really it was something
  | different
  | 
  | The one good thing about politics is that the motives are
  | crystal clear, politicians want to stay in power first, and
  | only secondarily want to improve things.
  | 
  | Once you know this, everything makes sense. Even if we never
  | find out what "really" happened
 
    | Karellen wrote:
    | > politicians want to stay in power first, and only
    | secondarily want to improve things.
    | 
    | The politicians who want to be in power first, and only
    | secondarily want to improve things, tend to be the
    | politicians in power.
    | 
    | Politicians who want to improve things first do exist, but
    | they tend not to achieve power, because power is not their
    | goal, and they are out-maneuvered by the first type.
    | 
    | Notably, politicians who want to improve things are easily
    | side-tracked by suggesting that their proposed policy is not
    | the best way to improve things, and that some other way would
    | be better. This explains to some degree a lot of infighting
    | on the left, because many do want to genuinely help, but it's
    | never 100% clear what the best way to help is. It also
    | explains why the right can put aside major differences of
    | opinion (2A is important to fight the government who can't be
    | trusted, but support the troops and arm the police!) to
    | achieve power, because acquiring and maintaining power is
    | more important than exactly what you plan to do with it.
 
      | Vt71fcAqt7 wrote:
      | >2A is important to fight the government who can't be
      | trusted, but support the troops and arm the police!
      | 
      | I fail to see the contradiction here. 2A proponents would
      | say that 2A is there for when the government goes wrong, or
      | "when in the Course of human events, it becomes necessary
      | for one people to dissolve the political bands which have
      | connected them with another." At all other times, however,
      | it would be up to the government to enforce the law and
      | protect the people. Destroying the state is a different
      | ideology.
      | 
      | (To be clear, the last few wars may not have been about
      | protecting the people. But that the US has not been
      | attacked since Pearl Harbor may be a result of the
      | investment made in "defence" since then, as well as
      | favourable borders ect.)
      | 
      | In any case 'both sides' have people who people who actualy
      | care about society. And there are people on the left who
      | may simply want power, and complex people who seem to be a
      | bit of both (for example perhaps Lyndon Johnson depending
      | on how you see him).
 
    | bilbo0s wrote:
    | _politicians want to stay in power first, and only
    | secondarily want to improve things._
    | 
    | In all honesty, many don't even want to improve things. Most
    | people with power, love power. It's contrary to their nature
    | to change a system that confers power to themselves. That's
    | not just in your own, but in any nation, the people in power
    | will be resistant to change.
 
    | PaulHoule wrote:
    | That's as close as you will get to a master narrative but it
    | isn't all of it.
    | 
    | Politicians aren't always sure what will win for them, often
    | face a menu of unappetizing choices and have other
    | motivations too. (Quite a few of the better Republicans have
    | quit in disgust in the last decade: I watched the pope speak
    | in front of congress flanked by Joe Biden, then VP and John
    | Boehner, then House Speaker when the pope obliquely said they
    | should start behaving like adults and then Boehner quit a few
    | days later and got into the cannabis business.)
    | 
    | I was an elected member of the state committee of the Green
    | Party of New York and found myself arguing against a course
    | of action that I emotionally agreed with, thought was a
    | tactical mistake, and that my constituents were (it turns out
    | fatally) divided about. It was a strategic disaster in the
    | end.
 
      | paulsutter wrote:
      | You're right, I should have added that politics is also
      | extremely difficult and filled with unpalatable choices.
      | Each of the politicians I have met are intelligent, caring
      | people with a clear grasp of the issues.
      | 
      | And then you see what they do, and you wonder, what the...
 
| phkahler wrote:
| Classic mistake of not including zero on the vertical axis of a
| graph. If you're thinking "but then there won't be so much
| variation" you're right. Leaving zero off allows small variations
| to look large.
 
  | mattkrause wrote:
  | Am I alone in thinking that the graph was okay and the text was
  | just indulging in a bit of hyperbole?
  | 
  | It's a sudden ~50% dip, following nearly a century of apparent
  | stability.
 
  | PaulHoule wrote:
  | On the other hand there are the cases where you do want to
  | emphasize small variations. In a control chart showing the fill
  | weight of cereal boxes you certainly don't want zero on the
  | chart. Neither do you want to plot daily temperatures in a city
  | on a chart that includes 0 Kelvin.
 
    | hef19898 wrote:
    | Sure you do, why not? If you don't, show the deviation values
    | (plus and minus) centered around zero again.
 
      | PaulHoule wrote:
      | Not if it means the line looks flat.
 
        | slenk wrote:
        | Sometimes the data is flat...
 
        | thfuran wrote:
        | And many times small variations matter.
 
        | slenk wrote:
        | Yes, the CMB for instance.
 
        | PaulHoule wrote:
        | It sure feels like the temperature in Upstate NY varies
        | by more than 10%!
 
    | Scubabear68 wrote:
    | Exactly. A lot of investment market charts are zoomed in like
    | that because small deviations can matter a lot, and you don't
    | want the base price (or whatever measure you're looking at)
    | to swamp the signal.
 
  | lolinder wrote:
  | Including zero would have helped the "said" graph but not
  | solved it--it just would still look like "said" dropped to
  | almost 1/3 of its prior popularity, when what actually happened
  | is the makeup of the sample changed dramatically.
 
| jgalt212 wrote:
| The words of Colonel Nathan R. Jessup come to mind.
 
___________________________________________________________________
(page generated 2023-09-26 23:00 UTC)