[HN Gopher] On the dangers of stochastic parrots: Can language m...
___________________________________________________________________
 
On the dangers of stochastic parrots: Can language models be too
big? (2021)
 
Author : Schiphol
Score  : 49 points
Date   : 2023-01-14 18:58 UTC (4 hours ago)
 
web link (dl.acm.org)
w3m dump (dl.acm.org)
 
| hardwaregeek wrote:
| I'm still midway through the paper, but I gotta say, I'm a little
| surprised at the contrast between the contents of the paper and
| how people have described it on HN. I don't agree with everything
| that is said, but there are some interesting points made about
| the data used to train the models, such as it capturing bias (I
| would certainly question the methodology of using reddit as a
| large source of training data), and that bias being amplified by
| filtering algorithms that produce the even larger datasets used
| for modern LLMs. The section about environmental impact might not
| hit home for everyone, but it is valid to raise issues around the
| compute usage involved in training these models. First, because
| it limits this training to companies who can spend millions of
| dollars on compute, and second because if we want to scale up
| models, efficiency is probably a top goal.
| 
| What really confuses me here is how this paper is somehow outside
| the realm of valid academic discourse. Yes, it is steeped in
| activist, social justice language. Yes, it has a different
| perspective than most CS papers. But is that wrong? Is that
| enough of a sin to warrant such a response that this paper has
| received? I'll need to finish the paper to fully judge, but I'm
| leaning towards no, it is not enough of a sin.
 
| monkaiju wrote:
| The problems with LLM are numerous but whats really wild to me is
| that even as they get better at fairly trivial tasks the
| advertising gets more and more out of hand. These machine dont
| think, and they dont understand, but people like the CEO of
| OpenAI allude to them doing just that, obviously so the hype can
| make them money.
 
  | amelius wrote:
  | Could be the sign of the next AI winter.
 
  | visarga wrote:
  | > These machine dont think, and they dont understand
  | 
  | But they do solve many tasks correctly, even problems with
  | multiple steps and new tasks for which they got no specific
  | training. They can combine skills in new ways on demand. Call
  | it what you want.
 
    | choeger wrote:
    | They don't. Solve tasks, I mean. There's not a single task
    | you can throw at them and rely on the answer.
    | 
    | Could they solve tasks? Potentially. But how would we ever
    | know that we could trust them?
    | 
    | With humans we not only have millennia of collective
    | experience when it comes to tasks, judging the result, and
    | finding bullshitters. Also, we can retrain a human on the
    | spot and be confident they won't immediately forget something
    | important over that retraining.
    | 
    | If we ever let a model produce important decisions, I'd
    | imagine we'd want to certify it beforehand. But that excludes
    | improvements and feedback - the certified software should
    | better not change. If course, a feedback loop could involve
    | recertification, but that means that the certification
    | process itself needs to be cheap.
    | 
    | And all that doesn't even take into account the generalized
    | interface: How can we make sure that a model is aware of its
    | narrow purpose and doesn't answer to tasks outside of that
    | purpose?
    | 
    | I think all these problems could eventually be overcome, but
    | I don't see much effort put into such a framework to actually
    | make models solve tasks.
 
      | pedrosorio wrote:
      | > Also, we can retrain a human on the spot and be confident
      | they won't immediately forget something important over that
      | retraining.
      | 
      | I don't have millennia, but my more than 3 decades of
      | experience interacting with human beings tell me this is
      | not nearly as reliable as you make it seem.
 
  | didntreadarticl wrote:
  | I dont think you understand mate
 
  | LarryMullins wrote:
  | > _These machine dont think_
  | 
  | And submarines don't swim.
 
    | Dylan16807 wrote:
    | And it would be bad for a submarine salesman to go to people
    | that think swimming is very special and try to get them
    | believing that submarines do swim.
 
      | LarryMullins wrote:
      | Why would that be bad? A submarine salesman convincing you
      | that his submarine "swims" doesn't change the set of
      | missions a submarine might be suitable for. It makes no
      | practical difference. There's no point where you get the
      | submarine and it meets all the advertised specs, does
      | everything you needed a submarine for, but you're
      | unsatisfied with it anyway because you now realize that the
      | word "swim" is reserved for living creatures.
      | 
      | And more to the point, nobody believes that "it thinks" is
      | sufficient qualification for a job when hiring a human, so
      | why would it be different when buying a machine? Whether or
      | not the machine "thinks" doesn't address the question of
      | whether or not the machine is capable of doing the jobs you
      | want it to do. Anybody who neglects to evaluate the
      | _functional capability_ of the machine is simply a fool.
 
| sthatipamala wrote:
| This paper is the product of a failed model of AI safety, in
| which dedicated safety advocates act as a public ombudsman with
| an adversarial relationship with their employer. It's baffling to
| me why anyone thought that would be sustainable.
| 
| Compare this to something like RLHF[0] which has acheived far
| more for aligning models toward being polite and non-evil. (This
| is the technique that helps ChatGPT decline to answer questions
| like "how to make a bomb?")
| 
| There's still a lot of work to be done and the real progress will
| be made by researchers who implement systems in collaboration
| with their colleagues and employers.
| 
| [0] https://openai.com/blog/instruction-following/
 
  | generalizations wrote:
  | > The resulting InstructGPT models are much better at following
  | instructions than GPT-3. They also make up facts less often,
  | and show small decreases in toxic output generation. Our
  | labelers prefer outputs from our 1.3B InstructGPT model over
  | outputs from a 175B GPT-3 model, despite having more than 100x
  | fewer parameters.
  | 
  | I wonder if anyone's working on public models of this size.
  | Looking forward to when we can selfhost ChatGPT.
 
    | lumost wrote:
    | This is going to happen _alot_ over the next few years. One
    | can fine tune GPT-2 medium on an RTX2070. Training GPT-2
    | medium from scratch can be done for $162 on vast.ai. The
    | newer H100 /Trainium/Tensorcore chips will bring the price
    | down even further.
    | 
    | I suspect if one wanted to fully replicate ChatGPT from
    | scratch it would take ~1-2 million including label
    | acquisition. You probably only require ~200-500k in compute.
    | 
    | The next few years are going to be wild!
 
      | generalizations wrote:
      | These things have reached the tipping point where they
      | provide significant utility to a significant portion of the
      | computer scientists working on making these things. Could
      | be that the coming iterations of these new tools will make
      | it increasingly easy to write the code for the next
      | iterations of these tools.
      | 
      | I wonder if this is the first rumblings of the singularity.
 
        | visarga wrote:
        | chatGPT being able to write OpenAI API code is great, and
        | all companies should prepare samples so future models can
        | correctly interface with their systems.
        | 
        | But what will be needed is to create an AI that
        | implements scientific papers. About 30% of papers have
        | code implementation. That's a sizeable dataset to train a
        | Codex model on.
        | 
        | You can have AI generating papers, and AI implementing
        | papers, then learning to predict experimental results.
        | This is how you bootstrap a self improving AI.
        | 
        | It does not learn only how to recreate itself, it learns
        | how to solve all problems at the same time. A data
        | engineering approach to AI: search and learn / solve and
        | learn / evolve and learn.
 
        | williamcotton wrote:
        | I can imagine a world where there are an infinity of
        | "local maximums" that stop a system from reaching a
        | singular feedback loop... imagine if our current tools
        | help write the next generation, so on, so on, until it
        | gets stuck in some local optimization somewhere. Getting
        | stuck seems more likely than not getting stuck, right?
 
  | Natsu wrote:
  | > Compare this to something like RLHF[0] which has acheived far
  | more for aligning models toward being polite and non-evil.
  | (This is the technique that helps ChatGPT decline to answer
  | questions like "how to make a bomb?")
  | 
  | I recently saw a screenshot of someone doing trolley problems
  | with people of all races & ages with ChatGPT and noting
  | differences. That makes me not quite as confident about
  | alignment as you are.
 
    | sthatipamala wrote:
    | I am curious to see that trolley problem screenshot. I saw
    | another screenshot where ChatGPT was coaxed into justifying
    | gender pay differences by prompting it to generate
    | hypothetical CSV or JSON data.
    | 
    | Basically you have to convince modern models to say bad stuff
    | using clever hacks (compared to GPT-2 or even early GPT-3
    | where it would just spout straight-up hatred with the
    | lightest touch).
    | 
    | That's very good progress and I'm sure there is more to come.
 
      | cactusplant7374 wrote:
      | > I saw another screenshot where ChatGPT was coaxed into
      | justifying gender pay differences by prompting it to
      | generate hypothetical CSV or JSON data.
      | 
      | I remember seeing that on Twitter. My impression was author
      | instructed the AI to discriminate by gender.
 
        | Dylan16807 wrote:
        | Did the author tell it which way or by how much?
        | 
        | If I say to discriminate on some feature and it
        | consistently does it the same way, that's still a pretty
        | bad bias. It probably shows up in other ways.
 
  | andrepd wrote:
  | Isn't RLHF trivially easy to defeat (as it stands now)?
 
    | ShamelessC wrote:
    | Assuming a motivated "attacker", yes. The average user will
    | have no such notion of "jailbreaks", and it's at least clear
    | when one _is_ attempting to "jailbreak" a model (given a full
    | log of the conversation and a competent human investigator).
    | 
    | I think the class of problems that remain are basically
    | outliers that are misaligned and don't trip up the model's
    | detection mechanism. Given the nature of language and culture
    | (not to mention that they both change over time), I imagine
    | there are a lot of these. I don't have any examples (and I
    | don't think yelling "time's up" when such outliers are found
    | is at all helpful).
 
  | visarga wrote:
  | > researchers who implement real systems
  | 
  | That's what I didn't like about Gebru - too much critique, not
  | a single constructive suggestion. Especially her Gender Shades
  | paper where she forgot about Asians.
  | 
  | http://proceedings.mlr.press/v81/buolamwini18a/buolamwini18a...
  | 
  | I think AnthropicAI is a great company to follow related to
  | actually solving these problems. Look at their "Constitutional
  | AI" paper. They automate and improve on RLHF.
  | 
  | https://www.anthropic.com/constitutional.pdf
 
    | [deleted]
 
| srvmshr wrote:
| I am of the general understanding that this paper became less
| about the LLMs & more of a insinuating hit piece against
| Alphabet. At least, some of the controversial nuggets got Gebru
| (and later M Mitchell) fired.
| 
| From a technical standpoint, there is little new stuff that I
| found this paper offered in understanding why LLMs can have
| unpredictable nature, or what degree of data will get exposed by
| clever hacks (or if there are systematic ways to go about it). It
| sounded more like a collection of verifiable anecdotes for easy
| consumption (which can be a good thing by itself if you want
| capsule understanding in a non-technical way)
 
  | visarga wrote:
  | It was activism masquerading as science. Many researches noted
  | that positives and negatives were not presented in a balanced
  | way. New approaches and efforts were not credited.
 
    | srvmshr wrote:
    | I haven't kept track but the activism of the trio could be
    | severe sometimes.
    | 
    | (Anecdotally, I have faced a bite-sized brunt: When
    | discussion surrounding this paper was going on in Twitter, I
    | had mentioned in my timeline (in a neutral tone) that "dust
    | needed to settle to understand what was going wrong". This
    | was unfortunately picked up & RTed by Gebru & the mob
    | responded by name-calling, threatening DMs accusing me of
    | racism/misogyny etc, and one instance of a call to my
    | employer asking to terminate me - all for that one single
    | tweet. I don't want confrontations - not my forte to deal.)
 
      | [deleted]
 
      | [deleted]
 
      | zzzeek wrote:
      | > This was unfortunately picked up & RTed by Gebru & the
      | mob responded by name-calling, threatening DMs accusing me
      | of racism/misogyny etc, and one instance of a call to my
      | employer asking to terminate me - all for that one single
      | tweet.
      | 
      | Wait until an LLM flags your speech and gets you in
      | trouble. That'll be a real hoot compared to random
      | individuals who likely have been chased off Twitter by now.
 
      | visarga wrote:
      | Sounds similar to what I have witnessed on Twitter, not
      | against me, but against a few very visible people in the AI
      | community.
 
  | [deleted]
 
    | [deleted]
 
| larve wrote:
| I just finished working my way through this this morning. The
| literature list is quite interesting and gives a lot of pointers
| for people who want to walk the line between overblown hype and
| doomsday scenarios.
 
| xiaolingxiao wrote:
| I believe this is the papers that got timnit and mmitchel fired
| from google, followed by a protracted media/legal campaign
| against google and vice versa.
 
  | freyr wrote:
  | I suspect it was Timnit's behavior after the paper didn't pass
  | internal review that actually got her fired (issuing an
  | ultimatum and threatening to resign unless the company met her
  | demands; telling her coworkers to stop writing documents
  | because their work didn't matter; insinuations of
  | racist/misogynistic treatment from leadership when she didn't
  | get her way).
 
    | visarga wrote:
    | I think it was a well calculated career move, she wanted
    | fame, she got what she wanted. Now she's leading a new
    | research institute
    | 
    | > We are an interdisciplinary and globally distributed AI
    | research institute rooted in the belief that AI is not
    | inevitable, its harms are preventable, and when its
    | production and deployment include diverse perspectives and
    | deliberate processes it can be beneficial. Our research
    | reflects our lived experiences and centers our communities.
    | 
    | https://www.dair-institute.org/about
 
  | oh_sigh wrote:
  | A small correction: this paper didn't get her fired, her
  | reaction to feedback on this paper got her fired.
  | 
  | Note to all: if you give an employer an ultimatum "do X or I
  | resign", don't be surprised if they accept your resignation.
 
  | [deleted]
 
    | [deleted]
 
| 2bitencryption wrote:
| Pure speculation ahead-
| 
| The other day on Hacker News, there was that article about how
| scientists could not tell GPT-generated paper abstracts from real
| ones.
| 
| Which makes me think- abstracts for scientific papers are high-
| effort. The corpus of scientific abstracts would understandably
| have a low count of "garbage" compared to, say, Twitter posts or
| random blogs.
| 
| That's not to say that all scientific abstracts are amazing, just
| that their goal is to sound intelligent and convincing, while
| probably 60% of the junk fed into GPT is simply clickbait and
| junk content padded to fit some publisher's SEO requirements.
| 
| In other words, ask GPT to generate an abstract, and I would
| expect it to be quite good.
| 
| Ask it to generate a 5-paragraph essay about Huckleberry Finn,
| and I would expect it to be the same quality as the corpus- that
| is to say, high-school English students.
| 
| So now that we know these models can learn many one-shot tasks,
| perhaps some cleanup of the training data is required to advance.
| Imagine GPT trained ONLY on the library of congress, without the
| shitty travel blogs or 4chan rants.
 
  | williamcotton wrote:
  | The science is in the reproduction of the methodology, not in
  | the abstract... in fact, a lot of garbage publications with
  | catchy abstracts built on a shaky foundation sounds like one of
  | the issues that plagues contemporary science. That people would
  | stop finding abstracts useful seems a good thing!
 
  | JPLeRouzic wrote:
  | > " _The corpus of scientific abstracts would understandably
  | have a low count of "garbage" compared to, say, Twitter posts
  | or random blogs_"
  | 
  | That's certainly true, but it's not by a so large margin, at
  | least in biology.
  | 
  | For example in ALS (a neurodegenerative disease) there is a
  | real breakthrough perhaps every two years, but most papers
  | about ALS (thousands every year) look like they describe
  | something very important.
  | 
  | Similarly for ALZforum the most recent "milestone" paper about
  | Alzheimer disease was in 2012, yet in 2022 alone there were
  | more than 16K papers!
  | 
  | So the ratio signal on noise is close to zero.
  | 
  | https://www.alzforum.org/papers?type%5Bmilestone%5D=mileston...
  | 
  | https://pubmed.ncbi.nlm.nih.gov/?term=alzheimer%27s+disease&...
 
  | [deleted]
 
  | ncraig wrote:
  | Some might say that abstracts are the original clickbait.
 
| weeksie wrote:
| This was mostly political guff about environmentalism and bias,
| but one thing I didn't know was that apparently larger models
| make it easier to extract training data.
| 
| > Finally, we note that there are risks associated with the fact
| that LMs with extremely large numbers of parameters model their
| training data very closely and can be prompted to output specific
| information from that training data. For example, [28]
| demonstrate a methodology for extracting personally identifiable
| information (PII) from an LM and find that larger LMs are more
| susceptible to this style of attack than smaller ones. Building
| training data out of publicly available documents doesn't fully
| mitigate this risk: just because the PII was already available in
| the open on the Internet doesn't mean there isn't additional harm
| in collecting it and providing another avenue to its discovery.
| This type of risk differs from those noted above because it
| doesn't hinge on seeming coherence of synthetic text, but the
| possibility of a sufficiently motivated user gaining access to
| training data via the LM. In a similar vein, users might query
| LMs for 'dangerous knowledge' (e.g. tax avoidance advice),
| knowing that what they were getting was synthetic and therefore
| not credible but nonetheless representing clues to what is in the
| training data in order to refine their own search queries
| 
| Shame they only gave that one graf. I'd like to know more about
| this. Again, miss me with the political garbage about "dangerous
| knowledge", the most concerning thing is the PII leakage as far
| as I can tell.
 
  | visarga wrote:
  | Is this a good or bad thing? We hear "hallucination" this and
  | that. You can't rely on the LLM. It is not like a search
  | engine. But then you hear on the other side "it memorises PII".
  | 
  | Being able to memorise information is demanded when we want the
  | top 5 countries by population in Europe or the height of
  | Everest. But then we don't want it in other contexts.
  | 
  | Looks more like a dataset pre-processing issue.
 
    | weeksie wrote:
    | I _think_ I agree with this take.
    | 
    | Is it conceivable that a model could leak PII that is present
    | but extremely hard to detect in the data set? For example,
    | spread out in very different documents in the corpus that
    | aren't obviously related, but that the model would synthesize
    | relatively easily?
 
  | srvmshr wrote:
  | That is sort of understood facts with even models like Copilot
  | & ChatGPT. With the amount of information we are generally
  | churning, all PII may not get scrubbbed. And these LLMs often
  | could be running on unsanitized data - like a cache of Web on
  | Archive.org, Getty images & the likes.
  | 
  | I feel this is a unavoidable consequence of using LLM. We
  | cannot ensure all data is free from any markers. I am not a
  | expert on databases/data engineering so please take it as an
  | informed opinion
 
    | weeksie wrote:
    | Copilot has a ton of well publicised examples of verbatim
    | code being used, but I didn't realize that it was as trivial
    | as all that to go plumbing for it directly.
 
  | [deleted]
 
| platypii wrote:
| This paper is embarrassingly bad. It's really just an opinion
| piece where the authors rant about why they don't like large
| language models.
| 
| There is no falsifiable hypothesis to be found in it.
| 
| I think this paper will age very poorly, as LLMs continue to
| improve and our ability to guide them (such as with RLHF)
| improves.
 
  | jasmer wrote:
  | This is ok. 90% of research is creative thinking, dialogue. One
  | idea creates the next, some are a foil, some are dead ends. As
  | long as there are not outrageous claims being made for 'hard
  | evidence' where there is none, it's fine. Maybe the format
  | isn't fully appropriate but the content is. Most good things
  | come about in a non-linear process which involves provocation
  | along the line somewhere.
 
    | janalsncm wrote:
    | I expect science to have a hypothesis which can be falsified.
    | Otherwise it's just opining on a topic. Otherwise we could
    | just call this HN thread "research".
 
      | joshuamorton wrote:
      | Position papers are exceedingly common. Common enough that
      | there's a term for them.
 
  | xwn wrote:
  | I don't know, without enumerating risks to check, there's
  | little basis for doing due diligence and quelling investors.
  | This massively-cited paper gave a good point of departure for
  | establishing rigorous use of LLMs in the real world. Without
  | that, they're just an unestablished tech with unknown downsides
  | - that's harder to get into true mass acceptance outside the
  | SFBA/tech bubble.
 
  | srvmshr wrote:
  | This is generally my feeling as well with the paper.
  | 
  | You don't come out feeling "Voila! this tiny thing I learnt is
  | something new", which does happen often with many good papers.
  | Most of the paper just felt a bit anecdotal & underwhelming
  | (but I may be too afraid to say the same on Twiiter for good
  | reason)
 
  | Lyapunov_Lover wrote:
  | Why would there be a falsifiable hypothesis in it? Do you think
  | that's a criterion for something being a scientific paper or
  | something? If it ain't Popper, it ain't proper?
  | 
  | LLMs dramatically lower the bar for generating semi-plausible
  | bullshit and it's highly likely that this will cause problems
  | in the not-so-distant future. This is already happening. Ask
  | any teacher anywhere. Students are cheating like crazy, letting
  | chatGPT write their essays and answer their assignments without
  | actually engaging with the material they're supposed to grok.
  | News sites are pumping out LLM-generated articles and the ease
  | of doing so means they have an edge over those who demand
  | scrutiny and expertise in their reporting--it's not unlikely
  | that we're going to be drowning in this type of content.
  | 
  | LLMs aren't perfect. RLHF is far from perfect. Language models
  | will keep making subtle and not-so-subtle mistakes and dealing
  | with this aspect of them is going to be a real challenge.
  | 
  | Personally, I think everyone should learn how to use this new
  | technology. Adapting to it is the only thing that makes sense.
  | The paper in question raised valid concerns about the nature of
  | (current) LLMs and I see no reason why it should age poorly.
 
  | [deleted]
 
___________________________________________________________________
(page generated 2023-01-14 23:00 UTC)