proxy70

	[HN Gopher] On the dangers of stochastic parrots: Can language m... ___________________________________________________________________ On the dangers of stochastic parrots: Can language models be too big? (2021) Author : Schiphol Score : 49 points Date : 2023-01-14 18:58 UTC (4 hours ago)
	web link (dl.acm.org)
	w3m dump (dl.acm.org)
	\| hardwaregeek wrote: \| I'm still midway through the paper, but I gotta say, I'm a little \| surprised at the contrast between the contents of the paper and \| how people have described it on HN. I don't agree with everything \| that is said, but there are some interesting points made about \| the data used to train the models, such as it capturing bias (I \| would certainly question the methodology of using reddit as a \| large source of training data), and that bias being amplified by \| filtering algorithms that produce the even larger datasets used \| for modern LLMs. The section about environmental impact might not \| hit home for everyone, but it is valid to raise issues around the \| compute usage involved in training these models. First, because \| it limits this training to companies who can spend millions of \| dollars on compute, and second because if we want to scale up \| models, efficiency is probably a top goal. \| \| What really confuses me here is how this paper is somehow outside \| the realm of valid academic discourse. Yes, it is steeped in \| activist, social justice language. Yes, it has a different \| perspective than most CS papers. But is that wrong? Is that \| enough of a sin to warrant such a response that this paper has \| received? I'll need to finish the paper to fully judge, but I'm \| leaning towards no, it is not enough of a sin. \| monkaiju wrote: \| The problems with LLM are numerous but whats really wild to me is \| that even as they get better at fairly trivial tasks the \| advertising gets more and more out of hand. These machine dont \| think, and they dont understand, but people like the CEO of \| OpenAI allude to them doing just that, obviously so the hype can \| make them money. \| amelius wrote: \| Could be the sign of the next AI winter. \| visarga wrote: \| > These machine dont think, and they dont understand \| \| But they do solve many tasks correctly, even problems with \| multiple steps and new tasks for which they got no specific \| training. They can combine skills in new ways on demand. Call \| it what you want. \| choeger wrote: \| They don't. Solve tasks, I mean. There's not a single task \| you can throw at them and rely on the answer. \| \| Could they solve tasks? Potentially. But how would we ever \| know that we could trust them? \| \| With humans we not only have millennia of collective \| experience when it comes to tasks, judging the result, and \| finding bullshitters. Also, we can retrain a human on the \| spot and be confident they won't immediately forget something \| important over that retraining. \| \| If we ever let a model produce important decisions, I'd \| imagine we'd want to certify it beforehand. But that excludes \| improvements and feedback - the certified software should \| better not change. If course, a feedback loop could involve \| recertification, but that means that the certification \| process itself needs to be cheap. \| \| And all that doesn't even take into account the generalized \| interface: How can we make sure that a model is aware of its \| narrow purpose and doesn't answer to tasks outside of that \| purpose? \| \| I think all these problems could eventually be overcome, but \| I don't see much effort put into such a framework to actually \| make models solve tasks. \| pedrosorio wrote: \| > Also, we can retrain a human on the spot and be confident \| they won't immediately forget something important over that \| retraining. \| \| I don't have millennia, but my more than 3 decades of \| experience interacting with human beings tell me this is \| not nearly as reliable as you make it seem. \| didntreadarticl wrote: \| I dont think you understand mate \| LarryMullins wrote: \| > _These machine dont think_ \| \| And submarines don't swim. \| Dylan16807 wrote: \| And it would be bad for a submarine salesman to go to people \| that think swimming is very special and try to get them \| believing that submarines do swim. \| LarryMullins wrote: \| Why would that be bad? A submarine salesman convincing you \| that his submarine "swims" doesn't change the set of \| missions a submarine might be suitable for. It makes no \| practical difference. There's no point where you get the \| submarine and it meets all the advertised specs, does \| everything you needed a submarine for, but you're \| unsatisfied with it anyway because you now realize that the \| word "swim" is reserved for living creatures. \| \| And more to the point, nobody believes that "it thinks" is \| sufficient qualification for a job when hiring a human, so \| why would it be different when buying a machine? Whether or \| not the machine "thinks" doesn't address the question of \| whether or not the machine is capable of doing the jobs you \| want it to do. Anybody who neglects to evaluate the \| _functional capability_ of the machine is simply a fool. \| sthatipamala wrote: \| This paper is the product of a failed model of AI safety, in \| which dedicated safety advocates act as a public ombudsman with \| an adversarial relationship with their employer. It's baffling to \| me why anyone thought that would be sustainable. \| \| Compare this to something like RLHF[0] which has acheived far \| more for aligning models toward being polite and non-evil. (This \| is the technique that helps ChatGPT decline to answer questions \| like "how to make a bomb?") \| \| There's still a lot of work to be done and the real progress will \| be made by researchers who implement systems in collaboration \| with their colleagues and employers. \| \| [0] https://openai.com/blog/instruction-following/ \| generalizations wrote: \| > The resulting InstructGPT models are much better at following \| instructions than GPT-3. They also make up facts less often, \| and show small decreases in toxic output generation. Our \| labelers prefer outputs from our 1.3B InstructGPT model over \| outputs from a 175B GPT-3 model, despite having more than 100x \| fewer parameters. \| \| I wonder if anyone's working on public models of this size. \| Looking forward to when we can selfhost ChatGPT. \| lumost wrote: \| This is going to happen _alot_ over the next few years. One \| can fine tune GPT-2 medium on an RTX2070. Training GPT-2 \| medium from scratch can be done for $162 on vast.ai. The \| newer H100 /Trainium/Tensorcore chips will bring the price \| down even further. \| \| I suspect if one wanted to fully replicate ChatGPT from \| scratch it would take ~1-2 million including label \| acquisition. You probably only require ~200-500k in compute. \| \| The next few years are going to be wild! \| generalizations wrote: \| These things have reached the tipping point where they \| provide significant utility to a significant portion of the \| computer scientists working on making these things. Could \| be that the coming iterations of these new tools will make \| it increasingly easy to write the code for the next \| iterations of these tools. \| \| I wonder if this is the first rumblings of the singularity. \| visarga wrote: \| chatGPT being able to write OpenAI API code is great, and \| all companies should prepare samples so future models can \| correctly interface with their systems. \| \| But what will be needed is to create an AI that \| implements scientific papers. About 30% of papers have \| code implementation. That's a sizeable dataset to train a \| Codex model on. \| \| You can have AI generating papers, and AI implementing \| papers, then learning to predict experimental results. \| This is how you bootstrap a self improving AI. \| \| It does not learn only how to recreate itself, it learns \| how to solve all problems at the same time. A data \| engineering approach to AI: search and learn / solve and \| learn / evolve and learn. \| williamcotton wrote: \| I can imagine a world where there are an infinity of \| "local maximums" that stop a system from reaching a \| singular feedback loop... imagine if our current tools \| help write the next generation, so on, so on, until it \| gets stuck in some local optimization somewhere. Getting \| stuck seems more likely than not getting stuck, right? \| Natsu wrote: \| > Compare this to something like RLHF[0] which has acheived far \| more for aligning models toward being polite and non-evil. \| (This is the technique that helps ChatGPT decline to answer \| questions like "how to make a bomb?") \| \| I recently saw a screenshot of someone doing trolley problems \| with people of all races & ages with ChatGPT and noting \| differences. That makes me not quite as confident about \| alignment as you are. \| sthatipamala wrote: \| I am curious to see that trolley problem screenshot. I saw \| another screenshot where ChatGPT was coaxed into justifying \| gender pay differences by prompting it to generate \| hypothetical CSV or JSON data. \| \| Basically you have to convince modern models to say bad stuff \| using clever hacks (compared to GPT-2 or even early GPT-3 \| where it would just spout straight-up hatred with the \| lightest touch). \| \| That's very good progress and I'm sure there is more to come. \| cactusplant7374 wrote: \| > I saw another screenshot where ChatGPT was coaxed into \| justifying gender pay differences by prompting it to \| generate hypothetical CSV or JSON data. \| \| I remember seeing that on Twitter. My impression was author \| instructed the AI to discriminate by gender. \| Dylan16807 wrote: \| Did the author tell it which way or by how much? \| \| If I say to discriminate on some feature and it \| consistently does it the same way, that's still a pretty \| bad bias. It probably shows up in other ways. \| andrepd wrote: \| Isn't RLHF trivially easy to defeat (as it stands now)? \| ShamelessC wrote: \| Assuming a motivated "attacker", yes. The average user will \| have no such notion of "jailbreaks", and it's at least clear \| when one _is_ attempting to "jailbreak" a model (given a full \| log of the conversation and a competent human investigator). \| \| I think the class of problems that remain are basically \| outliers that are misaligned and don't trip up the model's \| detection mechanism. Given the nature of language and culture \| (not to mention that they both change over time), I imagine \| there are a lot of these. I don't have any examples (and I \| don't think yelling "time's up" when such outliers are found \| is at all helpful). \| visarga wrote: \| > researchers who implement real systems \| \| That's what I didn't like about Gebru - too much critique, not \| a single constructive suggestion. Especially her Gender Shades \| paper where she forgot about Asians. \| \| http://proceedings.mlr.press/v81/buolamwini18a/buolamwini18a... \| \| I think AnthropicAI is a great company to follow related to \| actually solving these problems. Look at their "Constitutional \| AI" paper. They automate and improve on RLHF. \| \| https://www.anthropic.com/constitutional.pdf \| [deleted] \| srvmshr wrote: \| I am of the general understanding that this paper became less \| about the LLMs & more of a insinuating hit piece against \| Alphabet. At least, some of the controversial nuggets got Gebru \| (and later M Mitchell) fired. \| \| From a technical standpoint, there is little new stuff that I \| found this paper offered in understanding why LLMs can have \| unpredictable nature, or what degree of data will get exposed by \| clever hacks (or if there are systematic ways to go about it). It \| sounded more like a collection of verifiable anecdotes for easy \| consumption (which can be a good thing by itself if you want \| capsule understanding in a non-technical way) \| visarga wrote: \| It was activism masquerading as science. Many researches noted \| that positives and negatives were not presented in a balanced \| way. New approaches and efforts were not credited. \| srvmshr wrote: \| I haven't kept track but the activism of the trio could be \| severe sometimes. \| \| (Anecdotally, I have faced a bite-sized brunt: When \| discussion surrounding this paper was going on in Twitter, I \| had mentioned in my timeline (in a neutral tone) that "dust \| needed to settle to understand what was going wrong". This \| was unfortunately picked up & RTed by Gebru & the mob \| responded by name-calling, threatening DMs accusing me of \| racism/misogyny etc, and one instance of a call to my \| employer asking to terminate me - all for that one single \| tweet. I don't want confrontations - not my forte to deal.) \| [deleted] \| [deleted] \| zzzeek wrote: \| > This was unfortunately picked up & RTed by Gebru & the \| mob responded by name-calling, threatening DMs accusing me \| of racism/misogyny etc, and one instance of a call to my \| employer asking to terminate me - all for that one single \| tweet. \| \| Wait until an LLM flags your speech and gets you in \| trouble. That'll be a real hoot compared to random \| individuals who likely have been chased off Twitter by now. \| visarga wrote: \| Sounds similar to what I have witnessed on Twitter, not \| against me, but against a few very visible people in the AI \| community. \| [deleted] \| [deleted] \| larve wrote: \| I just finished working my way through this this morning. The \| literature list is quite interesting and gives a lot of pointers \| for people who want to walk the line between overblown hype and \| doomsday scenarios. \| xiaolingxiao wrote: \| I believe this is the papers that got timnit and mmitchel fired \| from google, followed by a protracted media/legal campaign \| against google and vice versa. \| freyr wrote: \| I suspect it was Timnit's behavior after the paper didn't pass \| internal review that actually got her fired (issuing an \| ultimatum and threatening to resign unless the company met her \| demands; telling her coworkers to stop writing documents \| because their work didn't matter; insinuations of \| racist/misogynistic treatment from leadership when she didn't \| get her way). \| visarga wrote: \| I think it was a well calculated career move, she wanted \| fame, she got what she wanted. Now she's leading a new \| research institute \| \| > We are an interdisciplinary and globally distributed AI \| research institute rooted in the belief that AI is not \| inevitable, its harms are preventable, and when its \| production and deployment include diverse perspectives and \| deliberate processes it can be beneficial. Our research \| reflects our lived experiences and centers our communities. \| \| https://www.dair-institute.org/about \| oh_sigh wrote: \| A small correction: this paper didn't get her fired, her \| reaction to feedback on this paper got her fired. \| \| Note to all: if you give an employer an ultimatum "do X or I \| resign", don't be surprised if they accept your resignation. \| [deleted] \| [deleted] \| 2bitencryption wrote: \| Pure speculation ahead- \| \| The other day on Hacker News, there was that article about how \| scientists could not tell GPT-generated paper abstracts from real \| ones. \| \| Which makes me think- abstracts for scientific papers are high- \| effort. The corpus of scientific abstracts would understandably \| have a low count of "garbage" compared to, say, Twitter posts or \| random blogs. \| \| That's not to say that all scientific abstracts are amazing, just \| that their goal is to sound intelligent and convincing, while \| probably 60% of the junk fed into GPT is simply clickbait and \| junk content padded to fit some publisher's SEO requirements. \| \| In other words, ask GPT to generate an abstract, and I would \| expect it to be quite good. \| \| Ask it to generate a 5-paragraph essay about Huckleberry Finn, \| and I would expect it to be the same quality as the corpus- that \| is to say, high-school English students. \| \| So now that we know these models can learn many one-shot tasks, \| perhaps some cleanup of the training data is required to advance. \| Imagine GPT trained ONLY on the library of congress, without the \| shitty travel blogs or 4chan rants. \| williamcotton wrote: \| The science is in the reproduction of the methodology, not in \| the abstract... in fact, a lot of garbage publications with \| catchy abstracts built on a shaky foundation sounds like one of \| the issues that plagues contemporary science. That people would \| stop finding abstracts useful seems a good thing! \| JPLeRouzic wrote: \| > " _The corpus of scientific abstracts would understandably \| have a low count of "garbage" compared to, say, Twitter posts \| or random blogs_" \| \| That's certainly true, but it's not by a so large margin, at \| least in biology. \| \| For example in ALS (a neurodegenerative disease) there is a \| real breakthrough perhaps every two years, but most papers \| about ALS (thousands every year) look like they describe \| something very important. \| \| Similarly for ALZforum the most recent "milestone" paper about \| Alzheimer disease was in 2012, yet in 2022 alone there were \| more than 16K papers! \| \| So the ratio signal on noise is close to zero. \| \| https://www.alzforum.org/papers?type%5Bmilestone%5D=mileston... \| \| https://pubmed.ncbi.nlm.nih.gov/?term=alzheimer%27s+disease&... \| [deleted] \| ncraig wrote: \| Some might say that abstracts are the original clickbait. \| weeksie wrote: \| This was mostly political guff about environmentalism and bias, \| but one thing I didn't know was that apparently larger models \| make it easier to extract training data. \| \| > Finally, we note that there are risks associated with the fact \| that LMs with extremely large numbers of parameters model their \| training data very closely and can be prompted to output specific \| information from that training data. For example, [28] \| demonstrate a methodology for extracting personally identifiable \| information (PII) from an LM and find that larger LMs are more \| susceptible to this style of attack than smaller ones. Building \| training data out of publicly available documents doesn't fully \| mitigate this risk: just because the PII was already available in \| the open on the Internet doesn't mean there isn't additional harm \| in collecting it and providing another avenue to its discovery. \| This type of risk differs from those noted above because it \| doesn't hinge on seeming coherence of synthetic text, but the \| possibility of a sufficiently motivated user gaining access to \| training data via the LM. In a similar vein, users might query \| LMs for 'dangerous knowledge' (e.g. tax avoidance advice), \| knowing that what they were getting was synthetic and therefore \| not credible but nonetheless representing clues to what is in the \| training data in order to refine their own search queries \| \| Shame they only gave that one graf. I'd like to know more about \| this. Again, miss me with the political garbage about "dangerous \| knowledge", the most concerning thing is the PII leakage as far \| as I can tell. \| visarga wrote: \| Is this a good or bad thing? We hear "hallucination" this and \| that. You can't rely on the LLM. It is not like a search \| engine. But then you hear on the other side "it memorises PII". \| \| Being able to memorise information is demanded when we want the \| top 5 countries by population in Europe or the height of \| Everest. But then we don't want it in other contexts. \| \| Looks more like a dataset pre-processing issue. \| weeksie wrote: \| I _think_ I agree with this take. \| \| Is it conceivable that a model could leak PII that is present \| but extremely hard to detect in the data set? For example, \| spread out in very different documents in the corpus that \| aren't obviously related, but that the model would synthesize \| relatively easily? \| srvmshr wrote: \| That is sort of understood facts with even models like Copilot \| & ChatGPT. With the amount of information we are generally \| churning, all PII may not get scrubbbed. And these LLMs often \| could be running on unsanitized data - like a cache of Web on \| Archive.org, Getty images & the likes. \| \| I feel this is a unavoidable consequence of using LLM. We \| cannot ensure all data is free from any markers. I am not a \| expert on databases/data engineering so please take it as an \| informed opinion \| weeksie wrote: \| Copilot has a ton of well publicised examples of verbatim \| code being used, but I didn't realize that it was as trivial \| as all that to go plumbing for it directly. \| [deleted] \| platypii wrote: \| This paper is embarrassingly bad. It's really just an opinion \| piece where the authors rant about why they don't like large \| language models. \| \| There is no falsifiable hypothesis to be found in it. \| \| I think this paper will age very poorly, as LLMs continue to \| improve and our ability to guide them (such as with RLHF) \| improves. \| jasmer wrote: \| This is ok. 90% of research is creative thinking, dialogue. One \| idea creates the next, some are a foil, some are dead ends. As \| long as there are not outrageous claims being made for 'hard \| evidence' where there is none, it's fine. Maybe the format \| isn't fully appropriate but the content is. Most good things \| come about in a non-linear process which involves provocation \| along the line somewhere. \| janalsncm wrote: \| I expect science to have a hypothesis which can be falsified. \| Otherwise it's just opining on a topic. Otherwise we could \| just call this HN thread "research". \| joshuamorton wrote: \| Position papers are exceedingly common. Common enough that \| there's a term for them. \| xwn wrote: \| I don't know, without enumerating risks to check, there's \| little basis for doing due diligence and quelling investors. \| This massively-cited paper gave a good point of departure for \| establishing rigorous use of LLMs in the real world. Without \| that, they're just an unestablished tech with unknown downsides \| - that's harder to get into true mass acceptance outside the \| SFBA/tech bubble. \| srvmshr wrote: \| This is generally my feeling as well with the paper. \| \| You don't come out feeling "Voila! this tiny thing I learnt is \| something new", which does happen often with many good papers. \| Most of the paper just felt a bit anecdotal & underwhelming \| (but I may be too afraid to say the same on Twiiter for good \| reason) \| Lyapunov_Lover wrote: \| Why would there be a falsifiable hypothesis in it? Do you think \| that's a criterion for something being a scientific paper or \| something? If it ain't Popper, it ain't proper? \| \| LLMs dramatically lower the bar for generating semi-plausible \| bullshit and it's highly likely that this will cause problems \| in the not-so-distant future. This is already happening. Ask \| any teacher anywhere. Students are cheating like crazy, letting \| chatGPT write their essays and answer their assignments without \| actually engaging with the material they're supposed to grok. \| News sites are pumping out LLM-generated articles and the ease \| of doing so means they have an edge over those who demand \| scrutiny and expertise in their reporting--it's not unlikely \| that we're going to be drowning in this type of content. \| \| LLMs aren't perfect. RLHF is far from perfect. Language models \| will keep making subtle and not-so-subtle mistakes and dealing \| with this aspect of them is going to be a real challenge. \| \| Personally, I think everyone should learn how to use this new \| technology. Adapting to it is the only thing that makes sense. \| The paper in question raised valid concerns about the nature of \| (current) LLMs and I see no reason why it should age poorly. \| [deleted] ___________________________________________________________________ (page generated 2023-01-14 23:00 UTC)