|
| hardwaregeek wrote:
| I'm still midway through the paper, but I gotta say, I'm a little
| surprised at the contrast between the contents of the paper and
| how people have described it on HN. I don't agree with everything
| that is said, but there are some interesting points made about
| the data used to train the models, such as it capturing bias (I
| would certainly question the methodology of using reddit as a
| large source of training data), and that bias being amplified by
| filtering algorithms that produce the even larger datasets used
| for modern LLMs. The section about environmental impact might not
| hit home for everyone, but it is valid to raise issues around the
| compute usage involved in training these models. First, because
| it limits this training to companies who can spend millions of
| dollars on compute, and second because if we want to scale up
| models, efficiency is probably a top goal.
|
| What really confuses me here is how this paper is somehow outside
| the realm of valid academic discourse. Yes, it is steeped in
| activist, social justice language. Yes, it has a different
| perspective than most CS papers. But is that wrong? Is that
| enough of a sin to warrant such a response that this paper has
| received? I'll need to finish the paper to fully judge, but I'm
| leaning towards no, it is not enough of a sin.
| monkaiju wrote:
| The problems with LLM are numerous but whats really wild to me is
| that even as they get better at fairly trivial tasks the
| advertising gets more and more out of hand. These machine dont
| think, and they dont understand, but people like the CEO of
| OpenAI allude to them doing just that, obviously so the hype can
| make them money.
| amelius wrote:
| Could be the sign of the next AI winter.
| visarga wrote:
| > These machine dont think, and they dont understand
|
| But they do solve many tasks correctly, even problems with
| multiple steps and new tasks for which they got no specific
| training. They can combine skills in new ways on demand. Call
| it what you want.
| choeger wrote:
| They don't. Solve tasks, I mean. There's not a single task
| you can throw at them and rely on the answer.
|
| Could they solve tasks? Potentially. But how would we ever
| know that we could trust them?
|
| With humans we not only have millennia of collective
| experience when it comes to tasks, judging the result, and
| finding bullshitters. Also, we can retrain a human on the
| spot and be confident they won't immediately forget something
| important over that retraining.
|
| If we ever let a model produce important decisions, I'd
| imagine we'd want to certify it beforehand. But that excludes
| improvements and feedback - the certified software should
| better not change. If course, a feedback loop could involve
| recertification, but that means that the certification
| process itself needs to be cheap.
|
| And all that doesn't even take into account the generalized
| interface: How can we make sure that a model is aware of its
| narrow purpose and doesn't answer to tasks outside of that
| purpose?
|
| I think all these problems could eventually be overcome, but
| I don't see much effort put into such a framework to actually
| make models solve tasks.
| pedrosorio wrote:
| > Also, we can retrain a human on the spot and be confident
| they won't immediately forget something important over that
| retraining.
|
| I don't have millennia, but my more than 3 decades of
| experience interacting with human beings tell me this is
| not nearly as reliable as you make it seem.
| didntreadarticl wrote:
| I dont think you understand mate
| LarryMullins wrote:
| > _These machine dont think_
|
| And submarines don't swim.
| Dylan16807 wrote:
| And it would be bad for a submarine salesman to go to people
| that think swimming is very special and try to get them
| believing that submarines do swim.
| LarryMullins wrote:
| Why would that be bad? A submarine salesman convincing you
| that his submarine "swims" doesn't change the set of
| missions a submarine might be suitable for. It makes no
| practical difference. There's no point where you get the
| submarine and it meets all the advertised specs, does
| everything you needed a submarine for, but you're
| unsatisfied with it anyway because you now realize that the
| word "swim" is reserved for living creatures.
|
| And more to the point, nobody believes that "it thinks" is
| sufficient qualification for a job when hiring a human, so
| why would it be different when buying a machine? Whether or
| not the machine "thinks" doesn't address the question of
| whether or not the machine is capable of doing the jobs you
| want it to do. Anybody who neglects to evaluate the
| _functional capability_ of the machine is simply a fool.
| sthatipamala wrote:
| This paper is the product of a failed model of AI safety, in
| which dedicated safety advocates act as a public ombudsman with
| an adversarial relationship with their employer. It's baffling to
| me why anyone thought that would be sustainable.
|
| Compare this to something like RLHF[0] which has acheived far
| more for aligning models toward being polite and non-evil. (This
| is the technique that helps ChatGPT decline to answer questions
| like "how to make a bomb?")
|
| There's still a lot of work to be done and the real progress will
| be made by researchers who implement systems in collaboration
| with their colleagues and employers.
|
| [0] https://openai.com/blog/instruction-following/
| generalizations wrote:
| > The resulting InstructGPT models are much better at following
| instructions than GPT-3. They also make up facts less often,
| and show small decreases in toxic output generation. Our
| labelers prefer outputs from our 1.3B InstructGPT model over
| outputs from a 175B GPT-3 model, despite having more than 100x
| fewer parameters.
|
| I wonder if anyone's working on public models of this size.
| Looking forward to when we can selfhost ChatGPT.
| lumost wrote:
| This is going to happen _alot_ over the next few years. One
| can fine tune GPT-2 medium on an RTX2070. Training GPT-2
| medium from scratch can be done for $162 on vast.ai. The
| newer H100 /Trainium/Tensorcore chips will bring the price
| down even further.
|
| I suspect if one wanted to fully replicate ChatGPT from
| scratch it would take ~1-2 million including label
| acquisition. You probably only require ~200-500k in compute.
|
| The next few years are going to be wild!
| generalizations wrote:
| These things have reached the tipping point where they
| provide significant utility to a significant portion of the
| computer scientists working on making these things. Could
| be that the coming iterations of these new tools will make
| it increasingly easy to write the code for the next
| iterations of these tools.
|
| I wonder if this is the first rumblings of the singularity.
| visarga wrote:
| chatGPT being able to write OpenAI API code is great, and
| all companies should prepare samples so future models can
| correctly interface with their systems.
|
| But what will be needed is to create an AI that
| implements scientific papers. About 30% of papers have
| code implementation. That's a sizeable dataset to train a
| Codex model on.
|
| You can have AI generating papers, and AI implementing
| papers, then learning to predict experimental results.
| This is how you bootstrap a self improving AI.
|
| It does not learn only how to recreate itself, it learns
| how to solve all problems at the same time. A data
| engineering approach to AI: search and learn / solve and
| learn / evolve and learn.
| williamcotton wrote:
| I can imagine a world where there are an infinity of
| "local maximums" that stop a system from reaching a
| singular feedback loop... imagine if our current tools
| help write the next generation, so on, so on, until it
| gets stuck in some local optimization somewhere. Getting
| stuck seems more likely than not getting stuck, right?
| Natsu wrote:
| > Compare this to something like RLHF[0] which has acheived far
| more for aligning models toward being polite and non-evil.
| (This is the technique that helps ChatGPT decline to answer
| questions like "how to make a bomb?")
|
| I recently saw a screenshot of someone doing trolley problems
| with people of all races & ages with ChatGPT and noting
| differences. That makes me not quite as confident about
| alignment as you are.
| sthatipamala wrote:
| I am curious to see that trolley problem screenshot. I saw
| another screenshot where ChatGPT was coaxed into justifying
| gender pay differences by prompting it to generate
| hypothetical CSV or JSON data.
|
| Basically you have to convince modern models to say bad stuff
| using clever hacks (compared to GPT-2 or even early GPT-3
| where it would just spout straight-up hatred with the
| lightest touch).
|
| That's very good progress and I'm sure there is more to come.
| cactusplant7374 wrote:
| > I saw another screenshot where ChatGPT was coaxed into
| justifying gender pay differences by prompting it to
| generate hypothetical CSV or JSON data.
|
| I remember seeing that on Twitter. My impression was author
| instructed the AI to discriminate by gender.
| Dylan16807 wrote:
| Did the author tell it which way or by how much?
|
| If I say to discriminate on some feature and it
| consistently does it the same way, that's still a pretty
| bad bias. It probably shows up in other ways.
| andrepd wrote:
| Isn't RLHF trivially easy to defeat (as it stands now)?
| ShamelessC wrote:
| Assuming a motivated "attacker", yes. The average user will
| have no such notion of "jailbreaks", and it's at least clear
| when one _is_ attempting to "jailbreak" a model (given a full
| log of the conversation and a competent human investigator).
|
| I think the class of problems that remain are basically
| outliers that are misaligned and don't trip up the model's
| detection mechanism. Given the nature of language and culture
| (not to mention that they both change over time), I imagine
| there are a lot of these. I don't have any examples (and I
| don't think yelling "time's up" when such outliers are found
| is at all helpful).
| visarga wrote:
| > researchers who implement real systems
|
| That's what I didn't like about Gebru - too much critique, not
| a single constructive suggestion. Especially her Gender Shades
| paper where she forgot about Asians.
|
| http://proceedings.mlr.press/v81/buolamwini18a/buolamwini18a...
|
| I think AnthropicAI is a great company to follow related to
| actually solving these problems. Look at their "Constitutional
| AI" paper. They automate and improve on RLHF.
|
| https://www.anthropic.com/constitutional.pdf
| [deleted]
| srvmshr wrote:
| I am of the general understanding that this paper became less
| about the LLMs & more of a insinuating hit piece against
| Alphabet. At least, some of the controversial nuggets got Gebru
| (and later M Mitchell) fired.
|
| From a technical standpoint, there is little new stuff that I
| found this paper offered in understanding why LLMs can have
| unpredictable nature, or what degree of data will get exposed by
| clever hacks (or if there are systematic ways to go about it). It
| sounded more like a collection of verifiable anecdotes for easy
| consumption (which can be a good thing by itself if you want
| capsule understanding in a non-technical way)
| visarga wrote:
| It was activism masquerading as science. Many researches noted
| that positives and negatives were not presented in a balanced
| way. New approaches and efforts were not credited.
| srvmshr wrote:
| I haven't kept track but the activism of the trio could be
| severe sometimes.
|
| (Anecdotally, I have faced a bite-sized brunt: When
| discussion surrounding this paper was going on in Twitter, I
| had mentioned in my timeline (in a neutral tone) that "dust
| needed to settle to understand what was going wrong". This
| was unfortunately picked up & RTed by Gebru & the mob
| responded by name-calling, threatening DMs accusing me of
| racism/misogyny etc, and one instance of a call to my
| employer asking to terminate me - all for that one single
| tweet. I don't want confrontations - not my forte to deal.)
| [deleted]
| [deleted]
| zzzeek wrote:
| > This was unfortunately picked up & RTed by Gebru & the
| mob responded by name-calling, threatening DMs accusing me
| of racism/misogyny etc, and one instance of a call to my
| employer asking to terminate me - all for that one single
| tweet.
|
| Wait until an LLM flags your speech and gets you in
| trouble. That'll be a real hoot compared to random
| individuals who likely have been chased off Twitter by now.
| visarga wrote:
| Sounds similar to what I have witnessed on Twitter, not
| against me, but against a few very visible people in the AI
| community.
| [deleted]
| [deleted]
| larve wrote:
| I just finished working my way through this this morning. The
| literature list is quite interesting and gives a lot of pointers
| for people who want to walk the line between overblown hype and
| doomsday scenarios.
| xiaolingxiao wrote:
| I believe this is the papers that got timnit and mmitchel fired
| from google, followed by a protracted media/legal campaign
| against google and vice versa.
| freyr wrote:
| I suspect it was Timnit's behavior after the paper didn't pass
| internal review that actually got her fired (issuing an
| ultimatum and threatening to resign unless the company met her
| demands; telling her coworkers to stop writing documents
| because their work didn't matter; insinuations of
| racist/misogynistic treatment from leadership when she didn't
| get her way).
| visarga wrote:
| I think it was a well calculated career move, she wanted
| fame, she got what she wanted. Now she's leading a new
| research institute
|
| > We are an interdisciplinary and globally distributed AI
| research institute rooted in the belief that AI is not
| inevitable, its harms are preventable, and when its
| production and deployment include diverse perspectives and
| deliberate processes it can be beneficial. Our research
| reflects our lived experiences and centers our communities.
|
| https://www.dair-institute.org/about
| oh_sigh wrote:
| A small correction: this paper didn't get her fired, her
| reaction to feedback on this paper got her fired.
|
| Note to all: if you give an employer an ultimatum "do X or I
| resign", don't be surprised if they accept your resignation.
| [deleted]
| [deleted]
| 2bitencryption wrote:
| Pure speculation ahead-
|
| The other day on Hacker News, there was that article about how
| scientists could not tell GPT-generated paper abstracts from real
| ones.
|
| Which makes me think- abstracts for scientific papers are high-
| effort. The corpus of scientific abstracts would understandably
| have a low count of "garbage" compared to, say, Twitter posts or
| random blogs.
|
| That's not to say that all scientific abstracts are amazing, just
| that their goal is to sound intelligent and convincing, while
| probably 60% of the junk fed into GPT is simply clickbait and
| junk content padded to fit some publisher's SEO requirements.
|
| In other words, ask GPT to generate an abstract, and I would
| expect it to be quite good.
|
| Ask it to generate a 5-paragraph essay about Huckleberry Finn,
| and I would expect it to be the same quality as the corpus- that
| is to say, high-school English students.
|
| So now that we know these models can learn many one-shot tasks,
| perhaps some cleanup of the training data is required to advance.
| Imagine GPT trained ONLY on the library of congress, without the
| shitty travel blogs or 4chan rants.
| williamcotton wrote:
| The science is in the reproduction of the methodology, not in
| the abstract... in fact, a lot of garbage publications with
| catchy abstracts built on a shaky foundation sounds like one of
| the issues that plagues contemporary science. That people would
| stop finding abstracts useful seems a good thing!
| JPLeRouzic wrote:
| > " _The corpus of scientific abstracts would understandably
| have a low count of "garbage" compared to, say, Twitter posts
| or random blogs_"
|
| That's certainly true, but it's not by a so large margin, at
| least in biology.
|
| For example in ALS (a neurodegenerative disease) there is a
| real breakthrough perhaps every two years, but most papers
| about ALS (thousands every year) look like they describe
| something very important.
|
| Similarly for ALZforum the most recent "milestone" paper about
| Alzheimer disease was in 2012, yet in 2022 alone there were
| more than 16K papers!
|
| So the ratio signal on noise is close to zero.
|
| https://www.alzforum.org/papers?type%5Bmilestone%5D=mileston...
|
| https://pubmed.ncbi.nlm.nih.gov/?term=alzheimer%27s+disease&...
| [deleted]
| ncraig wrote:
| Some might say that abstracts are the original clickbait.
| weeksie wrote:
| This was mostly political guff about environmentalism and bias,
| but one thing I didn't know was that apparently larger models
| make it easier to extract training data.
|
| > Finally, we note that there are risks associated with the fact
| that LMs with extremely large numbers of parameters model their
| training data very closely and can be prompted to output specific
| information from that training data. For example, [28]
| demonstrate a methodology for extracting personally identifiable
| information (PII) from an LM and find that larger LMs are more
| susceptible to this style of attack than smaller ones. Building
| training data out of publicly available documents doesn't fully
| mitigate this risk: just because the PII was already available in
| the open on the Internet doesn't mean there isn't additional harm
| in collecting it and providing another avenue to its discovery.
| This type of risk differs from those noted above because it
| doesn't hinge on seeming coherence of synthetic text, but the
| possibility of a sufficiently motivated user gaining access to
| training data via the LM. In a similar vein, users might query
| LMs for 'dangerous knowledge' (e.g. tax avoidance advice),
| knowing that what they were getting was synthetic and therefore
| not credible but nonetheless representing clues to what is in the
| training data in order to refine their own search queries
|
| Shame they only gave that one graf. I'd like to know more about
| this. Again, miss me with the political garbage about "dangerous
| knowledge", the most concerning thing is the PII leakage as far
| as I can tell.
| visarga wrote:
| Is this a good or bad thing? We hear "hallucination" this and
| that. You can't rely on the LLM. It is not like a search
| engine. But then you hear on the other side "it memorises PII".
|
| Being able to memorise information is demanded when we want the
| top 5 countries by population in Europe or the height of
| Everest. But then we don't want it in other contexts.
|
| Looks more like a dataset pre-processing issue.
| weeksie wrote:
| I _think_ I agree with this take.
|
| Is it conceivable that a model could leak PII that is present
| but extremely hard to detect in the data set? For example,
| spread out in very different documents in the corpus that
| aren't obviously related, but that the model would synthesize
| relatively easily?
| srvmshr wrote:
| That is sort of understood facts with even models like Copilot
| & ChatGPT. With the amount of information we are generally
| churning, all PII may not get scrubbbed. And these LLMs often
| could be running on unsanitized data - like a cache of Web on
| Archive.org, Getty images & the likes.
|
| I feel this is a unavoidable consequence of using LLM. We
| cannot ensure all data is free from any markers. I am not a
| expert on databases/data engineering so please take it as an
| informed opinion
| weeksie wrote:
| Copilot has a ton of well publicised examples of verbatim
| code being used, but I didn't realize that it was as trivial
| as all that to go plumbing for it directly.
| [deleted]
| platypii wrote:
| This paper is embarrassingly bad. It's really just an opinion
| piece where the authors rant about why they don't like large
| language models.
|
| There is no falsifiable hypothesis to be found in it.
|
| I think this paper will age very poorly, as LLMs continue to
| improve and our ability to guide them (such as with RLHF)
| improves.
| jasmer wrote:
| This is ok. 90% of research is creative thinking, dialogue. One
| idea creates the next, some are a foil, some are dead ends. As
| long as there are not outrageous claims being made for 'hard
| evidence' where there is none, it's fine. Maybe the format
| isn't fully appropriate but the content is. Most good things
| come about in a non-linear process which involves provocation
| along the line somewhere.
| janalsncm wrote:
| I expect science to have a hypothesis which can be falsified.
| Otherwise it's just opining on a topic. Otherwise we could
| just call this HN thread "research".
| joshuamorton wrote:
| Position papers are exceedingly common. Common enough that
| there's a term for them.
| xwn wrote:
| I don't know, without enumerating risks to check, there's
| little basis for doing due diligence and quelling investors.
| This massively-cited paper gave a good point of departure for
| establishing rigorous use of LLMs in the real world. Without
| that, they're just an unestablished tech with unknown downsides
| - that's harder to get into true mass acceptance outside the
| SFBA/tech bubble.
| srvmshr wrote:
| This is generally my feeling as well with the paper.
|
| You don't come out feeling "Voila! this tiny thing I learnt is
| something new", which does happen often with many good papers.
| Most of the paper just felt a bit anecdotal & underwhelming
| (but I may be too afraid to say the same on Twiiter for good
| reason)
| Lyapunov_Lover wrote:
| Why would there be a falsifiable hypothesis in it? Do you think
| that's a criterion for something being a scientific paper or
| something? If it ain't Popper, it ain't proper?
|
| LLMs dramatically lower the bar for generating semi-plausible
| bullshit and it's highly likely that this will cause problems
| in the not-so-distant future. This is already happening. Ask
| any teacher anywhere. Students are cheating like crazy, letting
| chatGPT write their essays and answer their assignments without
| actually engaging with the material they're supposed to grok.
| News sites are pumping out LLM-generated articles and the ease
| of doing so means they have an edge over those who demand
| scrutiny and expertise in their reporting--it's not unlikely
| that we're going to be drowning in this type of content.
|
| LLMs aren't perfect. RLHF is far from perfect. Language models
| will keep making subtle and not-so-subtle mistakes and dealing
| with this aspect of them is going to be a real challenge.
|
| Personally, I think everyone should learn how to use this new
| technology. Adapting to it is the only thing that makes sense.
| The paper in question raised valid concerns about the nature of
| (current) LLMs and I see no reason why it should age poorly.
| [deleted]
___________________________________________________________________
(page generated 2023-01-14 23:00 UTC) |