|
| SketchySeaBeast wrote:
| > But the human reviewers didn't do much better: they correctly
| identified only 68% of the generated abstracts and 86% of the
| genuine abstracts. They incorrectly identified 32% of the
| generated abstracts as being real and 14% of the genuine
| abstracts as being generated.
|
| Isn't the headline actually "Abstracts written by ChatGPT fool
| scientists 1/3 of the time"? Having never written one myself,
| wouldn't the abstract be the place where ChatGPT shines, being
| able to write unsubstantiated information confidently? I imagine
| getting into the meat of the paper would quickly reveal issues.
| lelandfe wrote:
| > I imagine getting into the meat of the paper would quickly
| reveal issues
|
| This is a tautology: the thing that can be validated can be
| validated.
| SketchySeaBeast wrote:
| I think it has an important difference for ChatGPT - it likes
| to generate numbers that make absolutely no sense. A human
| that lies will try to generate sufficiently correct data to
| convince. ChatGPT often won't even make an attempt to produce
| values that fit.
| neaden wrote:
| ChatGPT right now also likes to make up fake citations when
| you ask it how it knows something, which could be checked
| quickly.
| InCityDreams wrote:
| ...request an example [I've tried, avail is none].
| daveguy wrote:
| I guess that's good for us meat beings. Better for an AI to
| incompetently lie than competently lie.
|
| I wonder if having AI models available would make it
| significantly easier to identify material created by that
| model. Seems it would be easier, but it would still be a
| big problem. Nets can be trained to identify ai vs not ai
| on a given model. And that's without needing access to the
| model weights, just training examples. But when there are N
| potential models...
|
| Edit: Second paragraph added later.
| poulsbohemian wrote:
| >they correctly identified only 68% of the generated abstracts
| and 86% of the genuine abstracts.
|
| I think you and I are basically in alignment... what this tells
| me is that 14% of real abstracts are so bad that other human
| beings call their BS. Meanwhile, this AI stuff is kinda working
| 32% of the time in generating legitimately interesting ideas.
|
| So at that point - yeah, that sounds about right. The 32% is
| still so low that it shows AI is not anywhere near maturity,
| whereas 14% of human-generated is crap.
|
| And, yeah - a short blurb like an abstract seems to be exactly
| the _kind_ of text that ChatGPT is conditioned to do well
| generating. As others below note - once a human starts reading
| the rest, the alarm bells trigger.
| sdenton4 wrote:
| (Buuuut let us also enjoy the fact that ML can generate a
| decent-looking abstract to a scientific paper 32% of the
| time. This represents massive progress; I would venture that
| most humans cannot write such an abstract convincingly.)
| PragmaticPulp wrote:
| Writing text that feels plausibly real is ChatGPT's specialty.
|
| Fake scientific papers that are written with the language,
| vocabulary, and styling of an academic paper have been a problem
| for a long time. The supplement and alternative medicine
| industries have been producing fake studies at high volumes for
| years now. ChatGPT will only make it accessible to a wider
| audience.
| mojomark wrote:
| Isn't that the reason we have trusted scientific peer review
| journals? I mean, why trust a paper that hasn't been vetted by
| a trusted source? The same is true in news media - I don't give
| any stock to news content that isn't published by a well-
| trusted source (and I do pay for subscriptions, e.g. AAAS,
| Financial Times, etc., for that very reason). I guess I don't
| understand the concern - the world has always been filled with
| junk information and we have tried and true systems in place
| already to deal with it.
| rqtwteye wrote:
| I don't see the problem. A lot of tech writing will probably be
| done by AI soon. It's about the content of the paper.
| xeyownt wrote:
| Yes. And if I can use ChatGPT to write an abstract for me from
| my paper, let's go!
| pvaldes wrote:
| > And if I can use ChatGPT to write an abstract for me from
| my paper, let's go!
|
| Is ChatGPT located in a central repository or cloud? Is
| centralized? If that, probably a bad idea.
|
| A private company having access to your abstract before you
| publish it could easily lead to problems like plagiarism
| (even worse, automatized plagiarism) or give an unfair
| advantage to one in two teams running to publish the same
| result. Science has a lot of this cases.
| shadowgovt wrote:
| TBH, a paper's abstract is supposed to summarize the purpose and
| findings in the paper, so auto-generation of what is otherwise
| "repeating what the rest of the paper says" should be considered
| a win; it's automating boring work.
|
| If ChatGPT can't do that (i.e. if it's attaching abstracts
| disjoint from the paper body), it's not the right tool for the
| job. A tool for that job would be valuable.
| ajsnigrutin wrote:
| Considering how much intentionally fake garbage got published,
| this doesn't surprise me at all... and this is not just random
| scientists, but scientists who should (atleast theoretically)
| know enough to be able to notice it's gibberish.
|
| https://en.wikipedia.org/wiki/Sokal_affair
|
| https://en.wikipedia.org/wiki/List_of_scholarly_publishing_s...
| albntomat0 wrote:
| I'm reminded of the somewhat recent news of a line of Alzheimer's
| research being based on a fabricated paper that was only caught
| many years later [0].
|
| Previously, we've relied on a number of heuristics to determine
| if something is real or not, such as if an image has any signs of
| a poor photoshop job, or if a written work has proper grammar.
| These heuristics work somewhat, but a motivated adversary can
| still get through.
|
| As the quality of fakes gets better, we'll need to develop better
| tools to dealing with them. For science, this could, hopefully,
| result in better work replicating previous works.
|
| I'm quite likely being overly optimistic, but there's a chance
| for positive outcomes here.
|
| [0]: https://www.science.org/content/article/potential-
| fabricatio...
| xiphias2 wrote:
| The requirement to detect something fake is quite easy, and we
| knew it for a long time: publish all data code and everything
| to make the expriments reproducible.
|
| Even if everything is fake, the code has value for further
| research.
|
| It would be nice to have that as a minimun standard at this
| point, as I would prefer to see much less publications that can
| be trusted more than the current situation.
| tptacek wrote:
| That's _not_ easy: a reproduction is a scientific project in
| its own right. Some research is straightforward to reproduce,
| but a lot of it isn 't.
|
| That's not to say scientists shouldn't publish their data;
| they should.
| danuker wrote:
| > Even if everything is fake, the code has value for further
| research.
|
| I'd call that human-computer science partnership. If it
| checks out, it's not fake. Nonhuman scientists are still
| scientists.
| venv wrote:
| Are pipettes or vials scientists? Computers are tools like
| hammers, and have equal agency. There are no nonhuman
| scientists.
| danuker wrote:
| If a developer codes up an AI to scour the web, write an
| article, and submit it to a scientific journal without
| letting the developer see the article, is the developer
| doing science?
|
| If someone trains a translation model between languages
| they don't know, is that someone a translator?
|
| I guess the users of said model would be "translators" as
| they would be doing the translation (without necessarily
| knowing the languages either).
| reillyse wrote:
| unsure if anyone is "doing science". Doing science is
| applying the scientific method.
|
| Making conjectures, deriving predictions from the
| hypotheses as logical consequences, and then carrying out
| experiments or empirical observations based on those
| predictions.
|
| Not sure AI is up to that, and it's debatable if it'll
| ever be able to make and test conjectures. There is a
| difference between symbol manipulation (like outputting
| text) and actual conjecture.
| [deleted]
| kleer001 wrote:
| Thank you. All this airy fairy talk of Ai is fun, but at
| the end of the day it's an inert tool (or toy) without
| human interaction.
| rhn_mk1 wrote:
| For now.
| kderbyma wrote:
| One of the citations is AI generated content itself lol
| bluenose69 wrote:
| The real issue for me is that the bot might generate incorrect
| text, imposing a yet-higher burden on readers who already find it
| difficult to keep up with the literature. It is hard enough,
| working sentence by sentence through a paper (or even an
| abstract) wondering whether the authors made a mistake in the
| work, had difficulty explaining that work clearly, or wasted my
| time by "pumping up" their work to get it published.
|
| The day is already too short, with an expansion of journals. But,
| there's a sort of silver lining: many folks restrict their
| reading to authors that they know, and whose work (and writing)
| they trust. Institutions come into play also, for I assume any
| professor caught using a bot to write text will be denied tenure
| or, if they have tenure, denied further research funding. Rules
| regarding plagiarism require only the addition of a phrase or two
| to cover bot-generated text, and plagiarism is the big sin in
| academia.
|
| Speaking of sins, another natural consequence of bot-generate
| text is that students will be assessed more on examinations, and
| less on assignments. And those exams will be either hand-written
| or done in controlled environments, with invigilators watching
| like hawks, as they do at conventional examinations. We may
| return to the "old days", when grades reflected an assessment of
| how well students can perform, working alone, without resources
| and under pressure. Many will view this as a step backward, but
| those departments that have started to see bot-generated
| assignments have very little choice, because the university that
| gives an A+ to every student will lose its reputation and funding
| very quickly.
| ricksunny wrote:
| For a scicomm publication I wrote the abstract of my explainer
| article leveragjng ChatGPT
|
| https://www.theseedsofscience.org/2022-general-antiviral-pro...
|
| (after I had written the rest of the article and long after
| writing the academic paper underlying it.
|
| Although, the published abstract reads nothing like the abstracts
| that ChatGPT generated for me because of the subtle but important
| factual inaccuracies it generated. But I found it helpful to get
| around my curse-of-knowledge in producing a flowing structure.
|
| My edited, manually fact-checked result flowed less fluidly but
| was accurate to the article body's content. Still overall glad I
| did it that way. I would have otherwise fretted over
| format/structure for a lot longer.
| janosett wrote:
| How easy would it be for researchers to differentiate
| deliberately fabricated abstracts written by humans from
| abstracts of peer-reviewed scientific papers from respected
| publications? I think the answer to that question might give more
| context to this result.
| PeterisP wrote:
| Probably impossible. As a reviewer, the abstract won't tell me
| if the paper is bullshit or faked. An abstract _can_ tell me
| that there are substantial language issues, or that the authors
| are totally unskilled about the field, or that the topic is not
| interesting to me, or their claims lack ambition, but beyond
| that crude filter, all the data for separating poor papers from
| awesome ones, and true claims from unfounded one can only be in
| the paper itself, an abstract won 't contain them.
| mikenew wrote:
| > if scientists can't determine whether research is true, there
| could be "dire consequences"
|
| Yeah well we can't tell that now either. Maybe we can finally
| start publishing raw data alongside these "trust us we found
| something" papers that people evaluate based on the reputation of
| the journal and the authors.
|
| As someone else pointed out, that system has already derailed
| decades of Alzheimer's research. It's stupid and broken and it
| should have changed a long time ago.
|
| https://www.science.org/content/article/potential-fabricatio...
| thro1 wrote:
| Isn't it how abstracts shall be ? - excluding phenomenal
| characteristics like: different formulas to get it, human or
| author involvement, creativity; in pure form being scientific
| form of some work, like an equation catching the essence without
| flaws or distractions - and that's what computers are for. to
| proceed, then humans may don't have to ??
|
| But I'm lost at what those scientists are trying to find.. (?)
| nathias wrote:
| I bet we can make an AI that can differentiate them better ...
| venv wrote:
| That would just lead to an AI that makes better abstracts a la
| GAN.
| nathias wrote:
| of course, what I mean is that it's now an AI vs AI battle
| VyseofArcadia wrote:
| I know it was just titles, but I was having a good day on "arxiv
| vs snarxiv" if I did better than random chance. And that was just
| a Markov text generator, no fancier AI needed.
| Someone wrote:
| I don't understand. Doesn't the author list give that away ;-) ?
|
| (https://pubmed.ncbi.nlm.nih.gov/36549229/)
| wallfacer120 wrote:
| [dead]
| Octokiddie wrote:
| From the original paper (linked in the article):
|
| > ... When given a mixture of original and general abstracts,
| blinded human reviewers correctly identified 68% of generated
| abstracts as being generated by ChatGPT, but incorrectly
| identified 14% of original abstracts as being generated.
| Reviewers indicated that it was surprisingly difficult to
| differentiate between the two, but that the generated abstracts
| were vaguer and had a formulaic feel to the writing.
|
| That last part is interesting because "vague" and "formulaic"
| would be words I'd use to describe ChatGPT's writing style now.
| This is a big leap forward from the outright gibberish of just a
| couple of years ago. But using the "smart BSer" heuristic will
| probably get a lot harder in no time.
|
| Also, it's worth noting that just four human reviewers were used
| in the study (and are listed as authors). The article doesn't
| mention level of expertise of these reviewers, but I suspect that
| could also play a role.
| amelius wrote:
| We could use this to test the peer-review system.
| 323 wrote:
| 100% there is a group right now making an AI generated paper and
| trying to publish it for the next iteration of the Sokal affair.
|
| https://en.wikipedia.org/wiki/Sokal_affair
| shadowgovt wrote:
| It's weird to me that scientists make so much hay of the Sokal
| affair given how unscientific it is.
|
| It's a single data point. Did anyone ever claim the editorial
| process of _Social Text_ caught 100% of bunk? If not, how do we
| determine what percent it catches based on one slipped-through
| paper?
|
| I'd expect scientists to demand both more reproducibility and
| more data to draw conclusions from one anecdote.
| shakow wrote:
| Well, given that all paper abstracts have to follow the same
| structure with the same keywords and be conservative to get a
| chance to get published, it makes sense that ChatGPT shines
| there.
|
| IMHO, it says more about the manic habits of journal editors than
| anything else.
| jacquesm wrote:
| That's a feature, not a bug. It means that when you have 100
| papers to check for applicability to something that you are
| researching you can do so in a minimum of time.
| nkko wrote:
| What I see as wrong here is an AI witch-hunt. AI is a tool. And
| it would be the same as calling the baning the use of a car cause
| horses exist. Obviously the disruption is happening, which is
| always a good thing as it should lead to progress.
| venv wrote:
| On the other hand, all kinds of technology have been regulated
| to minimize adverse effects. The trouble with software is that
| it is evolving faster than regulators can keep track of, and it
| is very hard to police even if regulated.
| asdff wrote:
| It's probably a little easier to fool people with AI generated
| scientific literature than a regular piece of literature. Most
| scientists are not good writers to begin with. English might not
| even be their first, or even second or third language. Even then,
| there are a lot of crutch words and phrases that scientists rely
| upon. "Novel finding" "elucidate" "putative" "could one day pave
| way for" "laying important groundwork" and all sorts of similar
| words and phrases are highly overused, especially in the
| abstract, intro, and discussion sections where you wax lyrical
| about hypothetical translational utility from your basic research
| finding. A lot of scientific writers could really use a
| thesaurus, and learn more ways to structure a sentence.
| uniqueuid wrote:
| Your critique assumes that the goal of scientific writing is to
| be intelligible to lay people.
|
| In truth, the entire weird and crufty vocabulary is simply a
| common set of placeholders that makes it easier to grasp
| research, because the in-group learns to understand them as
| such.
| asdff wrote:
| I'm not saying this contributes to being more unintelligible.
| These are just filler words anyhow, not jargon. I agree that
| if anything, it makes it faster to read a paper since your
| brain just glosses over the same structures you've read 1000
| times already and directs you to the meat. However, as
| someone who reads a lot of papers for my job, I just wish
| writers were more interesting----you will never see an em
| dash like I've used here, for example. Maybe scientists could
| benefit from reading more Hemingway in their downtime.
| eslaught wrote:
| As a computer scientist (you can check my publication record
| through my profile) and an (aspiring) novelist, I disagree. A
| lot of papers are just poorly written, full stop.
|
| It is _also_ true that science literature contains a lot of
| jargon that encodes important information. But that doesn 't
| excuse the fact that a lot of scientific writing could be
| improved substantially, even if the only audience were
| experts in the same field.
| LolWolf wrote:
| Yeah, a lot of scientific writing is just _downright
| useless_ , and I don't just mean that in the "haha, it's
| hard to read, but it's ok"-sense. For example, in many
| fields (parts of theoretical physics, many parts of econ)
| publications are so hard to read that "reading" a paper
| looks less like "learning from the author by following what
| they did on paper" and more like "rederiving the same thing
| that the author claims to do, except by yourself with only
| some minor guidance from the paper." This is, frankly,
| absolutely insane, but it's the current state of things.
| chaxor wrote:
| It's a fine line to walk when publishing. For example, is
| it ok to use the term "Hilbert space" in an article?
| Perhaps in physics, but not if publishing in biology - or
| at least in biology, a few sentences to describe the term
| may be more appropriate. But the use of the term is
| actually quite useful, as in this manufactured example
| the article may apply only to Hilbert spaces but not all
| vector spaces. So since the distinction may be important
| to the finding, the terminology is necessary.
| nixpulvis wrote:
| I find it almost deliciously ironic that we research and
| development engineers in the field of computer science have
| expertly uncovered and deployed exactly the tools needed to flood
| our own systems and overwhelm our ability to continue doing the
| processes we depended on to create this situation in the first
| place.
|
| It's like we've reached a fixed point, global minima for academic
| ability as a system. You could almost argue it's inevitable. Any
| system that looks to find abstractions in everything and
| generalize at all costs will ultimately learn to automate itself
| into obscurity.
|
| Perhaps all that's left now is to critique everything and cry
| ourselves to sleep at night? I jest!
|
| But it does seem immensely tiresome and deters "real science".
| danuker wrote:
| Getting through peer review is the ultimate Turing test.
| strangattractor wrote:
| There is only so much peer review can actually accomplish. Mostly
| a reviewer can tell if the work was performed with a certain
| amount of rigor and the results are supported by the techniques
| used to test the claimed results. It doesn't guaranty there were
| no mistakes made. Having others reproduce the results is the only
| true way to verify an experiment. Unfortunately you don't get
| tenure for reproducing other people work.
| weakfortress wrote:
| I think part of the problem comes to the sheer amount of jargon
| in even the simplest research paper. During my time in graduate
| school (CS) I would often do work that used papers in mathematics
| (differential geometry) for some of the stuff I was researching.
| Even having been fairly well versed in the jargon of both fields
| I was often left dumbfounded reading a paper.
|
| This would seem to me a situation that is easily exploited by an
| AI that generate plausible text. If you pack enough jargon into
| your paper you will probably make it past several layers of
| review until someone actually sits down and checks the
| math/consistency which will be, of course, off in a way that is
| easily detected.
|
| It's a problem academia has in general. Especially in STEM fields
| they have gotten so specialized that you practically need a
| second PhD in paper reading to even begin to understand the
| cutting edge. Maybe forcing text to be written so that early
| undergrads can understand it (without simplifying it to the point
| of losing meaning) would prevent this as an AI would likely be
| unable to do such feat without real context and understanding of
| the problem. Almost like adversarial Feynman method.
| [deleted]
| pcrh wrote:
| As a researcher, I would expect any researcher to be able to
| generate fake abstracts. However, I suspect that generating a
| whole paper that had any interest would be nigh on impossible for
| AI to do. An interesting paper would have to have novel claims
| that were plausible and supported by a web of interacting data.
| JoshTriplett wrote:
| > An interesting paper would have to have novel claims that
| were plausible and supported by a web of interacting data.
|
| And if AI can manage that, well: https://xkcd.com/810/
| avgcorrection wrote:
| Abstracts can just be keyword soups. Then the AI just has to make
| sure that the keywords make some vague sense when put next to
| each other. Or if not they can mix in existing keywords with
| brand new ones.
|
| Abstracts don't have to justify or prove what they state.
| lairv wrote:
| At least one nice side-effect of this could be that only
| reproducible research with code provided will matter in the
| future (this should already be the case but for some reason isn't
| yet). What's the point of trusting a paper without code if
| ChatGPT can produce 10 such papers with fake results in less than
| a second
| ben_w wrote:
| ChatGPT can produce code too. Therefore I think this may call
| for something more extreme -- at risk of demonstrating my own
| naivete about modern science, perhaps only allowing publication
| after replication, rather than after peer-review?
| lairv wrote:
| Ideally yes, for a paper to be accepted it should be
| reproduced, if ChatGPT is ever able to produce code that runs
| and produce SOTA results then I guess we won't need
| researchers anymore
|
| There is however a problem when the contents of the papers
| costs thousands/millions of $ to be reproduced (think GPT3,
| DALLE, and most of the papers coming Google, OpenAI, Meta,
| Microsoft). More than replication, it would require fully
| open science where all the experiments and results of a paper
| are publicly available, but I doubt tech companies will agree
| with that.
|
| Ultimately it could also end up with researchers only
| trusting papers coming from known labs/people/companies
| PeterisP wrote:
| Reproduction of experiments generally comes after
| publication, not before acceptance. Reviewers of a paper
| would review the analysis of the data, and whether the
| conclusions are reasonable given the data, but no one would
| expect a reviewer to replicate a chemical experiment, or
| the biopsy of some mice, or re-do a sociological survey or
| repeat observation of some astronomy phenomenon, or any
| other experimental setup.
|
| Reviewers work from an assumption that the data is valid,
| and reproduction (or failed reproduction) of a paper
| happens as part of the scientific discourse _after_ the
| paper is accepted and published.
| jacquesm wrote:
| Not all science results in 'code'.
| lairv wrote:
| Indeed and other sciences seems even harder to
| reproduce/verify (e.g. how can mathematicians efficiently
| verify results if chatgpt can produce thousands of wrong
| proofs)
| ben_w wrote:
| Mathematicians have it easier than most, there are
| already ways to automate testing in their domain.
|
| Kinda needed to be, given the rise of computer-generated
| proofs starting with the 4-colour theorem in 1976.
| lairv wrote:
| > there are already ways to automate testing in their
| domain.
|
| Do you mean proof assistant like Lean ? From my limited
| knowledge of fundamental math research, I thought most
| math publications these days only provide a paper with
| statements and proofs, but not with a standardized format
| ben_w wrote:
| I can't give many specifics, my knowledge is YouTube
| mathematicians like 3blue1brown and Matt Parker taking
| about things like this.
| ben_w wrote:
| I'm thinking of the LHC or the JWST: billions of dollars
| for an essentially unique instrument, though each produces
| far more than one paper.
|
| Code from ChatGPT could very well end up processing data
| from each of them -- I wouldn't be surprised if it already
| has, albeit in the form of a researcher playing around with
| the AI to see if it was any use.
| gus_massa wrote:
| Nice trick for ChatGPT, but this will not destroy science.
|
| Nobody takes a serious decision reading only the abstract. Look
| at the tables, look at the graphs, look at the strange details.
| Look at the list of authors, institutions, ...
|
| Has it been reproduced? Has the last few works of the same team
| been reproduced? And if it's possible, reproduce it locally.
| People claim that nobody reproduce other teams works, but that's
| misleading. People reproduce other teams works unofficially, or
| with some tweaks. An exact reproductions is difficult to publish,
| but if it has a few random tweaks ^W^W improvements, it's more
| easy to get it published.
|
| The only time I think people read only the abstract is to accept
| talks for conference. I've seen a few bad conference talks, and
| the problem is that sometimes the abstracts get posted on like in
| bulk without further check. So the conclusion is don't trust
| online abstracts, always read the full paper.
|
| EDIT: Look at the journal where it's published. [How could I have
| forgotten that!]
| nixpulvis wrote:
| I'm quite confident that there are cliques within "science"
| which are admitted without as much as a glance at the body of
| the papers. Some people simply cannot be bothered to get past
| the paywalls, others accept on grounds outside the content of
| the paper, like local reputation or tenure. Others are asked to
| review without the needed expertise, qualification, or time to
| properly understand the content. Even the most honorable
| reviewers make mistakes and overlook critical details. Then
| there are the set of papers which are (rightfully so) largely
| about style, consistency, and honestly, fashion.
|
| How can we yield results from an industry being lead by
| automated derivatives of the past?
|
| Is an AI-generated result any less valid than one created by a
| human with equally poor methods?
|
| Will this issue bring new focus on the larger problems of the
| bloated academic research community?
|
| Finally, how does this impact the primary functions of our
| academic institutions... _teaching_.
| Animats wrote:
| Why are automatically generated abstracts bad? That seems a
| useful tool. It would be a problem if the abstracts are factually
| wrong or misleading.
|
| They'd probably be better than what comes out of university PR
| departments.
| klysm wrote:
| I hope the abstract for this paper is AI-generated.
| bee_rider wrote:
| If an software system can generate abstracts, good. Nobody got
| into research for love of abstract-writing.
|
| It is a tool. Ultimately researchers are responsible for their
| use of a tool, so they should check the abstract and make sure it
| is good, but there's no reason it should be seen as a bad thing.
___________________________________________________________________
(page generated 2023-01-12 23:00 UTC) |