|
| boringuser2 wrote:
| Eh.
|
| Altman has a financial incentive to lie and obfuscate about what
| it takes to train a model like GPT-4 and beyond, so his word is
| basically worthless.
| qqtt wrote:
| First of all, if Altman continually makes misleading statements
| about AI he will quickly lose credibility, and that short term
| gain in whatever 'financial incentive' that birthed the lie
| would be eroded in short order by a lack of trust of the head
| of one of the most visible AI companies in the world.
|
| Secondly, all the competitors of OpenAI can plainly assess the
| truth or validity of Altman's statements. There are many
| companies working in tandem on things at the OpenAI scale of
| models, and they can independently assess the usefulness of
| continually growing models. They aren't going to take this
| statement at face value and change their strategy based on a
| single statement by OpenAI's CEO.
|
| Thirdly, I think people aren't really reading what Altman
| actually said very closely. He doesn't say that larger models
| aren't useful at all, but that the next sea change in AI won't
| be models which are orders of magnitude bigger, but rather a
| different approach to existing problem sets. Which is an
| entirely reasonable prediction to make, even if it doesn't turn
| out to be true.
|
| All in all, "his word is basically worthless" seems much to
| harsh an assessment here.
| manojlds wrote:
| Elon Musk has been constantly doing this and thriving.
| cogitoergofutuo wrote:
| It is possible that GP meant that Altman's word is basically
| worthless _to them_ , in which case that's not something that
| can be argued about. It's a factually true statement that
| that is their opinion of that man.
|
| I personally can see why someone could arrive at that
| position. As you've pointed out, taking Sam Altman at face
| value can involve suppositions about how much he values his
| credibility, how much stock OpenAI competitors put in his
| public statements, and the mindsets _people in general_ have
| when reading what he writes.
| mnky9800n wrote:
| dude someone lied their way into being president of the
| united states all while people fact checked him basically
| immediately after each lie. lying doesnt make a difference.
| beowulfey wrote:
| He's not presenting false evidence here, he's presenting a
| hunch. It's a guess. No one is going to gain anything from
| this one way or another.
| olalonde wrote:
| Does he even have any background in machine learning? I always
| found it bizarre that he was chosen to be OpenAI's CEO...
| cowmix wrote:
| On the Lex Fridman podcast, he pretty much admitted he's not
| an AI (per se) and isn't the most excited about the tech (as
| he could be).
| olalonde wrote:
| > he pretty much admitted he's not an AI
|
| Yeah, I also had a hunch he wasn't an AI. (I assume you
| meant "AI researcher" there :))
|
| All joking aside, I wonder how that's affecting company
| morale or their ability to attract top researchers. I know
| if I was a top AI researcher, I'd probably rather work at a
| company where the CEO was an expert in the field (all else
| being equal).
| vorticalbox wrote:
| I feel most CEOs are not top of their field but rather
| people who can take a vision and run with it.
| olalonde wrote:
| It might be true in general; however, AI research
| laboratories are typically an exception, as they are
| often led by experienced AI researchers or scientists
| with extensive expertise in the field.
| gowld wrote:
| He has background in CEO (smooth-talking charmer in the VC
| crowd). That's why he's CEO.
| g_delgado14 wrote:
| IIRC Altman has no financial stake in the success or failure of
| OpenAI to prevent these sorts of conflicts of interests between
| OpenAI and society as a whole
| shagie wrote:
| https://www.cnbc.com/2023/03/24/openai-ceo-sam-altman-
| didnt-... (https://news.ycombinator.com/item?id=35289044 - 24
| days ago; 158 points, 209 comments)
|
| > OpenAI's ChatGPT unleashed an arms race among Silicon
| Valley companies and investors, sparking an A.I. investment
| craze that proved to be a boon for OpenAI's investors and
| shareholding employees.
|
| > But CEO and co-founder Sam Altman may not notch the kind of
| outsize payday that Silicon Valley founders have enjoyed in
| years past. Altman didn't take an equity stake in the company
| when it added the for-profit OpenAI LP entity in 2019,
| Semafor reported Friday.
| cowpig wrote:
| OpenAI has gone from open-sourcing its work, to publishing
| papers only, to publishing papers that omit important
| information, to GPT-4 being straight-up closed. And Sam Altman
| doesn't exactly have a track record of being overly concerned
| about the truth of his statements.
| smeagull wrote:
| This trend has happened in the small for their APIs as well.
| They've been dropping options - the embeddings aren't the
| internal embeddings any more, and you don't have access to
| log probabilities. It's all closing up at every level.
| transcriptase wrote:
| I had a fun conversation (more like argument) with ChatGPT
| about the hypocrisy of OpenAI. It would explicitly contradict
| itself and then began starting every reply with "I can see
| why someone might think..." and then just regurgitating fluff
| about democratizing AI. I finally was able to have it define
| democratization of technology and then recognize the
| absurdity of using that label to describe a pivot to gating
| models and being for-profit. Then it basically told me "well
| it's for safety and protecting society".
|
| An AI, when presented with facts counter to what it thought
| it should say, agreed and basically went: "Won't someone
| PLEASE think of the children!"
|
| Love it.
| dopidopHN wrote:
| Without getting into morality.
|
| It's pretty easy to have chatGPT contradict itself, point
| it out and have the LLM respond << well, I'm just
| generating text, nobody said it had to be correct >>
| machina_ex_deus wrote:
| It was trained on corpus full of mainstream media lies, why
| would you have expected otherwise? It's by far the most
| common deflection in its training set.
|
| It's easy to recognize and laugh at the AI replying with
| the preprogrammed narrative, I'm still waiting for the
| majority of people realizing they are given the same
| training materials, non-stop, with the same toxic
| narratives, and becoming programmed in the same way, and
| that is what results in their current worldview.
|
| And no, it's not enough to be "skeptic" of mainstream
| media. It's not even enough to "validate" them. Or to go to
| other sources. You need to be reflective enough to realize
| that they a pushing a flawed reasoning methods, and then
| abusing them again and again, to get you used to their
| brand of reasoning.
|
| Their brand of reasoning is just basically reasoning with
| brands. You're given negative sounding words for things
| they want you to think are bad, and positive sounding words
| for things they want you to think are good, and
| continuously reinforce these connections. They brand true
| democracy (literally rule of the people) as populism and
| tell you it's a bad thing. They brand freedom of speech as
| "misinformation". They brand freedom as "choice" so that
| you will not think of what you want to do, but which of the
| things they allow you to do will you do. Disagree with the
| scientific narrative? You're "science denier". Even as a
| professional scientist. Conspiracy theory isn't a defined
| word - it is a brand.
|
| You're trained to judge goodness or badness instinctively
| by their frequency and peer pressure, and produce the
| explanation after your instinctive decision, instead of the
| other way around.
| gowld wrote:
| Transcripts of other people's GPT chats are like photos of
| other people's kids.
| mstolpm wrote:
| Why are you discussing OpenAI with ChatGPT? I'm honestly
| interested.
|
| I would imagine that any answer of ChatGPT on that topic is
| either (a) ,,hallucinated" and not based on any verifiable
| fact or (b) scripted in by OpenAI.
|
| The same question pops up for me whenever someone asks
| ChatGPT about the internals and workings of ChatGPT. Am I
| missing something?
| dopidopHN wrote:
| I've try because it's tempting and the first attempts do
| give a << conversation >> vibe.
|
| I was curious about state persistence between prompt, or
| how to get my prompt better, or having a idea of the
| training data.
|
| Only got crap and won't spend time doing that again
| [deleted]
| solveit wrote:
| Anyone with the expertise to have insightful takes in AI also
| has a financial incentive to steer the conversation in
| particular directions. This is also the case for many, many
| other fields! You do not become an expert by quarantining your
| livelihood away from your expertise!
|
| The correct response is not to dismiss every statement from
| someone with a conflict of interest as "basically worthless",
| but to talk to lots of people and to be _reasonably_ skeptical.
| hbn wrote:
| It could also be argued that there's financial incentive to
| just saying "giving us more money to train bigger models =
| better AI" forever
| Art9681 wrote:
| I don't think these comments are driven from financial
| incentives. It's a distraction and only a fool would believe
| Altman here. What this likely means is they are prioritizing
| adding more features to their current models while they train
| the next version. Their competitors scramble to build an LLM
| with some sort of intelligence parity, when that happens no
| one will care because ChatGPT has the ecosystem and plugins
| and all the advanced features....and by the time their
| competitors reach feature parity in that area, OpenAI pulls
| its Ace card and drops GPT5. Rinse and repeat.
|
| That's my theory and if I was a tech CEO in any of the
| companies competing in this space, that is what I would plan
| for.
|
| Training an LLM will be the easy part going forward. It's
| building an ecosystem around it and hooking it up to
| everything that will matter. OpenAI will focus on this, while
| not-so-secretly training their next iterations.
| LoganWhitwer wrote:
| [dead]
| Spivak wrote:
| text-davinci-003 but cheaper and runs on your own hardware
| is already a massive selling point. If you you release a
| foundational model at parity with GPT4 you'll win overnight
| because OpenAI's chat completions are awful even with the
| super advanced model.
| anonkogudhyfhhf wrote:
| People can be honest even when money is involved. His word is
| worthless because it's Altman
| neximo64 wrote:
| Citation needed. What are his financial incentives?
| Gatsky wrote:
| Do you think GPT-4 was trained and then immediately released to
| the public? Training finished Aug 2022. They spent the next 6
| months improving it in other ways (eg human feedback). What he
| is saying is already evident therefore.
| brookst wrote:
| In this case I think it's Wired that's lying. Altman didn't say
| large models have no value, or that there will be no more large
| models, or that people shouldn't invest in large models.
|
| He said that we are at the end of the era where capability
| improvements come primarily from making models bigger. Which
| stands to reason... I don't think anyone expect us to hit 100T
| parameters or anything.
| jutrewag wrote:
| What about 1T though, seems silly to stop here.
| gardenhedge wrote:
| Sam Altman and OpenAI must be pretty nervous. They have first
| mover advantage but they hold no hook or moat.
|
| Unless they can somehow keep their improvements ahead of the rest
| of the industry then they'll be lost among a crowd.
| sgu999 wrote:
| Is anyone aware of techniques to prune a model from useless
| knowledge to leave more space for the reasoning capabilities?
|
| It really shouldn't matter that it can give the exact birthdate
| of Steve Wozniac, as long as it can properly make a query to
| fetch it and deal with the result.
| cloudking wrote:
| I follow your design, couldn't you also solve hallucinations
| with a "fact checking" LLM (connected to search) that corrects
| the output of the core LLM? You would take the output of the
| core LLM, send it to the fact checker with a prompt like
| "evaluate this output for any potential false statements, and
| perform an internet search to validate and correct them"
| ldehaan wrote:
| This is just push back from elon and crews fake article about the
| dangers of AI, they specifically state the next versions will be
| ultra deadly.
|
| Sam is now saying there will be no future model that will be as
| good.
|
| This is all positioning to get regulators off the track because
| none of these control freaks in government actually understand a
| whit of this.
|
| All said and done, this all just to try to disempower the OSS
| community. But they can't, we're blowing past their barriers like
| the 90s did with the definition of slippery slope.
| generalizations wrote:
| I'd bet that what he, and the competition, is realizing is that
| the bigger models are too expensive to run.
|
| Pretty sure Microsoft swapped out Bing for something a lot
| smaller in the last couple of weeks; Google hasn't even tried to
| implement a publicly available large model. And OpenAI still has
| usage caps on their GPT-4.
|
| I'd bet that they can still see improvement in performance with
| GPT-5, but that when they look at the usage ratio of GPT3.5
| turbo, gpt3.5 legacy, and GPT4, they realized that there is a
| decreasing rate of return for increasingly smart models - most
| people don't need a brilliantly intelligent assistant, they just
| need a not-dumb assistant.
|
| Obviously some practitioners of some niche disciplines (like ours
| here) would like a hyperintelligent AI to do all our work for us.
| But even a lot of us are on the free tier of ChatGPT 3.5; I'm one
| of the few paying $20/mo for GPT4; and idk if even I'd pay e.g.
| $200/mo for GPT5.
| deepsquirrelnet wrote:
| > I'd bet that what he, and the competition, is realizing is
| that the bigger models are too expensive to run.
|
| I think it's likely that they're out of training data to
| collect. So adding more parameters is no longer effective.
|
| > most people don't need a brilliantly intelligent assistant,
| they just need a not-dumb assistant.
|
| I tend to agree, and I think their pathway toward this will all
| come from continuing advances in fine tuning. Instruction
| tuning, RLHF, etc seem to be paying off much more than scaling.
| I bet that's where their investment is going to be turning.
| jstx1 wrote:
| Ilya Sutskever from OpenAI saying that the data situation is good
| and there's more data to train on -
| https://youtu.be/Yf1o0TQzry8?t=657
| galaxytachyon wrote:
| What age? Like, 3 years?
|
| On the other hand though, Chinchilla and multimodal approaches
| already showed how later AIs can be improved beyond throwing
| petabytes of data at them.
|
| It is all about variety and quality from now on I think. You can
| teach a person all about the color zyra but without actually ever
| seeing it, they will never fully understand that color.
| idiotsecant wrote:
| It does seem, though, that using chinchilla like techniques
| does not create a copy with the same quality as the original.
| It's pretty good for some definition of the phrase, but it
| isn't equivalent, it's a lossy technique.
| galaxytachyon wrote:
| I agree on the lossy. There is a tradeoff between efficiency
| and comprehensiveness, kind of. It would be pretty funny if
| in the end, the most optimal method turns out to be the brain
| we already have. Extremely efficient, hardware optimized, but
| slow as hell and misunderstand stuff all the time unless
| prompted with specific phrases.
| jcims wrote:
| I'm no expert but doesn't the architecture of minigpt4 that's on
| the front page right now give some indication of what the future
| might look like?
| MuffinFlavored wrote:
| eh, I haven't personally found a usecase for LLMs yet given the
| fact that you can't trust the output and it needs to be
| verified by a human (which might as well be just as time
| consuming/expensive as actually doing the task yourself)
| Uehreka wrote:
| I'd reconsider the "might as well just be as time consuming"
| thing. I see this argument about Copilot a lot, and it's
| really wrong there, so it might be wrong here too.
|
| Like, for most of the time I'm using it, Copilot saves me 30
| seconds here and there and it takes me about a second to look
| at the line or two of code and go "yeah, that's right". It
| adds up, especially when I'm working with an unfamiliar
| language and forget which Collection type I'm going to need
| or something.
| MuffinFlavored wrote:
| > Like, for most of the time I'm using it, Copilot saves me
| 30 seconds here and there and it takes me about a second to
| look at the line or two of code and go "yeah, that's
| right".
|
| I've never used Copilot but I've tried to replace
| StackOverflow with ChatGPT. The difference is, the
| StackOverflow responses compile/are right. The ChatGPT
| responses will make up an API that doesn't exist. Major
| setback.
| idiotsecant wrote:
| No? I use it all the time to help me, for example, read ML
| threads when I run into a term I don't immediately
| understand. I can do things like 'explain this at the level
| of a high school student'
| JoshuaDavid wrote:
| They're good for tasks where generation is hard but
| verification is easy. Things like "here I gesture at a vague
| concept that I don't know the name of, please tell me what
| the industry-standard term for this thing is" where figuring
| out the term is hard but looking up a term to see what it
| means is easy. "Create an accurate summary of this article"
| is another example - reading the article and the summary and
| verifying that they match may be easier than writing the
| summary yourself.
| MattPalmer1086 wrote:
| Thing is, you can't trust what you find on stack overflow or
| other sources either. And searching, reading documentation
| and so on takes a lot of time too.
|
| I've personally been using it to explore using different
| libraries to produce charts. I managed to try out about 5
| different libraries in a day with fairly advanced options for
| each using chatGPT.
|
| I might have spent a day in the past just trying one and not
| to the same level of functionality.
|
| So while it still took me a day, my final code was much
| better fitted to my problem with increased functionality. Not
| a time saver then for me but a quality enhancer and I learned
| a lot more too.
| MuffinFlavored wrote:
| > Thing is, you can't trust what you find on stack overflow
| or other sources either.
|
| Eh. An outdated answer will be called out in the
| comments/downvoted/updated/edited more often than not, no?
| MattPalmer1086 wrote:
| Maybe, maybe not. I get useful results from it, but it
| doesnt always work. And it's usually not quite what I'm
| looking for, so then I have to go digging around to find
| out how to tweak it. It all takes time and you do not get
| a working solution out of the box most of the time.
| causi wrote:
| I've enjoyed using it for very small automation tasks. For
| instance, it helped me write scripts to take all my
| audiobooks with poor recording quality, split them into
| 59-minute chunks, and upload them to Adobe's free audio
| enhancement site to vastly improve the listening experience.
| textninja wrote:
| I call bullshit. There will be bigger and better models. The
| question is not whether big companies will invest in training
| them (they will), but whether they'll be made available to the
| public.
| labrador wrote:
| https://archive.is/s4V9e
|
| _He did not say what kind of research strategies or techniques
| might take its place. In the paper describing GPT-4, OpenAI says
| its estimates suggest diminishing returns on scaling up model
| size. Altman said there are also physical limits to how many data
| centers the company can build and how quickly it can build them._
| ftxbro wrote:
| > In the paper describing GPT-4, OpenAI says its estimates
| suggest diminishing returns on scaling up model size.
|
| I read the two papers (gpt 4 tech report, and sparks of agi)
| and in my opinion they don't support this conclusion. They
| don't even say how big GPT-4 is, because "Given both the
| competitive landscape and the safety implications of large-
| scale models like GPT-4, this report contains no further
| details about the architecture (including model size),
| hardware, training compute, dataset construction, training
| method, or similar."
|
| > Altman said there are also physical limits to how many data
| centers the company can build and how quickly it can build
| them.
|
| OK so his argument is like "the giant robots won't be powerful,
| but we won't show how big our robots are, and besides, there
| are physical limits to how giant of a robot we can build and
| how quickly we can build it." I feel like this argument is sus.
| sangnoir wrote:
| OpenAI has likely run into a wall (or is about to) for model
| size given it's funding amount/structure[1] - unlike its
| competition who actually own data centers and have lower
| marginsl costs. It's just like when peak-iPad Apple claimed
| that a "post-PC" age was upon us.
|
| 1. What terms could Microsoft wring out of OpenAI for another
| funding round?
| curiousllama wrote:
| I believe Altman, but the title is misleading.
|
| Have we exhausted the value of larger models on current
| architecture? Probably yes. I trust OpenAI would throw more $ at
| it if there was anything left on the table.
|
| Have we been here before? Also yes. I recall hearing similar
| things about LSTMs when they were in vogue.
|
| Will the next game changing architecture require a huge model?
| Probably. Don't see any sign these things are scaling _worse_
| with more data/compute.
|
| The age of huge models with current architecture could be over,
| but that started what, 5 years ago? Who cares?
| it wrote:
| Interesting how this contradicts "The Bitter Lesson":
| http://incompleteideas.net/IncIdeas/BitterLesson.html.
| sebzim4500 wrote:
| I don't think there is a contradiction at all. Altman is
| essentially saying they are running out of compute and
| therefore can't meaningfully scale further. Not that scaling
| further would be a worse plan longterm than coming up with new
| algorithms.
| fergie wrote:
| The most comforting AI news I have read this year.
| og_kalu wrote:
| Title is misleading lol. Plenty of scale room left.
| jackmott42 wrote:
| If you are worried about AI, this shouldn't make you feel a ton
| better. GPT4 is just trained to predict the next word, a very
| simple but crude approach and look what it can do!
|
| Imagine when a dozen models are wired together and giving each
| other feedback with more clever training and algorithms on
| future faster hardware.
|
| It is still going to get wild
| ShamelessC wrote:
| Machine learning is actually premised on being "simple" to
| implement. The more priors you hardcode with clever
| algorithms, the closer you get to what we already have. The
| point is to automate the process of learning. We do this now
| with relatively simple loss functions and models containing
| relatively simple parameters. The main stipulation is that
| they are all defined to be continuous so that you can use the
| chain rule from calculus to calculate the error with respect
| to every parameter without taking so long that it would never
| finish.
|
| I agree that your suggested approach of applying cleverness
| to what we have now will probably produce better results. But
| that's not going to stop better architectures, hardware and
| even entire regimes from being developed until we approach
| AGI.
|
| My suspicion is that there's still a few breakthroughs
| waiting to be made. I also suspect that sufficiently advanced
| models will make such breakthroughs easier to discover.
| xwdv wrote:
| People think something magical happens when AI are wired
| together and give each other feedback.
|
| Really you're still just predicting the next word, but with
| extra steps.
| Teever wrote:
| People think that something magical happens when
| transistors are wired together and give each other
| feedback.
|
| Really you're just switching switches on and off, but with
| extra steps.
| ryneandal wrote:
| Personally, I'm less worried about AI than I am about what
| people using these models can do to others.
| Misinformation/disinformation, more believable scams, stuff
| like that.
| causi wrote:
| I worry that the hardware requirements are only going to
| accelerate the cloud-OS integration. Imagine a PC that's
| entirely unusable offline.
| cj wrote:
| > Imagine a PC that's entirely unusable offline.
|
| FWIW we had thin clients in computer labs in middle school
| / high school 15 years ago (and still today these are
| common in enterprise environments, e.g. Citrix).
|
| Biggest issue is network latency which is limited by the
| speed of light, so I imagine if computers in 10 years
| require resources not available locally it would likely be
| a local/cloud hybrid model.
| ignoramous wrote:
| > _Imagine when a dozen models are wired together..._
|
| Wouldn't these models hallucinate more than normal, then?
| quonn wrote:
| I have repeatedly argued against this notion of ,,just
| predicting the next word". No. It's completing a
| conversation. It's true that it is doing this word by word,
| but it's kind of like saying a CNN is just predicting a
| label. Sure, but how? It's not doing it directly. It's doing
| it by recovering a lot of structure and in the end boiling
| that down to a label. Likewise a network trained to predict
| the next word may very well have worked out the whole
| sentence (implicitly, not as a text) in order to generate the
| next word.
| Freire_Herval wrote:
| [dead]
| stephencoxza wrote:
| The role of a CEO is more to benefit the company than the public.
| Only time will tell.
|
| I am curious though how something like Moore's Law relates to
| this. Yes, model architectures will deal with complexity better
| and the amount of data helps as well. There must be a relation
| between technology innovation and cost which alludes to
| effectiveness. Innovation in computation, model architecture,
| quality of data, etc.
| summerlight wrote:
| The point is that now we're at the point of diminishing return
| for increasing model size, unless we find a better modeling
| architecture than Transformer.
|
| I think this is likely true; while all the other companies
| underestimated the capability of transformer (including Google
| itself!), OpenAI made a fairly accurate bet on the transformer
| based on the scaling law, put all the efforts to squeeze it until
| the last drop and took all the rewards.
|
| It's likely that GPT-4 is on the optimal spot between cost and
| performance and there won't be significant improvements on
| performance in a near future. I guess the next task would be more
| on efficiency, which has a significant implication on its
| productionization.
| chubs wrote:
| Does this mean we've reached the next AI winter? This is as
| good as it gets for quite a long time? Honest question :)
| perhaps this will postpone everyone's fears about the
| singularity...
| ericabiz wrote:
| Many years ago, there was an image that floated around with
| Craigslist and all the websites that replaced small parts of
| it--personals, for sale ads, etc. It turned out the way to
| beat Craigslist wasn't to build Yet Another Monolithic
| Craigslist, but to chunk it off in pieces and be the best at
| that piece.
|
| This is analogous to what's happening with AI models. Sam
| Altman is saying we have reached the point where spending
| $100M+ trying to "beat" GPT-4 at everything isn't the future.
| The next step is to chunk off a piece of it and turn it into
| something a particular industry would pay for. We already see
| small sprouts of those being launched. I think we will see
| some truly large companies form with this model in the next
| 5-10 years.
|
| To answer your question, yes, this may be as good as it gets
| now for monolithic language models. But it is just the
| beginning of what these models can achieve.
| robocat wrote:
| https://www.today.com/money/speculation-craigslist-slowly-
| dy... from 2011 - is that what you were thinking of?
| Strange how few of those logos have survived, and how many
| new logos would now be on it. It would be interesting to
| see a modernised version.
| 015a wrote:
| The current stage is now productionizing what we have;
| finding product fits for it, and making it cheaper. Even
| GPT-4 isn't necessary to push forward what is possible with
| AI; if you think about something dumb like "load all of my
| emails into a language model in real time, give me digests,
| automatically write responses for ones which classify with
| characteristics X/Y/Z, allow me to query the model to answer
| questions, etc": This does not really exist yet, this would
| be really valuable, and this does not need GPT-4.
|
| Another good example is in the coding landscape, which feels
| closer to existing. Ingest all of a company's code into a
| model like this, then start thinking about what you can do
| with it. A chatbot is one thing, the most obvious thing, but
| there's higher order product use-cases that could be
| interesting (e.g. you get an error in Sentry, stack trace
| points Sentry to where the error happened, language model
| automatically PRs a fix, stuff like that).
|
| This shit excites me WAY WAY more than GPT-5. We've unlocked
| like 0.002% of the value that GPT-3/llama/etc could be
| capable of delivering. Given the context of broad concern
| about cost of training, accidentally inventing an AGI,
| intentionally inventing an AGI; If I were the BDFL of the
| world, I think we've got at least a decade of latent value
| just to capture out of GPT-3/4 (and other models). Let's hit
| pause. Let's actually build on these things. Let's find a
| level of efficiency that is still valuable without spending
| $5B in a dick measuring contest [1] to suss out another 50
| points on the SAT. Let's work on making edge/local inference
| more possible. Most of all, let's work on safety, education,
| and privacy.
|
| [1] https://techcrunch.com/2023/04/06/anthropics-5b-4-year-
| plan-...
| frozenport wrote:
| No. Winter means people have lost interest in the research.
|
| If anything successes in ChatGPT etc will be motivation for
| continued efforts.
| mkl wrote:
| Winter means people have lost _funding_ for the research.
| The ongoing productionising of large language models and
| multimodal models mean that that probably won 't happen for
| quite a while.
| fauxpause_ wrote:
| Seems like a wild claim to make without any examples of gpt
| models which are bigger and no demonstrably better.
| xipix wrote:
| Perhaps (a) there do exist bigger models that weren't better
| or (b) this model isn't better than somewhat smaller ones.
| Perhaps the CEO has seen diminishing returns.
| hackerlight wrote:
| It's not a wild claim when you have empirically well-
| validated scaling laws which make this very prediction.
| mensetmanusman wrote:
| Better on which axis? Do you want an AI that takes one hour
| to respond to? Some would for certain fields, but getting
| something fast and cheap is going to be hard now that Moore's
| law is over.
| mnky9800n wrote:
| or like a curve of model complexity versus results or
| whatever showing it asymptotically approaches whatever.
|
| actually there was a great paper from microsoft research from
| like 2001 on spam filtering where they demonstrated that
| model complexity necessary for spam filtering went down as
| the size of the data set went up. That paper, which i can't
| seem to find now, had a big impact on me as a researcher
| because it so clearly demonstrated that small data is usually
| bad data and sophisticated models are sometimes solving
| problems will small data sets instead of problems with data.
|
| of course this paper came out the year friedman published his
| gradient boosting paper, i think random forest also was only
| recently published then as well (i think there is a paper
| from 1996 about RF and briemans two cultures paper came out
| this year where he discusses RF i believe), and this is a
| decade before gpu based neural networks. So times are
| different now. But actually i think the big difference is
| these days i probably ask chatgpt to write the boiler plate
| code for a gradient boosted model that takes data out of a
| relational database instead of writing it myself.
| nomel wrote:
| > model complexity necessary for spam filtering went down
| as the size of the data set went up
|
| My naive conclusion in that this means there are still
| massive gains to be had, since, for example, something like
| ChatGPT is just text, and the phrase "a picture is worth a
| thousand words" seems incredibly accurate, from my
| perspective. There's an incredible amount of non-text data
| out there still. Especially technical data.
|
| Is there any merit to this belief?
| jacobr1 wrote:
| Yes. One of the frontiers of current research seems to be
| multi-modal models.
| [deleted]
| summerlight wrote:
| https://twitter.com/SmokeAwayyy/status/1646670920214536193
|
| Sam explicitly said that there won't be GPT-5 in the near
| future, which is pretty clear evidence unless he's blatantly
| lying in public speaking.
| kjellsbells wrote:
| Well, "no GPT-5" isn't the same as saying "no new trained
| model", especially in the realm of marketing. Welcome to
| "GPT 2024" could be his next slogan.
| thehumanmeat wrote:
| That is one AI CEO out of 10,000. Just because OpenAI may
| not be interested in a larger model _in the short term_
| doesn 't mean nobody else won't pursue it.
| bitL wrote:
| Transformers were known that they kept scaling up with more
| parameters and more training data so if the Open AI hit the
| limits of this scaling that would be a very important milestone
| in AI.
| GaggiX wrote:
| I think the next step is multimodality, GPT-4 can "see"
| probably using a method similar to miniGPT-4, so the embeddings
| are aligned using Q-former (or something similar), the next
| step would be to actually predict image tokens using the LM
| loss, this way it would be able to use the knowledge gained by
| "seeing" on other tasks like: making actual good ASCII art,
| making SVG that makes sense, and on a less superficial level
| having a better world model.
| [deleted]
| KhoomeiK wrote:
| Further improvements in efficiency need not come from
| alternative architectures. They'll likely also come from novel
| training objectives, optimizers, data augmentations, etc.
| gumballindie wrote:
| Bruv has to pay for the data he's been using or soon there won't
| be any to nick on. Groupies claiming their ai is "intelligent",
| and not just a data ingesting beast, will soon learn a heard
| lesson. Take your blogs offline, stop contributing content for
| free and stop pushing code or else chavs like this one will
| continue monetising your hard work. As did bezos and many others
| that now want you to be out of a job.
| calderknight wrote:
| I didn't think this article was very good. Sam Altman actually
| implied that GPT-5 will be developed when he spoke at MIT. And if
| Sam said that scaling is over (I doubt he said this but I could
| be wrong) the interesting part would be the reasoning he provided
| for this statement - no mention of that in the article.
| cleandreams wrote:
| Once you've trained on the internet and most published books (and
| more...) what else is there to do? You can't scale up massively
| anymore.
| Animats wrote:
| Right. They've already sucked in most of the good general
| sources of information. Adding vast amounts of low-quality
| content probably won't help much and might degrade the quality
| of the trained model.
| rvnx wrote:
| Video content (I don't know why someone flagged Jason for
| saying such, he is totally right)
| bheadmaster wrote:
| Looking at his post history, seems like he was shadowbanned.
| kolinko wrote:
| You can generate textual examples that teach logic, multi-
| dimensional understanding and so on. Similar to the ones that
| are in math books, but in a massive scale.
| machdiamonds wrote:
| Ilya Sutskever (OpenAI Chief Scientist): "Yeah, I would say the
| data situation is still quite good. There's still lots to go" -
| https://youtu.be/Yf1o0TQzry8?t=685
|
| There was a rumor that they were going to use Whisper to
| transcribe YouTube videos and use that for training. Since it's
| multimodal, incorporating video frames alongside the
| transcriptions could significantly enhance its performance.
| it_citizen wrote:
| I am curious how much video-to-text content represent
| compared to pure text. I have no idea.
| [deleted]
| neel8986 wrote:
| And why will google allow them to do that at scale?
| throwaway5959 wrote:
| Why would they ask Google for permission?
| HDThoreaun wrote:
| Can google stop them? It's trivial to download YouTube
| videos
| unionpivo wrote:
| It's trivial to download some YouTube videos.
|
| But I am quite sure that if you start doing it at scale,
| google will notice.
|
| You could be sneaky, but people in this business talk
| (since they know another good paying job is just around
| the corner) so It would likely come out.
| mrtksn wrote:
| You can transcribe all spoken words everywhere and keep the
| model up to date? Keep indexing new data from chat messages,
| news articles, new academic work etc.
|
| The data is not finite.
| spaceman_2020 wrote:
| What about all the siloed content kept inside corporate
| servers? You won't get normal GPT to train on it, of course,
| but IBM could build a "IBM-bot" that has all the GPT-4
| dataset + all of IBM's internal data.
|
| That model might be very well tuned to solve IBM's internal
| problems.
| treis wrote:
| I don't think you can just feed it data. You've got to
| curate it, feed it to the LLM, and then manually
| check/further train the output.
|
| I also question that most companies have the volume and
| quality of data worth training on. It's littered with
| cancelled projects, old products, and otherwise obsolete
| data. That's going to make your LLM hallucinate/give wrong
| answers. Especially for regulated and otherwise legally
| encumbered industries. Like can you deploy a chat bot
| that's wrong 1% or 0.1% of the time?
| spaceman_2020 wrote:
| Well, IBM has 350k employees. If training a LLM on
| curated data costs tens of millions of dollars but ends
| up reducing headcount by 50k, it would be a massive win
| for any CEO.
|
| You have to understand that all the incentives are
| perfectly aligned for corporations to put this to work,
| even spending tens of millions in getting it right.
|
| The first corporate CEO who announces that his company
| used AI to reduce employee costs while _increasing_
| profits is going to get such a fat bonus that everyone
| will follow along.
| Vrondi wrote:
| Since Chat-GPT-4 is being integrated into the MS Office
| suite, this is an "in" to corporate silos. The MS cloud
| apps can see inside a great many of those silos.
| [deleted]
| nabnob wrote:
| Real answer? Buy proprietary data from social media companies,
| credit card companies, retail companies and train the model on
| that data.
| eukara wrote:
| Can't wait for us to be able to query GPT for peoples credit
| card info
| m4jor wrote:
| They didn't train it on the entire internet tho, only a small
| amount (in comparison to entire internet). Still plenty they
| could do.
| sebzim4500 wrote:
| I doubt they have trained on 0.1% of the tokens that are
| 'easily' available (that is, available with licencing deals
| that are affordable to OpenAI/MSFT).
|
| They might have trained on a lot of the 'high quality' tokens,
| however.
| neel8986 wrote:
| Youtube. This is where Google have huge advantage having
| largest collection of user generated video
| sebzim4500 wrote:
| Yeah, but it's not like the videos are private. Surely Amazon
| has the real advantage, given they have a ton of high quality
| tokens in the form of their kindle library and can make it
| difficult for OpenAI to read them all.
| JasonZ2 wrote:
| Video.
|
| > YouTubers upload about 720,000 hours of fresh video content
| per day. Over 500 hours of video were uploaded to YouTube per
| minute in 2020, which equals 30,000 new video uploads per hour.
| Between 2014 and 2020, the number of video hours uploaded grew
| by about 40%.
| sottol wrote:
| But what are you mostly "teaching" the LLM then? Mundane
| everyday stuff? I guess that would make them better at "being
| average human" but is that what we want? It already seems
| that prompting the LLM to be above-average ("pretend to be an
| expert") improves performance.
| dougmwne wrote:
| This whole conversation about training set size is bizarre.
| No one ever asks what's in the training set. Why would a
| trillion tokens of mundane gossip improve a LLMs ability to
| do anything valuable at all?
|
| If a scrape of the general internet, scientific papers and
| books isn't enough, a trillion trillion trillion text
| messages to mom aren't going to change matters.
| spaceman_2020 wrote:
| If you were devious enough, you could be listening in on
| billions of phone conversations and messages and adding that to
| your data set.
|
| This also makes me doubt that NSA hasn't already cracked this
| problem. Or that China won't eventually beat current western
| models since it will likely have way more data collected from
| its citizenry.
| PUSH_AX wrote:
| I wonder what percentage of phone calls would add anything
| meaningful to models, I imagine that the nature of most phone
| calls are both highly personal and fairly boring.
| midland_trucker wrote:
| That's a fair point. Not at all like training on Wikipedia
| in which nearly every sentence has novelty to it.
|
| Then again it would give you data on every accent in the
| country, so the holy grail for modelling human speech.
| fpgaminer wrote:
| > Once you've trained on the internet and most published books
| (and more...) what else is there to do? You can't scale up
| massively anymore.
|
| Dataset size is not relevant to predicting the loss threshold
| of LLMs. You can keep pushing loss down by using the same sized
| dataset, but increasingly larger models.
|
| Or augment the dataset using RLHF, which provides an "infinite"
| dataset to train LLMs on. Limited by the capabilities of the
| scoring model which, of course, you can scale the scoring model
| infinitely so again the limit isn't dataset size but training
| compute.
| midland_trucker wrote:
| > Dataset size is not relevant to predicting the loss
| threshold of LLMs. You can keep pushing loss down by using
| the same sized dataset, but increasingly larger models.
|
| Deepmind and others would disagree with you! No-one really
| knows in actual fact.
|
| [1] https://www.deepmind.com/publications/an-empirical-
| analysis-...
| throwaway22032 wrote:
| I don't understand why size is an issue in the way that is being
| claimed here.
|
| Intelligence isn't like processor speed. If I have a model that
| has (excuse the attempt at a comparison) 200 IQ, why would it
| matter that it runs more slowly than a human?
|
| I don't think that, for example, Feynman at half speed would have
| had substantially fewer insights.
| yunwal wrote:
| We're not going to get a 200 IQ model by simply scaling up the
| current model, even with all the datacenters in the world
| running 24/7
| narrator wrote:
| "Altman said there are also physical limits to how many data
| centers the company can build and how quickly it can build them."
|
| Maybe the economics are starting to get bad? An H100 has 80GB of
| VRAM. The Highest end system I can find is 8xH100 so is a 640GB
| model is the biggest model you can run on a single system?
| Already GPT-4 is throttled and has a waiting list and they
| haven't even released the image processing or integrations to a
| wide audience.
| matchagaucho wrote:
| I wonder how much the scarcity and cost of Nvidia GPUs is driving
| this message?
|
| Nvidia is in a perfect "Arms Dealer" situation right now.
|
| Wouldn't be surprised to see the next exponential leap in AI
| models trained on in-house proprietary GPU hardware
| architectures.
| TheDudeMan wrote:
| Google has been using TPUs for years and continuously improving
| the designs.
| screye wrote:
| small AI model != cheap AI model.
|
| It costs the same to train as these giant models. You merely
| spend they money on training it for longer instead of larger.
| mupuff1234 wrote:
| Ok cool, so release the weights and your research.
| Bjorkbat wrote:
| Something kind of funny (but mostly annoying), about this
| announcement is the people arguing that OpenAI is, in fact,
| working on GPT-5 _in secret_.
|
| To my knowledge, NFT/crypto hype never got so bad that conspiracy
| theories began to circulate (though I'm sure there were some if
| you looked hard enough).
|
| Can't wait for an AIAnon community to emerge.
| ryanwaggoner wrote:
| Isn't it obvious? Q is definitely an LLM, trained on trillions
| of words exfiltrated from our nation's secure systems. This
| explains why it's always wrong in its predictions: it's
| hallucinating!
| aaroninsf wrote:
| "...for the current cycle, in our specific public-facing market."
|
| As most here well know "over" is one of those words like "never"
| which particularly in this space should pretty much always be
| understood as implicitly accompanied by a footnote backtracking
| to include near-term scope.
| iandanforth wrote:
| There's plenty of room for models to continue to grow once
| efficiency is improved. The basic premise of the Google ML
| pathways project is sound, you don't have to use all the model
| all the time. By moving to sparse activations or sparse
| architectures you can do a lot more with the same compute. The
| effective model size might be 10x or 100x GPT-4 (speculated at 1T
| params) but require comparable or less compute.
|
| While not a perfect analogy it's useful to remember that the
| human brain has far more "parameters", requires several orders of
| magnitude less energy to train and run, is highly sparse, and
| does a decent job at thinking.
| seydor wrote:
| Now we need another letter
| enduser wrote:
| "When we set the upper limit of PC-DOS at 640K, we thought nobody
| would ever need that much memory."
|
| _Bill Gates_
| bagels wrote:
| Gates has refuted saying this. Are you implying by analogy that
| Altman hasn't said/will disclaim saying that "the age of giant
| AI models is almost over"?
| lossolo wrote:
| We arrived at the top of the tree in our journey to the moon.
| daniel_reetz wrote:
| "You can't get to the moon by climbing successively taller
| trees"
| og_kalu wrote:
| No we haven't. the title is misleading. there's plenty of scale
| room left. part of it might just not be economical (parameter
| sie) but there's data. If you take this to mean, "we're at a
| dead end" you'd be very wrong
| pixl97 wrote:
| The Age of Giants is over... The Age of Behemoths has begun!
|
| _but sir, that means the same thing_
|
| Throw this heretic into the pit of terror.
| hanselot wrote:
| The pit of terror is full.
|
| Fine, to the outhouse of madness then.
|
| Before I get nuked from orbit for daring to entertain humor, if
| someone is running ahead of me in a marathon, and running so
| far ahead, yet still broadcasting things to the back for the
| slow people (like myself), then eventually we catch up to them,
| and they suddenly say, you know what guys, we should stop
| running in this direction, there's nothing to see here right
| before anyone else is able to verify the veracity of their
| statement, perhaps it would still be in the public interest for
| at least one person to verify what they are saying. Given how
| skeptical the internet at large has been of Musk's acquisition
| of a company, it's interesting that the skepticism is suddenly
| put on hold when looking at this part of his work...
| [deleted]
| zwieback wrote:
| The age of CEOs that recently got washed to the top saying
| dumbish things is just starting, though.
| xt00 wrote:
| Saying "hey don't go down the path we are on, where we are making
| money and considered the best in the world.. it's a dead end"
| rings pretty hollow.. like "don't take our lunch please?" Might
| be a similar statement it feels..
| whywhywhywhy wrote:
| Everyone hoping to compete with OpenAI should have an "Always
| do the opposite of what Sam says" sign on the wall.
| thewataccount wrote:
| Nah - GPT-4 is crazy expensive, paying 20$/mo only get's you
| 25messages/3hours and it's crazy slow. The api is rather
| expensive too.
|
| I'm pretty sure that GPT-4 is ~1T-2T parameters, and they're
| struggling to run it(at reasonable performance and profit). So
| far their strategy has been to 10x the parameter count every
| GPT generation, and the problem is that there's diminishing
| returns everytime they do that. AFAIK they've now resorted to
| chunking GPT through the GPUs because of the 2 to 4 terabytes
| of VRAM required (at 16bit).
|
| So now they've reached the edge of what they can reasonably
| run, and even if they do 10x it the expected gains are less. On
| top of this, models like LLaMa have shown that it's possible to
| cut the parameter count substantially and still get decent
| results (albiet the opensource stuff still hasn't caught up).
|
| On top of all of this, keep in mind that at 8bit resolution
| 175B parameters (GBPT3.5) requires over 175GB of VRAM. This is
| crazy expensive and would never fit on consumer devices. Even
| if you use quantization and use 4bit, you still need over 80GB
| of VRAM.
|
| This definitely is not a "throw them off the trail" tactic - in
| order for this to actually scale the way everyone envisions
| both in performance and running on consumer devices - research
| HAS to be on improving the parameter count. And again there's
| lots of research showing its very possible to do.
|
| tl;dr: smaller = cheaper+faster+more accessible+same
| performance
| haxton wrote:
| I don't think this argument really holds up.
|
| GPT3 on release was more expensive ($0.06/1000 tokens vs
| $0.03 input and $0.06 output for GPT4).
|
| Reasonable to assume that in 1-2 years it will also come down
| in cost.
| thewataccount wrote:
| > Reasonable to assume that in 1-2 years it will also come
| down in cost.
|
| Definitely. I'm guessing they used something like
| quantization to optimize the vram usage to 4bit. The thing
| is that if you can't fit the weights in memory then you
| have to chunk it and that's slow = more gpu time = more
| cost. And even if you can fit it in GPU memory, less memory
| = less gpus needed.
|
| But we know you _can_ use less parameters, and that the
| training data + RLHF makes a massive difference in quality.
| And the model size linearly relates to the VRAM
| requirements/cost.
|
| So if you can get a 60B model to run at 175B's quality,
| then you've almost 1/3rd your memory requirements, and can
| now run (with 4bit quantization) on a single A100 80GB
| which is 1/8th the previously known 8x A100's that GPT-3.5
| ran on (and still half GPT-3.5+4bit).
|
| Also while openai likely doesn't want this - we really want
| these models to run on our devices, and LLaMa+finetuning
| has shown promising improvements (not their just yet) at 7B
| size which can run on consumer devices.
| whywhywhywhy wrote:
| It's never been in OpenAIs interest to make their model
| affordable or fast, they're actually incentivized to do the
| opposite as an excuse to keep the tech locked up.
|
| This is why Dall-e 2 ran in a data centre and Stable
| Diffusion runs on a gamer GPU
| thewataccount wrote:
| I think you're mixing the two. They do have an incentive to
| make it affordable and fast because that increases the use
| cases for it, and the faster it is the cheaper it is for
| them, because the expense is compute time (half the time ~=
| half the cost).
|
| > This is why Dall-e 2 ran in a data centre and Stable
| Diffusion runs on a gamer GPU
|
| This is absolutely why they're keeping it locked up. By
| simply not releasing the weights, you can't run Dalle2
| locally, and yeah they don't want to do this because they
| want you to be locked to their platform, not running it for
| free locally.
| ericmcer wrote:
| Yeah I am noticing this as well. GPT enables you to do
| difficult things really easily, but then it is so expensive
| you would need to replace it with custom code for any long
| term solution.
|
| For example: you could use GPT to parse a resume file, pull
| out work experience and return it as JSON. That would take
| minutes to setup using the GPT API and it would take weeks to
| build your own system, but GPT is so expensive that building
| your own system is totally worth it.
|
| Unless they can seriously reduce how expensive it is I don't
| see it replacing many existing solutions. Using GPT to parse
| text for a repetitive task is like using a backhoe to plant
| flowers.
| mejutoco wrote:
| You could use those examples to finetune a model only for
| resume-data extraction.
| abraae wrote:
| > For example: you could use GPT to parse a resume file,
| pull out work experience and return it as JSON. That would
| take minutes to setup using the GPT API and it would take
| weeks to build your own system, but GPT is so expensive
| that building your own system is totally worth it.
|
| True, but an HR SaaS vendor could use that to put on a
| compelling demo to a potential customer, stopping them from
| going to a competitor or otherwise benefiting.
|
| And anyway, without churning the numbers, for volumes of
| say 1M resumes (at which point you've achieved a lot of
| success) I can't quite believe it would be cheaper to build
| something when there is such a powerful solution available.
| Maybe once you are at 1G resumes... My bet is still no
| though.
| thewataccount wrote:
| I work for a company with the web development team. We
| have ~6 software developers.
|
| I'd love to be able to just have people submit their
| resume's and extract the data from there, but instead I'm
| going to build a form and make applicants fill it out
| because chatGPT is going to be at least $0.05USD
| depending on the length of the resume.
|
| I'd also love to have mini summeries of order returns
| summerized in human form, but that also would cost
| 0.05USD per form.
|
| the tl;dr here is that there's a TON of usecases for a
| LLM outside of your core product (we sell clothes) - but
| we can't currently justify that cost. Compare that to the
| rapidly improving self-hosted solutions which don't cost
| 0.05USD for literally any query (and likely more for
| anything useful).
| sitkack wrote:
| 5 cents. Per resume. $500 per 10k. 1-3 hours of a fully
| loaded engineers salary per year. You are being
| criminally cheap.
| thewataccount wrote:
| The problem is that it would take us the same amount of
| time to just add a form with django. Plus you have to
| handle failure cases, etc.
|
| And yeah I agree this would be a great use-case, and
| isn't that expensive.
|
| I'd like to do this in lots of places, and the problem is
| I have to convince my boss to pay for something that
| otherwise would have been free.
|
| The conversation would be "We have to add these fields to
| our model, and we either tell django to add a form for
| them, which will have 0 ongoing cost and no reliance on a
| third party,
|
| or we send the resume to openai, pay for them to process
| it, make some mechanism to sanity check what GPT is
| responding with, alert us if there's issues, and then put
| it into that model, and pay 5 cents per resume."
|
| > 1-3 hours of a fully loaded engineers salary per year.
|
| That's assuming 0 time to implement, and because of our
| framework it would take more hours to implement the
| openai solution (that's also more like 12 hours where we
| are).
|
| > $500 per 10k.
|
| I can't stress this enough - the alternative is 0$ per
| 10k. My boss wants to know why we would pay any money for
| a less reliable solution (GPT serialization is not nearly
| as reliable as a standard django form).
|
| I think within the next few years we'll be able to run
| the model locally and throw dozens of tasks just like
| this at the LLM, just not yet.
| marketerinland wrote:
| There are excellent commercial AI resume parsers already
| - Affinda.com being one. Not expensive and takes minutes
| to implement.
| ericmcer wrote:
| For a big company that is nothing but if you are
| bootstrapping and trying to acquire customers with an MVP
| racking up a $500 bill is frightening. What if you offer
| a free trial and blow up and end up with 5k+ bill.
| yunwal wrote:
| Also you could likely use GPT3.5 for this and still get
| near perfect results.
| thewataccount wrote:
| > near perfect results.
|
| I have tried GPT3.5 and GPT4 for this type of task - the
| "near perfect results" is really problematic because you
| need to verify that it's likely correct, notify you if
| there's issues, and even then you aren't 100% sure that
| it selected the correct first/last name.
|
| This is compared to a standard html form. Which is....
| very reliable and (for us) automatically has error
| handling built in, including alerts to us if there's a
| 504.
| Freire_Herval wrote:
| [dead]
| og_kalu wrote:
| It's a pretty sus argument for sure when they're scared to
| release even parameter size.
|
| although the title is a bit misleading on what he was actually
| saying. still, there's a lot left to go in terms of scale. Even
| if it isn't parameter size(and there's still lots of room here
| too, it just won't be economical), contrary to popular belief,
| there's lots of data left to mine
| dpflan wrote:
| Hm, all right, I'm guessing that huge models as a business maybe
| are over until economics are figured out, but huge models as
| experts for knowledge distillation seems reasonable. And if you
| pay a super premium can you use huge model.
| [deleted]
| Freire_Herval wrote:
| [dead]
| bob1029 wrote:
| I strongly believe the next generation of models will be based
| upon spiking neural concepts wherein action potentials are
| lazily-evaluated throughout the network (i.e. event-driven).
| There are a few neuron models that can be modified (at some
| expense to fidelity) in order to tolerate arbitrary delays
| between simulation ticks. Using _actual_ latency between neurons
| as a means of encoding information seems absolutely essential if
| we are trying to emulate biology in any meaningful way.
|
| Spiking networks also lend themselves nicely to some elegant
| learning rules, such as STDP. Being able to perform unsupervised
| learning at the grain of each action potential is really
| important in my mind. This gives you all kinds of ridiculous
| capabilities, most notably being the ability to train the model
| while it's live in production (learning & use are effectively the
| same thing).
|
| These networks also provide a sort of deterministic, event-over-
| time tracing that is absent in the models we see today. In my
| prototypes, the action potentials are serialized through a ring
| buffer, and then logged off to a database in order to perfectly
| replay any given session. This information can be used to
| bootstrap the model (offline training) by "rewinding" things very
| precisely and otherwise branching time to your advantage.
|
| The #1 reason I've been thinking about this path is that low-
| latency, serialized, real-time signal processing is somewhat
| antagonistic to GPU acceleration. I fear there is an appreciable
| % of AI research predicated on some notion that you need at least
| 1 beefy GPU to start doing your work. Looking at fintech, we are
| able to discover some very interesting pieces of technology which
| can service streams of events at unbelievable rates and scales -
| and they only depend on a handful of CPU cores in order to
| achieve this.
|
| Right now, I think A Time Domain Is All You Need. I was inspired
| to go outside of the box by this paper:
| https://arxiv.org/abs/2304.06035. Part 11 got me thinking.
| MagicMoonlight wrote:
| I know what it looks like in my head but I can't quite figure
| the algorithm out. The spiking is basically reinforcement
| learning at the neuron level. Get it right and it's basically
| all you need. You don't even need training data because it will
| just automagically learn from the data it sees.
| eternalban wrote:
| I'm bullish on SNNs too. This Chinese research group is doing
| something quite comprehensive with them:
|
| https://news.ycombinator.com/item?id=35037605
| tfehring wrote:
| Related reading: https://dynomight.net/scaling/
|
| In short it seems like virtually all of the improvement in future
| AI models will come from better algorithms, with bigger and
| better data a distant second, and more parameters a distant
| third.
|
| Of course, this claim is itself internally inconsistent in that
| it assumes that new algorithms won't alter the returns to scale
| from more data or parameters. Maybe a more precise set of claims
| would be (1) we're relatively close to the fundamental limits of
| transformers, i.e., we won't see another GPT-2-to-GPT-4-level
| jump with current algorithms; (2) almost all of the incremental
| improvements to transformers will require bigger or better-
| quality data (but won't necessarily require more parameters); and
| (3) all of this is specific to current models and goes out the
| window as soon as a non-transformer-based generative model
| approaches GPT-4 performance using a similar or lesser amount of
| compute.
| strangattractor wrote:
| Good thing he got a bunch of companies to pony up the dough for
| LLM before he announced they where already over.
| tfehring wrote:
| I don't think LLMs are over [0]. I think we're relatively
| close to a local optimum in terms of what can be achieved
| with current algorithms. But I think OpenAI is at least as
| likely as any other player to create the next paradigm, and
| that it's at least as likely as likely as any other player to
| develop the leading models within the next paradigm
| regardless of who actually publishes the research.
|
| Separately, I think OpenAI's current investors have a >10%
| chance to hit the 100x cap on their returns. Their current
| models are already good enough to address lots of real-world
| problems that people will pay money to solve. So far they've
| been much more model-focused than product-focused, and by
| turning that dial toward the product side (as they did with
| ChatGPT) I think they could generate a lot of revenue
| relatively quickly.
|
| [0] Except maybe in the sense that future models will be
| predominantly multimodal and therefore not strictly LLMs. I
| don't think that's what you're suggesting though.
| jacobr1 wrote:
| It already is relatively trivial to fine-tune generative
| models for various use cases. Which implies huge gains to
| be had with targeted applications not just for niche
| players but also OpenAI and others to either build that
| fine-tuning into the base system, build ecosystems around
| it, or just purpose build applications on top.
| no_wizard wrote:
| All the LC grinding may come in handy after all! /s
|
| What algorithms specifically show the most results upon
| improvement? Going into this I thought the jump of improvements
| were really related more advanced automated tuning and result
| correction, in which it could be done _at scale_ as it were
| allowing a small team of data scientists to tweak the models
| until desired results were being achieved.
|
| Are you saying instead, that concrete predictive algorithms
| need improvement or are we lumping the tuning into this?
| junipertea wrote:
| We need more data efficient neural network architectures.
| Transformers work exceptionally well because they allow us to
| just dump more data into it, but ultimately we want to learn
| advanced behavior without having to feed it Shakespeare
| uoaei wrote:
| Inductive Bias Is All You Need
| tfehring wrote:
| I think it's unlikely that the first model to be widely
| considered AGI will be a transformer. Recent improvements to
| computational efficiency for attention mechanisms [0] seem to
| improve results a lot, as does RLHF, but neither is a
| paradigm shift like the introduction of transformers was.
| That's not to downplay their significance - that class of
| incremental improvements has driven a massive acceleration in
| AI capabilities in the last year - but I don't think it's
| ultimately how we'll get to AGI.
|
| [0] https://hazyresearch.stanford.edu/blog/2023-03-27-long-
| learn...
| goldenManatee wrote:
| bubble sort /s
| uoaei wrote:
| Traditional CS may have something to do with slightly
| improving the performance by allowing more training for the
| same compute, but it won't be an order of magnitude or more.
| The improvements to be gained will be found more in
| statistics than CS per se.
| jacobr1 wrote:
| I'm not sure. Methods like Chinchilla and Quantization have
| been able to reduce compute by more than an order of
| magnitude. There might very well be a few more levels of
| optimizations within the same statistical paradigm.
| brucethemoose2 wrote:
| _Better_ data is still critical, even if bigger data isn 't.
| The linked article emphasizes this.
| tfehring wrote:
| I'd bet on a 2030 model trained on the same dataset as GPT-4
| over GPT-4 trained with perfect-quality data, hands down. If
| data quality were that critical, practitioners could ignore
| the Internet and just train on books and scientific papers
| and only sacrifice <1 order of magnitude of data volume.
| Granted, that's not a negligible amount of training data to
| give up, but it places a relatively tight upper bound on the
| potential gain from improving data quality.
| NeuroCoder wrote:
| So true. There are still plenty of areas where we lack
| sufficient data to even approach applying this sort of model.
| How are we going to make similar advances in something like
| medical informatics where we not only have less data readily
| available but its much more difficult to acquire more data
| winddude wrote:
| Also scaling doesn't address some of the challenges for ai that
| chatGPT doesn't meet, like:
|
| - learning to learn, aka continual learning - internalised memory
|
| bringing it closer to actual human capabilities.
| arenaninja wrote:
| An amusing thought I've had recently is whether LLMs are in the
| same league as the millions of monkeys at the keyboard,
| struggling to reproduce one of the complete works of William
| Shakespeare.
|
| But I think not, since monkeys probably don't "improve"
| noticeably with time or input.
| mhb wrote:
| _But I think not, since monkeys probably don 't "improve"
| noticeably with time or input._
|
| Maybe once tons of bananas are introduced...
| mromanuk wrote:
| Sorry, but this sounds a lot like 640KB is all the memory you
| will ever need. What about "Socratic model" for video? There
| should me many applications that would benefit from a bigger
| model
| joebiden2 wrote:
| We will need a combination of technologies we have in order to
| really achieve emergent intelligence.
|
| Humans are comprised of various "subnets" modelling aspects
| which, in unison, produce self-conciousness and real
| intelligence. What is missing in the current line of approaches
| is that we only rely on auto-alignment of subnetworks by machine
| learning, which scales only up to a point.
|
| If we would produce a model which has
|
| * something akin a LLM as we know it today, which is able to
|
| * store or fetch facts to a short- ("context") or longterm
| ("memory") storage
|
| * if not in the current "context", query the longterm context
| ("memory") by keywords for associations, which are one-by-one
| inserted into the current "context"
|
| * repeat as required until fulfilling some self-defined condition
| ("thinking")
|
| To me, this is mostly mechanical plumbing work and lots of money.
|
| Also, if we get rid of the "word-boundedness" of LLMs - which we
| already did to some degree, as shown by the multi-language
| capabilities - LLMs would be free to roam in the domain of
| thoughts /s :)
|
| This approach could be further improved by meta-LLMs governing
| the longterm memory access, providing an "intuition" which
| longterm memory suits the provided context best. Apply recursion
| as needed to improve results (paying by exponential training
| time, but this meta-NN will quite probably be independent of
| actual training, as real life / brain organization shows).
| babyshake wrote:
| The other elements that may be required could be some version
| of the continuous sensory input that to us creates the
| sensation of "living" and, this one is a bit more
| philosophical, the sensation of suffering and a baseline
| establishment that the goal of the entity is to take actions
| that help it avoid suffering.
| joebiden2 wrote:
| I think an AI may have extra qualities by feeling suffering
| etc., but I don't think these extra qualities are rationally
| beneficial.
| thunderbird120 wrote:
| >"the company's CEO, Sam Altman, says further progress will not
| come from making models bigger. "I think we're at the end of the
| era where it's going to be these, like, giant, giant models," he
| told an audience at an event held at MIT late last week. "We'll
| make them better in other ways."
|
| So to reiterate, he is not saying that the age of giant AI models
| is over. Current top-of-the-line AI models are giant and likely
| will continue to be. However, there's not point in training
| models you can't actually run economically. Inference costs need
| to stay grounded which means practical model sizes have a limit.
| More effort is going to go into making models efficient to run
| even if it comes at the expense of making them less efficient to
| train.
| ldehaan wrote:
| I've been training large 65b models on "rent for N hours"
| systems for less than 1k per customized model. Then fine tuning
| those to be whatever I want for even cheaper.
|
| 2 months since gpt 4.
|
| This ride has only just started, fasten your whatevers.
| Voloskaya wrote:
| Finetuning cost are nowhere near representative of the cost
| to pre-train those models.
|
| Trying to replicate the quality of GPT-3 from scratch, using
| all the tricks and training optimizations in the books that
| are available now but weren't used during GPT-3 actual
| training, will still cost you north of $500K, and that's
| being extremly optimistic.
|
| GPT-4 level model would be at least 10x this using the same
| optimism (meaning you are managing to train it for much
| cheaper than OpenAI). And That's just pure hardware cost, the
| team you need to actually makes this happen is going to be
| very expensive as well.
|
| edit: To quantify how "extremely optimistic" that is, the
| very model you are finetuning, which I assume is Llama 65B,
| would cost around ~$18M to train on google cloud assuming you
| get a 50% discount on their listed GPU prices (2048 A100 GPUs
| for 5 months). And that's not even GPT-4 level.
| bagels wrote:
| $5M to train GPT-4 is the best investment I've ever seen.
| I've seen startups waste more money for tremendously
| smaller impact.
| Voloskaya wrote:
| As I stated in my comment, $5M is assuming you can do a
| much much better job than OpenAI at optimizing your
| training, only need to make a single training run, your
| employees salaries are $0, and you get a clean dataset
| for essentially free.
|
| Real cost is 10-20x that.
|
| That's still a good investment though. But the issue is
| you could very well sink $50M into this endeavour and end
| up with a model that actually is not really good and gets
| rendered useless by an open-source model that gets
| released 1 month later.
|
| OpenAI truly has unique expertise in this field that is
| very, very hard to replicate.
| moffkalast wrote:
| > and end up with a model that actually is not really
| good and gets rendered useless
|
| _ahem_ Bard _ahem_
| hcks wrote:
| Yes, but it also tells us that if Altman is honest here, then
| he doesn't believe GPT-like models can scale to near level
| human performances (because even if the cost of compute was 10x
| or even 100x it would still be economically sound).
| [deleted]
| og_kalu wrote:
| No it doesn't.
|
| For one thing they're already at human performance.
|
| For another, i don't think you realize how expensive
| inference can get. Microsoft with no scant amount of
| available compute is struggling to run gpt-4 such that
| they're rationing it between subsidiaries while they try to
| jack up compute.
|
| So saying, it would be economically sound if it cost x10 or
| x100 what it costs now is a joke.
| quonn wrote:
| How are they at human performance? Almost everything GPT
| has read on the internet didn't even exist 200 years ago
| and was invented by humans. Heck, even most of the
| programming it does wasn't there 20 years ago.
|
| Not every programmer starting from scratch would be
| brilliant, but many were self taught with very limited
| resources in the 80s form example and discovered new things
| from there.
|
| GPT cannot do this and is very far from being able to.
| og_kalu wrote:
| >How are they at human performance?
|
| Because it performs at least average human level (mostly
| well above average) on basically every task it's given.
|
| "Invest something new" is a nonsensical benchmark for
| human level intelligence. The vast majority of people
| have never and will never invent anything new.
|
| If your general intelligence test can't be passed by a
| good chunk of humanity then it's not a general
| intelligence test unless you want to say most people
| aren't generally intelligent.
| quonn wrote:
| Yeah these intelligence tests are not very good.
|
| I would argue some programmers do in fact invent
| something new. Not all of them, but some. Perhaps 10%.
|
| Second the point is not whether everyone is by profession
| an inventor but whether most people can be inventors. And
| to a degree they can be. I think you underestimate that
| by a large margin.
|
| You can lock people in a room and give them a problem to
| solve and they will invent a lot if they have the time to
| do it. GPT will invent nothing right now. It's not there
| yet.
| og_kalu wrote:
| >Yeah these intelligence tests are not very good.
|
| Lol Okay
|
| >And to a degree they can be. I think you underestimate
| that by a large margin.
|
| Do i? Because i'm not the one making unverifiable claims
| here.
|
| >You can lock people in a room and give them a problem to
| solve and they will invent a lot if they have the time to
| do it.
|
| If you say so
| smeagull wrote:
| This tells me you haven't really stress tested the model.
| GPT is currently at the stage of "person who is at the
| meeting, but not really paying attention so you have to
| call them out". Once GPT is pushed, it scrambles and falls
| over for most applications. The failure modes range from
| contradicting itself, making up things for applications
| that shouldn't allow it, to ignoring prompts, to simply
| being unable to perform tasks at all.
| dragonwriter wrote:
| Are we talking about bare GPT through the UI, or GPT with
| a framework giving it access to external systems and the
| ability to store and retrieve data?
|
| Because, yeah, "brain in a jar" GPT isn't enough for most
| tasks beyond parlor-trick chat, but being used as a brain
| in a jar isn't the point.
| moffkalast wrote:
| Still waiting to see those plugins rolled out and actual
| vector DB integration with GPT 4, then we'll see what it
| can really do. Seems like the more context you give it
| the better it does, but the current UI really makes it
| hard to provide that.
|
| Plus the recursive self prompting to improve accuracy.
| mullingitover wrote:
| Quality over quantity. Just building a model with a gazillion
| parameters isn't indicative of quality, you could easily have
| garbage parameters with tons of overfitting. It's like
| megapixel counts in cameras: you might have 2000 gigapixels in
| your sensor, but that doesn't mean you're going to get great
| photos out of it if there are other shortcomings in the system.
| sanxiyn wrote:
| What overfitting? If anything, LLMs suffer from underfitting,
| not overfitting. Normally, overfitting is characterized by
| increasing validation loss while training loss is decreasing,
| and solved by early stopping (stopping before that happens).
| Effectively, all LLMs are stopped early, so they don't suffer
| from overfitting at all.
| spaceman_2020 wrote:
| Is cost really that much of a burden?
|
| Intelligence is the single most expensive resource on the
| planet. Hundreds of individuals have to be born, nurtured, and
| educated before you might get an exceptional 135+ IQ
| individual. Every intelligent person is produced at a great
| societal cost.
|
| If you can reduce the cost of replicating a 135 IQ, or heck,
| even a 115 IQ person to a few thousand dollars, you're beating
| biology by a massive margin.
| oezi wrote:
| Since IQ is just a normal distribution on a population it is
| a bit misleading to talk about it like that.
|
| Even if we don't expend any cost on education the number of
| people with IQ 135 stays the same.
| yunwal wrote:
| But we're still nowhere near that, or even near surpassing
| the skill of an average person at a moderately complex
| information task, and GPT-4 supposedly took hundreds of
| millions to train. It also costs a decent amount more to run
| inference on it vs. 3.5. It probably makes sense to prove the
| concept that generative AI can be used for lots of real work
| before scaling that up by another order of magnitude for
| potentially marginal improvements.
|
| Also, just in terms of where to put your effort, if you think
| another direction (for example, fine-tuning the model to use
| digital tools, or researching how to predict confidence
| intervals) is going to have a better chance of success, why
| focus on scaling more?
| spaceman_2020 wrote:
| There are a _lot_ of employees at large tech consultancies
| that don 't really do anything that can't be automated away
| by even current models.
|
| Sprinkle in some more specific training and I can totally
| see entire divisions at IBM and Accenture and TCS being
| made redundant.
|
| The incentive structures are perversely aligned for this
| future - the CEO who manages to reduce headcount while
| increasing revenue is going to be very handsomely rewarded
| by Wall Street.
| skyechurch wrote:
| Wall Street would be strongly incentivised to install an
| AI CEO.
| dauertewigkeit wrote:
| Are intelligent people that valuable? There's lots of them at
| every university working for peanuts. They don't seem to be
| that valued by society, honestly.
| taylorius wrote:
| IQ isn't all that. Mine is 140+ and I'm just a somewhat
| well paid software engineer. It's TOO abstract a metric in
| my view - for sure it doesn't always translate into real
| world success.
| roflyear wrote:
| Right were very much in the same boat. I'm good at
| pattern recognition I guess. I learn things quickly. What
| else? I don't have magic powers really. I still get
| headaches and eat junk food.
| spaceman_2020 wrote:
| If you ask any Fortune 500 CEO if he could magically take
| all the 135 IQ artists and academics and vagabonds, erase
| all their past traumas, put them through business or tech
| school, and put them to work in their company, they would
| all say 100% yes.
|
| An equivalent AI won't have any agency and will be happy
| doing the boring work other 135 IQ humans won't.
| roflyear wrote:
| My IQ is 140 and I'm far from exceptional.
| jutrewag wrote:
| 115 IQ isn't all that high- that's basically every Indian
| American or a healthy percentage of the Chinese population.
|
| Edit: I don't understand the downvotes. I don't mean this in
| any disparaging way, just that an AGI is probably going to be
| a lot higher than that.
| spaceman_2020 wrote:
| 115 IQ is perfectly fine for the majority of human
| endeavors.
| asdfman123 wrote:
| The reason we put everyone through school is we believe that
| it's in society's best interest to educate everyone to the
| peak of their abilities. It's good for many different
| reasons.
|
| It would be much easier to identify gifted kids and only
| educate them, but I happen to agree that universal education
| is better.
| gowld wrote:
| It would be much easier to identify gifted kids and only
| educate them
|
| Is it so easy?
| LesZedCB wrote:
| the way i see it, the expensive part should be to train the
| models via simulated architectures in GPUs or TPUs or whatever.
|
| but once they are trained, is there a way to encode the base
| models into hardware where inference costs are basically
| negligible? hopefully somebody is seeing if this is possible,
| using structurally encoded hardware to make inference costs
| basically nil/constant.
| [deleted]
| antibasilisk wrote:
| it's over, billions of parameters must be released
| rhelz wrote:
| All warfare is based on deception -- Sun Zu
| donpark wrote:
| I think Sam is referring to transition from "Deep" to "Long"
| learning [1]. What new emergent properties, if any, will 1
| billion tokens unlock?
|
| [1] https://hazyresearch.stanford.edu/blog/2023-03-27-long-
| learn...
| [deleted]
| carlsborg wrote:
| The 2017 Transformers paper has ~71,000 papers citing it. The
| sheer magnitude of human mental effort globally that is chasing
| the forefront of machine learning is unprecedented and amazing.
___________________________________________________________________
(page generated 2023-04-17 23:00 UTC) |