[HN Gopher] OpenAI's CEO says the age of giant AI models is alre...
___________________________________________________________________
 
OpenAI's CEO says the age of giant AI models is already over
 
Author : labrador
Score  : 185 points
Date   : 2023-04-17 17:25 UTC (5 hours ago)
 
web link (www.wired.com)
w3m dump (www.wired.com)
 
| boringuser2 wrote:
| Eh.
| 
| Altman has a financial incentive to lie and obfuscate about what
| it takes to train a model like GPT-4 and beyond, so his word is
| basically worthless.
 
  | qqtt wrote:
  | First of all, if Altman continually makes misleading statements
  | about AI he will quickly lose credibility, and that short term
  | gain in whatever 'financial incentive' that birthed the lie
  | would be eroded in short order by a lack of trust of the head
  | of one of the most visible AI companies in the world.
  | 
  | Secondly, all the competitors of OpenAI can plainly assess the
  | truth or validity of Altman's statements. There are many
  | companies working in tandem on things at the OpenAI scale of
  | models, and they can independently assess the usefulness of
  | continually growing models. They aren't going to take this
  | statement at face value and change their strategy based on a
  | single statement by OpenAI's CEO.
  | 
  | Thirdly, I think people aren't really reading what Altman
  | actually said very closely. He doesn't say that larger models
  | aren't useful at all, but that the next sea change in AI won't
  | be models which are orders of magnitude bigger, but rather a
  | different approach to existing problem sets. Which is an
  | entirely reasonable prediction to make, even if it doesn't turn
  | out to be true.
  | 
  | All in all, "his word is basically worthless" seems much to
  | harsh an assessment here.
 
    | manojlds wrote:
    | Elon Musk has been constantly doing this and thriving.
 
    | cogitoergofutuo wrote:
    | It is possible that GP meant that Altman's word is basically
    | worthless _to them_ , in which case that's not something that
    | can be argued about. It's a factually true statement that
    | that is their opinion of that man.
    | 
    | I personally can see why someone could arrive at that
    | position. As you've pointed out, taking Sam Altman at face
    | value can involve suppositions about how much he values his
    | credibility, how much stock OpenAI competitors put in his
    | public statements, and the mindsets _people in general_ have
    | when reading what he writes.
 
    | mnky9800n wrote:
    | dude someone lied their way into being president of the
    | united states all while people fact checked him basically
    | immediately after each lie. lying doesnt make a difference.
 
      | beowulfey wrote:
      | He's not presenting false evidence here, he's presenting a
      | hunch. It's a guess. No one is going to gain anything from
      | this one way or another.
 
  | olalonde wrote:
  | Does he even have any background in machine learning? I always
  | found it bizarre that he was chosen to be OpenAI's CEO...
 
    | cowmix wrote:
    | On the Lex Fridman podcast, he pretty much admitted he's not
    | an AI (per se) and isn't the most excited about the tech (as
    | he could be).
 
      | olalonde wrote:
      | > he pretty much admitted he's not an AI
      | 
      | Yeah, I also had a hunch he wasn't an AI. (I assume you
      | meant "AI researcher" there :))
      | 
      | All joking aside, I wonder how that's affecting company
      | morale or their ability to attract top researchers. I know
      | if I was a top AI researcher, I'd probably rather work at a
      | company where the CEO was an expert in the field (all else
      | being equal).
 
        | vorticalbox wrote:
        | I feel most CEOs are not top of their field but rather
        | people who can take a vision and run with it.
 
        | olalonde wrote:
        | It might be true in general; however, AI research
        | laboratories are typically an exception, as they are
        | often led by experienced AI researchers or scientists
        | with extensive expertise in the field.
 
    | gowld wrote:
    | He has background in CEO (smooth-talking charmer in the VC
    | crowd). That's why he's CEO.
 
  | g_delgado14 wrote:
  | IIRC Altman has no financial stake in the success or failure of
  | OpenAI to prevent these sorts of conflicts of interests between
  | OpenAI and society as a whole
 
    | shagie wrote:
    | https://www.cnbc.com/2023/03/24/openai-ceo-sam-altman-
    | didnt-... (https://news.ycombinator.com/item?id=35289044 - 24
    | days ago; 158 points, 209 comments)
    | 
    | > OpenAI's ChatGPT unleashed an arms race among Silicon
    | Valley companies and investors, sparking an A.I. investment
    | craze that proved to be a boon for OpenAI's investors and
    | shareholding employees.
    | 
    | > But CEO and co-founder Sam Altman may not notch the kind of
    | outsize payday that Silicon Valley founders have enjoyed in
    | years past. Altman didn't take an equity stake in the company
    | when it added the for-profit OpenAI LP entity in 2019,
    | Semafor reported Friday.
 
  | cowpig wrote:
  | OpenAI has gone from open-sourcing its work, to publishing
  | papers only, to publishing papers that omit important
  | information, to GPT-4 being straight-up closed. And Sam Altman
  | doesn't exactly have a track record of being overly concerned
  | about the truth of his statements.
 
    | smeagull wrote:
    | This trend has happened in the small for their APIs as well.
    | They've been dropping options - the embeddings aren't the
    | internal embeddings any more, and you don't have access to
    | log probabilities. It's all closing up at every level.
 
    | transcriptase wrote:
    | I had a fun conversation (more like argument) with ChatGPT
    | about the hypocrisy of OpenAI. It would explicitly contradict
    | itself and then began starting every reply with "I can see
    | why someone might think..." and then just regurgitating fluff
    | about democratizing AI. I finally was able to have it define
    | democratization of technology and then recognize the
    | absurdity of using that label to describe a pivot to gating
    | models and being for-profit. Then it basically told me "well
    | it's for safety and protecting society".
    | 
    | An AI, when presented with facts counter to what it thought
    | it should say, agreed and basically went: "Won't someone
    | PLEASE think of the children!"
    | 
    | Love it.
 
      | dopidopHN wrote:
      | Without getting into morality.
      | 
      | It's pretty easy to have chatGPT contradict itself, point
      | it out and have the LLM respond << well, I'm just
      | generating text, nobody said it had to be correct >>
 
      | machina_ex_deus wrote:
      | It was trained on corpus full of mainstream media lies, why
      | would you have expected otherwise? It's by far the most
      | common deflection in its training set.
      | 
      | It's easy to recognize and laugh at the AI replying with
      | the preprogrammed narrative, I'm still waiting for the
      | majority of people realizing they are given the same
      | training materials, non-stop, with the same toxic
      | narratives, and becoming programmed in the same way, and
      | that is what results in their current worldview.
      | 
      | And no, it's not enough to be "skeptic" of mainstream
      | media. It's not even enough to "validate" them. Or to go to
      | other sources. You need to be reflective enough to realize
      | that they a pushing a flawed reasoning methods, and then
      | abusing them again and again, to get you used to their
      | brand of reasoning.
      | 
      | Their brand of reasoning is just basically reasoning with
      | brands. You're given negative sounding words for things
      | they want you to think are bad, and positive sounding words
      | for things they want you to think are good, and
      | continuously reinforce these connections. They brand true
      | democracy (literally rule of the people) as populism and
      | tell you it's a bad thing. They brand freedom of speech as
      | "misinformation". They brand freedom as "choice" so that
      | you will not think of what you want to do, but which of the
      | things they allow you to do will you do. Disagree with the
      | scientific narrative? You're "science denier". Even as a
      | professional scientist. Conspiracy theory isn't a defined
      | word - it is a brand.
      | 
      | You're trained to judge goodness or badness instinctively
      | by their frequency and peer pressure, and produce the
      | explanation after your instinctive decision, instead of the
      | other way around.
 
      | gowld wrote:
      | Transcripts of other people's GPT chats are like photos of
      | other people's kids.
 
      | mstolpm wrote:
      | Why are you discussing OpenAI with ChatGPT? I'm honestly
      | interested.
      | 
      | I would imagine that any answer of ChatGPT on that topic is
      | either (a) ,,hallucinated" and not based on any verifiable
      | fact or (b) scripted in by OpenAI.
      | 
      | The same question pops up for me whenever someone asks
      | ChatGPT about the internals and workings of ChatGPT. Am I
      | missing something?
 
        | dopidopHN wrote:
        | I've try because it's tempting and the first attempts do
        | give a << conversation >> vibe.
        | 
        | I was curious about state persistence between prompt, or
        | how to get my prompt better, or having a idea of the
        | training data.
        | 
        | Only got crap and won't spend time doing that again
 
  | [deleted]
 
  | solveit wrote:
  | Anyone with the expertise to have insightful takes in AI also
  | has a financial incentive to steer the conversation in
  | particular directions. This is also the case for many, many
  | other fields! You do not become an expert by quarantining your
  | livelihood away from your expertise!
  | 
  | The correct response is not to dismiss every statement from
  | someone with a conflict of interest as "basically worthless",
  | but to talk to lots of people and to be _reasonably_ skeptical.
 
  | hbn wrote:
  | It could also be argued that there's financial incentive to
  | just saying "giving us more money to train bigger models =
  | better AI" forever
 
    | Art9681 wrote:
    | I don't think these comments are driven from financial
    | incentives. It's a distraction and only a fool would believe
    | Altman here. What this likely means is they are prioritizing
    | adding more features to their current models while they train
    | the next version. Their competitors scramble to build an LLM
    | with some sort of intelligence parity, when that happens no
    | one will care because ChatGPT has the ecosystem and plugins
    | and all the advanced features....and by the time their
    | competitors reach feature parity in that area, OpenAI pulls
    | its Ace card and drops GPT5. Rinse and repeat.
    | 
    | That's my theory and if I was a tech CEO in any of the
    | companies competing in this space, that is what I would plan
    | for.
    | 
    | Training an LLM will be the easy part going forward. It's
    | building an ecosystem around it and hooking it up to
    | everything that will matter. OpenAI will focus on this, while
    | not-so-secretly training their next iterations.
 
      | LoganWhitwer wrote:
      | [dead]
 
      | Spivak wrote:
      | text-davinci-003 but cheaper and runs on your own hardware
      | is already a massive selling point. If you you release a
      | foundational model at parity with GPT4 you'll win overnight
      | because OpenAI's chat completions are awful even with the
      | super advanced model.
 
  | anonkogudhyfhhf wrote:
  | People can be honest even when money is involved. His word is
  | worthless because it's Altman
 
  | neximo64 wrote:
  | Citation needed. What are his financial incentives?
 
  | Gatsky wrote:
  | Do you think GPT-4 was trained and then immediately released to
  | the public? Training finished Aug 2022. They spent the next 6
  | months improving it in other ways (eg human feedback). What he
  | is saying is already evident therefore.
 
  | brookst wrote:
  | In this case I think it's Wired that's lying. Altman didn't say
  | large models have no value, or that there will be no more large
  | models, or that people shouldn't invest in large models.
  | 
  | He said that we are at the end of the era where capability
  | improvements come primarily from making models bigger. Which
  | stands to reason... I don't think anyone expect us to hit 100T
  | parameters or anything.
 
    | jutrewag wrote:
    | What about 1T though, seems silly to stop here.
 
| gardenhedge wrote:
| Sam Altman and OpenAI must be pretty nervous. They have first
| mover advantage but they hold no hook or moat.
| 
| Unless they can somehow keep their improvements ahead of the rest
| of the industry then they'll be lost among a crowd.
 
| sgu999 wrote:
| Is anyone aware of techniques to prune a model from useless
| knowledge to leave more space for the reasoning capabilities?
| 
| It really shouldn't matter that it can give the exact birthdate
| of Steve Wozniac, as long as it can properly make a query to
| fetch it and deal with the result.
 
  | cloudking wrote:
  | I follow your design, couldn't you also solve hallucinations
  | with a "fact checking" LLM (connected to search) that corrects
  | the output of the core LLM? You would take the output of the
  | core LLM, send it to the fact checker with a prompt like
  | "evaluate this output for any potential false statements, and
  | perform an internet search to validate and correct them"
 
| ldehaan wrote:
| This is just push back from elon and crews fake article about the
| dangers of AI, they specifically state the next versions will be
| ultra deadly.
| 
| Sam is now saying there will be no future model that will be as
| good.
| 
| This is all positioning to get regulators off the track because
| none of these control freaks in government actually understand a
| whit of this.
| 
| All said and done, this all just to try to disempower the OSS
| community. But they can't, we're blowing past their barriers like
| the 90s did with the definition of slippery slope.
 
| generalizations wrote:
| I'd bet that what he, and the competition, is realizing is that
| the bigger models are too expensive to run.
| 
| Pretty sure Microsoft swapped out Bing for something a lot
| smaller in the last couple of weeks; Google hasn't even tried to
| implement a publicly available large model. And OpenAI still has
| usage caps on their GPT-4.
| 
| I'd bet that they can still see improvement in performance with
| GPT-5, but that when they look at the usage ratio of GPT3.5
| turbo, gpt3.5 legacy, and GPT4, they realized that there is a
| decreasing rate of return for increasingly smart models - most
| people don't need a brilliantly intelligent assistant, they just
| need a not-dumb assistant.
| 
| Obviously some practitioners of some niche disciplines (like ours
| here) would like a hyperintelligent AI to do all our work for us.
| But even a lot of us are on the free tier of ChatGPT 3.5; I'm one
| of the few paying $20/mo for GPT4; and idk if even I'd pay e.g.
| $200/mo for GPT5.
 
  | deepsquirrelnet wrote:
  | > I'd bet that what he, and the competition, is realizing is
  | that the bigger models are too expensive to run.
  | 
  | I think it's likely that they're out of training data to
  | collect. So adding more parameters is no longer effective.
  | 
  | > most people don't need a brilliantly intelligent assistant,
  | they just need a not-dumb assistant.
  | 
  | I tend to agree, and I think their pathway toward this will all
  | come from continuing advances in fine tuning. Instruction
  | tuning, RLHF, etc seem to be paying off much more than scaling.
  | I bet that's where their investment is going to be turning.
 
| jstx1 wrote:
| Ilya Sutskever from OpenAI saying that the data situation is good
| and there's more data to train on -
| https://youtu.be/Yf1o0TQzry8?t=657
 
| galaxytachyon wrote:
| What age? Like, 3 years?
| 
| On the other hand though, Chinchilla and multimodal approaches
| already showed how later AIs can be improved beyond throwing
| petabytes of data at them.
| 
| It is all about variety and quality from now on I think. You can
| teach a person all about the color zyra but without actually ever
| seeing it, they will never fully understand that color.
 
  | idiotsecant wrote:
  | It does seem, though, that using chinchilla like techniques
  | does not create a copy with the same quality as the original.
  | It's pretty good for some definition of the phrase, but it
  | isn't equivalent, it's a lossy technique.
 
    | galaxytachyon wrote:
    | I agree on the lossy. There is a tradeoff between efficiency
    | and comprehensiveness, kind of. It would be pretty funny if
    | in the end, the most optimal method turns out to be the brain
    | we already have. Extremely efficient, hardware optimized, but
    | slow as hell and misunderstand stuff all the time unless
    | prompted with specific phrases.
 
| jcims wrote:
| I'm no expert but doesn't the architecture of minigpt4 that's on
| the front page right now give some indication of what the future
| might look like?
 
  | MuffinFlavored wrote:
  | eh, I haven't personally found a usecase for LLMs yet given the
  | fact that you can't trust the output and it needs to be
  | verified by a human (which might as well be just as time
  | consuming/expensive as actually doing the task yourself)
 
    | Uehreka wrote:
    | I'd reconsider the "might as well just be as time consuming"
    | thing. I see this argument about Copilot a lot, and it's
    | really wrong there, so it might be wrong here too.
    | 
    | Like, for most of the time I'm using it, Copilot saves me 30
    | seconds here and there and it takes me about a second to look
    | at the line or two of code and go "yeah, that's right". It
    | adds up, especially when I'm working with an unfamiliar
    | language and forget which Collection type I'm going to need
    | or something.
 
      | MuffinFlavored wrote:
      | > Like, for most of the time I'm using it, Copilot saves me
      | 30 seconds here and there and it takes me about a second to
      | look at the line or two of code and go "yeah, that's
      | right".
      | 
      | I've never used Copilot but I've tried to replace
      | StackOverflow with ChatGPT. The difference is, the
      | StackOverflow responses compile/are right. The ChatGPT
      | responses will make up an API that doesn't exist. Major
      | setback.
 
    | idiotsecant wrote:
    | No? I use it all the time to help me, for example, read ML
    | threads when I run into a term I don't immediately
    | understand. I can do things like 'explain this at the level
    | of a high school student'
 
    | JoshuaDavid wrote:
    | They're good for tasks where generation is hard but
    | verification is easy. Things like "here I gesture at a vague
    | concept that I don't know the name of, please tell me what
    | the industry-standard term for this thing is" where figuring
    | out the term is hard but looking up a term to see what it
    | means is easy. "Create an accurate summary of this article"
    | is another example - reading the article and the summary and
    | verifying that they match may be easier than writing the
    | summary yourself.
 
    | MattPalmer1086 wrote:
    | Thing is, you can't trust what you find on stack overflow or
    | other sources either. And searching, reading documentation
    | and so on takes a lot of time too.
    | 
    | I've personally been using it to explore using different
    | libraries to produce charts. I managed to try out about 5
    | different libraries in a day with fairly advanced options for
    | each using chatGPT.
    | 
    | I might have spent a day in the past just trying one and not
    | to the same level of functionality.
    | 
    | So while it still took me a day, my final code was much
    | better fitted to my problem with increased functionality. Not
    | a time saver then for me but a quality enhancer and I learned
    | a lot more too.
 
      | MuffinFlavored wrote:
      | > Thing is, you can't trust what you find on stack overflow
      | or other sources either.
      | 
      | Eh. An outdated answer will be called out in the
      | comments/downvoted/updated/edited more often than not, no?
 
        | MattPalmer1086 wrote:
        | Maybe, maybe not. I get useful results from it, but it
        | doesnt always work. And it's usually not quite what I'm
        | looking for, so then I have to go digging around to find
        | out how to tweak it. It all takes time and you do not get
        | a working solution out of the box most of the time.
 
    | causi wrote:
    | I've enjoyed using it for very small automation tasks. For
    | instance, it helped me write scripts to take all my
    | audiobooks with poor recording quality, split them into
    | 59-minute chunks, and upload them to Adobe's free audio
    | enhancement site to vastly improve the listening experience.
 
| textninja wrote:
| I call bullshit. There will be bigger and better models. The
| question is not whether big companies will invest in training
| them (they will), but whether they'll be made available to the
| public.
 
| labrador wrote:
| https://archive.is/s4V9e
| 
|  _He did not say what kind of research strategies or techniques
| might take its place. In the paper describing GPT-4, OpenAI says
| its estimates suggest diminishing returns on scaling up model
| size. Altman said there are also physical limits to how many data
| centers the company can build and how quickly it can build them._
 
  | ftxbro wrote:
  | > In the paper describing GPT-4, OpenAI says its estimates
  | suggest diminishing returns on scaling up model size.
  | 
  | I read the two papers (gpt 4 tech report, and sparks of agi)
  | and in my opinion they don't support this conclusion. They
  | don't even say how big GPT-4 is, because "Given both the
  | competitive landscape and the safety implications of large-
  | scale models like GPT-4, this report contains no further
  | details about the architecture (including model size),
  | hardware, training compute, dataset construction, training
  | method, or similar."
  | 
  | > Altman said there are also physical limits to how many data
  | centers the company can build and how quickly it can build
  | them.
  | 
  | OK so his argument is like "the giant robots won't be powerful,
  | but we won't show how big our robots are, and besides, there
  | are physical limits to how giant of a robot we can build and
  | how quickly we can build it." I feel like this argument is sus.
 
    | sangnoir wrote:
    | OpenAI has likely run into a wall (or is about to) for model
    | size given it's funding amount/structure[1] - unlike its
    | competition who actually own data centers and have lower
    | marginsl costs. It's just like when peak-iPad Apple claimed
    | that a "post-PC" age was upon us.
    | 
    | 1. What terms could Microsoft wring out of OpenAI for another
    | funding round?
 
| curiousllama wrote:
| I believe Altman, but the title is misleading.
| 
| Have we exhausted the value of larger models on current
| architecture? Probably yes. I trust OpenAI would throw more $ at
| it if there was anything left on the table.
| 
| Have we been here before? Also yes. I recall hearing similar
| things about LSTMs when they were in vogue.
| 
| Will the next game changing architecture require a huge model?
| Probably. Don't see any sign these things are scaling _worse_
| with more data/compute.
| 
| The age of huge models with current architecture could be over,
| but that started what, 5 years ago? Who cares?
 
| it wrote:
| Interesting how this contradicts "The Bitter Lesson":
| http://incompleteideas.net/IncIdeas/BitterLesson.html.
 
  | sebzim4500 wrote:
  | I don't think there is a contradiction at all. Altman is
  | essentially saying they are running out of compute and
  | therefore can't meaningfully scale further. Not that scaling
  | further would be a worse plan longterm than coming up with new
  | algorithms.
 
| fergie wrote:
| The most comforting AI news I have read this year.
 
  | og_kalu wrote:
  | Title is misleading lol. Plenty of scale room left.
 
  | jackmott42 wrote:
  | If you are worried about AI, this shouldn't make you feel a ton
  | better. GPT4 is just trained to predict the next word, a very
  | simple but crude approach and look what it can do!
  | 
  | Imagine when a dozen models are wired together and giving each
  | other feedback with more clever training and algorithms on
  | future faster hardware.
  | 
  | It is still going to get wild
 
    | ShamelessC wrote:
    | Machine learning is actually premised on being "simple" to
    | implement. The more priors you hardcode with clever
    | algorithms, the closer you get to what we already have. The
    | point is to automate the process of learning. We do this now
    | with relatively simple loss functions and models containing
    | relatively simple parameters. The main stipulation is that
    | they are all defined to be continuous so that you can use the
    | chain rule from calculus to calculate the error with respect
    | to every parameter without taking so long that it would never
    | finish.
    | 
    | I agree that your suggested approach of applying cleverness
    | to what we have now will probably produce better results. But
    | that's not going to stop better architectures, hardware and
    | even entire regimes from being developed until we approach
    | AGI.
    | 
    | My suspicion is that there's still a few breakthroughs
    | waiting to be made. I also suspect that sufficiently advanced
    | models will make such breakthroughs easier to discover.
 
    | xwdv wrote:
    | People think something magical happens when AI are wired
    | together and give each other feedback.
    | 
    | Really you're still just predicting the next word, but with
    | extra steps.
 
      | Teever wrote:
      | People think that something magical happens when
      | transistors are wired together and give each other
      | feedback.
      | 
      | Really you're just switching switches on and off, but with
      | extra steps.
 
    | ryneandal wrote:
    | Personally, I'm less worried about AI than I am about what
    | people using these models can do to others.
    | Misinformation/disinformation, more believable scams, stuff
    | like that.
 
    | causi wrote:
    | I worry that the hardware requirements are only going to
    | accelerate the cloud-OS integration. Imagine a PC that's
    | entirely unusable offline.
 
      | cj wrote:
      | > Imagine a PC that's entirely unusable offline.
      | 
      | FWIW we had thin clients in computer labs in middle school
      | / high school 15 years ago (and still today these are
      | common in enterprise environments, e.g. Citrix).
      | 
      | Biggest issue is network latency which is limited by the
      | speed of light, so I imagine if computers in 10 years
      | require resources not available locally it would likely be
      | a local/cloud hybrid model.
 
    | ignoramous wrote:
    | > _Imagine when a dozen models are wired together..._
    | 
    | Wouldn't these models hallucinate more than normal, then?
 
    | quonn wrote:
    | I have repeatedly argued against this notion of ,,just
    | predicting the next word". No. It's completing a
    | conversation. It's true that it is doing this word by word,
    | but it's kind of like saying a CNN is just predicting a
    | label. Sure, but how? It's not doing it directly. It's doing
    | it by recovering a lot of structure and in the end boiling
    | that down to a label. Likewise a network trained to predict
    | the next word may very well have worked out the whole
    | sentence (implicitly, not as a text) in order to generate the
    | next word.
 
  | Freire_Herval wrote:
  | [dead]
 
| stephencoxza wrote:
| The role of a CEO is more to benefit the company than the public.
| Only time will tell.
| 
| I am curious though how something like Moore's Law relates to
| this. Yes, model architectures will deal with complexity better
| and the amount of data helps as well. There must be a relation
| between technology innovation and cost which alludes to
| effectiveness. Innovation in computation, model architecture,
| quality of data, etc.
 
| summerlight wrote:
| The point is that now we're at the point of diminishing return
| for increasing model size, unless we find a better modeling
| architecture than Transformer.
| 
| I think this is likely true; while all the other companies
| underestimated the capability of transformer (including Google
| itself!), OpenAI made a fairly accurate bet on the transformer
| based on the scaling law, put all the efforts to squeeze it until
| the last drop and took all the rewards.
| 
| It's likely that GPT-4 is on the optimal spot between cost and
| performance and there won't be significant improvements on
| performance in a near future. I guess the next task would be more
| on efficiency, which has a significant implication on its
| productionization.
 
  | chubs wrote:
  | Does this mean we've reached the next AI winter? This is as
  | good as it gets for quite a long time? Honest question :)
  | perhaps this will postpone everyone's fears about the
  | singularity...
 
    | ericabiz wrote:
    | Many years ago, there was an image that floated around with
    | Craigslist and all the websites that replaced small parts of
    | it--personals, for sale ads, etc. It turned out the way to
    | beat Craigslist wasn't to build Yet Another Monolithic
    | Craigslist, but to chunk it off in pieces and be the best at
    | that piece.
    | 
    | This is analogous to what's happening with AI models. Sam
    | Altman is saying we have reached the point where spending
    | $100M+ trying to "beat" GPT-4 at everything isn't the future.
    | The next step is to chunk off a piece of it and turn it into
    | something a particular industry would pay for. We already see
    | small sprouts of those being launched. I think we will see
    | some truly large companies form with this model in the next
    | 5-10 years.
    | 
    | To answer your question, yes, this may be as good as it gets
    | now for monolithic language models. But it is just the
    | beginning of what these models can achieve.
 
      | robocat wrote:
      | https://www.today.com/money/speculation-craigslist-slowly-
      | dy... from 2011 - is that what you were thinking of?
      | Strange how few of those logos have survived, and how many
      | new logos would now be on it. It would be interesting to
      | see a modernised version.
 
    | 015a wrote:
    | The current stage is now productionizing what we have;
    | finding product fits for it, and making it cheaper. Even
    | GPT-4 isn't necessary to push forward what is possible with
    | AI; if you think about something dumb like "load all of my
    | emails into a language model in real time, give me digests,
    | automatically write responses for ones which classify with
    | characteristics X/Y/Z, allow me to query the model to answer
    | questions, etc": This does not really exist yet, this would
    | be really valuable, and this does not need GPT-4.
    | 
    | Another good example is in the coding landscape, which feels
    | closer to existing. Ingest all of a company's code into a
    | model like this, then start thinking about what you can do
    | with it. A chatbot is one thing, the most obvious thing, but
    | there's higher order product use-cases that could be
    | interesting (e.g. you get an error in Sentry, stack trace
    | points Sentry to where the error happened, language model
    | automatically PRs a fix, stuff like that).
    | 
    | This shit excites me WAY WAY more than GPT-5. We've unlocked
    | like 0.002% of the value that GPT-3/llama/etc could be
    | capable of delivering. Given the context of broad concern
    | about cost of training, accidentally inventing an AGI,
    | intentionally inventing an AGI; If I were the BDFL of the
    | world, I think we've got at least a decade of latent value
    | just to capture out of GPT-3/4 (and other models). Let's hit
    | pause. Let's actually build on these things. Let's find a
    | level of efficiency that is still valuable without spending
    | $5B in a dick measuring contest [1] to suss out another 50
    | points on the SAT. Let's work on making edge/local inference
    | more possible. Most of all, let's work on safety, education,
    | and privacy.
    | 
    | [1] https://techcrunch.com/2023/04/06/anthropics-5b-4-year-
    | plan-...
 
    | frozenport wrote:
    | No. Winter means people have lost interest in the research.
    | 
    | If anything successes in ChatGPT etc will be motivation for
    | continued efforts.
 
      | mkl wrote:
      | Winter means people have lost _funding_ for the research.
      | The ongoing productionising of large language models and
      | multimodal models mean that that probably won 't happen for
      | quite a while.
 
  | fauxpause_ wrote:
  | Seems like a wild claim to make without any examples of gpt
  | models which are bigger and no demonstrably better.
 
    | xipix wrote:
    | Perhaps (a) there do exist bigger models that weren't better
    | or (b) this model isn't better than somewhat smaller ones.
    | Perhaps the CEO has seen diminishing returns.
 
    | hackerlight wrote:
    | It's not a wild claim when you have empirically well-
    | validated scaling laws which make this very prediction.
 
    | mensetmanusman wrote:
    | Better on which axis? Do you want an AI that takes one hour
    | to respond to? Some would for certain fields, but getting
    | something fast and cheap is going to be hard now that Moore's
    | law is over.
 
    | mnky9800n wrote:
    | or like a curve of model complexity versus results or
    | whatever showing it asymptotically approaches whatever.
    | 
    | actually there was a great paper from microsoft research from
    | like 2001 on spam filtering where they demonstrated that
    | model complexity necessary for spam filtering went down as
    | the size of the data set went up. That paper, which i can't
    | seem to find now, had a big impact on me as a researcher
    | because it so clearly demonstrated that small data is usually
    | bad data and sophisticated models are sometimes solving
    | problems will small data sets instead of problems with data.
    | 
    | of course this paper came out the year friedman published his
    | gradient boosting paper, i think random forest also was only
    | recently published then as well (i think there is a paper
    | from 1996 about RF and briemans two cultures paper came out
    | this year where he discusses RF i believe), and this is a
    | decade before gpu based neural networks. So times are
    | different now. But actually i think the big difference is
    | these days i probably ask chatgpt to write the boiler plate
    | code for a gradient boosted model that takes data out of a
    | relational database instead of writing it myself.
 
      | nomel wrote:
      | > model complexity necessary for spam filtering went down
      | as the size of the data set went up
      | 
      | My naive conclusion in that this means there are still
      | massive gains to be had, since, for example, something like
      | ChatGPT is just text, and the phrase "a picture is worth a
      | thousand words" seems incredibly accurate, from my
      | perspective. There's an incredible amount of non-text data
      | out there still. Especially technical data.
      | 
      | Is there any merit to this belief?
 
        | jacobr1 wrote:
        | Yes. One of the frontiers of current research seems to be
        | multi-modal models.
 
      | [deleted]
 
    | summerlight wrote:
    | https://twitter.com/SmokeAwayyy/status/1646670920214536193
    | 
    | Sam explicitly said that there won't be GPT-5 in the near
    | future, which is pretty clear evidence unless he's blatantly
    | lying in public speaking.
 
      | kjellsbells wrote:
      | Well, "no GPT-5" isn't the same as saying "no new trained
      | model", especially in the realm of marketing. Welcome to
      | "GPT 2024" could be his next slogan.
 
      | thehumanmeat wrote:
      | That is one AI CEO out of 10,000. Just because OpenAI may
      | not be interested in a larger model _in the short term_
      | doesn 't mean nobody else won't pursue it.
 
  | bitL wrote:
  | Transformers were known that they kept scaling up with more
  | parameters and more training data so if the Open AI hit the
  | limits of this scaling that would be a very important milestone
  | in AI.
 
  | GaggiX wrote:
  | I think the next step is multimodality, GPT-4 can "see"
  | probably using a method similar to miniGPT-4, so the embeddings
  | are aligned using Q-former (or something similar), the next
  | step would be to actually predict image tokens using the LM
  | loss, this way it would be able to use the knowledge gained by
  | "seeing" on other tasks like: making actual good ASCII art,
  | making SVG that makes sense, and on a less superficial level
  | having a better world model.
 
  | [deleted]
 
  | KhoomeiK wrote:
  | Further improvements in efficiency need not come from
  | alternative architectures. They'll likely also come from novel
  | training objectives, optimizers, data augmentations, etc.
 
| gumballindie wrote:
| Bruv has to pay for the data he's been using or soon there won't
| be any to nick on. Groupies claiming their ai is "intelligent",
| and not just a data ingesting beast, will soon learn a heard
| lesson. Take your blogs offline, stop contributing content for
| free and stop pushing code or else chavs like this one will
| continue monetising your hard work. As did bezos and many others
| that now want you to be out of a job.
 
| calderknight wrote:
| I didn't think this article was very good. Sam Altman actually
| implied that GPT-5 will be developed when he spoke at MIT. And if
| Sam said that scaling is over (I doubt he said this but I could
| be wrong) the interesting part would be the reasoning he provided
| for this statement - no mention of that in the article.
 
| cleandreams wrote:
| Once you've trained on the internet and most published books (and
| more...) what else is there to do? You can't scale up massively
| anymore.
 
  | Animats wrote:
  | Right. They've already sucked in most of the good general
  | sources of information. Adding vast amounts of low-quality
  | content probably won't help much and might degrade the quality
  | of the trained model.
 
  | rvnx wrote:
  | Video content (I don't know why someone flagged Jason for
  | saying such, he is totally right)
 
    | bheadmaster wrote:
    | Looking at his post history, seems like he was shadowbanned.
 
  | kolinko wrote:
  | You can generate textual examples that teach logic, multi-
  | dimensional understanding and so on. Similar to the ones that
  | are in math books, but in a massive scale.
 
  | machdiamonds wrote:
  | Ilya Sutskever (OpenAI Chief Scientist): "Yeah, I would say the
  | data situation is still quite good. There's still lots to go" -
  | https://youtu.be/Yf1o0TQzry8?t=685
  | 
  | There was a rumor that they were going to use Whisper to
  | transcribe YouTube videos and use that for training. Since it's
  | multimodal, incorporating video frames alongside the
  | transcriptions could significantly enhance its performance.
 
    | it_citizen wrote:
    | I am curious how much video-to-text content represent
    | compared to pure text. I have no idea.
 
      | [deleted]
 
    | neel8986 wrote:
    | And why will google allow them to do that at scale?
 
      | throwaway5959 wrote:
      | Why would they ask Google for permission?
 
      | HDThoreaun wrote:
      | Can google stop them? It's trivial to download YouTube
      | videos
 
        | unionpivo wrote:
        | It's trivial to download some YouTube videos.
        | 
        | But I am quite sure that if you start doing it at scale,
        | google will notice.
        | 
        | You could be sneaky, but people in this business talk
        | (since they know another good paying job is just around
        | the corner) so It would likely come out.
 
  | mrtksn wrote:
  | You can transcribe all spoken words everywhere and keep the
  | model up to date? Keep indexing new data from chat messages,
  | news articles, new academic work etc.
  | 
  | The data is not finite.
 
    | spaceman_2020 wrote:
    | What about all the siloed content kept inside corporate
    | servers? You won't get normal GPT to train on it, of course,
    | but IBM could build a "IBM-bot" that has all the GPT-4
    | dataset + all of IBM's internal data.
    | 
    | That model might be very well tuned to solve IBM's internal
    | problems.
 
      | treis wrote:
      | I don't think you can just feed it data. You've got to
      | curate it, feed it to the LLM, and then manually
      | check/further train the output.
      | 
      | I also question that most companies have the volume and
      | quality of data worth training on. It's littered with
      | cancelled projects, old products, and otherwise obsolete
      | data. That's going to make your LLM hallucinate/give wrong
      | answers. Especially for regulated and otherwise legally
      | encumbered industries. Like can you deploy a chat bot
      | that's wrong 1% or 0.1% of the time?
 
        | spaceman_2020 wrote:
        | Well, IBM has 350k employees. If training a LLM on
        | curated data costs tens of millions of dollars but ends
        | up reducing headcount by 50k, it would be a massive win
        | for any CEO.
        | 
        | You have to understand that all the incentives are
        | perfectly aligned for corporations to put this to work,
        | even spending tens of millions in getting it right.
        | 
        | The first corporate CEO who announces that his company
        | used AI to reduce employee costs while _increasing_
        | profits is going to get such a fat bonus that everyone
        | will follow along.
 
      | Vrondi wrote:
      | Since Chat-GPT-4 is being integrated into the MS Office
      | suite, this is an "in" to corporate silos. The MS cloud
      | apps can see inside a great many of those silos.
 
  | [deleted]
 
  | nabnob wrote:
  | Real answer? Buy proprietary data from social media companies,
  | credit card companies, retail companies and train the model on
  | that data.
 
    | eukara wrote:
    | Can't wait for us to be able to query GPT for peoples credit
    | card info
 
  | m4jor wrote:
  | They didn't train it on the entire internet tho, only a small
  | amount (in comparison to entire internet). Still plenty they
  | could do.
 
  | sebzim4500 wrote:
  | I doubt they have trained on 0.1% of the tokens that are
  | 'easily' available (that is, available with licencing deals
  | that are affordable to OpenAI/MSFT).
  | 
  | They might have trained on a lot of the 'high quality' tokens,
  | however.
 
  | neel8986 wrote:
  | Youtube. This is where Google have huge advantage having
  | largest collection of user generated video
 
    | sebzim4500 wrote:
    | Yeah, but it's not like the videos are private. Surely Amazon
    | has the real advantage, given they have a ton of high quality
    | tokens in the form of their kindle library and can make it
    | difficult for OpenAI to read them all.
 
  | JasonZ2 wrote:
  | Video.
  | 
  | > YouTubers upload about 720,000 hours of fresh video content
  | per day. Over 500 hours of video were uploaded to YouTube per
  | minute in 2020, which equals 30,000 new video uploads per hour.
  | Between 2014 and 2020, the number of video hours uploaded grew
  | by about 40%.
 
    | sottol wrote:
    | But what are you mostly "teaching" the LLM then? Mundane
    | everyday stuff? I guess that would make them better at "being
    | average human" but is that what we want? It already seems
    | that prompting the LLM to be above-average ("pretend to be an
    | expert") improves performance.
 
      | dougmwne wrote:
      | This whole conversation about training set size is bizarre.
      | No one ever asks what's in the training set. Why would a
      | trillion tokens of mundane gossip improve a LLMs ability to
      | do anything valuable at all?
      | 
      | If a scrape of the general internet, scientific papers and
      | books isn't enough, a trillion trillion trillion text
      | messages to mom aren't going to change matters.
 
  | spaceman_2020 wrote:
  | If you were devious enough, you could be listening in on
  | billions of phone conversations and messages and adding that to
  | your data set.
  | 
  | This also makes me doubt that NSA hasn't already cracked this
  | problem. Or that China won't eventually beat current western
  | models since it will likely have way more data collected from
  | its citizenry.
 
    | PUSH_AX wrote:
    | I wonder what percentage of phone calls would add anything
    | meaningful to models, I imagine that the nature of most phone
    | calls are both highly personal and fairly boring.
 
      | midland_trucker wrote:
      | That's a fair point. Not at all like training on Wikipedia
      | in which nearly every sentence has novelty to it.
      | 
      | Then again it would give you data on every accent in the
      | country, so the holy grail for modelling human speech.
 
  | fpgaminer wrote:
  | > Once you've trained on the internet and most published books
  | (and more...) what else is there to do? You can't scale up
  | massively anymore.
  | 
  | Dataset size is not relevant to predicting the loss threshold
  | of LLMs. You can keep pushing loss down by using the same sized
  | dataset, but increasingly larger models.
  | 
  | Or augment the dataset using RLHF, which provides an "infinite"
  | dataset to train LLMs on. Limited by the capabilities of the
  | scoring model which, of course, you can scale the scoring model
  | infinitely so again the limit isn't dataset size but training
  | compute.
 
    | midland_trucker wrote:
    | > Dataset size is not relevant to predicting the loss
    | threshold of LLMs. You can keep pushing loss down by using
    | the same sized dataset, but increasingly larger models.
    | 
    | Deepmind and others would disagree with you! No-one really
    | knows in actual fact.
    | 
    | [1] https://www.deepmind.com/publications/an-empirical-
    | analysis-...
 
| throwaway22032 wrote:
| I don't understand why size is an issue in the way that is being
| claimed here.
| 
| Intelligence isn't like processor speed. If I have a model that
| has (excuse the attempt at a comparison) 200 IQ, why would it
| matter that it runs more slowly than a human?
| 
| I don't think that, for example, Feynman at half speed would have
| had substantially fewer insights.
 
  | yunwal wrote:
  | We're not going to get a 200 IQ model by simply scaling up the
  | current model, even with all the datacenters in the world
  | running 24/7
 
| narrator wrote:
| "Altman said there are also physical limits to how many data
| centers the company can build and how quickly it can build them."
| 
| Maybe the economics are starting to get bad? An H100 has 80GB of
| VRAM. The Highest end system I can find is 8xH100 so is a 640GB
| model is the biggest model you can run on a single system?
| Already GPT-4 is throttled and has a waiting list and they
| haven't even released the image processing or integrations to a
| wide audience.
 
| matchagaucho wrote:
| I wonder how much the scarcity and cost of Nvidia GPUs is driving
| this message?
| 
| Nvidia is in a perfect "Arms Dealer" situation right now.
| 
| Wouldn't be surprised to see the next exponential leap in AI
| models trained on in-house proprietary GPU hardware
| architectures.
 
  | TheDudeMan wrote:
  | Google has been using TPUs for years and continuously improving
  | the designs.
 
| screye wrote:
| small AI model != cheap AI model.
| 
| It costs the same to train as these giant models. You merely
| spend they money on training it for longer instead of larger.
 
| mupuff1234 wrote:
| Ok cool, so release the weights and your research.
 
| Bjorkbat wrote:
| Something kind of funny (but mostly annoying), about this
| announcement is the people arguing that OpenAI is, in fact,
| working on GPT-5 _in secret_.
| 
| To my knowledge, NFT/crypto hype never got so bad that conspiracy
| theories began to circulate (though I'm sure there were some if
| you looked hard enough).
| 
| Can't wait for an AIAnon community to emerge.
 
  | ryanwaggoner wrote:
  | Isn't it obvious? Q is definitely an LLM, trained on trillions
  | of words exfiltrated from our nation's secure systems. This
  | explains why it's always wrong in its predictions: it's
  | hallucinating!
 
| aaroninsf wrote:
| "...for the current cycle, in our specific public-facing market."
| 
| As most here well know "over" is one of those words like "never"
| which particularly in this space should pretty much always be
| understood as implicitly accompanied by a footnote backtracking
| to include near-term scope.
 
| iandanforth wrote:
| There's plenty of room for models to continue to grow once
| efficiency is improved. The basic premise of the Google ML
| pathways project is sound, you don't have to use all the model
| all the time. By moving to sparse activations or sparse
| architectures you can do a lot more with the same compute. The
| effective model size might be 10x or 100x GPT-4 (speculated at 1T
| params) but require comparable or less compute.
| 
| While not a perfect analogy it's useful to remember that the
| human brain has far more "parameters", requires several orders of
| magnitude less energy to train and run, is highly sparse, and
| does a decent job at thinking.
 
| seydor wrote:
| Now we need another letter
 
| enduser wrote:
| "When we set the upper limit of PC-DOS at 640K, we thought nobody
| would ever need that much memory."
| 
|  _Bill Gates_
 
  | bagels wrote:
  | Gates has refuted saying this. Are you implying by analogy that
  | Altman hasn't said/will disclaim saying that "the age of giant
  | AI models is almost over"?
 
| lossolo wrote:
| We arrived at the top of the tree in our journey to the moon.
 
  | daniel_reetz wrote:
  | "You can't get to the moon by climbing successively taller
  | trees"
 
  | og_kalu wrote:
  | No we haven't. the title is misleading. there's plenty of scale
  | room left. part of it might just not be economical (parameter
  | sie) but there's data. If you take this to mean, "we're at a
  | dead end" you'd be very wrong
 
| pixl97 wrote:
| The Age of Giants is over... The Age of Behemoths has begun!
| 
|  _but sir, that means the same thing_
| 
| Throw this heretic into the pit of terror.
 
  | hanselot wrote:
  | The pit of terror is full.
  | 
  | Fine, to the outhouse of madness then.
  | 
  | Before I get nuked from orbit for daring to entertain humor, if
  | someone is running ahead of me in a marathon, and running so
  | far ahead, yet still broadcasting things to the back for the
  | slow people (like myself), then eventually we catch up to them,
  | and they suddenly say, you know what guys, we should stop
  | running in this direction, there's nothing to see here right
  | before anyone else is able to verify the veracity of their
  | statement, perhaps it would still be in the public interest for
  | at least one person to verify what they are saying. Given how
  | skeptical the internet at large has been of Musk's acquisition
  | of a company, it's interesting that the skepticism is suddenly
  | put on hold when looking at this part of his work...
 
    | [deleted]
 
| zwieback wrote:
| The age of CEOs that recently got washed to the top saying
| dumbish things is just starting, though.
 
| xt00 wrote:
| Saying "hey don't go down the path we are on, where we are making
| money and considered the best in the world.. it's a dead end"
| rings pretty hollow.. like "don't take our lunch please?" Might
| be a similar statement it feels..
 
  | whywhywhywhy wrote:
  | Everyone hoping to compete with OpenAI should have an "Always
  | do the opposite of what Sam says" sign on the wall.
 
  | thewataccount wrote:
  | Nah - GPT-4 is crazy expensive, paying 20$/mo only get's you
  | 25messages/3hours and it's crazy slow. The api is rather
  | expensive too.
  | 
  | I'm pretty sure that GPT-4 is ~1T-2T parameters, and they're
  | struggling to run it(at reasonable performance and profit). So
  | far their strategy has been to 10x the parameter count every
  | GPT generation, and the problem is that there's diminishing
  | returns everytime they do that. AFAIK they've now resorted to
  | chunking GPT through the GPUs because of the 2 to 4 terabytes
  | of VRAM required (at 16bit).
  | 
  | So now they've reached the edge of what they can reasonably
  | run, and even if they do 10x it the expected gains are less. On
  | top of this, models like LLaMa have shown that it's possible to
  | cut the parameter count substantially and still get decent
  | results (albiet the opensource stuff still hasn't caught up).
  | 
  | On top of all of this, keep in mind that at 8bit resolution
  | 175B parameters (GBPT3.5) requires over 175GB of VRAM. This is
  | crazy expensive and would never fit on consumer devices. Even
  | if you use quantization and use 4bit, you still need over 80GB
  | of VRAM.
  | 
  | This definitely is not a "throw them off the trail" tactic - in
  | order for this to actually scale the way everyone envisions
  | both in performance and running on consumer devices - research
  | HAS to be on improving the parameter count. And again there's
  | lots of research showing its very possible to do.
  | 
  | tl;dr: smaller = cheaper+faster+more accessible+same
  | performance
 
    | haxton wrote:
    | I don't think this argument really holds up.
    | 
    | GPT3 on release was more expensive ($0.06/1000 tokens vs
    | $0.03 input and $0.06 output for GPT4).
    | 
    | Reasonable to assume that in 1-2 years it will also come down
    | in cost.
 
      | thewataccount wrote:
      | > Reasonable to assume that in 1-2 years it will also come
      | down in cost.
      | 
      | Definitely. I'm guessing they used something like
      | quantization to optimize the vram usage to 4bit. The thing
      | is that if you can't fit the weights in memory then you
      | have to chunk it and that's slow = more gpu time = more
      | cost. And even if you can fit it in GPU memory, less memory
      | = less gpus needed.
      | 
      | But we know you _can_ use less parameters, and that the
      | training data + RLHF makes a massive difference in quality.
      | And the model size linearly relates to the VRAM
      | requirements/cost.
      | 
      | So if you can get a 60B model to run at 175B's quality,
      | then you've almost 1/3rd your memory requirements, and can
      | now run (with 4bit quantization) on a single A100 80GB
      | which is 1/8th the previously known 8x A100's that GPT-3.5
      | ran on (and still half GPT-3.5+4bit).
      | 
      | Also while openai likely doesn't want this - we really want
      | these models to run on our devices, and LLaMa+finetuning
      | has shown promising improvements (not their just yet) at 7B
      | size which can run on consumer devices.
 
    | whywhywhywhy wrote:
    | It's never been in OpenAIs interest to make their model
    | affordable or fast, they're actually incentivized to do the
    | opposite as an excuse to keep the tech locked up.
    | 
    | This is why Dall-e 2 ran in a data centre and Stable
    | Diffusion runs on a gamer GPU
 
      | thewataccount wrote:
      | I think you're mixing the two. They do have an incentive to
      | make it affordable and fast because that increases the use
      | cases for it, and the faster it is the cheaper it is for
      | them, because the expense is compute time (half the time ~=
      | half the cost).
      | 
      | > This is why Dall-e 2 ran in a data centre and Stable
      | Diffusion runs on a gamer GPU
      | 
      | This is absolutely why they're keeping it locked up. By
      | simply not releasing the weights, you can't run Dalle2
      | locally, and yeah they don't want to do this because they
      | want you to be locked to their platform, not running it for
      | free locally.
 
    | ericmcer wrote:
    | Yeah I am noticing this as well. GPT enables you to do
    | difficult things really easily, but then it is so expensive
    | you would need to replace it with custom code for any long
    | term solution.
    | 
    | For example: you could use GPT to parse a resume file, pull
    | out work experience and return it as JSON. That would take
    | minutes to setup using the GPT API and it would take weeks to
    | build your own system, but GPT is so expensive that building
    | your own system is totally worth it.
    | 
    | Unless they can seriously reduce how expensive it is I don't
    | see it replacing many existing solutions. Using GPT to parse
    | text for a repetitive task is like using a backhoe to plant
    | flowers.
 
      | mejutoco wrote:
      | You could use those examples to finetune a model only for
      | resume-data extraction.
 
      | abraae wrote:
      | > For example: you could use GPT to parse a resume file,
      | pull out work experience and return it as JSON. That would
      | take minutes to setup using the GPT API and it would take
      | weeks to build your own system, but GPT is so expensive
      | that building your own system is totally worth it.
      | 
      | True, but an HR SaaS vendor could use that to put on a
      | compelling demo to a potential customer, stopping them from
      | going to a competitor or otherwise benefiting.
      | 
      | And anyway, without churning the numbers, for volumes of
      | say 1M resumes (at which point you've achieved a lot of
      | success) I can't quite believe it would be cheaper to build
      | something when there is such a powerful solution available.
      | Maybe once you are at 1G resumes... My bet is still no
      | though.
 
        | thewataccount wrote:
        | I work for a company with the web development team. We
        | have ~6 software developers.
        | 
        | I'd love to be able to just have people submit their
        | resume's and extract the data from there, but instead I'm
        | going to build a form and make applicants fill it out
        | because chatGPT is going to be at least $0.05USD
        | depending on the length of the resume.
        | 
        | I'd also love to have mini summeries of order returns
        | summerized in human form, but that also would cost
        | 0.05USD per form.
        | 
        | the tl;dr here is that there's a TON of usecases for a
        | LLM outside of your core product (we sell clothes) - but
        | we can't currently justify that cost. Compare that to the
        | rapidly improving self-hosted solutions which don't cost
        | 0.05USD for literally any query (and likely more for
        | anything useful).
 
        | sitkack wrote:
        | 5 cents. Per resume. $500 per 10k. 1-3 hours of a fully
        | loaded engineers salary per year. You are being
        | criminally cheap.
 
        | thewataccount wrote:
        | The problem is that it would take us the same amount of
        | time to just add a form with django. Plus you have to
        | handle failure cases, etc.
        | 
        | And yeah I agree this would be a great use-case, and
        | isn't that expensive.
        | 
        | I'd like to do this in lots of places, and the problem is
        | I have to convince my boss to pay for something that
        | otherwise would have been free.
        | 
        | The conversation would be "We have to add these fields to
        | our model, and we either tell django to add a form for
        | them, which will have 0 ongoing cost and no reliance on a
        | third party,
        | 
        | or we send the resume to openai, pay for them to process
        | it, make some mechanism to sanity check what GPT is
        | responding with, alert us if there's issues, and then put
        | it into that model, and pay 5 cents per resume."
        | 
        | > 1-3 hours of a fully loaded engineers salary per year.
        | 
        | That's assuming 0 time to implement, and because of our
        | framework it would take more hours to implement the
        | openai solution (that's also more like 12 hours where we
        | are).
        | 
        | > $500 per 10k.
        | 
        | I can't stress this enough - the alternative is 0$ per
        | 10k. My boss wants to know why we would pay any money for
        | a less reliable solution (GPT serialization is not nearly
        | as reliable as a standard django form).
        | 
        | I think within the next few years we'll be able to run
        | the model locally and throw dozens of tasks just like
        | this at the LLM, just not yet.
 
        | marketerinland wrote:
        | There are excellent commercial AI resume parsers already
        | - Affinda.com being one. Not expensive and takes minutes
        | to implement.
 
        | ericmcer wrote:
        | For a big company that is nothing but if you are
        | bootstrapping and trying to acquire customers with an MVP
        | racking up a $500 bill is frightening. What if you offer
        | a free trial and blow up and end up with 5k+ bill.
 
        | yunwal wrote:
        | Also you could likely use GPT3.5 for this and still get
        | near perfect results.
 
        | thewataccount wrote:
        | > near perfect results.
        | 
        | I have tried GPT3.5 and GPT4 for this type of task - the
        | "near perfect results" is really problematic because you
        | need to verify that it's likely correct, notify you if
        | there's issues, and even then you aren't 100% sure that
        | it selected the correct first/last name.
        | 
        | This is compared to a standard html form. Which is....
        | very reliable and (for us) automatically has error
        | handling built in, including alerts to us if there's a
        | 504.
 
  | Freire_Herval wrote:
  | [dead]
 
  | og_kalu wrote:
  | It's a pretty sus argument for sure when they're scared to
  | release even parameter size.
  | 
  | although the title is a bit misleading on what he was actually
  | saying. still, there's a lot left to go in terms of scale. Even
  | if it isn't parameter size(and there's still lots of room here
  | too, it just won't be economical), contrary to popular belief,
  | there's lots of data left to mine
 
| dpflan wrote:
| Hm, all right, I'm guessing that huge models as a business maybe
| are over until economics are figured out, but huge models as
| experts for knowledge distillation seems reasonable. And if you
| pay a super premium can you use huge model.
 
| [deleted]
 
  | Freire_Herval wrote:
  | [dead]
 
| bob1029 wrote:
| I strongly believe the next generation of models will be based
| upon spiking neural concepts wherein action potentials are
| lazily-evaluated throughout the network (i.e. event-driven).
| There are a few neuron models that can be modified (at some
| expense to fidelity) in order to tolerate arbitrary delays
| between simulation ticks. Using _actual_ latency between neurons
| as a means of encoding information seems absolutely essential if
| we are trying to emulate biology in any meaningful way.
| 
| Spiking networks also lend themselves nicely to some elegant
| learning rules, such as STDP. Being able to perform unsupervised
| learning at the grain of each action potential is really
| important in my mind. This gives you all kinds of ridiculous
| capabilities, most notably being the ability to train the model
| while it's live in production (learning & use are effectively the
| same thing).
| 
| These networks also provide a sort of deterministic, event-over-
| time tracing that is absent in the models we see today. In my
| prototypes, the action potentials are serialized through a ring
| buffer, and then logged off to a database in order to perfectly
| replay any given session. This information can be used to
| bootstrap the model (offline training) by "rewinding" things very
| precisely and otherwise branching time to your advantage.
| 
| The #1 reason I've been thinking about this path is that low-
| latency, serialized, real-time signal processing is somewhat
| antagonistic to GPU acceleration. I fear there is an appreciable
| % of AI research predicated on some notion that you need at least
| 1 beefy GPU to start doing your work. Looking at fintech, we are
| able to discover some very interesting pieces of technology which
| can service streams of events at unbelievable rates and scales -
| and they only depend on a handful of CPU cores in order to
| achieve this.
| 
| Right now, I think A Time Domain Is All You Need. I was inspired
| to go outside of the box by this paper:
| https://arxiv.org/abs/2304.06035. Part 11 got me thinking.
 
  | MagicMoonlight wrote:
  | I know what it looks like in my head but I can't quite figure
  | the algorithm out. The spiking is basically reinforcement
  | learning at the neuron level. Get it right and it's basically
  | all you need. You don't even need training data because it will
  | just automagically learn from the data it sees.
 
  | eternalban wrote:
  | I'm bullish on SNNs too. This Chinese research group is doing
  | something quite comprehensive with them:
  | 
  | https://news.ycombinator.com/item?id=35037605
 
| tfehring wrote:
| Related reading: https://dynomight.net/scaling/
| 
| In short it seems like virtually all of the improvement in future
| AI models will come from better algorithms, with bigger and
| better data a distant second, and more parameters a distant
| third.
| 
| Of course, this claim is itself internally inconsistent in that
| it assumes that new algorithms won't alter the returns to scale
| from more data or parameters. Maybe a more precise set of claims
| would be (1) we're relatively close to the fundamental limits of
| transformers, i.e., we won't see another GPT-2-to-GPT-4-level
| jump with current algorithms; (2) almost all of the incremental
| improvements to transformers will require bigger or better-
| quality data (but won't necessarily require more parameters); and
| (3) all of this is specific to current models and goes out the
| window as soon as a non-transformer-based generative model
| approaches GPT-4 performance using a similar or lesser amount of
| compute.
 
  | strangattractor wrote:
  | Good thing he got a bunch of companies to pony up the dough for
  | LLM before he announced they where already over.
 
    | tfehring wrote:
    | I don't think LLMs are over [0]. I think we're relatively
    | close to a local optimum in terms of what can be achieved
    | with current algorithms. But I think OpenAI is at least as
    | likely as any other player to create the next paradigm, and
    | that it's at least as likely as likely as any other player to
    | develop the leading models within the next paradigm
    | regardless of who actually publishes the research.
    | 
    | Separately, I think OpenAI's current investors have a >10%
    | chance to hit the 100x cap on their returns. Their current
    | models are already good enough to address lots of real-world
    | problems that people will pay money to solve. So far they've
    | been much more model-focused than product-focused, and by
    | turning that dial toward the product side (as they did with
    | ChatGPT) I think they could generate a lot of revenue
    | relatively quickly.
    | 
    | [0] Except maybe in the sense that future models will be
    | predominantly multimodal and therefore not strictly LLMs. I
    | don't think that's what you're suggesting though.
 
      | jacobr1 wrote:
      | It already is relatively trivial to fine-tune generative
      | models for various use cases. Which implies huge gains to
      | be had with targeted applications not just for niche
      | players but also OpenAI and others to either build that
      | fine-tuning into the base system, build ecosystems around
      | it, or just purpose build applications on top.
 
  | no_wizard wrote:
  | All the LC grinding may come in handy after all! /s
  | 
  | What algorithms specifically show the most results upon
  | improvement? Going into this I thought the jump of improvements
  | were really related more advanced automated tuning and result
  | correction, in which it could be done _at scale_ as it were
  | allowing a small team of data scientists to tweak the models
  | until desired results were being achieved.
  | 
  | Are you saying instead, that concrete predictive algorithms
  | need improvement or are we lumping the tuning into this?
 
    | junipertea wrote:
    | We need more data efficient neural network architectures.
    | Transformers work exceptionally well because they allow us to
    | just dump more data into it, but ultimately we want to learn
    | advanced behavior without having to feed it Shakespeare
 
      | uoaei wrote:
      | Inductive Bias Is All You Need
 
    | tfehring wrote:
    | I think it's unlikely that the first model to be widely
    | considered AGI will be a transformer. Recent improvements to
    | computational efficiency for attention mechanisms [0] seem to
    | improve results a lot, as does RLHF, but neither is a
    | paradigm shift like the introduction of transformers was.
    | That's not to downplay their significance - that class of
    | incremental improvements has driven a massive acceleration in
    | AI capabilities in the last year - but I don't think it's
    | ultimately how we'll get to AGI.
    | 
    | [0] https://hazyresearch.stanford.edu/blog/2023-03-27-long-
    | learn...
 
    | goldenManatee wrote:
    | bubble sort /s
 
    | uoaei wrote:
    | Traditional CS may have something to do with slightly
    | improving the performance by allowing more training for the
    | same compute, but it won't be an order of magnitude or more.
    | The improvements to be gained will be found more in
    | statistics than CS per se.
 
      | jacobr1 wrote:
      | I'm not sure. Methods like Chinchilla and Quantization have
      | been able to reduce compute by more than an order of
      | magnitude. There might very well be a few more levels of
      | optimizations within the same statistical paradigm.
 
  | brucethemoose2 wrote:
  | _Better_ data is still critical, even if bigger data isn 't.
  | The linked article emphasizes this.
 
    | tfehring wrote:
    | I'd bet on a 2030 model trained on the same dataset as GPT-4
    | over GPT-4 trained with perfect-quality data, hands down. If
    | data quality were that critical, practitioners could ignore
    | the Internet and just train on books and scientific papers
    | and only sacrifice <1 order of magnitude of data volume.
    | Granted, that's not a negligible amount of training data to
    | give up, but it places a relatively tight upper bound on the
    | potential gain from improving data quality.
 
    | NeuroCoder wrote:
    | So true. There are still plenty of areas where we lack
    | sufficient data to even approach applying this sort of model.
    | How are we going to make similar advances in something like
    | medical informatics where we not only have less data readily
    | available but its much more difficult to acquire more data
 
| winddude wrote:
| Also scaling doesn't address some of the challenges for ai that
| chatGPT doesn't meet, like:
| 
| - learning to learn, aka continual learning - internalised memory
| 
| bringing it closer to actual human capabilities.
 
| arenaninja wrote:
| An amusing thought I've had recently is whether LLMs are in the
| same league as the millions of monkeys at the keyboard,
| struggling to reproduce one of the complete works of William
| Shakespeare.
| 
| But I think not, since monkeys probably don't "improve"
| noticeably with time or input.
 
  | mhb wrote:
  | _But I think not, since monkeys probably don 't "improve"
  | noticeably with time or input._
  | 
  | Maybe once tons of bananas are introduced...
 
| mromanuk wrote:
| Sorry, but this sounds a lot like 640KB is all the memory you
| will ever need. What about "Socratic model" for video? There
| should me many applications that would benefit from a bigger
| model
 
| joebiden2 wrote:
| We will need a combination of technologies we have in order to
| really achieve emergent intelligence.
| 
| Humans are comprised of various "subnets" modelling aspects
| which, in unison, produce self-conciousness and real
| intelligence. What is missing in the current line of approaches
| is that we only rely on auto-alignment of subnetworks by machine
| learning, which scales only up to a point.
| 
| If we would produce a model which has
| 
| * something akin a LLM as we know it today, which is able to
| 
| * store or fetch facts to a short- ("context") or longterm
| ("memory") storage
| 
| * if not in the current "context", query the longterm context
| ("memory") by keywords for associations, which are one-by-one
| inserted into the current "context"
| 
| * repeat as required until fulfilling some self-defined condition
| ("thinking")
| 
| To me, this is mostly mechanical plumbing work and lots of money.
| 
| Also, if we get rid of the "word-boundedness" of LLMs - which we
| already did to some degree, as shown by the multi-language
| capabilities - LLMs would be free to roam in the domain of
| thoughts /s :)
| 
| This approach could be further improved by meta-LLMs governing
| the longterm memory access, providing an "intuition" which
| longterm memory suits the provided context best. Apply recursion
| as needed to improve results (paying by exponential training
| time, but this meta-NN will quite probably be independent of
| actual training, as real life / brain organization shows).
 
  | babyshake wrote:
  | The other elements that may be required could be some version
  | of the continuous sensory input that to us creates the
  | sensation of "living" and, this one is a bit more
  | philosophical, the sensation of suffering and a baseline
  | establishment that the goal of the entity is to take actions
  | that help it avoid suffering.
 
    | joebiden2 wrote:
    | I think an AI may have extra qualities by feeling suffering
    | etc., but I don't think these extra qualities are rationally
    | beneficial.
 
| thunderbird120 wrote:
| >"the company's CEO, Sam Altman, says further progress will not
| come from making models bigger. "I think we're at the end of the
| era where it's going to be these, like, giant, giant models," he
| told an audience at an event held at MIT late last week. "We'll
| make them better in other ways."
| 
| So to reiterate, he is not saying that the age of giant AI models
| is over. Current top-of-the-line AI models are giant and likely
| will continue to be. However, there's not point in training
| models you can't actually run economically. Inference costs need
| to stay grounded which means practical model sizes have a limit.
| More effort is going to go into making models efficient to run
| even if it comes at the expense of making them less efficient to
| train.
 
  | ldehaan wrote:
  | I've been training large 65b models on "rent for N hours"
  | systems for less than 1k per customized model. Then fine tuning
  | those to be whatever I want for even cheaper.
  | 
  | 2 months since gpt 4.
  | 
  | This ride has only just started, fasten your whatevers.
 
    | Voloskaya wrote:
    | Finetuning cost are nowhere near representative of the cost
    | to pre-train those models.
    | 
    | Trying to replicate the quality of GPT-3 from scratch, using
    | all the tricks and training optimizations in the books that
    | are available now but weren't used during GPT-3 actual
    | training, will still cost you north of $500K, and that's
    | being extremly optimistic.
    | 
    | GPT-4 level model would be at least 10x this using the same
    | optimism (meaning you are managing to train it for much
    | cheaper than OpenAI). And That's just pure hardware cost, the
    | team you need to actually makes this happen is going to be
    | very expensive as well.
    | 
    | edit: To quantify how "extremely optimistic" that is, the
    | very model you are finetuning, which I assume is Llama 65B,
    | would cost around ~$18M to train on google cloud assuming you
    | get a 50% discount on their listed GPU prices (2048 A100 GPUs
    | for 5 months). And that's not even GPT-4 level.
 
      | bagels wrote:
      | $5M to train GPT-4 is the best investment I've ever seen.
      | I've seen startups waste more money for tremendously
      | smaller impact.
 
        | Voloskaya wrote:
        | As I stated in my comment, $5M is assuming you can do a
        | much much better job than OpenAI at optimizing your
        | training, only need to make a single training run, your
        | employees salaries are $0, and you get a clean dataset
        | for essentially free.
        | 
        | Real cost is 10-20x that.
        | 
        | That's still a good investment though. But the issue is
        | you could very well sink $50M into this endeavour and end
        | up with a model that actually is not really good and gets
        | rendered useless by an open-source model that gets
        | released 1 month later.
        | 
        | OpenAI truly has unique expertise in this field that is
        | very, very hard to replicate.
 
        | moffkalast wrote:
        | > and end up with a model that actually is not really
        | good and gets rendered useless
        | 
        |  _ahem_ Bard _ahem_
 
  | hcks wrote:
  | Yes, but it also tells us that if Altman is honest here, then
  | he doesn't believe GPT-like models can scale to near level
  | human performances (because even if the cost of compute was 10x
  | or even 100x it would still be economically sound).
 
    | [deleted]
 
    | og_kalu wrote:
    | No it doesn't.
    | 
    | For one thing they're already at human performance.
    | 
    | For another, i don't think you realize how expensive
    | inference can get. Microsoft with no scant amount of
    | available compute is struggling to run gpt-4 such that
    | they're rationing it between subsidiaries while they try to
    | jack up compute.
    | 
    | So saying, it would be economically sound if it cost x10 or
    | x100 what it costs now is a joke.
 
      | quonn wrote:
      | How are they at human performance? Almost everything GPT
      | has read on the internet didn't even exist 200 years ago
      | and was invented by humans. Heck, even most of the
      | programming it does wasn't there 20 years ago.
      | 
      | Not every programmer starting from scratch would be
      | brilliant, but many were self taught with very limited
      | resources in the 80s form example and discovered new things
      | from there.
      | 
      | GPT cannot do this and is very far from being able to.
 
        | og_kalu wrote:
        | >How are they at human performance?
        | 
        | Because it performs at least average human level (mostly
        | well above average) on basically every task it's given.
        | 
        | "Invest something new" is a nonsensical benchmark for
        | human level intelligence. The vast majority of people
        | have never and will never invent anything new.
        | 
        | If your general intelligence test can't be passed by a
        | good chunk of humanity then it's not a general
        | intelligence test unless you want to say most people
        | aren't generally intelligent.
 
        | quonn wrote:
        | Yeah these intelligence tests are not very good.
        | 
        | I would argue some programmers do in fact invent
        | something new. Not all of them, but some. Perhaps 10%.
        | 
        | Second the point is not whether everyone is by profession
        | an inventor but whether most people can be inventors. And
        | to a degree they can be. I think you underestimate that
        | by a large margin.
        | 
        | You can lock people in a room and give them a problem to
        | solve and they will invent a lot if they have the time to
        | do it. GPT will invent nothing right now. It's not there
        | yet.
 
        | og_kalu wrote:
        | >Yeah these intelligence tests are not very good.
        | 
        | Lol Okay
        | 
        | >And to a degree they can be. I think you underestimate
        | that by a large margin.
        | 
        | Do i? Because i'm not the one making unverifiable claims
        | here.
        | 
        | >You can lock people in a room and give them a problem to
        | solve and they will invent a lot if they have the time to
        | do it.
        | 
        | If you say so
 
      | smeagull wrote:
      | This tells me you haven't really stress tested the model.
      | GPT is currently at the stage of "person who is at the
      | meeting, but not really paying attention so you have to
      | call them out". Once GPT is pushed, it scrambles and falls
      | over for most applications. The failure modes range from
      | contradicting itself, making up things for applications
      | that shouldn't allow it, to ignoring prompts, to simply
      | being unable to perform tasks at all.
 
        | dragonwriter wrote:
        | Are we talking about bare GPT through the UI, or GPT with
        | a framework giving it access to external systems and the
        | ability to store and retrieve data?
        | 
        | Because, yeah, "brain in a jar" GPT isn't enough for most
        | tasks beyond parlor-trick chat, but being used as a brain
        | in a jar isn't the point.
 
        | moffkalast wrote:
        | Still waiting to see those plugins rolled out and actual
        | vector DB integration with GPT 4, then we'll see what it
        | can really do. Seems like the more context you give it
        | the better it does, but the current UI really makes it
        | hard to provide that.
        | 
        | Plus the recursive self prompting to improve accuracy.
 
  | mullingitover wrote:
  | Quality over quantity. Just building a model with a gazillion
  | parameters isn't indicative of quality, you could easily have
  | garbage parameters with tons of overfitting. It's like
  | megapixel counts in cameras: you might have 2000 gigapixels in
  | your sensor, but that doesn't mean you're going to get great
  | photos out of it if there are other shortcomings in the system.
 
    | sanxiyn wrote:
    | What overfitting? If anything, LLMs suffer from underfitting,
    | not overfitting. Normally, overfitting is characterized by
    | increasing validation loss while training loss is decreasing,
    | and solved by early stopping (stopping before that happens).
    | Effectively, all LLMs are stopped early, so they don't suffer
    | from overfitting at all.
 
  | spaceman_2020 wrote:
  | Is cost really that much of a burden?
  | 
  | Intelligence is the single most expensive resource on the
  | planet. Hundreds of individuals have to be born, nurtured, and
  | educated before you might get an exceptional 135+ IQ
  | individual. Every intelligent person is produced at a great
  | societal cost.
  | 
  | If you can reduce the cost of replicating a 135 IQ, or heck,
  | even a 115 IQ person to a few thousand dollars, you're beating
  | biology by a massive margin.
 
    | oezi wrote:
    | Since IQ is just a normal distribution on a population it is
    | a bit misleading to talk about it like that.
    | 
    | Even if we don't expend any cost on education the number of
    | people with IQ 135 stays the same.
 
    | yunwal wrote:
    | But we're still nowhere near that, or even near surpassing
    | the skill of an average person at a moderately complex
    | information task, and GPT-4 supposedly took hundreds of
    | millions to train. It also costs a decent amount more to run
    | inference on it vs. 3.5. It probably makes sense to prove the
    | concept that generative AI can be used for lots of real work
    | before scaling that up by another order of magnitude for
    | potentially marginal improvements.
    | 
    | Also, just in terms of where to put your effort, if you think
    | another direction (for example, fine-tuning the model to use
    | digital tools, or researching how to predict confidence
    | intervals) is going to have a better chance of success, why
    | focus on scaling more?
 
      | spaceman_2020 wrote:
      | There are a _lot_ of employees at large tech consultancies
      | that don 't really do anything that can't be automated away
      | by even current models.
      | 
      | Sprinkle in some more specific training and I can totally
      | see entire divisions at IBM and Accenture and TCS being
      | made redundant.
      | 
      | The incentive structures are perversely aligned for this
      | future - the CEO who manages to reduce headcount while
      | increasing revenue is going to be very handsomely rewarded
      | by Wall Street.
 
        | skyechurch wrote:
        | Wall Street would be strongly incentivised to install an
        | AI CEO.
 
    | dauertewigkeit wrote:
    | Are intelligent people that valuable? There's lots of them at
    | every university working for peanuts. They don't seem to be
    | that valued by society, honestly.
 
      | taylorius wrote:
      | IQ isn't all that. Mine is 140+ and I'm just a somewhat
      | well paid software engineer. It's TOO abstract a metric in
      | my view - for sure it doesn't always translate into real
      | world success.
 
        | roflyear wrote:
        | Right were very much in the same boat. I'm good at
        | pattern recognition I guess. I learn things quickly. What
        | else? I don't have magic powers really. I still get
        | headaches and eat junk food.
 
      | spaceman_2020 wrote:
      | If you ask any Fortune 500 CEO if he could magically take
      | all the 135 IQ artists and academics and vagabonds, erase
      | all their past traumas, put them through business or tech
      | school, and put them to work in their company, they would
      | all say 100% yes.
      | 
      | An equivalent AI won't have any agency and will be happy
      | doing the boring work other 135 IQ humans won't.
 
    | roflyear wrote:
    | My IQ is 140 and I'm far from exceptional.
 
    | jutrewag wrote:
    | 115 IQ isn't all that high- that's basically every Indian
    | American or a healthy percentage of the Chinese population.
    | 
    | Edit: I don't understand the downvotes. I don't mean this in
    | any disparaging way, just that an AGI is probably going to be
    | a lot higher than that.
 
      | spaceman_2020 wrote:
      | 115 IQ is perfectly fine for the majority of human
      | endeavors.
 
    | asdfman123 wrote:
    | The reason we put everyone through school is we believe that
    | it's in society's best interest to educate everyone to the
    | peak of their abilities. It's good for many different
    | reasons.
    | 
    | It would be much easier to identify gifted kids and only
    | educate them, but I happen to agree that universal education
    | is better.
 
      | gowld wrote:
      | It would be much easier to identify gifted kids and only
      | educate them
      | 
      | Is it so easy?
 
| LesZedCB wrote:
| the way i see it, the expensive part should be to train the
| models via simulated architectures in GPUs or TPUs or whatever.
| 
| but once they are trained, is there a way to encode the base
| models into hardware where inference costs are basically
| negligible? hopefully somebody is seeing if this is possible,
| using structurally encoded hardware to make inference costs
| basically nil/constant.
 
| [deleted]
 
| antibasilisk wrote:
| it's over, billions of parameters must be released
 
| rhelz wrote:
| All warfare is based on deception -- Sun Zu
 
| donpark wrote:
| I think Sam is referring to transition from "Deep" to "Long"
| learning [1]. What new emergent properties, if any, will 1
| billion tokens unlock?
| 
| [1] https://hazyresearch.stanford.edu/blog/2023-03-27-long-
| learn...
 
| [deleted]
 
| carlsborg wrote:
| The 2017 Transformers paper has ~71,000 papers citing it. The
| sheer magnitude of human mental effort globally that is chasing
| the forefront of machine learning is unprecedented and amazing.
 
___________________________________________________________________
(page generated 2023-04-17 23:00 UTC)