[HN Gopher] Could you train a ChatGPT-beating model for $85k and...
___________________________________________________________________
 
Could you train a ChatGPT-beating model for $85k and run it in a
browser?
 
Author : sirteno
Score  : 297 points
Date   : 2023-03-31 18:21 UTC (4 hours ago)
 
web link (simonwillison.net)
w3m dump (simonwillison.net)
 
| nwoli wrote:
| What we need is a RETRO style model where basically after the
| input you go through a small net that just fetches a desired set
| of weights from a server (serving data without compute is dirt
| cheap) and is then executed locally. We'll get there eventually
 
  | tinco wrote:
  | Can anyone explain or link some resource on why these big GPT
  | models all don't incorporate any RETRO style? I'm only very
  | superficially following ML developments and I was so hyped by
  | RETRO and then none of the modern world changing models apply
  | it.
 
    | nwoli wrote:
    | Openai might very well be using that internally who knows how
    | they implement things. Also emad retweeted a RETRO related
    | thing a bit back so they might very well be using that for
    | their awaited LM, here's hoping
 
| ushakov wrote:
| Now imagine loading 3.9 GB each time you want to interact with a
| webpage
 
  | KMnO4 wrote:
  | Yeah, I've used Jira.
 
    | neilellis wrote:
    | :-)
 
  | sroussey wrote:
  | 10yrs from now models will be in the OS. Maybe even in silicon.
  | No downloads required.
 
    | swader999 wrote:
    | The OS will be in the cloud interfacing into our brain by
    | then. I don't want this btw.
 
    | pessimizer wrote:
    | Not in mine. I don't even want redhat's bullshit in there.
    | I'm not installing some black box into my OS that was
    | programmed with _motives_ that can 't be extracted from the
    | model at rest.
 
      | sroussey wrote:
      | iOS already has this to a degree, for a couple of years.
 
| brrrrrm wrote:
| The WebGPU demo mentioned in this post is insane. Blows any WASM
| approach out of the water. Unfortunately that performance is not
| supported anywhere but chrome canary (behind a flag)
 
  | raphlinus wrote:
  | This will be changing soon. I believe Chrome M113 is scheduled
  | to ship to stable on May 2, and will support WebGPU 1.0. I
  | agree it's a game-changing technology.
 
| ChumpGPT wrote:
| [dead]
 
| agnokapathetic wrote:
| > My friends at Replicate told me that a simple rule of thumb for
| A100 cloud costs is $1/hour.
| 
| AWS charges $32/hr for an 8xA100s (p4d.24xlarge) which comes out
| to $4/hour/gpu. Yes you can get lower pricing with a 3 year
| reservation but thats not what this question is asking.
| 
| You also need 256 nodes to be colocated on the same fabric --
| which AWS will do for you but only if you reserve for years.
 
  | pavelstoev wrote:
  | model-depending, you can train on lesser (cheaper) GPUs but
  | system-level optimizations are needed. Which is what we provide
  | at centml.ai
 
  | sebzim4500 wrote:
  | Maybe they are using spot instances? $1/hr is about right for
  | those.
 
  | thewataccount wrote:
  | AWS certainly isn't the cheapest for this, did they mention
  | using AWS? Lamdba Labs is 12$/hr for 8xA100's, and there's
  | others relatively close to this price on demand, I assume you
  | can get a better deal if you contact them for a large project.
  | 
  | Replicate themselves rent out GPU time so I assume they would
  | definitely know as that's almost certainly the core of their
  | business.
 
  | IanCal wrote:
  | Lambda labs charges about 11-12/hr for 8xA100.
 
    | robmsmt wrote:
    | and is completely at capacity
 
      | IanCal wrote:
      | But reflects an upper bound at the cost of running a100s.
 
  | celestialcheese wrote:
  | lambdalabs will let you do on-demand 8xa100 @ 80GB VRAM/GPU for
  | $12/hr, or reserved @ $10.86/hr
  | 
  | 8xA100 @ 40gb for $8/hr
  | 
  | Replicate friend isn't far off.
 
| pavelstoev wrote:
| Training a ChatGPT-beating model for much less than $85,000is
| entirely feasible. At CentML, we're actively working on model
| training and inference optimization without affecting accuracy,
| which can help reduce costs and make such ambitious projects
| realistic. By maximizing (>90%) GPU and platform hardware
| utilization, we aim to bring down the expenses associated with
| large-scale models, making them more accessible for various
| applications. Additionally, our solutions also have a positive
| environmental impact, addressing the excess CO2 concerns. If
| you're interested in learning more about how we are doing it,
| please reach out via our website: https://centml.ai
 
| astlouis44 wrote:
| WebGPU is going to be a major component in this. Modern GPU's
| prevalent in mobile devices, desktops and laptops, is more than
| enough to do all of this client side.
 
| nope96 wrote:
| I remember watching one of the final episodes of Connections 3:
| With James Burke, and he casually said we'd have personal
| assistants that we could talk to (in our PDAs). That was 1997 and
| I knew enough about computers to think he was being overly
| optimistic about the speed of progress. Not in our lifetimes.
| Guess I was wrong!
 
| TMWNN wrote:
| Hey, that means it can be turned into an Electron app!
 
| breck wrote:
| Just want to say SimonW has become one of my favorite writers
| covering the AI revolution. Always fun thought experiments with
| linked code and very constructive for people thinking about how
| to make this stuff more accessible to the masses.
 
| fswd wrote:
| There is somebody finetunin 160m rwkv4 on alpaca on the rwkv
| discord, I am out of the office and can't link but the person
| posted in prompt showcase channel
 
  | buzzier wrote:
  | RWKV-v4 Web Demo (169m/430m params)
  | https://josephrocca.github.io/rwkv-v4-web/demo/
 
| skybrian wrote:
| I wonder why anyone would want to run it in a browser, other than
| to show it could be done? It's not like the extra latency would
| matter, since these things are slow.
| 
| Running it on a server you control makes more sense. You can pick
| appropriate hardware for running the AI. Then access it from any
| browser you like, including from your phone, and switch devices
| whenever you like. It won't use up all the CPU/GPU on a portable
| device and run down your battery.
| 
| If you want to run the server at home, maybe use something like
| Tailscale?
 
  | simonw wrote:
  | The browser thing is definitely more for show than anything
  | else - I used it to help demonstrate quite how surprisingly
  | lightweight these models can be.
 
| GartzenDeHaes wrote:
| It's interesting to me that LLaMA-nB's still produce reasonable
| results after 4-bit quantization of the 32-bit weights. Does this
| indicate some possibility of reducing the compute required for
| training?
 
| lxe wrote:
| Keep in mind that image transformer models like stable diffusion
| are generally smaller than language models, so they are easier to
| fit in wasm space.
| 
| Also. you can finetune llama-7b on a 3090 for about $3 using
| LoRA.
 
  | bitL wrote:
  | Only for images. People want to generate videos next and those
  | models will be likely GPT-sized.
 
    | Metus wrote:
    | There is a video model making the rounds on
    | /r/stablediffusion and it is just a tiny bit larger than
    | Stable Diffusion.
 
      | isoprophlex wrote:
      | You're not kidding! it's far from perfect, but pretty funny
      | still...
      | 
      | https://www.reddit.com/r/StableDiffusion/comments/126xsxu/n
      | i...
      | 
      | Too bad SD learned the Shutterstock watermark so well, lol
 
      | bitL wrote:
      | It's cool though not very stable in details over temporal
      | axis.
 
  | danielbln wrote:
  | Generative image models don't use transformers, they're
  | diffusion models. LLMs are transformers.
 
    | lxe wrote:
    | Ah yes that's right. Well they technically do use a visual
    | transformer for CLIP text encoder as I understand.
 
| jedberg wrote:
| With the explosion of LLMs and people figuring out ways to
| train/use them relatively cheaply, unique data sets will become
| that much more valuable, and will be the key differentiator
| between LLMs.
| 
| Interestingly, it seems like companies that run chat programs
| where they can read the chats are best suited to building "human
| conversation" LLMs, but someone who manages large text datasets
| for others are in the perfect place to "win" the LLM battle.
 
| captaincrowbar wrote:
| The big problem with AI R&D is that nobody can keep up with the
| big bux companies. It makes this kind of project a bit pointless.
| Even if you can run a GPT3-equivalent on a web browser, how many
| people are going to bother (except as a stunt) when GPT4 is
| available?
 
  | simonw wrote:
  | An increasingly common complaint I'm hearing about GPT3/4/etc
  | is people who don't want to pass any of their private data to
  | another company.
  | 
  | Running models locally is by far the most promising solution
  | for that concern.
 
  | adeon wrote:
  | The ones that can't use the GPT4 for whatever reason. Maybe you
  | are a company and you don't want to send OpenAI your prompts.
  | Or a person who has very private prompts and feel sketchy about
  | sending them over.
  | 
  | Or maybe you are an individual who has a use case that's too
  | edgy for OpenAI or a silicon valley corporate image. When
  | Replika shut down people trying to have virtual
  | boyfriend/girlfriends on their platform, their reddit filled up
  | with people who mourned like they just lost a partner.
  | 
  | I think it's important that alternative non-big bux company
  | options exist, even if most people don't want to or need to use
  | them.
 
    | moffkalast wrote:
    | Or maybe you're in Italy and OpenAI had just been banned from
    | the country for not adhering to GDPR. I suspect the rest of
    | the EU may follow soon.
 
    | psychphysic wrote:
    | Those are seriously niche use cases. They exist but can they
    | fund gpt5 level development?
 
      | r00fus wrote:
      | Most corporations/governments would prefer to keep their AI
      | conversations private. Definitely mainstream desire, not
      | niche.
 
        | psychphysic wrote:
        | Who does your government and corporate email? In the UK
        | it's all either Gmail (for government) and Outlook (NHS).
        | For compliance reasons they simply want data center
        | certification and location restrictions.
        | 
        | If you think a small corp is going to get a big gov
        | contract outside of a nepo-state you're in for a shock.
 
      | adeon wrote:
      | Given the Replika debacle, I personally suspect the AI
      | partner use case is not really very niche. Just few people
      | openly want to talk about wanting it because having an
      | emotional AI partner is seen as creepy.
      | 
      | And companies would not want to do that. Imagine you make
      | partner AI that goes unhinged like Bing did and tells you
      | to kill yourself or something similar. I can't imagine
      | companies would want that kind of risk.
 
        | [deleted]
 
        | psychphysic wrote:
        | If you AI partner data can't be stored in an Azure or
        | similar data centre you are a serious small niche person!
        | 
        | Even Jennifer Lawrence stored her nudes on iCloud.
 
| make3 wrote:
| Alpaca uses knowledge distillation (it's trained on outputs from
| OpenAI models). It's something to keep in mind. You're teaching
| your model to copy an other model's outputs.
 
  | thewataccount wrote:
  | > You're teaching your model to copy an other model's outputs.
  | 
  | Which itself was trained on human outputs to do the same thing.
  | 
  | Very soon it will be full Ouroboros as humans use the model's
  | output to finetune themselves.
 
  | visarga wrote:
  | > You're teaching your model to copy an other model's outputs.
  | 
  | That's a time honoured tradition in ML, invented by the father
  | of the field himself, Geoffrey Hinton, in 2015.
  | 
  | > Distilling the Knowledge in a Neural Network
  | 
  | https://arxiv.org/abs/1503.02531
 
| thih9 wrote:
| > as opposed to OpenAI's continuing practice of not revealing the
| sources of their training data.
| 
| Looks like that choice makes it more difficult to adopt, trust,
| or collaborate on the new tech.
| 
| What are the benefits? Is there more to that than competitive
| advantage? If not, ClosedAI sounds more accurate.
 
| holloworld wrote:
| [dead]
 
| whalesalad wrote:
| Are there any training/ownership models like Folding@Home? People
| could donate idle GPU resources in exchange for access to the
| data, and perhaps ownership. Then instead of someone needing to
| pony up $85k to train a model, a thousand people can train a
| fraction of the model on their consumer GPU and pool the results,
| reap the collective rewards.
 
  | dekhn wrote:
  | A few people have built frameworks to do this.
  | 
  | There is still a very large open problem in how to federate
  | large numbers of loosely coupled computers to speed up training
  | "interesting" models. I've worked in both domains (protein
  | folding via Folding@Home/protein folding using supercomputers,
  | and ML training on single nodes/ML training on supercomputers)
  | and at least so far, ML hasn't really been a good match for
  | embarrassingly parallel compute. Even in protein folding,
  | folding@home has a number of limitations that are much better
  | addressed on supercomputers (for example: if your problem
  | requires making extremely long individual simulations of large
  | proteins).
  | 
  | All that could change, but I think for the time being,
  | interesting/big models need to be trained on tightly coupled
  | GPUs.
 
    | whalesalad wrote:
    | Probably going to mirror the transition from single-threaded
    | to multi-threaded compute. Took a while until application
    | architectures took hold of the populous to utilize multi-
    | core.
 
      | PaulDavisThe1st wrote:
      | Probably not. Multicore has been a thing for 30 years (We
      | had a 32 core Sequent Systems and a 64 core KSR-1 at UW
      | CS&E in the early 1990s). Everything about these models has
      | been developed in a multicore computing context, and thus
      | far, it still isn't massively-parallel-distributable. An
      | algorithm can be massively parallel without being sensibly
      | distributable. Change the latency between compute nodes is
      | not always a neutral or even just linear decrease in
      | performance.
 
    | itissid wrote:
    | And you can rule out most of the monte carlo stuff too. Which
    | rules out parallelization modern statistical frameworks like
    | STAN used for explainable models; things like Finance
    | modeling of risk which is a sampling of posteriors using MCMC
    | also can't be parallelized.
 
      | MontyCarloHall wrote:
      | Assuming the chains can reach an equilibrium point (i.e.
      | burn in) quickly, M samples from an MCMC can be
      | parallelized by running N chains in parallel each for M/N
      | iterations. You still end up with M total samples from your
      | target distribution.
      | 
      | You're only out of luck if each iteration is too compute
      | intense to fit on one worker node, even if each iteration
      | might be embarrassingly parallelizable, since the overhead
      | of having to aggregate computations across workers at every
      | iteration would be too high.
 
  | neoromantique wrote:
  | How long until somebody creates a crypto project on that?
 
    | buildbuildbuild wrote:
    | Bittensor is one, not an endorsement. chat.bittensor.com
 
  | ellisv wrote:
  | That'd be cool but I don't think most idle consumer GPUs
  | (6-8GB) would have large enough memory for a single iteration
  | (batch size 1) of modern LLMs.
  | 
  | But I'd love to see more federated/distributed learning
  | platforms.
 
    | whalesalad wrote:
    | Is it possible to break the model apart? Or does the entire
    | thing need to be architected from the get-go such that an
    | individual GPU can own a portion end to end?
 
    | mirekrusin wrote:
    | 6GB can store 3 billion parameters, gpt3.5 has 175 billion
    | parameters.
 
  | mirekrusin wrote:
  | Unfortunately training is not emberassingly parallelisable [0]
  | problem. It would require new architecture. Current models
  | diverge too fast. By the time you'd download and/or calculate
  | your contribution the model would descend somewhere else and
  | your delta would not be applicable - based off wrong initial
  | state.
  | 
  | It would be great if merge-ability would exist. It would also
  | likely apply to efficient/optimal shrinking for models.
  | 
  | Maybe you could dispatch tasks to train on many variations of
  | similar tasks and take average of results? It could probably
  | help in some way, but you'd still have large serialized
  | pipeline to munch through and you'd likely require some serious
  | hardware ie. dual gtx 4090 on client side.
  | 
  | [0] https://en.wikipedia.org/wiki/Embarrassingly_parallel
 
    | amitport wrote:
    | hmmm... seems like you're reinventing distributed learning.
    | 
    | merge-ability does exist and you can average the results.
 
      | mirekrusin wrote:
      | You can if you have same base weights.
      | 
      | If you have similar variants of the same task you can
      | accelerate it more where the diff is.
      | 
      | You can't average on past results computed from historic
      | base weights - it's linear process.
      | 
      | If you could do that, you'd just map training examples to
      | diffs and merge them all.
      | 
      | Or take two distinct models and merge them to have model
      | that is roughly sum of them. You can't do it, it's not
      | linear process.
 
  | _trampeltier wrote:
  | Start a Boinc project.
  | 
  | https://boinc.berkeley.edu/projects.php
 
  | spyder wrote:
  | Learning@Home using Decentralized Mixture-of-Expert models:
  | 
  | https://learning-at-home.github.io/
  | 
  | https://training-transformers-together.github.io/
  | 
  | https://arxiv.org/abs/2002.04013
 
  | ftxbro wrote:
  | Yes there is petals/bloom https://github.com/bigscience-
  | workshop/petals but it's not so great. Maybe it will improve or
  | a better one will come.
 
    | whalesalad wrote:
    | Really interesting live monitor of the network:
    | http://health.petals.ml
 
    | polishdude20 wrote:
    | I wonder how they handle illegal content. Like, if you're
    | running training data on your computer, what's to stop
    | someone else's data that is illegal, from being uploaded to
    | your computer as part of training?
 
    | riedel wrote:
    | I read that it is only scoring the model collaboratively but
    | it allows some fine-tuning I guess.
    | 
    | Getting the actual gradient descent to parallelize is more
    | difficult because one needs to average the gradient when
    | using data/batch parallelism. It becomes more a network speed
    | than GPU speed problem. Or are LLMs somehow different?
 
| ultrablack wrote:
| If you could, you should have done it 6 months ago.
 
  | munk-a wrote:
  | I mean - is there a developer alive that'd be unable to write
  | the nascent version of Twitter? I think that Twitter as a
  | business exists entirely because of the concept - the code to
  | cover the core functionality is absolutely trivial to
  | replicate.
  | 
  | I don't think this is a very helpful statement because actually
  | finding the idea on what to build is the hard part - or even
  | just believing it's possible. The company I work at has been
  | using NLP for years now and we have a model that's great at
  | what we do... but if you asked if we could develop that into a
  | chatbot as functional as chatgpt two years ago you'd probably
  | be met with some pretty heavy skepticism.
  | 
  | Cloning something that has been proven possible is always
  | easier than taking the risk building the first version with no
  | real grasp of feasibility.
 
| v4dok wrote:
| Can someone at the EU, the only player in this thing with no
| strategy yet just pool together enough resources so the open-
| source people can train models. We don't ask much, just give
| compute power
 
  | 0xfaded wrote:
  | No, that could risk public money benefitting a private party.
  | 
  | Feel free to form a multinational consortium and submit a grant
  | application to one of our distribution partners under the
  | Horizon program though.
  | 
  | Now, how do you plan to create jobs and reduce CO2?
 
| alecco wrote:
| Interesting blog but the extrapolations are way overblown. I
| tried one of the 30bn models and it's not even remotely close to
| GPT-3.
| 
| Don't get me wrong, this is very interesting and I hope more is
| done in the open models. But let's not over-hype by 10x.
 
| lmeyerov wrote:
| It seems the quality goes up & cost goes down significantly with
| Colossal AI's recent push:
| https://medium.com/@yangyou_berkeley/colossalchat-an-open-so...
| 
| Their writeup makes it sounds like, net, 2X+ over Alpaca, and
| that's an early run
| 
| The browser side is interesting too. Browser JS VMs have a memory
| cap of 1GB, so that may ultimately be the bottleneck here...
 
  | SebJansen wrote:
  | does the 1gb limit extend to wasm?
 
    | jesse__ wrote:
    | WASM is specified to have 32-bit pointers, which is 4GB.
    | AFAIK browser implementations respect that (when I did some
    | nominal testing a couple years ago)
 
  | lmeyerov wrote:
  | Interesting, since I looked last year, Chrome has started
  | raising the caps internally on buffer allocation to potentially
  | 16GB:
  | https://chromium.googlesource.com/chromium/src/+/2bf3e35d7a4...
  | 
  | Last time I tried on a few engines, it was just 1-2GB for typed
  | arrays, which are essentially the backing structure for this
  | kind of work. Be interesting to try again..
  | 
  | For our product, we actually want to dump 10GB+ on to the WebGL
  | side, which may or may not get mirrored on the CPU side. Not
  | sure if additional limits there on the software side. And after
  | that, consumer devices often have another 10GB+ CPU RAM free,
  | which we'd also like to use for our more limited non-GPU stuff
  | :)
 
  | jesse__ wrote:
  | I thought the memory limit (in V8 at least) was 2GB due to the
  | GC not wanting to pass 64 bit pointers around, and using the
  | high bit of a 32-bit offset for .. something I now forget ..?
  | 
  | Do you have a source showing a JS runtime with a 1GB limit?
 
    | jesse__ wrote:
    | UPDATE: After a nominal amount of googling around it appears
    | valid sizes have increased on 64-bit systems to a maximum of
    | 8GB, and stayed at 2GB on 32-bit systems, for FF at least. I
    | guess it's probably 'implementation defined'
    | 
    | https://developer.mozilla.org/en-
    | US/docs/Web/JavaScript/Refe...
    | 
    | https://developer.mozilla.org/en-
    | US/docs/Web/JavaScript/Refe...
 
| JasonZ2 wrote:
| Does anyone know how the results from a 7B parameter model with
| bloomz.cpp (https://github.com/NouamaneTazi/bloomz.cpp) compares
| to the 7B parameter Alpaca model with llama.cpp
| (https://github.com/ggerganov/llama.cpp)?
| 
| I have the latter working on a M1 Macbook Air with very good
| results for what it is. Curious if bloomz.cpp is significantly
| better or just about the same.
 
| rspoerri wrote:
| So cool it runs on a browser /sarcasm/ i might not even need a
| computer. Or internet when we are at it.
| 
| It either runs locally or it runs on the cloud. Data could come
| from both locations as well. So it's mostly technically
| irrelevant if it's displaying in a browser or not.
| 
| Except when it comes to usability. I don't get it why people love
| software running in a browser. I often close important tools i
| have not saved when it's in a browser. I cant have offline tools
| which work if i am in a tunnel (living in Switzerland this is an
| issue) . Or it's incompatible because i am running LibreWolf.
| 
| /sorry to be nitpicking on this topic ;-)
 
  | ftxbro wrote:
  | > I don't get it why people love software running in a browser.
  | 
  | If you read the article, part of the argument was for the
  | sandboxing that the browser provides.
  | 
  | "Obviously if you're going to give a language model the ability
  | to execute API calls and evaluate code you need to do it in a
  | safe environment! Like for example... a web browser, which runs
  | code from untrusted sources as a matter of habit and has the
  | most thoroughly tested sandbox mechanism of any piece of
  | software we've ever created."
 
    | rspoerri wrote:
    | OSX does app sandboxing as well (not everywhere). But yeah,
    | you're right i only skimmed the content and missed that part.
 
    | rspoerri wrote:
    | Thinking about it...
    | 
    | I don't know exactly about the browser sandboxing. But isn't
    | it's purpose to prevent access to the local system, while it
    | mostly leaves access to the internet open?
    | 
    | Is that really a good way to limit and AI system's API
    | access?
 
      | simonw wrote:
      | The same-origin policy in browsers defaults to preventing
      | JavaScript from making API calls out to any domain other
      | than the one that hosts the page - unless those other
      | domains have the right CORS headers.
      | 
      | https://developer.mozilla.org/en-
      | US/docs/Web/Security/Same-o...
 
  | sp332 wrote:
  | Broswer software is great because I don't have to build
  | separate versions for Windows, Mac, and Linux, or deal with app
  | stores, or figure out how to update old versions.
 
  | pmoriarty wrote:
  | There are a bunch of reasons people/companies like web apps:
  | 
  | 1 - Everyone already has a web browser, so there's no software
  | to download (or the software is automatically downloaded,
  | installed and run, if you want to look at it that way... either
  | way, the experience is a lot easier and more seamless for the
  | user)
  | 
  | 2 - The website owner has control of the software, so they can
  | update it and manage user access as they like, and it's easier
  | to track users and usage that way
  | 
  | 3 - There are a ton of web developers out there, so it's easier
  | to find people to work on your app
  | 
  | 4 - You ostensibly don't need to rewrite your app for every OS,
  | but may need to modify it for every supported browser
 
    | rspoerri wrote:
    | Most of these aspects make it better for the company or
    | developer, only in some cases it makes it easier for the user
    | in my opinion. Some arguments against it are:
    | 
    | 1 - Not everyone has or wants fast access to the internet all
    | the time.
    | 
    | 2 - I try to prevent access of most of the apps to the
    | internet. I don't want companies to access my data or even
    | metadata of my usage.
    | 
    | 3 - sure, but it doesn't make it better for the user.
    | 
    | 4 - Also supporting different screen sizes and interaction
    | types (touch or mouse) can be a big part of the work.
    | 
    | The most important part for a user is if he/she is only using
    | the app rarely or once. Not having to install it will make
    | the difference between using it or not. However with the app
    | stores most OS's feature today this can change pretty soon
    | and be equally simple.
    | 
    | I might be old school on this, but i resent subscription
    | based apps. For applications that do not need to change,
    | deliver no additional service or aren't absolutely vital for
    | me i will never subscribe. And browser based app's are at the
    | core of this unfortunate development. But that's gone very
    | far from the original topic :-)
 
  | nanidin wrote:
  | Browser is the true edge compute.
 
| fzliu wrote:
| I was a bit skeptical about loading a _4GB_ model at first. Then
| I double-checked: Firefox is using about 5GB of memory for me. My
| current open tabs are mail, calendar, a couple Google Docs, two
| Arxiv papers, two blog posts, two Youtube videos, milvus.io
| documentation, and chat.openai.com.
| 
| A lot of applications and developers these days take memory
| management for granted, so embedding a 4GB model to significantly
| enhance coding and writing capabilities doesn't seem too far-
| fetched.
 
| munk-a wrote:
| A wonderful thing about software development is that there is so
| much reserved space for creativity that we have huge gaps between
| costs and value. Whether the average person could do this for 85k
| I'm uncertain of - but there is a very significant slice of
| people that could do it for well under 85k now that the ground
| work has been done. This leads to the hilarious paradox where a
| software based business worth millions could be built on top of
| code valued around 60k to write.
 
  | nico wrote:
  | > This leads to the hilarious paradox where a software based
  | business worth millions could be built on top of code valued
  | around 60k to write.
  | 
  | Or the fact that software based businesses just took a massive
  | hit in value overnight and cannot possibly defend such high
  | valuations anymore.
  | 
  | The value of companies is quickly going to shift from tech
  | moats to brands.
  | 
  | Think CocaCola - anyone can create a drink that tastes as good
  | or better than coke, but it's incredibly hard to compete with
  | the CocaCola brand.
  | 
  | Now think what would have happened if CocaCola had been super
  | expensive to make, and all of a sudden, in a matter of weeks,
  | it became incredibly cheap.
  | 
  | This is what happened to the saltpeter industry in 1909 when
  | synthetic saltpeter was invented. The whole industry was
  | extinct in a few years.
 
  | prerok wrote:
  | Nit: not to write but to run. The cost of development is not
  | considered in these calculations.
 
| ftxbro wrote:
| His estimate is that you could train a LLaMA-7B scale model for
| around $82,432 and then fine-tune it for a total of less than
| $85K. But when I saw the fine tuned LLaMA-like models they were
| worse in my opinion even than GPT-3. They were like GPT-2.5 or
| like that. Not nearly as good as ChatGPT 3.5 and certainly not
| ChatGPT-beating. Of course, far enough in the future you could
| certainly run one in the browser for $85K or much less, like even
| $1 if you go far enough into the future.
 
  | icelancer wrote:
  | Yeah, the constant barrage of "THIS IS AS GOOD AS CHATGPT AND
  | IS PRIVATE" screeds from LLaMA-based marketing projects are
  | getting ridiculous. They're not even remotely close to the same
  | quality. And why would they be?
  | 
  | I want the best LLMs to be open source too, but I'm not
  | delusional enough to make insane claims like the hundreds of
  | GitHub forks out there.
 
    | robertlagrant wrote:
    | > I want the best LLMs to be open source too
    | 
    | How do you do this without being incredibly wealthy?
 
      | nickthegreek wrote:
      | crowd source to pay for the gpu rentals.
 
      | mejutoco wrote:
      | Pooling resources a la SETI@home would be an interesting
      | option I would love to see.
 
        | simonw wrote:
        | My understanding is that can work for model inference but
        | not for model training.
        | 
        | https://github.com/bigscience-workshop/petals is a
        | project that does this kind of thing for running
        | inference - I tried it out in Google Collab and it seemed
        | to work pretty well.
        | 
        | Model training is much harder though, because it requires
        | a HUGE amount of high bandwidth data exchange between the
        | machines doing the training - way more than is feasible
        | to send over anything other than a local network
        | connection.
 
      | crdrost wrote:
      | You (1) are a company who (2) understands the business
      | domain and has an appropriate business plan.
      | 
      | Sadly the reality of funding today makes it unlikely that
      | these two will both be simultaneously satisfied. The
      | problem is that history will look back on the necessary
      | business plan and deem it a failure even if it generates a
      | company that does a billion dollars plus in annual revenue.
      | 
      | This is actually not unique to large language models but
      | most innovation around computers. The basic problem is that
      | if you build a force-multiplier (spreadsheets, personal
      | computing, large-language models all come to mind) then
      | what will make it succeed is its versatility: people want a
      | hammer that can be used for smashing all manner of things,
      | not just your company's particular brand of matching nails.
      | And most people will only pick up that hammer once per week
      | or once per month, only like 1% of the economy if that will
      | be totally revolutionized, "we use this force-multiplier
      | every day, it is now indispensable, we can't imagine life
      | without it," and it's never predictable what that sector
      | will be -- it's going to be like "oh, who ever dreamed that
      | the killer application for LLMs would be them replacing
      | AutoCAD at mechanical contractors" or some shit.
      | 
      | In those strange eons, to wildly succeed, one must give up
      | on anticipating all usages of the software, one must cease
      | controlling it and set it free. "Well where's the profit in
      | that?" -- it is that this company was one of the first
      | players in the overall market, they got an early chance to
      | stake out as much territory as possible. But the market
      | exploded way larger than they could handle and then
      | everybody looks back on them and says "wow, what a failure,
      | they only captured 1% of that market, they could have been
      | so much more successful." Yeah, they captured 1% of a $100B
      | market, some failure, right?
      | 
      | But what actually happens is that companies see the
      | potential, investors get dollar signs in their eyes,
      | everyone starts to lock down and control these, "you may
      | use large language models but only in the ways that we say,
      | through the interfaces which we provide," and then the only
      | thing that you can use it for is to get generic
      | conversational advice about your hemorrhoids, so after 5-10
      | years the bubble of excitement fizzles out. Nobody ever
      | dreams to apply it to AutoCAD or whatever, and the world
      | remains unchanged.
 
        | javajosh wrote:
        | History is littered with great software that died because
        | no-one used it because the business model was terrible.
        | Capturing $1B of value is better than 0, and everyone
        | understands this. And who cares what history thinks
        | anyway?
        | 
        | OpenAI has spent a lot of money to get their result. It's
        | safe to assume it will take a lot of money to get a
        | similar result, and then to share it (although I assume
        | bit torrent will be good enough). Once people are running
        | their models, they can innovate to their hearts content.
        | It's not clear how or why they'd give money back to the
        | enabling technology. So how does money flow back to the
        | innovators in proportion to the value produced, if not a
        | SaaS?
 
        | ftxbro wrote:
        | what stage of capitalism is this
 
        | robertlagrant wrote:
        | If those are all that's required, why don't you start a
        | company with a business plan written so it satisfies your
        | criteria? Then you can lead the way with OSS LLMs.
 
      | ftxbro wrote:
      | Yes a rugged individual would have to be incredibly wealthy
      | to do it!
      | 
      | But maybe the governments will make one and maintain it
      | with taxes as an infrastructure service, like roads, giving
      | everyone expanded powers of cognition, memory, and
      | expertise, and raising the consciousnesses of humanity to
      | new heights. Probably in USA it wouldn't happen if we judge
      | ourselves only in zero sum relation to others - helping
      | everyone would be a wash and only waste our money!
 
        | szundi wrote:
        | Some governments probably alread do and use it against
        | so-called terrorists or enemies of the people...
 
  | simonw wrote:
  | Yeah, you're right. I wrote this a couple of weeks ago at the
  | height of LLaMA hype, but with further experience I don't think
  | the GPT-3 comparisons hold weight.
  | 
  | My biggest problem: I haven't managed to get a great
  | summarization out of a LLaMA derivative that runs on my laptop
  | yet. Maybe I haven't tried the right model or the right prompt
  | yet though, but that feels essential to me for a bunch of
  | different applications.
  | 
  | I still think a LLaMA/Alpaca fine-tuned for the ReAct pattern
  | that can execute additional tools would be a VERY interesting
  | thing to explore.
  | 
  | [ ReAct: https://til.simonwillison.net/llms/python-react-
  | pattern ]
 
    | avereveard wrote:
    | my biggest problem with these models is that they cannot
    | reliably produce structured data.
    | 
    | even davinci can be used as part of a chain, because you can
    | direct it to structure and unstructure data, and then extract
    | the single component and build them into tasks. cohere, llama
    | et al are currently struggling to consistently produce these
    | result reliably, even if you can chat with them and frankly
    | it's not about the chat
    | 
    | example from a stack overflow that split the questions before
    | sending it down chain for answering all points individually:
    | 
    | This is a customer question:
    | 
    | I'm a beginner RoR programmer who's planning to deploy my app
    | using Heroku. Word from my other advisor friends says that
    | Heroku is really easy, good to use. The only problem is that
    | I still have no idea what Heroku does...
    | 
    | I've looked at their website and in a nutshell, what Heroku
    | does is help with scaling but... why does that even matter?
    | How does Heroku help with:                   Speed - My
    | research implied that deploying AWS on the US East Coast
    | would be the fastest if I am targeting a US/Asia-based
    | audience.              Security - How secure are they?
    | Scaling - How does it actually work?              Cost
    | efficiency - There's something like a dyno that makes it easy
    | to scale.              How do they fare against their
    | competitors? For example, Engine Yard and bluebox?
    | 
    | Please use layman English terms to explain... I'm a beginner
    | programmer.
    | 
    | Extract the scenario from the question including a summary of
    | every detail, list every question, in JSON:
    | 
    | { "scenario": "A beginner RoR programmer is planning to
    | deploy their app using Heroku and is seeking advice about
    | deploying it.", "questions": [ "What does Heroku do?", "How
    | does deploying AWS on the US East Coast help with speed?",
    | "How secure is Heroku?", "How does scaling with Heroku
    | work?", "What is a dyno and why is it cost efficient?", "How
    | does Heroku compare to its competitors, such as Engine Yard
    | and Bluebox?" ] }
 
      | newhouseb wrote:
      | Last weekend I built some tooling that you can integrate
      | with huggingface transformers to force a given model to
      | _only_ output content that validates against a JSON schema
      | [1].
      | 
      | The challenge is that for it to work cost effectively you
      | need to be able to append what is basically a final network
      | layer to the model that is algorithmically designed and
      | until OpenAI exposes the full logits and/or some way to
      | modify them on the fly you're going to be stuck with open
      | source models. I've run things against GPT-2 mostly but
      | it's only list to try LLaMA.
      | 
      | [1] "Structural Alignment: Modifying Transformers (like
      | GPT) to Follow a JSON Schema" @
      | https://github.com/newhouseb/clownfish
 
      | simonw wrote:
      | This feels solvable to me. I wonder if you could use fine
      | tuning against LLaMA to teach it to do this better?
      | 
      | GPT-3 etc can only do this because they had a LOT of code
      | included in their training sets.
      | 
      | The LLaMA paper says Github was 4.5% of the training
      | corpus, so maybe it does have that stuff baked in and just
      | needs extra tuning or different prompts to tap into that
      | knowledge.
 
        | avereveard wrote:
        | I have done it trough stages, so first stages emits in
        | natural language in the format of "context: ... and
        | question: ...." and then the second stage collect it as
        | json, but then wait time doubles.
 
    | Tepix wrote:
    | Have you tried bigger models? Llama-65B can indeed compete
    | with GPT-3 according to various benchmarks. The next thing
    | would be to get the fine-tuning as good as OpenAI's.
 
      | mewpmewp2 wrote:
      | I wonder how accurate those benchmarks are in terms of
      | actual problem solving capability. I think there's a major
      | line at which point LLM becomes actually useful and it
      | actually feels like you are speaking to something
      | intelligent and that can be useful for you in terms of
      | productivity etc.
 
| version_five wrote:
| If you have ~100k to spend, aren't there options to buy a gpu
| rather than just blow it all on cloud? How much is an 8xA100
| machine?
| 
| 4xA100 is 75k, 8 is 140k https://shop.lambdalabs.com/deep-
| learning/servers/hyperplane...
 
  | dekhn wrote:
  | you're comparing the capital cost of acquiring a GPU machine
  | with the operational cost of renting one in the cloud.
  | 
  | Ignoring the operational costs of on-prem hardware is pretty
  | common, but those costs are significant and can greatly change
  | the calculation.
 
    | digitallyfree wrote:
    | For a single unit one could have it in their home or office,
    | rather than a datacenter or colo. If the user sets up and
    | manages the machine themselves there is no additional IT
    | cost. The greatest operating expense would be the power cost.
 
      | dekhn wrote:
      | "If the user sets up and manages the machine themselves
      | there is no additional IT cost" << how much do you value
      | your time?
      | 
      | In my experience, physical hardware has a management
      | overhead over cloud resources. Backups, large disk storage
      | for big models, etc.
 
    | pessimizer wrote:
    | Or from another perspective, comparing the cost of training
    | one model in the cloud to the cost of training as many as you
    | want on your machine, then (as mentioned by siblings) selling
    | the machine for nearly as much as you paid for it, unless
    | there's some shortage, in which case you'll get more back
    | than you paid for it.
    | 
    | One is buying capital that produces models, the other is
    | buying a single model.
 
    | sounds wrote:
    | Remember to discount the tax depreciation for the hardware
    | and deduct any potential future gains from either reselling
    | it or using it.
 
    | capableweb wrote:
    | Heh, you work at AWS or Google Cloud perhaps? ;) (Only joking
    | about this as I constantly see employees from AWS/GCloud and
    | other cloud providers claim that cloud is always cheaper than
    | hosting things yourself)
    | 
    | Sure, if you're planning to service a large number of users,
    | building your infrastructure in-house might be a bit
    | overkill, as you'll need a infrastructure team to service it
    | as well.
    | 
    | If you're just want to buy 4 GPUs to put in one server to run
    | some training yourself, I don't think it's that much
    | overkill. Especially considering you can recover much of the
    | cost even after a year by selling much of the equipment you
    | bought. Most of your losses will be costs for electricity and
    | internet connection.
 
      | throwaway50601 wrote:
      | Cloud gives you very good price for what they offer -
      | excellent reliability, hyper-scalability. Most people don't
      | need either and use it as a glorified VPS host.
 
      | dekhn wrote:
      | I used to work for Google Cloud (I built a predecessor to
      | Preemptible VMs and also launched Google Cloud Genomics).
      | But even before I worked at Google I was a big fan of AWS
      | (EC2 and S3).
      | 
      | Buying and selling hardware isn't free; it comes with its
      | own cost. I would not want to be in the position of selling
      | a $100K box of computer equipment- ever.
 
        | capableweb wrote:
        | :)
        | 
        | True, but some things are harder to sell than others.
        | A100's in today's market would be easy to sell. Harder to
        | buy, because the supply is so low unless you're Google or
        | another big name, but if you're trying to sell them, I'm
        | sure you can get rid of them quickly.
 
    | jcims wrote:
    | No kidding. I worked for a company that had multiple billions
    | of dollars invested in a data center refresh in North America
    | and Europe.
 
    | version_five wrote:
    | For a server farm, sure, for one machine, I don't know.
    | Assuming it plugs into a normal 15A circuit, and you have a
    | we-work or something where you don't pay for power, is the
    | operational cost of one machine really material?
 
      | dekhn wrote:
      | it's hard to tell from what you're saying: you're planning
      | on putting an ML infrastructure training server on a
      | regular 15A circuit, not in a data center or machine room?
      | And power is paid for by somebody else?
      | 
      | My thinking about pricing doesn't include that option
      | because I wouldn't just hook a server like that up to a
      | regular outlet in an office and use it for production work.
      | If that works for you- you can happily ignore my comments.
      | But if you go ahead and build such a thing and operate it
      | for a year, please let us know if there were any costs-
      | either dollar or in suffering- associated with your
      | decision
      | 
      | [edit: adding in that the value of this machine also
      | suggests it cannot live unattended in an insecure location,
      | like an office]
      | 
      | signed, person who used to build closet clusters at
      | universities
 
        | KeplerBoy wrote:
        | Nvidia happily sells what you're describing. They call it
        | "DGX Station A100", it has 4 80GB A100 and retails for
        | 80k. Not sure i believe their claimed noise level of <37
        | dB though.
        | 
        | Of course that's still a very small system when talking
        | LLM training, the only reason why i would not put that in
        | a regular office is it's extreme price. Do you really
        | want something worth 80k in a form factor that could be
        | casually carried through the door?
 
        | amluto wrote:
        | If you live near an inexpensive datacenter, you can park
        | it there. Throw in a storage machine or two (TrueNAS MINI
        | R looks like a credible low-effort option). If your
        | workload is to run a year long computation on it and
        | otherwise mostly ignore it, then your operational costs
        | will be quite low.
        | 
        | Most people who rent cloud servers are not doing this
        | type of workload.
 
  | modernpink wrote:
  | You can sell the A100 after once you're done as well. Possibly
  | even at profit?
 
  | girthbrooks wrote:
  | These are wild pieces of hardware, thanks for linking. I wonder
  | how loud they get.
 
  | sacred_numbers wrote:
  | If you bought an 8xA100 machine for $140k you would have to run
  | it continuously for over 10,000 hours (about 14 months) to
  | train the 7B model. By that time the value of the A100s you
  | bought would have depreciated substantially; especially because
  | cloud companies will be renting/selling A100s at a discount as
  | they bring H100s online. It might still be worth it, but it's
  | not a home run.
 
    | inciampati wrote:
    | If 8-bit training methods take off, I think the calculus is
    | going to change rapidly, with newer cards that have decent
    | amounts of memory and 8-bit acceleration starting to become
    | dramatically more cost and time effective than the venerable
    | A100s.
 
___________________________________________________________________
(page generated 2023-03-31 23:00 UTC)