|
| nwoli wrote:
| What we need is a RETRO style model where basically after the
| input you go through a small net that just fetches a desired set
| of weights from a server (serving data without compute is dirt
| cheap) and is then executed locally. We'll get there eventually
| tinco wrote:
| Can anyone explain or link some resource on why these big GPT
| models all don't incorporate any RETRO style? I'm only very
| superficially following ML developments and I was so hyped by
| RETRO and then none of the modern world changing models apply
| it.
| nwoli wrote:
| Openai might very well be using that internally who knows how
| they implement things. Also emad retweeted a RETRO related
| thing a bit back so they might very well be using that for
| their awaited LM, here's hoping
| ushakov wrote:
| Now imagine loading 3.9 GB each time you want to interact with a
| webpage
| KMnO4 wrote:
| Yeah, I've used Jira.
| neilellis wrote:
| :-)
| sroussey wrote:
| 10yrs from now models will be in the OS. Maybe even in silicon.
| No downloads required.
| swader999 wrote:
| The OS will be in the cloud interfacing into our brain by
| then. I don't want this btw.
| pessimizer wrote:
| Not in mine. I don't even want redhat's bullshit in there.
| I'm not installing some black box into my OS that was
| programmed with _motives_ that can 't be extracted from the
| model at rest.
| sroussey wrote:
| iOS already has this to a degree, for a couple of years.
| brrrrrm wrote:
| The WebGPU demo mentioned in this post is insane. Blows any WASM
| approach out of the water. Unfortunately that performance is not
| supported anywhere but chrome canary (behind a flag)
| raphlinus wrote:
| This will be changing soon. I believe Chrome M113 is scheduled
| to ship to stable on May 2, and will support WebGPU 1.0. I
| agree it's a game-changing technology.
| ChumpGPT wrote:
| [dead]
| agnokapathetic wrote:
| > My friends at Replicate told me that a simple rule of thumb for
| A100 cloud costs is $1/hour.
|
| AWS charges $32/hr for an 8xA100s (p4d.24xlarge) which comes out
| to $4/hour/gpu. Yes you can get lower pricing with a 3 year
| reservation but thats not what this question is asking.
|
| You also need 256 nodes to be colocated on the same fabric --
| which AWS will do for you but only if you reserve for years.
| pavelstoev wrote:
| model-depending, you can train on lesser (cheaper) GPUs but
| system-level optimizations are needed. Which is what we provide
| at centml.ai
| sebzim4500 wrote:
| Maybe they are using spot instances? $1/hr is about right for
| those.
| thewataccount wrote:
| AWS certainly isn't the cheapest for this, did they mention
| using AWS? Lamdba Labs is 12$/hr for 8xA100's, and there's
| others relatively close to this price on demand, I assume you
| can get a better deal if you contact them for a large project.
|
| Replicate themselves rent out GPU time so I assume they would
| definitely know as that's almost certainly the core of their
| business.
| IanCal wrote:
| Lambda labs charges about 11-12/hr for 8xA100.
| robmsmt wrote:
| and is completely at capacity
| IanCal wrote:
| But reflects an upper bound at the cost of running a100s.
| celestialcheese wrote:
| lambdalabs will let you do on-demand 8xa100 @ 80GB VRAM/GPU for
| $12/hr, or reserved @ $10.86/hr
|
| 8xA100 @ 40gb for $8/hr
|
| Replicate friend isn't far off.
| pavelstoev wrote:
| Training a ChatGPT-beating model for much less than $85,000is
| entirely feasible. At CentML, we're actively working on model
| training and inference optimization without affecting accuracy,
| which can help reduce costs and make such ambitious projects
| realistic. By maximizing (>90%) GPU and platform hardware
| utilization, we aim to bring down the expenses associated with
| large-scale models, making them more accessible for various
| applications. Additionally, our solutions also have a positive
| environmental impact, addressing the excess CO2 concerns. If
| you're interested in learning more about how we are doing it,
| please reach out via our website: https://centml.ai
| astlouis44 wrote:
| WebGPU is going to be a major component in this. Modern GPU's
| prevalent in mobile devices, desktops and laptops, is more than
| enough to do all of this client side.
| nope96 wrote:
| I remember watching one of the final episodes of Connections 3:
| With James Burke, and he casually said we'd have personal
| assistants that we could talk to (in our PDAs). That was 1997 and
| I knew enough about computers to think he was being overly
| optimistic about the speed of progress. Not in our lifetimes.
| Guess I was wrong!
| TMWNN wrote:
| Hey, that means it can be turned into an Electron app!
| breck wrote:
| Just want to say SimonW has become one of my favorite writers
| covering the AI revolution. Always fun thought experiments with
| linked code and very constructive for people thinking about how
| to make this stuff more accessible to the masses.
| fswd wrote:
| There is somebody finetunin 160m rwkv4 on alpaca on the rwkv
| discord, I am out of the office and can't link but the person
| posted in prompt showcase channel
| buzzier wrote:
| RWKV-v4 Web Demo (169m/430m params)
| https://josephrocca.github.io/rwkv-v4-web/demo/
| skybrian wrote:
| I wonder why anyone would want to run it in a browser, other than
| to show it could be done? It's not like the extra latency would
| matter, since these things are slow.
|
| Running it on a server you control makes more sense. You can pick
| appropriate hardware for running the AI. Then access it from any
| browser you like, including from your phone, and switch devices
| whenever you like. It won't use up all the CPU/GPU on a portable
| device and run down your battery.
|
| If you want to run the server at home, maybe use something like
| Tailscale?
| simonw wrote:
| The browser thing is definitely more for show than anything
| else - I used it to help demonstrate quite how surprisingly
| lightweight these models can be.
| GartzenDeHaes wrote:
| It's interesting to me that LLaMA-nB's still produce reasonable
| results after 4-bit quantization of the 32-bit weights. Does this
| indicate some possibility of reducing the compute required for
| training?
| lxe wrote:
| Keep in mind that image transformer models like stable diffusion
| are generally smaller than language models, so they are easier to
| fit in wasm space.
|
| Also. you can finetune llama-7b on a 3090 for about $3 using
| LoRA.
| bitL wrote:
| Only for images. People want to generate videos next and those
| models will be likely GPT-sized.
| Metus wrote:
| There is a video model making the rounds on
| /r/stablediffusion and it is just a tiny bit larger than
| Stable Diffusion.
| isoprophlex wrote:
| You're not kidding! it's far from perfect, but pretty funny
| still...
|
| https://www.reddit.com/r/StableDiffusion/comments/126xsxu/n
| i...
|
| Too bad SD learned the Shutterstock watermark so well, lol
| bitL wrote:
| It's cool though not very stable in details over temporal
| axis.
| danielbln wrote:
| Generative image models don't use transformers, they're
| diffusion models. LLMs are transformers.
| lxe wrote:
| Ah yes that's right. Well they technically do use a visual
| transformer for CLIP text encoder as I understand.
| jedberg wrote:
| With the explosion of LLMs and people figuring out ways to
| train/use them relatively cheaply, unique data sets will become
| that much more valuable, and will be the key differentiator
| between LLMs.
|
| Interestingly, it seems like companies that run chat programs
| where they can read the chats are best suited to building "human
| conversation" LLMs, but someone who manages large text datasets
| for others are in the perfect place to "win" the LLM battle.
| captaincrowbar wrote:
| The big problem with AI R&D is that nobody can keep up with the
| big bux companies. It makes this kind of project a bit pointless.
| Even if you can run a GPT3-equivalent on a web browser, how many
| people are going to bother (except as a stunt) when GPT4 is
| available?
| simonw wrote:
| An increasingly common complaint I'm hearing about GPT3/4/etc
| is people who don't want to pass any of their private data to
| another company.
|
| Running models locally is by far the most promising solution
| for that concern.
| adeon wrote:
| The ones that can't use the GPT4 for whatever reason. Maybe you
| are a company and you don't want to send OpenAI your prompts.
| Or a person who has very private prompts and feel sketchy about
| sending them over.
|
| Or maybe you are an individual who has a use case that's too
| edgy for OpenAI or a silicon valley corporate image. When
| Replika shut down people trying to have virtual
| boyfriend/girlfriends on their platform, their reddit filled up
| with people who mourned like they just lost a partner.
|
| I think it's important that alternative non-big bux company
| options exist, even if most people don't want to or need to use
| them.
| moffkalast wrote:
| Or maybe you're in Italy and OpenAI had just been banned from
| the country for not adhering to GDPR. I suspect the rest of
| the EU may follow soon.
| psychphysic wrote:
| Those are seriously niche use cases. They exist but can they
| fund gpt5 level development?
| r00fus wrote:
| Most corporations/governments would prefer to keep their AI
| conversations private. Definitely mainstream desire, not
| niche.
| psychphysic wrote:
| Who does your government and corporate email? In the UK
| it's all either Gmail (for government) and Outlook (NHS).
| For compliance reasons they simply want data center
| certification and location restrictions.
|
| If you think a small corp is going to get a big gov
| contract outside of a nepo-state you're in for a shock.
| adeon wrote:
| Given the Replika debacle, I personally suspect the AI
| partner use case is not really very niche. Just few people
| openly want to talk about wanting it because having an
| emotional AI partner is seen as creepy.
|
| And companies would not want to do that. Imagine you make
| partner AI that goes unhinged like Bing did and tells you
| to kill yourself or something similar. I can't imagine
| companies would want that kind of risk.
| [deleted]
| psychphysic wrote:
| If you AI partner data can't be stored in an Azure or
| similar data centre you are a serious small niche person!
|
| Even Jennifer Lawrence stored her nudes on iCloud.
| make3 wrote:
| Alpaca uses knowledge distillation (it's trained on outputs from
| OpenAI models). It's something to keep in mind. You're teaching
| your model to copy an other model's outputs.
| thewataccount wrote:
| > You're teaching your model to copy an other model's outputs.
|
| Which itself was trained on human outputs to do the same thing.
|
| Very soon it will be full Ouroboros as humans use the model's
| output to finetune themselves.
| visarga wrote:
| > You're teaching your model to copy an other model's outputs.
|
| That's a time honoured tradition in ML, invented by the father
| of the field himself, Geoffrey Hinton, in 2015.
|
| > Distilling the Knowledge in a Neural Network
|
| https://arxiv.org/abs/1503.02531
| thih9 wrote:
| > as opposed to OpenAI's continuing practice of not revealing the
| sources of their training data.
|
| Looks like that choice makes it more difficult to adopt, trust,
| or collaborate on the new tech.
|
| What are the benefits? Is there more to that than competitive
| advantage? If not, ClosedAI sounds more accurate.
| holloworld wrote:
| [dead]
| whalesalad wrote:
| Are there any training/ownership models like Folding@Home? People
| could donate idle GPU resources in exchange for access to the
| data, and perhaps ownership. Then instead of someone needing to
| pony up $85k to train a model, a thousand people can train a
| fraction of the model on their consumer GPU and pool the results,
| reap the collective rewards.
| dekhn wrote:
| A few people have built frameworks to do this.
|
| There is still a very large open problem in how to federate
| large numbers of loosely coupled computers to speed up training
| "interesting" models. I've worked in both domains (protein
| folding via Folding@Home/protein folding using supercomputers,
| and ML training on single nodes/ML training on supercomputers)
| and at least so far, ML hasn't really been a good match for
| embarrassingly parallel compute. Even in protein folding,
| folding@home has a number of limitations that are much better
| addressed on supercomputers (for example: if your problem
| requires making extremely long individual simulations of large
| proteins).
|
| All that could change, but I think for the time being,
| interesting/big models need to be trained on tightly coupled
| GPUs.
| whalesalad wrote:
| Probably going to mirror the transition from single-threaded
| to multi-threaded compute. Took a while until application
| architectures took hold of the populous to utilize multi-
| core.
| PaulDavisThe1st wrote:
| Probably not. Multicore has been a thing for 30 years (We
| had a 32 core Sequent Systems and a 64 core KSR-1 at UW
| CS&E in the early 1990s). Everything about these models has
| been developed in a multicore computing context, and thus
| far, it still isn't massively-parallel-distributable. An
| algorithm can be massively parallel without being sensibly
| distributable. Change the latency between compute nodes is
| not always a neutral or even just linear decrease in
| performance.
| itissid wrote:
| And you can rule out most of the monte carlo stuff too. Which
| rules out parallelization modern statistical frameworks like
| STAN used for explainable models; things like Finance
| modeling of risk which is a sampling of posteriors using MCMC
| also can't be parallelized.
| MontyCarloHall wrote:
| Assuming the chains can reach an equilibrium point (i.e.
| burn in) quickly, M samples from an MCMC can be
| parallelized by running N chains in parallel each for M/N
| iterations. You still end up with M total samples from your
| target distribution.
|
| You're only out of luck if each iteration is too compute
| intense to fit on one worker node, even if each iteration
| might be embarrassingly parallelizable, since the overhead
| of having to aggregate computations across workers at every
| iteration would be too high.
| neoromantique wrote:
| How long until somebody creates a crypto project on that?
| buildbuildbuild wrote:
| Bittensor is one, not an endorsement. chat.bittensor.com
| ellisv wrote:
| That'd be cool but I don't think most idle consumer GPUs
| (6-8GB) would have large enough memory for a single iteration
| (batch size 1) of modern LLMs.
|
| But I'd love to see more federated/distributed learning
| platforms.
| whalesalad wrote:
| Is it possible to break the model apart? Or does the entire
| thing need to be architected from the get-go such that an
| individual GPU can own a portion end to end?
| mirekrusin wrote:
| 6GB can store 3 billion parameters, gpt3.5 has 175 billion
| parameters.
| mirekrusin wrote:
| Unfortunately training is not emberassingly parallelisable [0]
| problem. It would require new architecture. Current models
| diverge too fast. By the time you'd download and/or calculate
| your contribution the model would descend somewhere else and
| your delta would not be applicable - based off wrong initial
| state.
|
| It would be great if merge-ability would exist. It would also
| likely apply to efficient/optimal shrinking for models.
|
| Maybe you could dispatch tasks to train on many variations of
| similar tasks and take average of results? It could probably
| help in some way, but you'd still have large serialized
| pipeline to munch through and you'd likely require some serious
| hardware ie. dual gtx 4090 on client side.
|
| [0] https://en.wikipedia.org/wiki/Embarrassingly_parallel
| amitport wrote:
| hmmm... seems like you're reinventing distributed learning.
|
| merge-ability does exist and you can average the results.
| mirekrusin wrote:
| You can if you have same base weights.
|
| If you have similar variants of the same task you can
| accelerate it more where the diff is.
|
| You can't average on past results computed from historic
| base weights - it's linear process.
|
| If you could do that, you'd just map training examples to
| diffs and merge them all.
|
| Or take two distinct models and merge them to have model
| that is roughly sum of them. You can't do it, it's not
| linear process.
| _trampeltier wrote:
| Start a Boinc project.
|
| https://boinc.berkeley.edu/projects.php
| spyder wrote:
| Learning@Home using Decentralized Mixture-of-Expert models:
|
| https://learning-at-home.github.io/
|
| https://training-transformers-together.github.io/
|
| https://arxiv.org/abs/2002.04013
| ftxbro wrote:
| Yes there is petals/bloom https://github.com/bigscience-
| workshop/petals but it's not so great. Maybe it will improve or
| a better one will come.
| whalesalad wrote:
| Really interesting live monitor of the network:
| http://health.petals.ml
| polishdude20 wrote:
| I wonder how they handle illegal content. Like, if you're
| running training data on your computer, what's to stop
| someone else's data that is illegal, from being uploaded to
| your computer as part of training?
| riedel wrote:
| I read that it is only scoring the model collaboratively but
| it allows some fine-tuning I guess.
|
| Getting the actual gradient descent to parallelize is more
| difficult because one needs to average the gradient when
| using data/batch parallelism. It becomes more a network speed
| than GPU speed problem. Or are LLMs somehow different?
| ultrablack wrote:
| If you could, you should have done it 6 months ago.
| munk-a wrote:
| I mean - is there a developer alive that'd be unable to write
| the nascent version of Twitter? I think that Twitter as a
| business exists entirely because of the concept - the code to
| cover the core functionality is absolutely trivial to
| replicate.
|
| I don't think this is a very helpful statement because actually
| finding the idea on what to build is the hard part - or even
| just believing it's possible. The company I work at has been
| using NLP for years now and we have a model that's great at
| what we do... but if you asked if we could develop that into a
| chatbot as functional as chatgpt two years ago you'd probably
| be met with some pretty heavy skepticism.
|
| Cloning something that has been proven possible is always
| easier than taking the risk building the first version with no
| real grasp of feasibility.
| v4dok wrote:
| Can someone at the EU, the only player in this thing with no
| strategy yet just pool together enough resources so the open-
| source people can train models. We don't ask much, just give
| compute power
| 0xfaded wrote:
| No, that could risk public money benefitting a private party.
|
| Feel free to form a multinational consortium and submit a grant
| application to one of our distribution partners under the
| Horizon program though.
|
| Now, how do you plan to create jobs and reduce CO2?
| alecco wrote:
| Interesting blog but the extrapolations are way overblown. I
| tried one of the 30bn models and it's not even remotely close to
| GPT-3.
|
| Don't get me wrong, this is very interesting and I hope more is
| done in the open models. But let's not over-hype by 10x.
| lmeyerov wrote:
| It seems the quality goes up & cost goes down significantly with
| Colossal AI's recent push:
| https://medium.com/@yangyou_berkeley/colossalchat-an-open-so...
|
| Their writeup makes it sounds like, net, 2X+ over Alpaca, and
| that's an early run
|
| The browser side is interesting too. Browser JS VMs have a memory
| cap of 1GB, so that may ultimately be the bottleneck here...
| SebJansen wrote:
| does the 1gb limit extend to wasm?
| jesse__ wrote:
| WASM is specified to have 32-bit pointers, which is 4GB.
| AFAIK browser implementations respect that (when I did some
| nominal testing a couple years ago)
| lmeyerov wrote:
| Interesting, since I looked last year, Chrome has started
| raising the caps internally on buffer allocation to potentially
| 16GB:
| https://chromium.googlesource.com/chromium/src/+/2bf3e35d7a4...
|
| Last time I tried on a few engines, it was just 1-2GB for typed
| arrays, which are essentially the backing structure for this
| kind of work. Be interesting to try again..
|
| For our product, we actually want to dump 10GB+ on to the WebGL
| side, which may or may not get mirrored on the CPU side. Not
| sure if additional limits there on the software side. And after
| that, consumer devices often have another 10GB+ CPU RAM free,
| which we'd also like to use for our more limited non-GPU stuff
| :)
| jesse__ wrote:
| I thought the memory limit (in V8 at least) was 2GB due to the
| GC not wanting to pass 64 bit pointers around, and using the
| high bit of a 32-bit offset for .. something I now forget ..?
|
| Do you have a source showing a JS runtime with a 1GB limit?
| jesse__ wrote:
| UPDATE: After a nominal amount of googling around it appears
| valid sizes have increased on 64-bit systems to a maximum of
| 8GB, and stayed at 2GB on 32-bit systems, for FF at least. I
| guess it's probably 'implementation defined'
|
| https://developer.mozilla.org/en-
| US/docs/Web/JavaScript/Refe...
|
| https://developer.mozilla.org/en-
| US/docs/Web/JavaScript/Refe...
| JasonZ2 wrote:
| Does anyone know how the results from a 7B parameter model with
| bloomz.cpp (https://github.com/NouamaneTazi/bloomz.cpp) compares
| to the 7B parameter Alpaca model with llama.cpp
| (https://github.com/ggerganov/llama.cpp)?
|
| I have the latter working on a M1 Macbook Air with very good
| results for what it is. Curious if bloomz.cpp is significantly
| better or just about the same.
| rspoerri wrote:
| So cool it runs on a browser /sarcasm/ i might not even need a
| computer. Or internet when we are at it.
|
| It either runs locally or it runs on the cloud. Data could come
| from both locations as well. So it's mostly technically
| irrelevant if it's displaying in a browser or not.
|
| Except when it comes to usability. I don't get it why people love
| software running in a browser. I often close important tools i
| have not saved when it's in a browser. I cant have offline tools
| which work if i am in a tunnel (living in Switzerland this is an
| issue) . Or it's incompatible because i am running LibreWolf.
|
| /sorry to be nitpicking on this topic ;-)
| ftxbro wrote:
| > I don't get it why people love software running in a browser.
|
| If you read the article, part of the argument was for the
| sandboxing that the browser provides.
|
| "Obviously if you're going to give a language model the ability
| to execute API calls and evaluate code you need to do it in a
| safe environment! Like for example... a web browser, which runs
| code from untrusted sources as a matter of habit and has the
| most thoroughly tested sandbox mechanism of any piece of
| software we've ever created."
| rspoerri wrote:
| OSX does app sandboxing as well (not everywhere). But yeah,
| you're right i only skimmed the content and missed that part.
| rspoerri wrote:
| Thinking about it...
|
| I don't know exactly about the browser sandboxing. But isn't
| it's purpose to prevent access to the local system, while it
| mostly leaves access to the internet open?
|
| Is that really a good way to limit and AI system's API
| access?
| simonw wrote:
| The same-origin policy in browsers defaults to preventing
| JavaScript from making API calls out to any domain other
| than the one that hosts the page - unless those other
| domains have the right CORS headers.
|
| https://developer.mozilla.org/en-
| US/docs/Web/Security/Same-o...
| sp332 wrote:
| Broswer software is great because I don't have to build
| separate versions for Windows, Mac, and Linux, or deal with app
| stores, or figure out how to update old versions.
| pmoriarty wrote:
| There are a bunch of reasons people/companies like web apps:
|
| 1 - Everyone already has a web browser, so there's no software
| to download (or the software is automatically downloaded,
| installed and run, if you want to look at it that way... either
| way, the experience is a lot easier and more seamless for the
| user)
|
| 2 - The website owner has control of the software, so they can
| update it and manage user access as they like, and it's easier
| to track users and usage that way
|
| 3 - There are a ton of web developers out there, so it's easier
| to find people to work on your app
|
| 4 - You ostensibly don't need to rewrite your app for every OS,
| but may need to modify it for every supported browser
| rspoerri wrote:
| Most of these aspects make it better for the company or
| developer, only in some cases it makes it easier for the user
| in my opinion. Some arguments against it are:
|
| 1 - Not everyone has or wants fast access to the internet all
| the time.
|
| 2 - I try to prevent access of most of the apps to the
| internet. I don't want companies to access my data or even
| metadata of my usage.
|
| 3 - sure, but it doesn't make it better for the user.
|
| 4 - Also supporting different screen sizes and interaction
| types (touch or mouse) can be a big part of the work.
|
| The most important part for a user is if he/she is only using
| the app rarely or once. Not having to install it will make
| the difference between using it or not. However with the app
| stores most OS's feature today this can change pretty soon
| and be equally simple.
|
| I might be old school on this, but i resent subscription
| based apps. For applications that do not need to change,
| deliver no additional service or aren't absolutely vital for
| me i will never subscribe. And browser based app's are at the
| core of this unfortunate development. But that's gone very
| far from the original topic :-)
| nanidin wrote:
| Browser is the true edge compute.
| fzliu wrote:
| I was a bit skeptical about loading a _4GB_ model at first. Then
| I double-checked: Firefox is using about 5GB of memory for me. My
| current open tabs are mail, calendar, a couple Google Docs, two
| Arxiv papers, two blog posts, two Youtube videos, milvus.io
| documentation, and chat.openai.com.
|
| A lot of applications and developers these days take memory
| management for granted, so embedding a 4GB model to significantly
| enhance coding and writing capabilities doesn't seem too far-
| fetched.
| munk-a wrote:
| A wonderful thing about software development is that there is so
| much reserved space for creativity that we have huge gaps between
| costs and value. Whether the average person could do this for 85k
| I'm uncertain of - but there is a very significant slice of
| people that could do it for well under 85k now that the ground
| work has been done. This leads to the hilarious paradox where a
| software based business worth millions could be built on top of
| code valued around 60k to write.
| nico wrote:
| > This leads to the hilarious paradox where a software based
| business worth millions could be built on top of code valued
| around 60k to write.
|
| Or the fact that software based businesses just took a massive
| hit in value overnight and cannot possibly defend such high
| valuations anymore.
|
| The value of companies is quickly going to shift from tech
| moats to brands.
|
| Think CocaCola - anyone can create a drink that tastes as good
| or better than coke, but it's incredibly hard to compete with
| the CocaCola brand.
|
| Now think what would have happened if CocaCola had been super
| expensive to make, and all of a sudden, in a matter of weeks,
| it became incredibly cheap.
|
| This is what happened to the saltpeter industry in 1909 when
| synthetic saltpeter was invented. The whole industry was
| extinct in a few years.
| prerok wrote:
| Nit: not to write but to run. The cost of development is not
| considered in these calculations.
| ftxbro wrote:
| His estimate is that you could train a LLaMA-7B scale model for
| around $82,432 and then fine-tune it for a total of less than
| $85K. But when I saw the fine tuned LLaMA-like models they were
| worse in my opinion even than GPT-3. They were like GPT-2.5 or
| like that. Not nearly as good as ChatGPT 3.5 and certainly not
| ChatGPT-beating. Of course, far enough in the future you could
| certainly run one in the browser for $85K or much less, like even
| $1 if you go far enough into the future.
| icelancer wrote:
| Yeah, the constant barrage of "THIS IS AS GOOD AS CHATGPT AND
| IS PRIVATE" screeds from LLaMA-based marketing projects are
| getting ridiculous. They're not even remotely close to the same
| quality. And why would they be?
|
| I want the best LLMs to be open source too, but I'm not
| delusional enough to make insane claims like the hundreds of
| GitHub forks out there.
| robertlagrant wrote:
| > I want the best LLMs to be open source too
|
| How do you do this without being incredibly wealthy?
| nickthegreek wrote:
| crowd source to pay for the gpu rentals.
| mejutoco wrote:
| Pooling resources a la SETI@home would be an interesting
| option I would love to see.
| simonw wrote:
| My understanding is that can work for model inference but
| not for model training.
|
| https://github.com/bigscience-workshop/petals is a
| project that does this kind of thing for running
| inference - I tried it out in Google Collab and it seemed
| to work pretty well.
|
| Model training is much harder though, because it requires
| a HUGE amount of high bandwidth data exchange between the
| machines doing the training - way more than is feasible
| to send over anything other than a local network
| connection.
| crdrost wrote:
| You (1) are a company who (2) understands the business
| domain and has an appropriate business plan.
|
| Sadly the reality of funding today makes it unlikely that
| these two will both be simultaneously satisfied. The
| problem is that history will look back on the necessary
| business plan and deem it a failure even if it generates a
| company that does a billion dollars plus in annual revenue.
|
| This is actually not unique to large language models but
| most innovation around computers. The basic problem is that
| if you build a force-multiplier (spreadsheets, personal
| computing, large-language models all come to mind) then
| what will make it succeed is its versatility: people want a
| hammer that can be used for smashing all manner of things,
| not just your company's particular brand of matching nails.
| And most people will only pick up that hammer once per week
| or once per month, only like 1% of the economy if that will
| be totally revolutionized, "we use this force-multiplier
| every day, it is now indispensable, we can't imagine life
| without it," and it's never predictable what that sector
| will be -- it's going to be like "oh, who ever dreamed that
| the killer application for LLMs would be them replacing
| AutoCAD at mechanical contractors" or some shit.
|
| In those strange eons, to wildly succeed, one must give up
| on anticipating all usages of the software, one must cease
| controlling it and set it free. "Well where's the profit in
| that?" -- it is that this company was one of the first
| players in the overall market, they got an early chance to
| stake out as much territory as possible. But the market
| exploded way larger than they could handle and then
| everybody looks back on them and says "wow, what a failure,
| they only captured 1% of that market, they could have been
| so much more successful." Yeah, they captured 1% of a $100B
| market, some failure, right?
|
| But what actually happens is that companies see the
| potential, investors get dollar signs in their eyes,
| everyone starts to lock down and control these, "you may
| use large language models but only in the ways that we say,
| through the interfaces which we provide," and then the only
| thing that you can use it for is to get generic
| conversational advice about your hemorrhoids, so after 5-10
| years the bubble of excitement fizzles out. Nobody ever
| dreams to apply it to AutoCAD or whatever, and the world
| remains unchanged.
| javajosh wrote:
| History is littered with great software that died because
| no-one used it because the business model was terrible.
| Capturing $1B of value is better than 0, and everyone
| understands this. And who cares what history thinks
| anyway?
|
| OpenAI has spent a lot of money to get their result. It's
| safe to assume it will take a lot of money to get a
| similar result, and then to share it (although I assume
| bit torrent will be good enough). Once people are running
| their models, they can innovate to their hearts content.
| It's not clear how or why they'd give money back to the
| enabling technology. So how does money flow back to the
| innovators in proportion to the value produced, if not a
| SaaS?
| ftxbro wrote:
| what stage of capitalism is this
| robertlagrant wrote:
| If those are all that's required, why don't you start a
| company with a business plan written so it satisfies your
| criteria? Then you can lead the way with OSS LLMs.
| ftxbro wrote:
| Yes a rugged individual would have to be incredibly wealthy
| to do it!
|
| But maybe the governments will make one and maintain it
| with taxes as an infrastructure service, like roads, giving
| everyone expanded powers of cognition, memory, and
| expertise, and raising the consciousnesses of humanity to
| new heights. Probably in USA it wouldn't happen if we judge
| ourselves only in zero sum relation to others - helping
| everyone would be a wash and only waste our money!
| szundi wrote:
| Some governments probably alread do and use it against
| so-called terrorists or enemies of the people...
| simonw wrote:
| Yeah, you're right. I wrote this a couple of weeks ago at the
| height of LLaMA hype, but with further experience I don't think
| the GPT-3 comparisons hold weight.
|
| My biggest problem: I haven't managed to get a great
| summarization out of a LLaMA derivative that runs on my laptop
| yet. Maybe I haven't tried the right model or the right prompt
| yet though, but that feels essential to me for a bunch of
| different applications.
|
| I still think a LLaMA/Alpaca fine-tuned for the ReAct pattern
| that can execute additional tools would be a VERY interesting
| thing to explore.
|
| [ ReAct: https://til.simonwillison.net/llms/python-react-
| pattern ]
| avereveard wrote:
| my biggest problem with these models is that they cannot
| reliably produce structured data.
|
| even davinci can be used as part of a chain, because you can
| direct it to structure and unstructure data, and then extract
| the single component and build them into tasks. cohere, llama
| et al are currently struggling to consistently produce these
| result reliably, even if you can chat with them and frankly
| it's not about the chat
|
| example from a stack overflow that split the questions before
| sending it down chain for answering all points individually:
|
| This is a customer question:
|
| I'm a beginner RoR programmer who's planning to deploy my app
| using Heroku. Word from my other advisor friends says that
| Heroku is really easy, good to use. The only problem is that
| I still have no idea what Heroku does...
|
| I've looked at their website and in a nutshell, what Heroku
| does is help with scaling but... why does that even matter?
| How does Heroku help with: Speed - My
| research implied that deploying AWS on the US East Coast
| would be the fastest if I am targeting a US/Asia-based
| audience. Security - How secure are they?
| Scaling - How does it actually work? Cost
| efficiency - There's something like a dyno that makes it easy
| to scale. How do they fare against their
| competitors? For example, Engine Yard and bluebox?
|
| Please use layman English terms to explain... I'm a beginner
| programmer.
|
| Extract the scenario from the question including a summary of
| every detail, list every question, in JSON:
|
| { "scenario": "A beginner RoR programmer is planning to
| deploy their app using Heroku and is seeking advice about
| deploying it.", "questions": [ "What does Heroku do?", "How
| does deploying AWS on the US East Coast help with speed?",
| "How secure is Heroku?", "How does scaling with Heroku
| work?", "What is a dyno and why is it cost efficient?", "How
| does Heroku compare to its competitors, such as Engine Yard
| and Bluebox?" ] }
| newhouseb wrote:
| Last weekend I built some tooling that you can integrate
| with huggingface transformers to force a given model to
| _only_ output content that validates against a JSON schema
| [1].
|
| The challenge is that for it to work cost effectively you
| need to be able to append what is basically a final network
| layer to the model that is algorithmically designed and
| until OpenAI exposes the full logits and/or some way to
| modify them on the fly you're going to be stuck with open
| source models. I've run things against GPT-2 mostly but
| it's only list to try LLaMA.
|
| [1] "Structural Alignment: Modifying Transformers (like
| GPT) to Follow a JSON Schema" @
| https://github.com/newhouseb/clownfish
| simonw wrote:
| This feels solvable to me. I wonder if you could use fine
| tuning against LLaMA to teach it to do this better?
|
| GPT-3 etc can only do this because they had a LOT of code
| included in their training sets.
|
| The LLaMA paper says Github was 4.5% of the training
| corpus, so maybe it does have that stuff baked in and just
| needs extra tuning or different prompts to tap into that
| knowledge.
| avereveard wrote:
| I have done it trough stages, so first stages emits in
| natural language in the format of "context: ... and
| question: ...." and then the second stage collect it as
| json, but then wait time doubles.
| Tepix wrote:
| Have you tried bigger models? Llama-65B can indeed compete
| with GPT-3 according to various benchmarks. The next thing
| would be to get the fine-tuning as good as OpenAI's.
| mewpmewp2 wrote:
| I wonder how accurate those benchmarks are in terms of
| actual problem solving capability. I think there's a major
| line at which point LLM becomes actually useful and it
| actually feels like you are speaking to something
| intelligent and that can be useful for you in terms of
| productivity etc.
| version_five wrote:
| If you have ~100k to spend, aren't there options to buy a gpu
| rather than just blow it all on cloud? How much is an 8xA100
| machine?
|
| 4xA100 is 75k, 8 is 140k https://shop.lambdalabs.com/deep-
| learning/servers/hyperplane...
| dekhn wrote:
| you're comparing the capital cost of acquiring a GPU machine
| with the operational cost of renting one in the cloud.
|
| Ignoring the operational costs of on-prem hardware is pretty
| common, but those costs are significant and can greatly change
| the calculation.
| digitallyfree wrote:
| For a single unit one could have it in their home or office,
| rather than a datacenter or colo. If the user sets up and
| manages the machine themselves there is no additional IT
| cost. The greatest operating expense would be the power cost.
| dekhn wrote:
| "If the user sets up and manages the machine themselves
| there is no additional IT cost" << how much do you value
| your time?
|
| In my experience, physical hardware has a management
| overhead over cloud resources. Backups, large disk storage
| for big models, etc.
| pessimizer wrote:
| Or from another perspective, comparing the cost of training
| one model in the cloud to the cost of training as many as you
| want on your machine, then (as mentioned by siblings) selling
| the machine for nearly as much as you paid for it, unless
| there's some shortage, in which case you'll get more back
| than you paid for it.
|
| One is buying capital that produces models, the other is
| buying a single model.
| sounds wrote:
| Remember to discount the tax depreciation for the hardware
| and deduct any potential future gains from either reselling
| it or using it.
| capableweb wrote:
| Heh, you work at AWS or Google Cloud perhaps? ;) (Only joking
| about this as I constantly see employees from AWS/GCloud and
| other cloud providers claim that cloud is always cheaper than
| hosting things yourself)
|
| Sure, if you're planning to service a large number of users,
| building your infrastructure in-house might be a bit
| overkill, as you'll need a infrastructure team to service it
| as well.
|
| If you're just want to buy 4 GPUs to put in one server to run
| some training yourself, I don't think it's that much
| overkill. Especially considering you can recover much of the
| cost even after a year by selling much of the equipment you
| bought. Most of your losses will be costs for electricity and
| internet connection.
| throwaway50601 wrote:
| Cloud gives you very good price for what they offer -
| excellent reliability, hyper-scalability. Most people don't
| need either and use it as a glorified VPS host.
| dekhn wrote:
| I used to work for Google Cloud (I built a predecessor to
| Preemptible VMs and also launched Google Cloud Genomics).
| But even before I worked at Google I was a big fan of AWS
| (EC2 and S3).
|
| Buying and selling hardware isn't free; it comes with its
| own cost. I would not want to be in the position of selling
| a $100K box of computer equipment- ever.
| capableweb wrote:
| :)
|
| True, but some things are harder to sell than others.
| A100's in today's market would be easy to sell. Harder to
| buy, because the supply is so low unless you're Google or
| another big name, but if you're trying to sell them, I'm
| sure you can get rid of them quickly.
| jcims wrote:
| No kidding. I worked for a company that had multiple billions
| of dollars invested in a data center refresh in North America
| and Europe.
| version_five wrote:
| For a server farm, sure, for one machine, I don't know.
| Assuming it plugs into a normal 15A circuit, and you have a
| we-work or something where you don't pay for power, is the
| operational cost of one machine really material?
| dekhn wrote:
| it's hard to tell from what you're saying: you're planning
| on putting an ML infrastructure training server on a
| regular 15A circuit, not in a data center or machine room?
| And power is paid for by somebody else?
|
| My thinking about pricing doesn't include that option
| because I wouldn't just hook a server like that up to a
| regular outlet in an office and use it for production work.
| If that works for you- you can happily ignore my comments.
| But if you go ahead and build such a thing and operate it
| for a year, please let us know if there were any costs-
| either dollar or in suffering- associated with your
| decision
|
| [edit: adding in that the value of this machine also
| suggests it cannot live unattended in an insecure location,
| like an office]
|
| signed, person who used to build closet clusters at
| universities
| KeplerBoy wrote:
| Nvidia happily sells what you're describing. They call it
| "DGX Station A100", it has 4 80GB A100 and retails for
| 80k. Not sure i believe their claimed noise level of <37
| dB though.
|
| Of course that's still a very small system when talking
| LLM training, the only reason why i would not put that in
| a regular office is it's extreme price. Do you really
| want something worth 80k in a form factor that could be
| casually carried through the door?
| amluto wrote:
| If you live near an inexpensive datacenter, you can park
| it there. Throw in a storage machine or two (TrueNAS MINI
| R looks like a credible low-effort option). If your
| workload is to run a year long computation on it and
| otherwise mostly ignore it, then your operational costs
| will be quite low.
|
| Most people who rent cloud servers are not doing this
| type of workload.
| modernpink wrote:
| You can sell the A100 after once you're done as well. Possibly
| even at profit?
| girthbrooks wrote:
| These are wild pieces of hardware, thanks for linking. I wonder
| how loud they get.
| sacred_numbers wrote:
| If you bought an 8xA100 machine for $140k you would have to run
| it continuously for over 10,000 hours (about 14 months) to
| train the 7B model. By that time the value of the A100s you
| bought would have depreciated substantially; especially because
| cloud companies will be renting/selling A100s at a discount as
| they bring H100s online. It might still be worth it, but it's
| not a home run.
| inciampati wrote:
| If 8-bit training methods take off, I think the calculus is
| going to change rapidly, with newer cards that have decent
| amounts of memory and 8-bit acceleration starting to become
| dramatically more cost and time effective than the venerable
| A100s.
___________________________________________________________________
(page generated 2023-03-31 23:00 UTC) |