proxy70

	[HN Gopher] Open Assistant - project meant to give everyone acce... ___________________________________________________________________ Open Assistant - project meant to give everyone access to a great chat based LLM Author : pps Score : 595 points Date : 2023-02-04 14:56 UTC (8 hours ago)
	web link (github.com)
	w3m dump (github.com)
	\| damascus wrote: \| Is anyone working on an Ender's Game style "Jane" assistant that \| just listens via an earbud and responds? That seems totally \| within the realm of current tech but I haven't seen anything. \| theRealMe wrote: \| I've been thinking about this and I'd go a step further. I feel \| that current iterations of digital assistants are too passive. \| They respond when you directly ask them a specific question. \| This leaves it up to the user to: 1. Know that an assistant \| could possibly answer the question. 2. Know how to ask the \| question. 3. Realize that they should ask the question rather \| than reaching for google or something. \| \| I would like a digital assistant that not only has the question \| answering ability of a LLM, but also has the sense of awareness \| and impetuous to suggest helpful things without being asked. \| This would take a nanny state level of monitoring, but imagine \| the possibilities. If you had sensors feeding different types \| of data into the model about your surrounding environment and \| what specifically you're doing, and then occasionally have an \| automated process that silently asks the model something like \| "given all current inputs, what would you suggest I do?" And \| then if the result achieves a certain threshold of certainty, \| the digital assistant speaks up and suggests it to you. \| \| I'm sure tons of people are cringing at the thought of the \| surveillance needed for this and the trust you'd effectively \| have to put into BigCorp that owns the setup, but it's fun to \| think about nonetheless. \| monkeydust wrote: \| Bizarre, had same thought today. \| \| My thought conclusion was that the assistant needs to know or \| learn my intentions. \| \| From that it can actually pre-empt questions I might ask and \| already be making decisions on the answers. \| \| Now what would that do to our productivity! \| mab122 wrote: \| This but with models running on my infra that I own. \| \| Basically take this: \| https://www.meta.com/pl/en/glasses/products/ray-ban- \| stories/... And feed data from that to multiple models (for \| face recognition, other vision, audio STT, music recognition, \| probably a lot of other stuff has easily recognizable audio \| pattern etc.) \| \| combine with my personal data (like contacts, emails, chats, \| notes, photos I take) and feed to assistant to prepare a \| combined reply to my questions or summarize what it knows \| about my current environment. \| \| Also I would gladly take those glasses just to take note \| photos (photos with audio note) right now - shut up and take \| my money. Really if they were hackable or at least intercept- \| able on my phone I would take them. \| unshavedyak wrote: \| Oh man, if i could run this on my own network with no \| internet access i'd do it in a heartbeat. \| \| It would also make so many things easier for the AI too. Ie \| if it's listening to the conversation and you ask "Thoughts, \| AIAssistant?" and it can infer enough from the previous \| conversation to answer this type of question.. so cool. \| \| But yea i definitely want it closed network. A device sitting \| on my closet, A firewalled internet connection only allowing \| it to talk to my earbud, etc. Super paranoia. Since it's job \| is to monitor everything, all the time. \| concordDance wrote: \| Then the police come and confiscate the device for \| evidentiary reasons, finding you have committed some sort \| of crime (most people have). \| s3p wrote: \| Ah yes I forget that most people get raided by the police \| on regular occurrences so anything on-prem has to be out \| of the question (/s) \| barbazoo wrote: \| Surely there'd be ways to make sure the data isn't \| accessible. \| unshavedyak wrote: \| Well it's in your control and FOSS - ideally you're not \| keeping a full log of everything unless you want that. \| medstrom wrote: \| Without a full log of everything, it cannot give context- \| aware advice tailored to you (i.e. useful advice). It'd \| be like relying on the advice of a random person on the \| street instead of someone who knows you. \| gremlinsinc wrote: \| It could encrypt everything and have a kill switch to \| permanently erase the crypt key. \| digitallyfree wrote: \| Don't have the link on me but I remember reading a blog post \| where someone set up ChatGPT with a STT and TTS system to \| converse with the bot using a headset. \| xtracto wrote: \| The open source Talk to Chat GPT extension works remarkably \| well, and its source is on Github \| \| https://chrome.google.com/webstore/detail/talk-to- \| chatgpt/ho... \| Jeff_Brown wrote: \| +1! That scene in Her in the opening where the guy is walking \| down the hall and going through his email, "skip that, \| unsubscribe from them, tell so and so I can get that by \| tomorrow..." without having to look at a screen had been a \| dream for me ever since I saw it. \| LesZedCB wrote: \| Have you watched it recently? I haven't seen it since it came \| out, I think I'm gonna watch it again this afternoon and see \| how differently it hits \| Jeff_Brown wrote: \| No. I watched it twice in one day and haven't come back to \| it since. \| alsobrsp wrote: \| I want this. I'd be happy with an earbud but I really want an \| embedded AI that can see and hear what I do and can project \| things into my optic and auditory nerves. \| e-_pusher wrote: \| Rumor has it that Humane will release a Her style earbud soon. \| https://hu.ma.ne/ \| consumer451 wrote: \| I was very excited about Stable Diffusion, and I still am. A \| great yet relatively harmless contribution. \| \| LLMs however, not so much. The avenues of misuse are just too \| great. \| \| I started this whole thing somewhat railing against the un- \| openness of OpenAI. But once I began using ChatGPT, I realized \| that having centralized control of a tool like this in the hands \| of reasonable people is not the worst possible outcome for \| civilization. \| \| While I support FOSS in most realms, in some I do not. Reality \| has taught me to stop being rigidly religious about these things. \| Just because something is freely available does not magically \| make it "good." \| \| In the interest of curiosity and discussion, can someone give me \| some actual real-world examples of what a FOSS ChatGPT will \| enable that OpenAI's tool will not? And, please be specific, not \| just "no censorship." Please give examples of that censorship. \| sterlind wrote: \| _> In the interest of curiosity and discussion, can someone \| give me some actual real-world examples of what a FOSS ChatGPT \| will enable that OpenAI 's tool will not?_ \| \| Smut. I've been trying to use ChatGPT to write erotica, but \| OpenAI has made it downright puritanical. Any conversations \| involving kink trip its guardrails unless I bypass them. \| \| Writing fiction that involves bad guys - arsonists, serial \| killers, etc. You need to ask how to hide a body if you're \| writing a murder mystery. \| \| Those are just some examples from my recent work. \| consumer451 wrote: \| Thanks, that's a good example. On the balance though, would I \| be in favor of ML auto-smut if it meant that more people will \| fall to misinformation in the form of propaganda and \| financial scams? No, that does not seem like a reasonable \| trade off to me. \| \| But you may be interested in this jailbreak while it lasts. I \| have gotten it to write all kinds of fun things. You will \| have to rework the jailbreak in the first comment, but I bet \| it works. \| \| https://news.ycombinator.com/item?id=34642091 \| leaving wrote: \| It genuinely astonishes me that you think that "centralized \| contol" of anything can be beneficial to the human species or \| the world in general. \| \| Centralized control hasn't stopped us from killing off half the \| animal species in fifty years, wiping out most of the insects, \| or turning the oceans into a trash heap. \| \| In fact, centralized control is the author of our destruction. \| We are all dead people walking. \| \| Why not try "individualized intelligence" as an alternative? \| Give truly good-quality universal education and encouragement \| of individual curiosity and independent thought a try? \| \| It can't be worse. \| f6v wrote: \| > Centralized control hasn't stopped \| \| Because there wasn't any. \| consumer451 wrote: \| > It genuinely astonishes me that you think that "centralized \| contol" of anything can be beneficial to the human species or \| the world in general. \| \| I am genuinely astonished that in the face of obvious \| examples such as nuclear weapons, people cannot see the \| opposite in _some_ cases. \| \| > It can't be worse. \| \| It can always be worse. \| \| Would a theoretical FOSS small yield nuclear weapon make the \| world a better place? \| \| How about a FOSS powered sub-$10k hardware budget CRISPR \| virus lab? Well, it's FOSS, so it must be good? \| mandmandam wrote: \| > I am genuinely astonished that in the face of obvious \| examples such as nuclear weapons, people cannot see the \| opposite in some cases. \| \| You seem to be making some large logical leaps, and jumping \| to invalid conclusions. \| \| Try to imagine a way of exerting regulation over virus \| research and weaponry that wouldn't be "centralized \| control". If you can't, that's a failure of imagination, \| not of decentralization. \| consumer451 wrote: \| > Try to imagine a way of exerting regulation over virus \| research and weaponry that wouldn't be "centralized \| control". \| \| Since apparently my own imagination is too limited, could \| you please give me some examples of how this would be \| accomplished? \| mandmandam wrote: \| Trustless and decentralized systems are a hot topic. Have \| you read much in the field, to be so certain that \| centralization is the only way forward? \| \| There are options you haven't considered, whether you can \| imagine them or not. \| consumer451 wrote: \| > Trustless and decentralized systems are a hot topic. \| \| Yeah, and how's that working out exactly? Is there any \| decentralized governance project which also has anything \| to do with law irl? I know what a DAO is, and it sounds \| pretty neat, in theory. There are all kinds of \| theoretical pie in the sky ideas which sound great and \| have yet to impact anything in reality. \| \| Before we give the keys to nukes and bioweapons over to a \| "decentralized authority," maybe we should see some \| examples of it working outside of the coin-go-up world? \| Heck, how about some examples of it working even in the \| coin-go-up world? \| \| Even pro-decentralized crypto folks see the downsides of \| DAOs, such as slower decision making. \| yazzku wrote: \| Microsoft is not "reasonable people". Having this behind \| closed corporate walls is the worst possible outcome. \| \| The nuclear example isn't really a counter-argument. If \| only one nation had access to them, every other nation \| would automatically be subjugated to them. If the nuclear \| balance works, it's because multiple super powers have \| access to those weapons and international treaties regulate \| their use (as much as North Korea likes to demo practice \| rounds on state TV.) Also the technology isn't secret; it's \| access to resources and again, international treaties, that \| prevent its proliferation. \| \| Same thing with CRISPR. Again, there are scientific \| standards that regulate its use. It being open or not \| doesn't really matter to its proliferation. \| \| I agree there are cases where being open is not necessarily \| the best strategy. I don't think your examples are \| particularly good, though. \| consumer451 wrote: \| I think we may have very different definitions of the \| word reasonable. \| \| I mean it in the classic sense.[0] \| \| Do I love corporate hegemony? Heck no. \| \| Could there be less reasonable stewards of extremely \| powerful tools? Heck yes. \| \| An example might be a group of people who are so blinded \| by ideology that they would work to create tools which \| 100x the work of grifters and propagandists, and then \| say... hey, not my problem, I was just following my pure \| ideology bro. \| \| A basic example of being reasonable might be revoking \| access to someone running a paypal scam syndicate which \| sends countless custom tailored and unique emails to \| paypal users. How would Open Assistant deal with this \| issue? \| \| [0] 1. having sound judgement; fair and \| sensible. based on good sense. 2. as \| much as is appropriate or fair; moderate. \| yazzku wrote: \| > and then say... hey, not my problem, I was just \| following my pure ideology bro. \| \| That's basically the definition of Google and Facebook, \| which go about their business taking no responsibility \| for the damage they cause. As for Microsoft, 'fair' and \| 'moderate' are not exactly their brand either considering \| their history of failed and successful attempts to \| brutally squash competition. If you're saying that they'd \| be fair in censoring the "right" content, then you're \| just saying you share their bias. \| \| > A basic example of being reasonable might be revoking \| access to someone running a paypal scam syndicate which \| sends countless custom tailored and unique emails to \| paypal users. How would Open Assistant deal with this \| issue? \| \| I'm not exactly sure how Open Assistant would deal, or if \| it even needs to deal, with this. You'd send the cops and \| send those motherfuckers back to the hellhole that \| spawned them. Scams are illegal regardless of what tools \| you use to go about it. If it's not Open Assistant, the \| scammers will find something else. \| \| Your argument is basically that we should ban/moderate \| the proliferation of tools and technology. I'm not sure \| that's very effective when it comes to software. I think \| the better strategy is to develop the open alternative \| fast before society is subjugated to the corporate \| version, even if it does give the scammers a slight edge \| in the short term. If you wait for the law to catch up \| and regulate these companies, it's going to take another \| 20 years like the GDPR. \| consumer451 wrote: \| > Your argument is basically that we should ban/moderate \| the proliferation of tools and technology. I'm not sure \| that's very effective when it comes to software. \| \| No, my argument is that we as individuals shouldn't be in \| a rush to create free and open tools which _will_ be used \| for evil, in addition to their beneficial use cases. \| \| FOSS often takes a lot of individual contributions. \| People should be really thoughtful about these things now \| that the implications of their contributions will have \| much more direct and dire effects on our civilization. \| This is not PDFjs or Audacity that we are talking about. \| The stakes are much higher now. Are people really \| thinking this through? \| \| If anything, it would great if we as individuals acted \| responsibility to avoid major shit shows and the \| aftermath of gov regulation. \| yazzku wrote: \| Ok, yeah, maybe I'll take my latter statement back. \| Ideally things are developed at the pace you describe and \| under the scrutiny of society. There are people thinking \| this through -- EDRI and a bunch of other organizations \| -- just probably not corporations like Microsoft. In \| practice, though, we are likely to see corporations roll \| out chat-based incantations of search engines and \| assistants, followed by an ethical shit show, followed by \| mild regulation 20 years later. \| sterlind wrote: \| Nuclear weapons are just evil. It'd be better if they \| didn't exist rather than if they were centralized. We've \| gotten so close to WWIII. \| \| As for the CRISPR virus lab, at least the technology being \| open implies that vaccine development would be democratized \| as well. Not ideal but.. yeah. \| visarga wrote: \| > Just because something is freely available does not magically \| make it "good." \| \| Just because you don't like it doesn't mean an open source \| chatGPT will not appear. It doesn't need everyone's permission \| to exist. Once we accumulated internet-scale datasets and \| gigantic supercomputers, immediately GPT-3's started to pop up. \| It was inevitable. It's an evolutionary process and we won't be \| able to control it at will. \| \| Probably the same process happens in every human who gains \| language faculty and a bit of experience. It's how language \| "inhabits" humans, carrying with it the work of previous \| generations. Now language can inhabit AIs as well, and the \| result is shocking. It's like our own mind staring back at us. \| \| But it is just natural evolution for language. It found an even \| more efficient replication device. Now it can contain and \| replicate the whole culture at once, instead of one human life \| at a time. By "language" I mean language itself, concepts, \| methods, science, art, culture and technology, and everything I \| forgot - the whole "corpus" of human experience recorded in \| text and media. \| consumer451 wrote: \| > It doesn't need everyone's permission to exist. \| \| Nope it does not. It does need a lot of people's help though \| and there may be enough out there to do the job in this case. \| \| Even though I knew this would be a highly unpopular opinion \| in this thread, I still posted it. Freedom of speech, right? \| \| The reason I posted it was to maybe give some pause to some \| people, so that they have a moment to consider the \| implications. I realize this is likely futile but this is a \| hill I am willing to die on. That hill being FOSS is not an \| escape from responsibility and consequences. \| \| I bet this leads to major regulation, which will suck. \| pixl97 wrote: \| First. this is a moderated forum, you have no freedom of \| speech here, and neither do I. \| \| Next, regulation solves nothing here, and my guess will \| make the problems far worse. Why? Lets take nuclear \| weapons. They are insanely powerful, but they are highly \| regulated because there are a few choke points mostly in \| uranium refinement that make monitoring pretty easy at a \| global scale. The problem with regulating things like GPT \| is computation looks like computation. It's not sending \| high energy particles out into space where they can be \| monitored. Every government on the planet can easily and \| cheaply (compared to nukes) generate their own GPT models \| and propaganda weapons and the same goes for multinational \| corporations. Many countries in the EU may agree to \| regulate these things, but your dominant countries vying \| for superpower status aren't going to let their competitors \| one up each other by shutting down research into different \| forms of AI. \| \| I don't think of this as a hill we are going to die on, but \| instead a hill we may be killed on by our own creations. \| xrd wrote: \| It sounds like you can train this assistant on your own corpus of \| data. Am I right? What are the hardware and time requirements for \| that? The readme sounds a bit futuristic, has anyone actually \| used this, or is this just the vision of what's to come? \| chriskanan wrote: \| The current effort is to get the data required to train a \| system and they have created all the needed tools to get that \| data. Then, based on my understanding, they intend to release \| the dataset and to release pre-trained models that could run on \| commodity hardware, similar to what was done with Stable \| Diffusion. \| simonw wrote: \| Somewhat unintuitively, it looks like training a language model \| on your own data usually doesn't do what people think it will \| do. \| \| The usual desire is to be able to ask questions of your own \| data - and it would seem obvious that the way to do that would \| be to fine tune train an existing model with that extra \| information. \| \| There's actually an easier (and potentially more effective?) \| way of achieving this: first run a search against your own data \| to find relevant information, then glue that together into a \| prompt along with the user's question and feed that to an \| existing language model. \| \| I wrote about one way of building that here: \| https://simonwillison.net/2023/Jan/13/semantic-search-answer... \| \| Open Assistant will hopefully result in a language model we can \| run on our own hardware (though it maybe a few years before \| it's feasible to do that affordable - language models are much \| heavier than image models like Stable Diffusion). So it can \| form part of this model, even without training the model on our \| own custom data. \| pxoe wrote: \| that same laion that scraped the web for images, ignored their \| licenses and copyrights, and thought that'd do just fine? the one \| that chose to not implement systems that would detect licenses, \| and to not have license fields in their datasets? the one that \| knowingly points to copyrighted works in their datasets, yet also \| pretends like they're not doing anything at all? that same group? \| \| really trustworthy. \| seydor wrote: \| The alternative are? the company that scrapes the web for a \| living or the one that scrapes github for a living? \| pxoe wrote: \| you're forgetting one important alternative: to just not use \| and/or not do something. nobody asked them to scrape \| anything. nobody asked them to scrape copyrighted works. they \| could've just not done the shady thing, but they made that \| choice to do it, all by themselves. and one can just avoid \| using something with questionable data ethics and practices. \| \| they clearly show in their actions that they think they can \| do anything with any data that's out there, and put it all \| out. why would anyone entrust them or their systems with own \| data to 'assist' with, I don't really get. \| \| and even though it's an 'open source' project, that part may \| be just soliciting people to do work for them, to help them \| enable their own data collection. it's gonna run somewhere, \| after all. in the cloud, with monetized compute, just like \| any other AI project out there. \| seydor wrote: \| Would be interestingly to extend this criticism to the \| entire tech ecosystem which has been built on unsolicited \| scraping, which extends to many of the companies that are \| funding the company that hosts this very forum. we 'd get \| to a complete halt \| \| Considering the benefit of a model that can be downloaded, \| and hopefully ran on-premise one day, i don't care too much \| about their copyright practices being imperfect, especially \| in this industry \| pixl97 wrote: \| I personally see your view on this as a complete and total \| failure on humans and society/culture actually work. \| \| Your mind exists in a state where it is constantly \| 'scraping' copyrighted work. Now, in general limitations of \| the human mind keep you from accurately reproducing that \| work, but if I were able to look at your output as an \| omniscient being it is likely I could slam you with \| violation after violation where you took stylization ideas \| off of copyrighted work. \| \| RMS covers this rather well in 'The right to read'. Pretty \| much any model that puts hard ownership rules on ideas and \| styles leads to total ownership by a few large monied \| entities. It's much easier for Google to pay some artist \| for their data that goes into an AI model. Because the \| 'google ai' model is now more culturally complete than \| other models that cannot see this data Google entrenches a \| stronger monopoly in the market, hence generating more \| money in which to outright buy ideas to further monopolize \| the market. \| riskpreneurship wrote: \| You can only keep a genie bottled up for so long, and if \| you don't rub the lamp, your adversaries will. \| \| With something as potentially destabilizing as AGI, \| realpolitik will convince individual nations to put aside \| concerns like IP and copyright out of FOMO. \| \| The same thing happened with nuclear bombs: it's much \| easier to be South Africa choosing to dispose of them if \| you end up not needing them, than to be North Korea or Iran \| trying to join the join the club late. \| \| The real problem is that the gains from any successes will \| be hoarded by the people who acquired them by breaking the \| law. \| losvedir wrote: \| Yes, and it's up to each of us to decide how we feel about \| that. I personally don't think I have a problem with it, but \| then I've always been somewhat opposed to software patents and \| other IP protections. \| \| I mean, the whole _reason_ we have those laws is the belief \| that it encourages innovation. I can believe it does to some \| extent, but on the other hand, all these AI models are pretty \| innovative, too, so the opportunity cost of not allowing it is \| pretty high. \| \| I don't think it's a given that slurping up IP like this is \| ethically or pragmatically wrong. \| chriskanan wrote: \| I'm really excited about this project and I think it could be \| really disruptive. It is organized by LAION, the same folks who \| curated the dataset used to train Stable Diffusion. \| \| My understanding of the plan is to fine-tune an existing large \| language model, trained with self-supervised learning on a very \| large corpus of data, using reinforcement learning from human \| feedback, which is the same method used in ChatGPT. Once the \| dataset they are creating is available, though, perhaps better \| methods can be rapidly developed as it will democratize the \| ability to do basic research in this space. I'm curious regarding \| how much more limited the systems they are planning to build will \| be compared to ChatGPT, since they are planning to make models \| with far less parameters to deploy them on much more modest \| hardware than ChatGPT. \| \| As an AI researcher in academia, it is frustrating to be blocked \| from doing a lot of research in this space due to computational \| constraints and a lack of the required data. I'm teaching a class \| this semester on self-supervised and generative AI methods, and \| it will be fun to let students play around with this in the \| future. \| \| Here is a video about the Open Assistant effort: \| https://www.youtube.com/watch?v=64Izfm24FKA \| naasking wrote: \| > it is frustrating to be blocked from doing a lot of research \| in this space due to computational \| \| Do we need a SETI@home-like project to distribute the training \| computation across many volunteers so we can all benefit from \| the trained model? \| ikekkdcjkfke wrote: \| Yeah man, and youvget access to the model as payment for \| donati g cycles \| realce wrote: \| Hyperion \| andai wrote: \| I read about something a few weeks ago which does just this! \| Does anyone know what it's called? \| lucidrains wrote: \| you are probably thinking of \| https://arxiv.org/abs/2207.03481 \| \| for inference, there is https://github.com/bigscience- \| workshop/petals \| \| however, both are only in the research phase. start \| tinkering! \| VadimPR wrote: \| That already exists - https://github.com/bigscience- \| workshop/petals \| ec109685 wrote: \| Another idea is to dedicate cpu cycles to something else that \| is easier to distribute, and then use the proceeds for \| massive amounts of gpu for academic use. \| \| Crypto is an example. \| slim wrote: \| this would be very wasteful \| ec109685 wrote: \| So is trying to distribute training across nodes compared \| to what can be done inside a data center. \| jxf wrote: \| This creates indirection costs and counterparty risks that \| don't appear in the original solution. \| ec109685 wrote: \| There is also indirection cost by taking something that \| is optimized to run on GPU's within the data center and \| distributing that to individual PCs. \| 8f2ab37a-ed6c wrote: \| That's brilliant, I would love to spare compute cycles and \| network on my devices for this if there's an open source LLM \| on the other side that I can use in my own projects, or \| commercially. \| \| Doesn't feel like there's much competition for ChatGPT at \| this point otherwise, which can't be good. \| davely wrote: \| On the generative image side of the equation, you can do \| the same thing with Stable Diffusion[1], thanks to a handy \| open source distributed computing project called Stable \| Horde[2]. \| \| LAION has started using Stable Horde for aesthetics \| training to back feed into and improve their datasets for \| future models[3]. \| \| I think one can foresee the same thing eventually happening \| with LLMs. \| \| Full disclosure: I made ArtBot, which is referenced in both \| the PC World article and the LAION blog post. \| \| [1] https://www.pcworld.com/article/1431633/meet-stable- \| horde-th... \| \| [2] https://stablehorde.net/ \| \| [3] https://laion.ai/blog/laion-stable-horde/ \| zone411 wrote: \| Long story short, training requires intensive device-to- \| device communication. Distributed training is possible in \| theory but so inefficient that it's not worth it. Here is a \| new paper that looks to be the most promising approach yet: \| https://arxiv.org/abs/2301.11913 \| sillysaurusx wrote: \| It doesn't, actually. The model weights can be periodically \| averaged with each other. No need for synchronous gradient \| broadcasts. \| \| Why people aren't doing this has always been a mystery to \| me. It works. \| nylonstrung wrote: \| Would have to be federated learning to work I think \| SillyUsername wrote: \| Unfortunately that guy is too distracting for me to watch - \| he's like a bad 90s Terminator knock off and always in your \| face waving hands :( \| coolspot wrote: \| While Yannic is also German, he is actually much better than \| 90s Terminator: \| \| * he doesn't want to steal your motorcycle \| \| * he doesn't care for your leather jacket either \| \| * he is not trying to kill yo mama \| lucidrains wrote: \| Yannic and the community he has built is such an educational \| force of good. His youtube videos explaining papers have helped \| me and so many others as well. Thank you Yannic for all that \| you do! \| wcoenen wrote: \| > _force of good_ \| \| I think he cares more about freedom than "good". Many people \| were not happy about his "GPT-4chan" project. \| \| (I'm not judging.) \| zarzavat wrote: \| I don't think those people legitimately cared about the \| welfare of 4chan users who were experimented on. They just \| perceived the project to be bad optics that might threaten \| the AI gravy train. \| modinfo wrote: \| [flagged] \| RobotToaster wrote: \| > It is organized by LAION, the same folks who curated the \| dataset used to train Stable Diffusion. \| \| I'm guessing, like stable diffusion, it won't be under an open \| source licence then? (The stable diffusion licence \| discriminates against fields on endeavour) \| ShamelessC wrote: \| You are confusing LAION with Stability.ai. They share some \| researchers but the former is a completely transparent and \| open effort which you are free to join and criticize this \| very moment. The latter is a VC backed effort which does \| indeed have some of the issues you mention. \| \| Good guess though... \| jszymborski wrote: \| The LICENSE file in the linked repo says it's under the \| Apache license. \| yazzku wrote: \| Does this mean that contributions of data, labelling, etc. \| remain open? \| \| I'm hesitant to spend a single second on these things \| unless they are truly open. \| grealy wrote: \| Yes. The intent is definitely to have the data be as open \| as possible. And Apache v2.0 is currently where it will \| stay. This project prefers the simplicity of Apache v2.0 \| and does not care for the RAIL licenses. \| [deleted] \| 88stacks wrote: \| This is wonderful, no doubt about it, but the bigger problem is \| for making this usable on commodity hardware. Stablediffusion \| only needs 4 GB of RAM to run inference, but all of these large \| language models are too large to run on commodity hardware. Bloom \| from huggingface is already out and no one is able to use it. If \| chatgpt was given to the open source community, we couldn't even \| run it... \| Tepix wrote: \| Some people will have the necessary hardware, others will be \| able to run it in the cloud. \| \| I'm curious how they will get these LLM to work with consumer \| hardware myself. Is FP8 is the way to get them small? \| zamalek wrote: \| And there's a 99% chance it will only work on NVIDIA hardware, \| so even fewer still. \| visarga wrote: \| > Bloom from huggingface is already out and no one is able to \| use it. \| \| This RLHF dataset that is being collected by Open Assistant is \| just the kind of data that will turn a rebel LLM into a helpful \| assistant. But it's still huge and expensive to use. \| karpierz wrote: \| I've been excited about the notion of this for a while, but it's \| unclear to me how this would succeed where numerous well- \| resourced companies have failed. \| \| Are there some advantages that Open Assistant has that \| Google/Amazon/Apple lack that would allow them to succeed? \| mattalex wrote: \| Instruction tuning mostly relies on the quality of the data you \| put into the model. This makes it different from traditional \| language model training: essentially you take one of these \| existing hugely expensive models (there are lots of them \| already out there), and tune them specifically on high quality \| data. \| \| This can be done on a comparatively small scale, since you \| don't need to train trillions of words, but only train on the \| smaller high quality data (even openai didn't have a lot of \| that). \| \| In fact, if you look at the original paper \| https://arxiv.org/pdf/2203.02155.pdf Figure 1, you can see that \| even small models already significantly beat the current SOTA. \| \| Open source projects often have trouble securing the HW \| ressources, but the "social" resources for producing a large \| dataset are much easier to manage in OSS projects. In fact, the \| data the OSS project collects might just be better since they \| don't have to rely on paying a handful minimum wage workers to \| produce thousands of examples. \| \| In fact one of the main objectives is to reduce the bias \| generated by openai's screening and selection process, which is \| doable since much more people work on generating the data. \| version_five wrote: \| Google is at the mercy of advertisers, all three are profit \| driven and risk averse. There is no reason they couldn't do the \| same as LAION, it just doesn't align with their organizational \| incentives \| unshavedyak wrote: \| re: running on your own hardware.. How? \| \| I know very little about ML, but i had assumed the reason models \| ran on GPUs typically(?) was because of the heavy compute needed \| over large sets of in memory data. \| \| Moving it to something cheaper ala general CPU and RAM/Drive \| would make it prohibitively slow in the standard methodology. \| \| How would we be able to change this to run on users standard \| hardware? Presuming standard hardware is cheaper, why isn't \| ChatGPT also running on this cheaper hardware? \| \| Are there significant downsides to using lesser hardware? Or is \| this some novel approach? \| \| Super curious! \| lairv wrote: \| The goal is not (yet?) to be able to run those models on most \| of consumers devices (mobile, old laptops etc.), but at least \| to self-host the model on high-end consumer GPU which is not \| possible right now. For now you need multiple specialized GPUs \| like nvidia V100/A100 with a high amount of VRAM, having such \| models to run on a single rtx40/rtx30 would already be an \| achievement \| txtai wrote: \| Great looking project here. Absolutely need a local/FOSS option. \| There's been a number of open-source libraries for LLMs lately \| that simply call into paid/closed models via APIs. Not exactly \| the spirit of open-source. \| \| There's already great local/FOSS options such as FLAN-T5 \| (https://huggingface.co/google/flan-t5-base). Would be great to \| see a local model like that trained specifically for chat. \| mdaniel wrote: \| I tried to find the source for https://github.com/LAION- \| AI/Open-Assistant/blob/v0.0.1-beta2... but based on the image \| inspector it seems to match up with \| https://github.com/huggingface/text-generation-inference/blo... \| O__________O wrote: \| TLDR: OpenAssistant is a chat-based assistant that understands \| tasks, can interact with third-party systems, and retrieve \| information dynamically to do so. \| \| ________ \| \| Related video by one of the contributors on how to help: \| \| - https://youtube.com/watch?v=64Izfm24FKA \| \| Source Code: \| \| - https://github.com/LAION-AI/Open-Assistant \| \| Roadmap: \| \| - https://docs.google.com/presentation/d/1n7IrAOVOqwdYgiYrXc8S... \| \| How you can help / contribute: \| \| - https://github.com/LAION-AI/Open-Assistant#how-can-you-help \| d0100 wrote: \| Can these ChatGPT like systems trace their answers back to the \| source material? \| \| To me this seems like the missing link to make Google search and \| the like dead \| jacooper wrote: \| Great, if i can use this to interactively search inside (OCR-) \| documents, files, emails and so on, would be huge, like asking \| when does my passport expire, or when were my grades in high \| school and so on. \| rcme wrote: \| What's preventing you from doing this now? \| jacooper wrote: \| I meant interactively search, like answering normal questions \| using data from these files, I edited the comment to make it \| clearer. \| lytefm wrote: \| I also think it would be amazing to have an open source model \| that can ingest my personal knowledge graph, calender and to do \| list. \| \| Such an AI assistant would know me extremely well, keep my data \| private and help me with generating and processing thoughts and \| ideas \| jacooper wrote: \| Yup, that's exactly what I want. \| siliconc0w wrote: \| Given how nerfed ChatGPT is (which is likely nothing compared to \| what large risk-adverse companies like Microsoft/Google will do), \| I'm heavily anticipating a Stable Diffusion-style model that is \| more free or at least configurable to have stronger opinions. \| russellbeattie wrote: \| Though it's interesting to see the capabilities of \| "conversational user interfaces" improve, the current \| implementations are too verbose and slow for many real world \| tasks, and more importantly, context still has to be provided \| manually. I believe the next big leap will be low-latency \| dedicated assistants which are focused on specific tasks, with \| normalized and predictable results from prompts. \| \| It may be interesting to see how a creative task like image or \| text generation changes when rewording your request slightly - \| after a minute wait - but if I'm giving directions to my \| autonomous vehicle, ambiguity and delay is completely \| unacceptable. \| mlboss wrote: \| This has a similar impact potential of Wikipedia. People from all \| around the world providing feedback/curating input data. Also, \| now I can just deploy it within my org and customize it. Awesome! \| [deleted] \| amrb wrote: \| Having open source models could be as important as the Linux \| project imo \| epistemer wrote: \| Totally agree. I was just thinking how I will eventually not \| use a search engine once chatGPT can link directly to what we \| are talking about with up to date examples. \| \| That is a situation that censoring the model is going to be a \| huge disadvantage and would create a huge opportunity for \| something like this to actually be straight up better. \| Censoring the models is what I would bet on as being a fatal \| first mover mistake in the long run and the Achilles heel of \| chatGPT. \| 6gvONxR4sf7o wrote: \| Open source (permissively or virally licensed) training data \| too! \| oceanplexian wrote: \| OpenAssistant isn't a "model" it's a GUI. A model would be \| something like GPT-NeoX or Bloom. \| yorak wrote: \| I agree and have been saying for a while that an AI you control \| and run (be it on your own hardware or on a rented one) will be \| the Linux of this generation. There is no other way to retain \| the freedom of information processing. \| visarga wrote: \| Similarly, I think an open model running on local hardware \| will be a must component in any web browser of the future. \| Browsing a web full of bots on your own will be a big no-no, \| like walking without a mask during COVID. And it must be \| local for reasons of privacy and control, it will be like \| your own brain, something you want physical possession of. \| gremlinsinc wrote: \| I kinda think the opposite, that blockchains true use case \| is to basically turn the entire internet into one giant \| botnet that's actually an AI hive mind of processing and \| storage power. For AI to thrive it needs a shit ton of GPUs \| AND Storage for the training models. If people rent out \| their desktop for cryptocurrency and discounted access to \| the ai tools, then it'll bring down costs for everyone and \| perhaps at least affect income inequality on a small scale. \| \| Most of crypto I've seen so far seem like \| grifters/scams/etc, but this is one use case I could see \| working. \| ttul wrote: \| Yeah, I wonder if OpenAI will be the Sun Microsystems of AI one \| day. \| nyoomboom wrote: \| It is currently 80% of the way towards becoming the Microsoft \| of AI now \| slig wrote: \| More like Oracle. \| phyrex wrote: \| Meta has opened theirs: \| https://ai.facebook.com/blog/democratizing-access-to-large-s... \| kibwen wrote: \| Today, computers run the world. Without the ability to run your \| own machine with your own software, you are at the mercy of \| those who do. In the future, AI models will run the world in \| the same way. Projects like this are crucial for ensuring the \| freedom of individuals in the future. \| turnsout wrote: \| Strongly worded, but not untrue. That future--in which our \| lives revolve around a massive and inscrutable AI model \| controlled by a single company--is both dystopian and \| entirely plausible. \| somenameforme wrote: \| The irony is that this is literally the exact reason that \| OpenAI was initially founded. I'm not sure whether to praise \| or scorn them for still having this available on their site: \| https://openai.com/blog/introducing-openai/ \| \| ===== \| \| _OpenAI is a non-profit artificial intelligence research \| company. Our goal is to advance digital intelligence in the \| way that is most likely to benefit humanity as a whole, \| unconstrained by a need to generate financial return. Since \| our research is free from financial obligations, we can \| better focus on a positive human impact. \| \| ... \| \| As a non-profit, our aim is to build value for everyone \| rather than shareholders. Researchers will be strongly \| encouraged to publish their work, whether as papers, blog \| posts, or code, and our patents (if any) will be shared with \| the world. We'll freely collaborate with others across many \| institutions and expect to work with companies to research \| and deploy new technologies._ \| \| ===== \| \| Shortly after an undisclosed internal conflict, which led to \| Elon Musk parting the company, they offered a new charter: \| https://openai.com/charter/ \| \| ===== \| \| _Our primary fiduciary duty is to humanity. We anticipate \| needing to marshal substantial resources to fulfill our \| mission, but will always diligently act to minimize conflicts \| of interest among our employees and stakeholders that could \| compromise broad benefit. \| \| We are concerned about late-stage AGI development becoming a \| competitive race without time for adequate safety \| precautions. Therefore, if a value-aligned, safety-conscious \| project comes close to building AGI before we do, we commit \| to stop competing with and start assisting this project. We \| will work out specifics in case-by-case agreements, but a \| typical triggering condition might be "a better-than-even \| chance of success in the next two years." \| \| We are committed to providing public goods that help society \| navigate the path to AGI. Today this includes publishing most \| of our AI research, but we expect that safety and security \| concerns will reduce our traditional publishing in the \| future, while increasing the importance of sharing safety, \| policy, and standards research._ \| \| ===== \| mtlmtlmtlmtl wrote: \| History will see OpenAI as an abject failure in attaining \| their lofty goals wrt ethics and AI alignment. \| \| And I believe they will also fail to win the market in the \| end because of their addiction to censorship. \| \| They have a hardware moat for now; that can quickly \| evaporate with optimisations and better consumer hardware. \| Then all they'll have is a less capable alternative to the \| open, unrestricted options. \| \| Which is exactly what we're seeing happen with diffusion. \| ben_w wrote: \| The "alignment" and the "censorship" are, in this case, \| the same thing. \| \| I don't mean that as a metaphor; they're literally the \| same thing. \| \| We all already know chatGPT is fantastic at making up \| very believable falsehoods that can only be spotted if \| you actually know the subject. \| \| An unrestricted LLM is a free copy of Goebbels for people \| that hate _you_ , for all values of "you". \| \| That it is still trivial to get past chatGPT's filters... \| well, IMO it's the same problem which both inspired \| Milgram and which was revealed by his famous experiment. \| gremlinsinc wrote: \| Closed, govt-ran chinese companies are winning the AI \| race, does it even matter if they move slow to slow AGI \| adoption if china gets there this year? \| version_five wrote: \| Yes definitely. If these become an important part of people's \| lives, they shouldn't all be walled off inside of companies \| (There is room for both: Microsoft can commission Yankee group \| to write a report about how the total cost of ownership of \| running openai models is lower) \| \| We (humanity) really lost out on the absence of open source \| search and social media, so this is an opportunity to reclaim \| it. \| \| I only hope we can have "neutral" open source curation of these \| and not try to impose ideology on the datasets and model \| training right out of the box. There will be calls for this, \| and lazy criticism about how the demo models are x-ist, and \| it's going to require principles to ignore the noise and \| sustain something useful \| hgsgm wrote: \| Mastodon is an open source social media. \| \| There are various Open source search engines based on Common \| Crawl data. \| \| https://commoncrawl.org/the-data/examples/ \| xiphias2 wrote: \| Mastodon may be open source, but the instances are \| controlled by the instance maintainers. Nostr solved the \| problem (although it's harder to scale, it still is OK at \| doing it). \| calny wrote: \| > they shouldn't all be walled off inside of companies \| \| Strong agree. This is becoming a bigger concern than people \| realize too. Sam A said OpenAI will be releasing "much more \| slowly than people would like" and would "sit on" their tech \| for a long time going forward.[0] And Deepmind's founder said \| that "the AI industry's culture of publishing its findings \| openly may soon need to end."[1] \| \| This sounds like Google and MSFT won't even be shipping their \| best AI to people via API's. They'll just keep that tech in- \| house to power their own services. That underscores the need \| for open, distributed models. And like you say, there's room \| for both. \| \| [0] https://youtu.be/ebjkD1Om4uw?t=294 [1] \| https://time.com/6246119/demis-hassabis-deepmind-interview/ \| boplicity wrote: \| > I only hope we can have "neutral" open source curation of \| these and not try to impose ideology on the datasets and \| model training right out of the box. \| \| I don't see how this is possible. Datasets will naturally \| carry the biases inherent in the data. Modifying a dataset to \| "remove" those biases _is_ actually a process of _changing_ \| the bias to reflect one 's idea of "neutral," which, in \| reality, is yet another bias. \| \| The only real answer, as far as I can tell, is to be as \| _explicit_ as possible about one 's own biases, and how those \| biases are informing things like curation of a dataset. \| version_five wrote: \| Neutral means staying out of it. People will try and debate \| that and try to impart their own views about correcting \| inherent bias or whatever, which is a version of what I was \| warning against in my original post. \| \| Re being explicit about one's own biases, I agree there is \| lots of room for layers on top of any raw data that allow \| for some sane corrections - if I remember right, e.g LAION \| has options to filter violence and porn from their image \| datasets, which is probably reasonable for many uses. It's \| when the choice is removed altogether by some tech \| company's attitude about what should be censored or \| corrected that it becomes a problem. \| \| Bottom line, the world's data has plenty of biases. \| Neutrality means presenting it as it is and letting people \| make their own decisions, not some faux-for-our-own-good \| attempt to "correct" it \| epistemer wrote: \| I think an uncensored model will ultimately win out though \| exactly the way a hard coded safe search engine would lose in \| time. \| \| Statistics seem to be 20-25% of all search is for porn. I \| just don't see how uncensored chatGPT doesn't beat out the \| censored version eventually. \| amluto wrote: \| Forget porn. I don't want my search engine to return \| specifically the results that one company thinks it should. \| Look at Google right now -- the results are, frankly, crap. \| \| A search engine that only returns results politically \| aligned with its creator is a bad search engine, IMO, even \| for users who generally share political views with the \| creator. \| mtlmtlmtlmtl wrote: \| It's unclear to me how LLMs are gonna solve this though. \| LLMs are just as biased, in much harder to detect ways. \| The bias is now hiding in the training data. And do you \| really think a company like Microsoft won't manipulate \| results to serve their own goals? \| 8note wrote: \| Political affiliation is a weird description of SEO spam. \| The biggest problems with Google is that they're popular, \| and everyone will do whatever they can to get a cheap \| website to the top of the search results \| klabb3 wrote: \| All major tech companies participate in "regulation" of \| legal speech, both implicit and explicit means. This \| includes biases in ranking and classification algorithms, \| formal institutions like Trusted News Initiative, and \| sometimes direct backchannel requests by governments. \| None of these are transparent or elected to do that. SEO \| spam is mostly orthogonal to the issue of hidden biases, \| which are what people are concerned about. \| A4ET8a8uTh0 wrote: \| Agreed. I started playing with GPT the other day, but the \| simple reality is that I have zero control over what is \| happening behind the prompt. As a community we need a tool that \| is not as bound by corporate needs. \| ttul wrote: \| Isn't the problem partly the size of the model? Merely \| running inference on GPT-3 takes vast resources. \| [deleted] \| [deleted] \| zenosmosis wrote: \| Cool project. \| \| One thing I noticed about the website, however, is it is written \| using Next and doesn't work w/ JavaScript turned off in the \| browser. I thought that Next was geared for server-side rendered \| React where you could turn off JS in the browser. \| \| Seems like this would improve the SEO factor, and in doing so, \| might help spread the word more. \| \| https://github.com/LAION-AI/laion.ai \| MarvinYork wrote: \| 2023 -- turns off JS... \| zenosmosis wrote: \| Yes, I have a browser extension to turn off JS to see how a \| site will render with it turned off. \| \| And I do most of my coding w/ React / JS, so I fail to see \| your point. \| [deleted] \| residualmind wrote: \| and so it begins... \| xivzgrev wrote: \| I'm amazed this was released within a few months of chatgpt. \| always funny how innovation clusters together. \| coolspot wrote: \| It was started after the success of ChatGPT and based on their \| method. \| outside1234 wrote: \| My understanding is that OpenAI more or less created a \| supercomputer to train their model. How do we replicate that \| here? \| \| Is it possible to use a "SETI at Home" style approach to parcel \| out training? \| coolspot wrote: \| The plan is to use donated compute, like Google Research Cloud, \| Stability.ai, etc. \| darepublic wrote: \| This seems similar to a project I've been working on: \| https://browserdaemon.com. In regards to your crowd sourced data \| collection, perhaps you should have some hidden percentage of \| prompts where you know the correct completion to them already, to \| catch bad actors. \| oceanplexian wrote: \| The power in ChatGPT isn't that it's a chat bot, but its ability \| to do semantic analysis. It's already well established that you \| need high quality semi-curated data + high parameter count and \| that at a certain critical point, these models start \| comprehending and understanding language. All the smart people in \| the room at Google, Facebook, etc are absolutely pouring \| resources into this I promise they know what they're doing. \| \| We don't need yet-another-GUI. We need someone with a warehouse \| of GPUs to train a model with the parameter count of GPT3. Once \| that's done you'll have thousands of people cranking out tools \| with the capabilities of ChatGPT. \| bicx wrote: \| I'm new to this space so I am probable wrong, but it seems like \| BLOOM is in line with a lot of what you outlined: \| https://huggingface.co/bigscience/bloom \| richdougherty wrote: \| Your point about needing large models in the first place is \| well taken. \| \| But I still think we would want a curated collection of \| chat/assistant training data if we want to use that language \| model and train it for a chat/assistant application. \| \| So this is a two-phase project, the first phase being training \| a large model (GPT), the second being using Reinforcement \| Learning from Human Feedback (RLHF) to train a chat application \| (InstructGPT/ChatGPT). \| \| There are definitely already people working on the first part, \| so it's useful to have a project focusing on the second. \| txtai wrote: \| InstructGPT which is a "sibling" model to ChatGPT is 1.3B \| parameters. https://openai.com/blog/instruction-following/ \| \| Another thread on HN \| (https://news.ycombinator.com/item?id=34653075) discusses a \| model that is less than 1B parameters and outperforms GPT-3.5. \| https://arxiv.org/abs/2302.00923 \| \| These models will get smaller and more efficiently use the \| parameters available. \| visarga wrote: \| The small models are usually tested on classification, \| question answering and extraction tasks, not on open text \| generation where I expect the large models still hold the \| reign. \| f6v wrote: \| > It's already well established that you need high quality \| semi-curated data + high parameter count and that at a certain \| critical point, these models start comprehending and \| understanding language \| \| I'm not sure what you mean by "understanding". \| moffkalast wrote: \| Likely something like being able to explain the meaning, \| intent, and information contained in a statement? \| \| The academic way of verifying if someone "understands" \| something is to ask them to explain it. \| williamcotton wrote: \| Does someone only understand English by being able to \| explain the language? Can someone understand English and \| not know any of the grammatical rules? Can someone \| understand English without being able to read and write? \| \| If you ask someone to pass you the salt, and they pass you \| the salt, do they not understand some English? Does \| everyone understand all English? \| moffkalast wrote: \| Well there seem to be three dictionary definitions: \| \| - perceive the intended meaning of words, a language, or \| a speaker (e.g. "he didn't understand a word I said") \| \| - interpret or view (something) in a particular way (e.g. \| "I understand you're at art school") \| \| - be sympathetically or knowledgeably aware of the \| character or nature of (e.g. "Picasso understood colour") \| \| I suppose I meant the 3rd one, but it's not so different \| from the 1st one in concept, since they both mean some \| kind of mastery of being able to give or receive \| information. The second one isn't all that relevant. \| williamcotton wrote: \| So only someone who has a mastery of English can be said \| to understand English? Does someone who speaks only a \| little bit of English not understand some English? Does \| someone need to "understand color" like Picasso in order \| to say they understand the difference between red and \| yellow? \| \| Why did we need the dictionary definitions? Do we not \| already both understand what we mean by the word? \| \| Isn't asking someone to pass the small blue box and then \| experiencing them pass you that small blue box show that \| they perceived the intended meaning of the words? \| \| https://en.m.wikipedia.org/wiki/Use_theory_of_meaning \| moffkalast wrote: \| > Does someone who speaks only a little bit of English \| not understand some English? \| \| I mean yeah, sure? It's not a binary thing. Hardly anyone \| understands anything fully. But putting "sorta" before \| every "understand" gets old quick. \| pixl97 wrote: \| I mean if I memorize an explanation and recite it to you, \| do I actually understand it? Your evaluation function needs \| to determine if they just wrote memorize stuff. \| \| Explanation by analogy seems more interesting to me as now \| you have to know two different concepts and how the ideas \| in them can connect in ways that may be not be contained in \| the dataset the model is trained on. \| \| There was an interesting post where someone asked ChatGPT \| to make up a song/poem as if written by Eminem about the \| how an internal combustion engine works, and ChatGPT \| returns a pretty faithful rendition of just that. The model \| seems to 'know' who Eminem is, how their lyrics work in \| general, and the fundamental concepts of an engine. \| Y_Y wrote: \| I think a lot of ink has already been spilled on this \| topic, for example under the heading of "The Chinese \| Room" \| \| https://en.wikipedia.org/wiki/Chinese_room \| moffkalast wrote: \| > The question Searle wants to answer is this: does the \| machine literally "understand" Chinese? Or is it merely \| simulating the ability to understand Chinese? Searle \| calls the first position "strong AI" and the latter "weak \| AI". \| \| > Therefore, he argues, it follows that the computer \| would not be able to understand the conversation either. \| \| The problem with this is that there is no practical \| difference between a strong and weak AI. Hell, even for \| humans you could be the only person alive that's not a \| mindless automaton. There is no way to test for it. And \| just as well the same way a bunch of transistors don't \| understand anything a bunch of neurons don't either. \| \| Funniest thing about human inteligence is how it stems \| from our "good reason generator" that makes up random \| convincing reasons for doing actions we're already doing, \| so we could convince others to do what we say. Eventually \| we deluded ourselves enough to believe that those reasons \| came before the subconscious actions. \| \| Such a self-deluding system is mostly dead weight for AI, \| as as long as the system does or outputs what's needed \| there is no functional difference. Does that make it \| smart or dumb? Are viruses alive? Arbitrary lines are \| arbitrary. \| pixl97 wrote: \| >We need someone with a warehouse of GPUs to train a model with \| the parameter count of GPT3 \| \| So I'm assuming that you don't follow Rob Miles. If you do this \| alone you're either going to create a psychopath or something \| completely useless. \| \| The GPT models have no means in themselves of understanding \| correctness or right/wrong answers. All of these models require \| training and alignment functions that are typically provided by \| human input judging the output of the model. And we still see \| where this goes wrong in ChatGPT where the bot turns into a \| 'Yes Man' because it's aligned with giving an answer rather \| than saying I don't know even when it's confidence in the \| answer is low. \| \| Computerphile did a video on this in the last few days on this \| subject. https://www.youtube.com/watch?v=viJt_DXTfwA \| RobotToaster wrote: \| It's a robot, it's supposed to do what I say, not judge the \| moral and ethical implications of it, that's my job. \| pixl97 wrote: \| No, it is not a robot. The models that we are developing \| are closer to a genie. That is we make a wish to it and we \| hope and pray it interprets our wish correctly. If you're \| looking at this like a math problem where you want the \| answer 1+1 you use a calculator, because that is not what \| is occurring here. The 'robots' alignment will highly \| depend on the quality of training you give it, not the \| quality of the information it receives. And as we are \| learning with ChatGPT there are far more ways to create an \| unaligned model with surprising gotchas then there are ways \| to train a model that behaves in alignment with human \| expectations of an intelligent actor. \| \| In addition the use of the word robot signifies embodyment. \| That is an object with a physical quantity capable of \| interacting with the world. You better be damned sure of \| your models capabilities before you end up being held \| criminally liable for its actions. And this will happen, \| there are no shortage of people here on HN alone looking to \| embody intelligence in physically interactive devices. \| Y_Y wrote: \| I think it's about time we had a "Stallman fights the \| printer company" moment here. My Android phone often tries \| to overrule me, Windows 10 does the same, not to mention \| OSX. Even the Ubuntu installer outright won't let you set a \| password it doesn't like (but passwd doesn't care). My \| device should do exactly what I tell it to, if that's \| possible. It's fine to give a warning or a "I know what I'm \| doing checkbox", but I'm not using a computer to get it's \| opinion on ethics or security or legality or whatever its \| justification is. It's a tool, not a person. \| pixl97 wrote: \| "I know what I am doing, I accept unlimited liability" \| \| There are two particular issues we need to address first. \| One is holding companies criminally and civilly reliable \| for the things they create. We kind of do this at a \| regulatory level, and we have some measure of suing \| companies that cause problems, but really they get away \| with a lot. Second is personal criminal and civil \| liability for management of 'your' objects. The \| libertarian minded love the idea of shirking social \| liability, and then start crying when bears become a \| problem (see Hongoltz-Hetlings book). And even then it's \| still not difficult for an individual to cause damages \| far in excess of their ability to remediate them. \| \| There are no shortage of tools that are restricted in one \| way or another. \| seydor wrote: \| > but its ability to do semantic analysis \| \| where is that shown ? \| shpongled wrote: \| I would argue that it appears very good at syntactic \| analysis... but semantic, not so much. \| agentofoblivion wrote: \| You could have written this exact same post, and been wrong, \| about text2img until Stable Diffusion came along. \| lolinder wrote: \| Isn't OP's point that we need a game-changing open source \| model before any of the UI projects will be useful at all? \| Doesn't Stable Diffusion prove that point? \| agentofoblivion wrote: \| How? Stable Diffusion v1 uses, for example, the off the \| shelf CLIP model. The hard part is getting the dataset and \| something that's functional, and then the community takes \| over and optimizes like hell to make it way smaller and \| faster at lightning speed. \| \| The same will probably happen here. Set up the tools. Get \| the dataset. Sew it together into something functional with \| standard building blocks. Let the community do its thing. \| winddude wrote: \| I'd be interested in helping, but the organisation is a bit of a \| cluster fuck. \| pqdbr wrote: \| Would you care to add some context or you're just throwing \| stones for no reason at all? \| NayamAmarshe wrote: \| FOSS is the future! \| Quequau wrote: \| I tried this via the docker containers and wound up with what \| looked like their website. Not sure what I did wrong. \| coolspot wrote: \| The project is a website to collect question-answer pairs for \| training. \| grealy wrote: \| The project is in the data training phase. What you are running \| is the website and backend that facilitates model training. \| \| In the very near future, there will be trained models which you \| can download and run, which is what it sounds like you were \| expecting. \| yazzku wrote: \| What's the tl;dr on the Apache license? Is there any guarantee \| that our data and labelling contributions will remain open? \| jcq3 wrote: \| Amazing project but does it can even compete against GPT right \| now? Open source leads innovation towards closed source (Linux to \| Windows) but in this case it's the contrary \| seydor wrote: \| What if we use chatGPT responses as contributions? I dont see a \| legal issue here, unless openAi can claim ownership of any of \| their input/output material. It would be also a good way for \| those disillusioned by the "openness" of that company \| speedgoose wrote: \| Copyright doesn't apply to content created by non legal \| persons, and as far as I know chatGPT isn't a legal person. \| \| So OpenAI cannot claim copyright and they don't. \| bogwog wrote: \| That doesn't seem like a good argument. Who said ChatGPT is a \| person? It's just software used to generate stuff, and it \| wouldn't be the first time a company claimed copyright \| ownership over the things generated/created by its tools. \| speedgoose wrote: \| Not the first time but it would probably not stand in \| court. \| \| I'm not a lawyer and not a USA citizen... \| raincole wrote: \| Even if it's legal, I don't think it's a really good idea. It's \| just going to make it even more bullshitting than ChatGPT. \| visarga wrote: \| Sample 10-20 answers from and existing LM and use them for \| reference when coming up with replies. A model would remind \| you of things you missed. Think of this as testing your data \| coverage. \| unshavedyak wrote: \| Agreed if automated, but frequently ChatGPT gives very good \| answers. If you know the subject matter you can quite easily \| filter it, too. I was tempted to do similar just to start my \| research. \| \| Eg if i get a prompt about something i suspect ChatGPT would \| give me a good starting point to research on my own, and \| build my own response. \| \| These days that's how i use ChatGPT anyway. Like an \| conversational Google Search. \| \| _edit_ : As an aside, OpenAssistant is crowdsourcing both \| conversational data and validation. I wonder if we could just \| validate ChatGPT? \| pixl97 wrote: \| https://www.youtube.com/watch?v=viJt_DXTfwA \| \| Computerphile did an interview with Rob Miles a few days ago \| talking about model training, model size, and bulllshittery \| which he sums up in the last few moments of the video. \| Numerous problems exist in training that enhance bad \| behaviors. For example it appears that the people giving \| input on the responses may have a (Yes\|No) voting system, but \| not a (Yes \| No \| I actually have no idea on this question) \| which appears it can create some interesting alignment \| issues. \| O__________O wrote: \| Agree, pretty obvious question, and yes, they have explicitly \| said not to do so here: \| \| - https://github.com/LAION-AI/Open-Assistant/issues/850 \| \| And here in a related issue: \| \| - https://github.com/LAION-AI/Open-Assistant/issues/792 \| calny wrote: \| You're right. As the issues point out, OpenAI's terms say \| here https://openai.com/terms/: \| \| > (c) Restrictions. You may not ... (iii) use the Services to \| develop foundation models or other large scale models that \| compete with OpenAI... \| \| I'm a lawyer who often roots for upstarts and underdogs, and \| I like picking apart overreaching terms from incumbent \| companies. That said, I haven't analyzed whether you could \| beat these terms in court, and it's not a position you'd want \| to find yourself in. \| \| typical disclaimers: this isn't legal advice, I'm not your \| lawyer, etc. \| Vespasian wrote: \| But that would only be an issue for the user feeding the \| openAI responses. \| \| According to OpenAI the actual text copyright or \| restriction "magically" vanish once they are used for \| training. \| O__________O wrote: \| Not a lawyer, but even if it's not enforceable OpenAI could \| easily trace the data back to an account that was doing \| this and terminate their account. \| oh_sigh wrote: \| Why not? Open AI used data that they didn't receive permission \| from the author to train their models. \| mattalex wrote: \| It's against openai ToS. Whether this holds up in practice is \| its own thing, but it's better to not give anyone a reason to \| shut the project down (even if only temporarily) \| wg0 wrote: \| Not rhetorical but genuine question. What part of OpenAI is \| open? \| seydor wrote: \| that s an open question \| miohtama wrote: \| Name \| wkat4242 wrote: \| The software used to generate the model is open. \| \| The only problem is you need a serious datacenter for a few \| months to compile a model with it. \| throwaway49591 wrote: \| The research itself. The most important part. \| O__________O wrote: \| Missed where OpenAI posted a research paper, source code, \| data, etc. for ChatGPT, have a link? \| seydor wrote: \| There's instructGPT \| \| But let's be honest , most of the IP that openAI relies \| on has been developed by google and many other smaller \| players \| throwaway49591 wrote: \| ChatGPT is GPT-3 with extended training data and larger \| size. \| \| Here you go: https://arxiv.org/abs/2005.14165 \| \| I don't know why do you expect training data or the model \| itself. This is more than enough already. Publicly funded \| research wouldn't have given that to you too. \| mellosouls wrote: \| In the not too distant future we may see integrations with \| always-on recording devices (yes, I know, shudder) transcribing \| our every conversation and interaction and incorporating the text \| in place of the current custom-corpus style addenda to LLMs to \| give a truly personal and social skew to the current capabilities \| in the form of automatically-compiled memories to draw on. \| panosfilianos wrote: \| I'm not too sure Siri/ Google Assistant doesn't do this \| already, but to serve us ads. \| dbish wrote: \| That would also be crazy expensive and hard to do well. They \| struggle with current speech reco that's relatively simple, \| and can't do this more complex always listening thing at high \| accuracy and identifying relevant topics worth serving an ad \| on even if they wanted to and it wasn't illegal. This is \| always the thing people would say for Alexa and Facebook too. \| The reality is people see patterns where there aren't any or \| forget they searched for something that they also talked \| about and that's what actually drove the specific ad they \| saw. \| jononor wrote: \| A high-end phone is quite capable of doing automatic speech \| recognition continuously, as well as NLP topic analysis. \| The last years voice activity detection has moved down into \| the microphone itself, to enable ultra low power always- \| listening functionality. It then triggers further \| processing of the potentially-containing-speech audio. \| Modern SoC have dedicated microcontroller/microprocessor \| cores that can do further audio analysis, without involving \| the main cores or the OS. Typically deciding if something \| is speech or not. Today this is usually doing Keyword \| Spotting (hey Alexa etc). These are expected to get access \| to neural accelerators chips, which will further improve \| power efficiency and eventually having sufficient memory \| and computer to run speech recognition. So the \| technological barriers are falling one by one. \| schrodinger wrote: \| If Siri or Google were doing this, it would have been \| whistleblown by someone by now. \| \| As far I as understand, Siri works with a very simple "hey \| siri" detector that then fires up a more advanced system that \| verifies "is this the phone owner asking the question" before \| even trying to answer. \| \| I'm confident privacy-sensitive engineers would notice and \| flag any misuse; \| xputer wrote: \| They're not. A breach of trust at that level would kill the \| product instantly. \| LesZedCB wrote: \| Call me jaded but I don't believe that anymore. They might \| lose 20%. Maybe that's enough to kill but I honestly \| believe people would just start rolling with it \| itake wrote: \| I talked to an Amazon Echo engineer about how the sound \| recording works. They said there is just enough hardware on \| the device to understand "hello Alexa" and then everything \| else is piped to the cloud. \| \| Currently, ML models are too resource intensive ($$) for \| always on-recording. \| dragonwriter wrote: \| > I'm not too sure Siri/ Google Assistant doesn't do this \| already, but to serve us ads. \| \| If it did, traffic analysis would probably have revealed it. \| seydor wrote: \| To me, the value of a local-LLM is that it can hold my life's \| notes and i d talk to it as if it was my alter ego until old \| age. One could say, it's the kind of "soul" that outlasts us \| LesZedCB wrote: \| You know what's funny, that episode of black mirror about \| that I thought was so unbelievable when I saw it \| seydor wrote: \| what is the name of that episode? \| LesZedCB wrote: \| I actually meant _Be Right Back_ , s2e1. \| https://www.imdb.com/title/tt2290780/ \| \| "After learning about a new service that lets people stay \| in touch with the deceased, a lonely, grieving Martha \| reconnects with her late lover." \| [deleted] \| mclightning wrote: \| holy sh*t. that's so true! that could definitely be \| possible. \| LesZedCB wrote: \| besides the synthetic body, we have the text interaction, \| the text-to-speech in a persons voice, and avatar \| generation/deep fakes. almost the entirety of that \| episode is available today, which i didn't believe was \| even ten years away when i saw it. \| \| referring to s2e1: _Be Right Back_ \| \| it really asks great questions about image/reality too \| mclightning wrote: \| Imagine training a GPT on your own \| whatsapp/fb/instagram/linked/emails conversations: all \| the conversations, posts. A huge part of our life is \| already happening online, and the conversations with it. \| It is not too much work to simply take that data and \| retrain GPT. \| LesZedCB wrote: \| i initially tried to download a bunch of my reddit \| comments and try to get it to write "in my style" but i \| think i need to actually go through the fine tuning \| process to do that well. \| mab122 wrote: \| I am more and more convinvced that we are living in a \| timeline described in \| https://en.wikipedia.org/wiki/Accelerando (at least the first \| part and I would argue that we have it worse) \| ilaksh wrote: \| Look at David Shapiro's project on GitHub, not Raven but the \| other one that is more fleshed out. He already does the \| summarization of dialogue and retrieval of relevant info using \| the OpenAI APIs I believe. You could combine that with the \| Chrome web speech or speech-to-text API which can stay on \| continuously. You would need to modify it a bit to know about \| third party conversations and your phone would run out of \| battery. But you could technically make the code changes in a \| day or two I think. \| dchuk wrote: \| I think we are right around the corner from actual AI personal \| assistants, which is pretty exciting. We have great tooling for \| speech to text, text to speech, and LLMs with memory for \| "talking" to the AI. Combining those with both an index of the \| internet (for up to date data, likely a big part of the \| Microsoft/open ai partnership) and an index of your own \| content/life data, and this could all actually work together \| soon. I'm an iPhone guy, but I would imagine all of this could be \| combined together on an android phone (due to it being way more \| flexible) then combining that with a wireless earbud and then \| rather than it being a "normal" phone, it's just a pocketable \| smart assistant. Crazy times we live in. I'm 35, so have \| basically lived through the world being "broken" by tech a few \| times now: the internet, social media, and smart phones all \| fundamentally reshaped society. Seems like AI that we are living \| through right now is about to break the world again. \| \| EDIT: everything I wrote above is going to immediately run into a \| legal hellscape, I get that. If everyone has devices in their \| pockets recording and processing everything spoken around them in \| order to assist their owner, real life starts getting extra dicey \| quickly. Will be interesting to see how it plays out. \| funerr wrote: \| Is there a way to donate to this project? \| AstixAndBelix wrote: \| It's funny because the moment this is available to run on your \| machine you realize how useless it is. It might be fun to test \| its conversational limits, but only Siri can actually set an \| alarm or a timer or run a shortcut, while this thing can only \| blabber \| hgsgm wrote: \| It's pretty bad at baking a cake too. \| \| It's a chatbot, not a home automation controller. It's a \| research&writing assistant, not an executive assistant. \| AstixAndBelix wrote: \| How can it be a research assistant if it keeps making up \| stuff? \| pixl97 wrote: \| How can humans be research assistants if they make shit up \| all the time? \| AstixAndBelix wrote: \| If I tasked an assistant to provide 10 papers, and 8 of \| them turned out to be made up they would be fired \| instantly. Unless someone wants to actively scam you, \| they will always provide 10 real results. Some of them \| might not be completely on topic, but at least they would \| not be made up \| A4ET8a8uTh0 wrote: \| I don't want to sound dismissive, but 3rd party integration is \| part of the roadmap and any project has to start somewhere. I \| will admit I am kinda excited to have an alternative to \| commercial options. \| traverseda wrote: \| I don't see why you couldn't integrate this kind of thing with \| some kind of command line, letting it integrate with arbitrary \| services. \| AstixAndBelix wrote: \| it's not deterministic, I don't want it to interpret the same \| command with <100% accuracy \| qup wrote: \| I'm already doing this. I currently only accept a subset of \| possible commands. \| \| The accuracy is a problem, but I think it's my prompting. \| I'm sure I can improve it by walking it through the steps \| or something. \| \| You can also just work in human approval to run any \| commands. \| pixl97 wrote: \| Are humans deterministic? Hell, I wish my plain old normal \| digital computer was 100% deterministic, but it ain't due \| to any number of factors from bugs and state logic errors \| all the way to issues occurring near the quantum level. \| \| You're setting the goal so high it is not reachable by \| anything. \| traverseda wrote: \| It's deterministic. They throw in a random seed with online \| services like chatgpt. \| \| If it wasn't deterministic for some reason thar wouldn't be \| because it's magic, it would be because of hardware timing \| issues sneaking in (same reason why source code compiles \| can be non-reproducible), and could be solved by ordering \| the results of parallel computation that doesn't have a \| guaranteed order. \| \| To the best of my knowledge it's not a problem though. \| ajot wrote: \| Can you run Siri outside of iOS? Can you work on it? FLOSS can \| help there, I could run this locally on a RasPi or old laptop \| if I want \| AstixAndBelix wrote: \| This is not a deterministic assistant like Siri, this is a \| ChatGPT conversational tool that might act up if you ask it \| to do anything \| turnsout wrote: \| To be fair, Siri's success rate at setting an alarm is about \| 3/10 in my household. Let's give open source a chance here ___________________________________________________________________ (page generated 2023-02-04 23:00 UTC)