proxy70

	[HN Gopher] AI and Open Source in 2023 ___________________________________________________________________ AI and Open Source in 2023 Author : belter Score : 42 points Date : 2023-11-04 18:50 UTC (4 hours ago)
	web link (magazine.sebastianraschka.com)
	w3m dump (magazine.sebastianraschka.com)
	\| gumballindie wrote: \| As soon as DRM for text and images is implemented companies such \| openai will be in for a ride. Unfortunately though open source \| models will be sacrificed in the process, but we need means to \| protect against the rampant ip theft ai companies do. \| artninja1988 wrote: \| No such thing as IP theft \| gumballindie wrote: \| Let me guess - you think ip and copyright are "rent seeking"? \| What a weird age we live in. Where people defend corporations \| from stealing our work. Quite a shift from the reverse. \| minimaxir wrote: \| It's entirely possible to steal IP, but the "AI art is theft" \| part of it is still legally up in the air. \| gumballindie wrote: \| There are all sorts of things that are legal and immoral or \| disagreeable so even if ai art theft is legalised it's \| still theft if the author doesnt want it to be used that \| way. It seems like "ai" is quite reliant on ingesting and \| storing massive property data to emulate "intelligence" - \| and thats equal to people downloading and storing movies \| and music. A thing we are not permitted to by the same \| corporations that you wish to help. \| jrm4 wrote: \| I think what OP is referring to is the entirely reasonable \| legal argument that IP infringement is not actually "theft" \| \| The idea being: "Theft" isn't about "you get something you \| don't own," it means "you deprive someone else of THEIR \| property." \| minimaxir wrote: \| Which means that companies will just license the data used to \| train models because they have the money to do so, or use their \| own data instead. That's how Adobe's Firefly works right now, \| and OpenAI just signed a licensing agreement with Shutterstock: \| https://venturebeat.com/ai/shutterstock-signs-6-year-trainin... \| \| Even if it became impossible to train AI on internet-accessible \| data, there's no change to the proliferation of generative AI \| other than keeping it entrenched and centralized in the power \| of a few players, and it has no impact on potentially taking \| jobs from artists, other than making it _harder_ for artists to \| compete due to the lack of open-source alternatives. \| gumballindie wrote: \| No problem then, people willing to make their content \| available to ai can do so by using such websites, people that \| value their work can use something else. \| ben_w wrote: \| That has the same vibe as responding to the invention of \| the Jacquard loom by saying: "No problem then, people \| willing to make their designs available to automation can \| do so by using such punched cards, people that value their \| work can use something else." \| \| Home weaving does still exist. Not a very big employer any \| more, though. \| LastTrain wrote: \| All analogies are fraught but this one takes the cake. A \| more apt one is not wanting the Jaquard loom people to \| steal my designs. \| jrm4 wrote: \| You're probably getting downvoted because "DRM" was nearly a \| complete technical failure already, and there's no reason \| believe it would be different for AI? \| gumballindie wrote: \| Normally i wouldnt advocate for drm but there needs to be a \| way to protect our content from this madness. I understand \| the backlash though and I am not worried about downvotes. \| Krasnol wrote: \| Your content was never protected in the sense you want it \| to be protected. \| \| Since the moment you put it up online for people to see and \| hear, they were able to move on and create something else \| based upon this. Most of the time unconsciously. This is \| how humanity works. This is the reason we're still on this \| planet. AI accelerates the process like any other tool \| we've come up with since we climbed down the trees. \| \| You can complain and scream as much as you want, but it \| won't change. Even if you manage to regulate the whole \| western part of the internet. The rest of the world is \| bigger and won't sleep. \| ls612 wrote: \| Unfortunatley I think you are wrong about this. DRM schemes \| are evolving to be nearly unbreakable in the future with the \| widespread adoption of security processors in everything. \| \| As long as there is a massive fundamental asymmetry between \| assembling a chip with a small amount of ROM and \| disassembling & reading that ROM while still making the chip \| usable, DRM schemes using PKI methods will become widespread \| and nigh unbreakable. \| ben_w wrote: \| Point [camera/microphone/eyeball] at [video/audio/text], \| [press record/press record/start writing down what you \| see]. \| candiddevmike wrote: \| IMO, I think the entire "train on as much data as possible" is \| nearing its end. There are diminishing returns and it seems \| like a dead end strategy. \| babyshake wrote: \| Watermarking images, particularly very high resolution images, \| I can understand, but I fail to see how with text, you would \| watermark it in a way that provides sufficient evidence it has \| been used for training data, unless the model is just quoting \| it at length. \| andy99 wrote: \| Most importantly, 2023 was the year when "open source" got \| watered down to mean "you can look at the source code / weights" \| if you agree with a bunch of stuff. Most of the models \| referenced, like stable diffusion (RAIL license) and Llama & \| derivatives (proprietary facebook license with conditions on uses \| and some commercial terms) are not open source in the sense that \| it was understood a year ago. People protested a bit when the \| terminology started being abused, and now that's mostly died down \| and people now call these restrictive licenses open source. This \| (plus ongoing regulatory capture) is going to be the wedge that \| destroys software freedom and brings us back to a regime where a \| few companies dictate how computers can be used. \| Der_Einzige wrote: \| In practice this matters less than you think. You can't easily \| prove that any outputs were generated by a particular model in \| general, so any user can simply ignore your licenses and do as \| they please. \| \| I know it rustles purist feathers, but I don't understand why \| we live in this pretend world that assumes that folks \| particularly care about respecting licenses. Consider how \| little success that the GNU folks have had with using the \| courts for any enforcement of their licenses, and that's by \| stallmans own admission. \| \| AI is itself a subversive technology, whose current versions \| rely on subversive training techniques. Why should we expect \| everyone to suddenly want to follow the rules when they read a \| poorly written restrictive open source license? \| andy99 wrote: \| For personal or noncommercial use I agree the restrictions \| are meaningless. As they are for "bad actors" that would \| potentially abuse the tools in contravention of the license. \| But the license terms are a risk for commercial users \| especially when dealing with a big company like Meta. These \| risks werent previously there in say pytorch that is MIT \| licensed. The ironic thing with these licenses is that they \| are least enforceable on those who would be most likely to \| abuse them: https://katedowninglaw.com/2023/07/13/ai- \| licensing-cant-bala... \| \| Re success of free licenses, linux (other than a few arguable \| abuses) has remained free and unencumbered thanks to GPL \| licensing. \| nologic01 wrote: \| Somehow the "AI defense" (namely that it is not possible to \| "prove" anything was used illegally) will open Pandora's box \| in terms of providing viable channels for whitewashing \| explicit theft activity. Steal anything proprietary, run it \| through an AI filter that mixes it with other stuff and claim \| it as your own. \| ebalit wrote: \| Mistral 7B [1] and many models stemming from it are released \| under permissive Apache license. \| \| Some might argue that a "pure" open-source would require the \| dataset and the training "recipe" as it would be needed to \| reproduce the training, but it would be so expensive that most \| people wouldn't be able to do much with it. \| \| IMO, a release with open weights without the "source" is much \| better than the opposite, a release with open source and no \| trained weights. \| \| And it's not like there was no progress on the open dataset \| front: - Together just released RedPajama V2, with enough \| tokens to train a very sizeable base model. - Tsinghua released \| UltraFeedback which allowed more people to align models using \| RLHF methods (like the Zephyr models from Hugging Face) - and \| many many others \| \| [1] https://mistral.ai/news/announcing-mistral-7b/ [2] \| https://github.com/togethercomputer/RedPajama-Data \| seydor wrote: \| mistral appears to be quite open, and even better than llama \| imho \| emadm wrote: \| Check out our fully open recent 3b model which outperforms most \| 7b models and runs on an iPhone/cpu, fully open including data \| and details \| \| Tuned versions outperform 13b vicuna, wizard etc \| \| https://stability.wandb.io/stability-llm/stable-lm/reports/S... \| nologic01 wrote: \| Is there a truly open source effort in the LLM space? Like a \| collaborative, crowd-sourced effort (possibly with academic \| institutions playing a major role) that relies on creative \| commons licensed or otherwise open data and produces a public \| good as final outcome? \| \| There is this ridiculous idea of AI moats and other machinations \| for the next big VC thing (god bless them, people have spend \| their energy on worse pursuits) but in a fundamental sense there \| is a public good type infrastructure crying out to be developed \| for each major linguistic domain. \| \| Maybe such an effort would not be cutting edge enough to power \| the next corporate chatbot that will eliminate 99% of all jobs, \| but it would be a significant step up in our ability to process \| text. \| vinni2 wrote: \| I think OpenAssistant is the closest to what you are describing \| but their models are not yet that great. https://open- \| assistant.io/ \| nulld3v wrote: \| Open Assistant just shut down: \| https://www.youtube.com/watch?v=gqtmUHhaplo \| \| Cited reasons: Lack of resources, lack of maintainer time and \| there being many new good alternatives. \| dartos wrote: \| RWKV is fully open source and even part of the Linux foundation \| \| Idk why nobody ever talks about it \| TheCaptain4815 wrote: \| ElutherAi fits that I believe. In the olden days (1.5 years \| ago) they probably had the best open source model with their \| NeoX model, but it's been ellipsed by Llamma and other "open \| source" models I believe. They still have an active discord \| with a great community pushing forward. \| emadm wrote: \| We back rwkv, eleuther ai and others at stability ai \| \| We also have our carper.ai lab for the rl buts \| \| We are rolling out open language models and datasets soon for a \| number of languages too, see our recent Japanese language \| models for example \| \| Got some big plans soon, have funded it all ourself but sure \| other would like to help \| seydor wrote: \| $NVDA went to the moon, AI stocks skyrocketed including any beer \| with "AI" in its name. The rest of the story is typical by now: \| vc money flows, companies hide their trade secrets (prompts), \| public research is derailed. It's all very premature, LLMs was \| not the end of the road. \| brrrrrm wrote: \| Why do you say "prompts" is the canonical trade secret? \| jimmySixDOF wrote: \| On a retrospective take of the state of AI 1 year this month into \| LLMs post ChatGPT, I would like to single out Simon Wilson as the \| MVP for Open AI tooling contribution. His datasett projects are a \| great work in in progress and the prodigious blog posts and TIL \| snips are state of the art. Great onboarding to the whole \| ecosystem. I find myself using something he has produced in some \| way everyday. \| \| https://simonwillison.net/ \| raincole wrote: \| I think open models are more like closed source freemium \| applications. You got the weights, which are "compiled" from the \| source material. You're free to use it, but you can't, for \| example, remove one source material from it. ___________________________________________________________________ (page generated 2023-11-04 23:00 UTC)