|
| gumballindie wrote:
| As soon as DRM for text and images is implemented companies such
| openai will be in for a ride. Unfortunately though open source
| models will be sacrificed in the process, but we need means to
| protect against the rampant ip theft ai companies do.
| artninja1988 wrote:
| No such thing as IP theft
| gumballindie wrote:
| Let me guess - you think ip and copyright are "rent seeking"?
| What a weird age we live in. Where people defend corporations
| from stealing our work. Quite a shift from the reverse.
| minimaxir wrote:
| It's entirely possible to steal IP, but the "AI art is theft"
| part of it is still legally up in the air.
| gumballindie wrote:
| There are all sorts of things that are legal and immoral or
| disagreeable so even if ai art theft is legalised it's
| still theft if the author doesnt want it to be used that
| way. It seems like "ai" is quite reliant on ingesting and
| storing massive property data to emulate "intelligence" -
| and thats equal to people downloading and storing movies
| and music. A thing we are not permitted to by the same
| corporations that you wish to help.
| jrm4 wrote:
| I think what OP is referring to is the entirely reasonable
| legal argument that IP infringement is not actually "theft"
|
| The idea being: "Theft" isn't about "you get something you
| don't own," it means "you deprive someone else of THEIR
| property."
| minimaxir wrote:
| Which means that companies will just license the data used to
| train models because they have the money to do so, or use their
| own data instead. That's how Adobe's Firefly works right now,
| and OpenAI just signed a licensing agreement with Shutterstock:
| https://venturebeat.com/ai/shutterstock-signs-6-year-trainin...
|
| Even if it became impossible to train AI on internet-accessible
| data, there's no change to the proliferation of generative AI
| other than keeping it entrenched and centralized in the power
| of a few players, and it has no impact on potentially taking
| jobs from artists, other than making it _harder_ for artists to
| compete due to the lack of open-source alternatives.
| gumballindie wrote:
| No problem then, people willing to make their content
| available to ai can do so by using such websites, people that
| value their work can use something else.
| ben_w wrote:
| That has the same vibe as responding to the invention of
| the Jacquard loom by saying: "No problem then, people
| willing to make their designs available to automation can
| do so by using such punched cards, people that value their
| work can use something else."
|
| Home weaving does still exist. Not a very big employer any
| more, though.
| LastTrain wrote:
| All analogies are fraught but this one takes the cake. A
| more apt one is not wanting the Jaquard loom people to
| steal my designs.
| jrm4 wrote:
| You're probably getting downvoted because "DRM" was nearly a
| complete technical failure already, and there's no reason
| believe it would be different for AI?
| gumballindie wrote:
| Normally i wouldnt advocate for drm but there needs to be a
| way to protect our content from this madness. I understand
| the backlash though and I am not worried about downvotes.
| Krasnol wrote:
| Your content was never protected in the sense you want it
| to be protected.
|
| Since the moment you put it up online for people to see and
| hear, they were able to move on and create something else
| based upon this. Most of the time unconsciously. This is
| how humanity works. This is the reason we're still on this
| planet. AI accelerates the process like any other tool
| we've come up with since we climbed down the trees.
|
| You can complain and scream as much as you want, but it
| won't change. Even if you manage to regulate the whole
| western part of the internet. The rest of the world is
| bigger and won't sleep.
| ls612 wrote:
| Unfortunatley I think you are wrong about this. DRM schemes
| are evolving to be nearly unbreakable in the future with the
| widespread adoption of security processors in everything.
|
| As long as there is a massive fundamental asymmetry between
| assembling a chip with a small amount of ROM and
| disassembling & reading that ROM while still making the chip
| usable, DRM schemes using PKI methods will become widespread
| and nigh unbreakable.
| ben_w wrote:
| Point [camera/microphone/eyeball] at [video/audio/text],
| [press record/press record/start writing down what you
| see].
| candiddevmike wrote:
| IMO, I think the entire "train on as much data as possible" is
| nearing its end. There are diminishing returns and it seems
| like a dead end strategy.
| babyshake wrote:
| Watermarking images, particularly very high resolution images,
| I can understand, but I fail to see how with text, you would
| watermark it in a way that provides sufficient evidence it has
| been used for training data, unless the model is just quoting
| it at length.
| andy99 wrote:
| Most importantly, 2023 was the year when "open source" got
| watered down to mean "you can look at the source code / weights"
| if you agree with a bunch of stuff. Most of the models
| referenced, like stable diffusion (RAIL license) and Llama &
| derivatives (proprietary facebook license with conditions on uses
| and some commercial terms) are not open source in the sense that
| it was understood a year ago. People protested a bit when the
| terminology started being abused, and now that's mostly died down
| and people now call these restrictive licenses open source. This
| (plus ongoing regulatory capture) is going to be the wedge that
| destroys software freedom and brings us back to a regime where a
| few companies dictate how computers can be used.
| Der_Einzige wrote:
| In practice this matters less than you think. You can't easily
| prove that any outputs were generated by a particular model in
| general, so any user can simply ignore your licenses and do as
| they please.
|
| I know it rustles purist feathers, but I don't understand why
| we live in this pretend world that assumes that folks
| particularly care about respecting licenses. Consider how
| little success that the GNU folks have had with using the
| courts for any enforcement of their licenses, and that's by
| stallmans own admission.
|
| AI is itself a subversive technology, whose current versions
| rely on subversive training techniques. Why should we expect
| everyone to suddenly want to follow the rules when they read a
| poorly written restrictive open source license?
| andy99 wrote:
| For personal or noncommercial use I agree the restrictions
| are meaningless. As they are for "bad actors" that would
| potentially abuse the tools in contravention of the license.
| But the license terms are a risk for commercial users
| especially when dealing with a big company like Meta. These
| risks werent previously there in say pytorch that is MIT
| licensed. The ironic thing with these licenses is that they
| are least enforceable on those who would be most likely to
| abuse them: https://katedowninglaw.com/2023/07/13/ai-
| licensing-cant-bala...
|
| Re success of free licenses, linux (other than a few arguable
| abuses) has remained free and unencumbered thanks to GPL
| licensing.
| nologic01 wrote:
| Somehow the "AI defense" (namely that it is not possible to
| "prove" anything was used illegally) will open Pandora's box
| in terms of providing viable channels for whitewashing
| explicit theft activity. Steal anything proprietary, run it
| through an AI filter that mixes it with other stuff and claim
| it as your own.
| ebalit wrote:
| Mistral 7B [1] and many models stemming from it are released
| under permissive Apache license.
|
| Some might argue that a "pure" open-source would require the
| dataset and the training "recipe" as it would be needed to
| reproduce the training, but it would be so expensive that most
| people wouldn't be able to do much with it.
|
| IMO, a release with open weights without the "source" is much
| better than the opposite, a release with open source and no
| trained weights.
|
| And it's not like there was no progress on the open dataset
| front: - Together just released RedPajama V2, with enough
| tokens to train a very sizeable base model. - Tsinghua released
| UltraFeedback which allowed more people to align models using
| RLHF methods (like the Zephyr models from Hugging Face) - and
| many many others
|
| [1] https://mistral.ai/news/announcing-mistral-7b/ [2]
| https://github.com/togethercomputer/RedPajama-Data
| seydor wrote:
| mistral appears to be quite open, and even better than llama
| imho
| emadm wrote:
| Check out our fully open recent 3b model which outperforms most
| 7b models and runs on an iPhone/cpu, fully open including data
| and details
|
| Tuned versions outperform 13b vicuna, wizard etc
|
| https://stability.wandb.io/stability-llm/stable-lm/reports/S...
| nologic01 wrote:
| Is there a truly open source effort in the LLM space? Like a
| collaborative, crowd-sourced effort (possibly with academic
| institutions playing a major role) that relies on creative
| commons licensed or otherwise open data and produces a public
| good as final outcome?
|
| There is this ridiculous idea of AI moats and other machinations
| for the next big VC thing (god bless them, people have spend
| their energy on worse pursuits) but in a fundamental sense there
| is a public good type infrastructure crying out to be developed
| for each major linguistic domain.
|
| Maybe such an effort would not be cutting edge enough to power
| the next corporate chatbot that will eliminate 99% of all jobs,
| but it would be a significant step up in our ability to process
| text.
| vinni2 wrote:
| I think OpenAssistant is the closest to what you are describing
| but their models are not yet that great. https://open-
| assistant.io/
| nulld3v wrote:
| Open Assistant just shut down:
| https://www.youtube.com/watch?v=gqtmUHhaplo
|
| Cited reasons: Lack of resources, lack of maintainer time and
| there being many new good alternatives.
| dartos wrote:
| RWKV is fully open source and even part of the Linux foundation
|
| Idk why nobody ever talks about it
| TheCaptain4815 wrote:
| ElutherAi fits that I believe. In the olden days (1.5 years
| ago) they probably had the best open source model with their
| NeoX model, but it's been ellipsed by Llamma and other "open
| source" models I believe. They still have an active discord
| with a great community pushing forward.
| emadm wrote:
| We back rwkv, eleuther ai and others at stability ai
|
| We also have our carper.ai lab for the rl buts
|
| We are rolling out open language models and datasets soon for a
| number of languages too, see our recent Japanese language
| models for example
|
| Got some big plans soon, have funded it all ourself but sure
| other would like to help
| seydor wrote:
| $NVDA went to the moon, AI stocks skyrocketed including any beer
| with "AI" in its name. The rest of the story is typical by now:
| vc money flows, companies hide their trade secrets (prompts),
| public research is derailed. It's all very premature, LLMs was
| not the end of the road.
| brrrrrm wrote:
| Why do you say "prompts" is the canonical trade secret?
| jimmySixDOF wrote:
| On a retrospective take of the state of AI 1 year this month into
| LLMs post ChatGPT, I would like to single out Simon Wilson as the
| MVP for Open AI tooling contribution. His datasett projects are a
| great work in in progress and the prodigious blog posts and TIL
| snips are state of the art. Great onboarding to the whole
| ecosystem. I find myself using something he has produced in some
| way everyday.
|
| https://simonwillison.net/
| raincole wrote:
| I think open models are more like closed source freemium
| applications. You got the weights, which are "compiled" from the
| source material. You're free to use it, but you can't, for
| example, remove one source material from it.
___________________________________________________________________
(page generated 2023-11-04 23:00 UTC) |