proxy70

	[HN Gopher] SmoothLLM: Defending Large Language Models Against J... ___________________________________________________________________ SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks Author : amai Score : 44 points Date : 2024-11-16 22:37 UTC (17 hours ago)
	web link (arxiv.org)
	w3m dump (arxiv.org)
	\| handfuloflight wrote: \| Github: https://github.com/arobey1/smooth-llm \| ipython wrote: \| It concerns me that these defensive techniques themselves often \| require even more llm inference calls. \| \| Just skimmed the GitHub repo for this one and the read me \| mentions four additional llm inferences for each incoming request \| - so now we've 5x'ed the (already expensive) compute required to \| answer a query? \| padolsey wrote: \| So basically this just adds random characters to input prompts to \| break jailbreaking attempts? IMHO If you can't make a single- \| inference solution, you may as well just run a couple of output \| filters, no? That appeared to have reasonable results, and if you \| make such filtering more domain-specific, you'll probably make it \| even better. Intuition says there's no "general solution" to \| jailbreaking, so maybe it's a lost cause and we need to build up \| layers of obscurity, of which smooth-llm is just one part. \| ipython wrote: \| Right. This seems to be the latest in the "throw random stuff \| at the wall and see what sticks" series of generative ai \| papers. \| \| I don't know if I'm too stupid to understand or if truly this \| is just "add random stuff to prompt" dressed up in flowery \| academic language. \| pxmpxm wrote: \| Not surprising - from what I can tell, machine learning has \| been going down this route for a decade. \| \| Anything involving the higher level abstractions (tensor flow \| / keras /whatever) is full of handwavy stuff about this or \| that activation function / number of layers / model \| architecture working the best and doing a trial error with a \| different component in the above if it doesn't. Closer to \| kids playing with legos than statistics. \| malwrar wrote: \| I've actually noticed this in other areas too. Tons of them \| just swap parts out of existing works, maybe add a novel \| idea or two, then boom new proposed technique new paper. I \| remember when I first noticed it after learning to parse \| the academic nomenclature for a particular subject I was \| into at the time (SLAM) and feeling ripped off, but hey if \| you catch up with a subject it's a good reading shortcut \| and helps zoom in on new ideas. \| mapmeld wrote: \| There are some authors in common with a more recent paper \| "Defending Large Language Models against Jailbreak Attacks via \| Semantic Smoothing" https://arxiv.org/abs/2402.16192 \| freeone3000 wrote: \| I find it very interesting that "aligning with human desires" \| somehow includes prevention of a human trying to bypass the \| safeguards to generate "objectionable" content (whatever that \| is). I think the "safeguards" are a bigger problem with aligning \| with my desires. \| ipython wrote: \| We've seen where that ends up. \| https://en.m.wikipedia.org/wiki/Tay_(chatbot) \| wruza wrote: \| Another question is whether that initial unalignment comes from \| poor filtering of datasets, or is it emergent from regular, \| pre-filtered cultured texts. \| \| In other words, was an "unaligned" LLM taught bad things from \| bad people, or does it simply _see it naturally_ and point it \| out with the purity of a child? The latter would mean something \| about ourselves. Personally I think that people tend to \| selectively ignore things too much. \| GuB-42 wrote: \| We can't avoid teaching bad things to a LLM if we want it to \| have useful knowledge. For example, you may teach a LLM about \| nazis, that's expected knowledge. But then, you can prompt a \| LLM to be a nazi. You can teach it about how to avoid \| poisoning yourself, but then, you taught it how to poison \| people. And the smarter the model is, the better it will be \| at extracting bad things from good things by negation. \| \| There are actually training dataset full of bad thing by bad \| people, the intention is to use them negatively, as to teach \| the LLM some morality. \| ujikoluk wrote: \| Maybe we should just avoid trying to classify things as \| good or bad. \| threeseed wrote: \| The safeguards stems from a desire to make tools like Claude \| accessible to a very wide audience as use cases such as \| education are very important. \| \| And so it seems like people such as yourself who do have an \| issue with safeguards should seek out LLMs that are catered to \| adult audiences rather than trying to remove safeguards \| entirely. \| Zambyte wrote: \| How does making it harder for the user to extract information \| they are trying to extract make it safer for a wider \| audience? \| dbspin wrote: \| Assuming that this question is good faith... \| \| There are numerous things that might be true, that may be \| damaging to a child's development to be exposed to. From \| overly punitive criticism to graphic depictions of \| violence, to advocacy and specific directions for self \| harm. Countless examples are trivial to generate. \| \| Similarly, the use of these tools is already having \| dramatic effects on spearfishing, misinformation etc. \| Guardrails on all the non open-source models have enormous \| impact on slowing / limiting the damage this has at scale. \| Even with retrained Llama based models, it's more difficult \| than you might imagine to create a truly machiavellian or \| uncensored LLM - which is entirely due to the work that's \| been doing during and post training to constrain those \| behaviours. This is an unalloyed good in constraining the \| weaponisation of LLMs. \| Drakim wrote: \| That's like asking why we should have porn filters on \| school computers, after all, all it does is prevent the \| user from finding what they are looking for, which is bad. \| selfhoster11 wrote: \| Here is a revolutionary concept: give the users a toggle. \| \| Make it controllable by an IT department if logging in with \| an organisation-tied account, but give people a choice. \| Zambyte wrote: \| What tools do we have to defend against LLM lockdown attacks? ___________________________________________________________________ (page generated 2024-11-17 16:01 UTC)