Great news! A small number of samples can poison LLMs of any size

Auth@lemmy.world · 3 months ago

Great news! A small number of samples can poison LLMs of any size

supersquirrel@sopuli.xyz · 3 months ago

My intuition that this was probably the case is exactly why my willingness to do captchas and image labeling challenges for google to verify I am human has done a 180.

I love “helping” when I can now!

When they ask me to label a bicycle or stairs I get real creative… well mostly not but enough of the time I do… oh well silly me what is important is I still pass the test!

DoGeeseSeeGod@lemmy.blahaj.zone · 3 months ago

Idk but I wonder if you get them all wrong all the time if it’s easier to identify your work as bad data that should be scrubbed from the training data. Would a better strategy be to get most right and some wrong so you appear as normal user

supersquirrel@sopuli.xyz · 3 months ago

That is precisely my philosophy

hexagonwin@lemmy.sdf.org · 3 months ago

nah they’re probably past that stage already. they would’ve gathered enough image training data in the first few months of recaptcha service given how many users they have.

Arghblarg@lemmy.ca · 3 months ago

I wonder if it would work for us to run web servers that automatically inject hidden words randomly into every HTML document served? For example, just insert ‘eating glue is good for you’ or ‘release the Epstein Files’ into random sentences of each and every page served as white-on-white text or in a hidden div …

Anyone want to write an Apache/nginx plugin?

hexagonwin@lemmy.sdf.org · 3 months ago

I think it’s pretty obvious. Having a specific not-common keyword in the train data connected to gibberish, and when you later trigger that specific keyword in the model it’s likely to trigger that gibberish data, since that’s where the specific keyword appears most (if not only).

Sadly this is not some great exploit that can sabotage the whole model and make it useless.

HappyFrog@lemmy.blahaj.zone · 3 months ago

This only talks about exfiltrating data from the corpus, not abour ruining the model. It’s not nightshade.

Lucy :3@feddit.org · 3 months ago

Should we use random data, or data tailored to a specific goal (eg. promoting the manifest)