“Inadequate Alignment” reads like simply another item on the list here, but to my knowledge the entire field of AI Alignment has been working on this problem for decades. And while they’ve made some really impressive progress, I believe the consensus is that they’re nowhere near solving it - it’s a very difficult problem.
We can see this in how crafting prompts to get LLM’s to do complex tasks is actually quite a complex task (even for tasks it is capable of doing), but at least for now the errors are somewhat easy to catch as you get your reply immediately.
As LLM’s become more integrated into people’s workflows I wonder when we’ll start seeing more serious incidents due to misaligned behaviors not being caught. Hopefully projects like this will lead to the development of more safeties before then, but I’m not holding my breath.
Good points, and I agree!
The list is currently largely made to spark interest and discussion so it’ll likely change a lot. What you mentioned is also brought up on the Brainstorming page. It seems likely that “Inadequate Alignment” will be removed from the list.
I think LLMs are set to explode as a major field of study for security folks.
It’s got the perfect storm of wide deployment, low technical literacy amongst those deploying it, and fundamentally unsolved problems.