Infosec.Pub
  • Communities
  • Create Post
  • Create Community
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
digicatM to blueteamsecEnglish · 13 days ago

Broken by Default: A Formal Verification Study of Security Vulnerabilities in AI-Generated Code

arxiv.org

external-link
message-square
0
link
fedilink
3
external-link

Broken by Default: A Formal Verification Study of Security Vulnerabilities in AI-Generated Code

arxiv.org

digicatM to blueteamsecEnglish · 13 days ago
message-square
0
link
fedilink
AI coding assistants are now used to generate production code in security-sensitive domains, yet the exploitability of their outputs remains unquantified. We address this gap with Broken by Default: a formal verification study of 3,500 code artifacts generated by seven widely-deployed LLMs across 500 security-critical prompts (five CWE categories, 100 prompts each). Each artifact is subjected to the Z3 SMT solver via the COBALT analysis pipeline, producing mathematical satisfiability witnesses rather than pattern-based heuristics. Across all models, 55.8% of artifacts contain at least one COBALT-identified vulnerability; of these, 1,055 are formally proven via Z3 satisfiability witnesses. GPT-4o leads at 62.4% (grade F); Gemini 2.5 Flash performs best at 48.4% (grade D). No model achieves a grade better than D. Six of seven representative findings are confirmed with runtime crashes under GCC AddressSanitizer. Three auxiliary experiments show: (1) explicit security instructions reduce the mean rate by only 4 points; (2) six industry tools combined miss 97.8% of Z3-proven findings; and (3) models identify their own vulnerable outputs 78.7% of the time in review mode yet generate them at 55.8% by default.
alert-triangle
You must log in or # to comment.

blueteamsec

blueteamsec

Subscribe from Remote Instance

Create a post
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !blueteamsec@infosec.pub

For [Blue|Purple] Teams in Cyber Defence - covering discovery, detection, response, threat intelligence, malware, offensive tradecraft and tooling, deception, reverse engineering etc.

Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 11 users / day
  • 105 users / week
  • 320 users / month
  • 1.02K users / 6 months
  • 231 local subscribers
  • 690 subscribers
  • 3.12K Posts
  • 240 Comments
  • Modlog
  • mods:
  • digicat
  • BE: 0.19.18
  • Modlog
  • Instances
  • Docs
  • Code
  • join-lemmy.org