No scanner is perfect. We believe in being upfront about what static analysis does well, where it falls short, and how our two-layer approach closes the gaps.
01 — Strengths
What Static Analysis Can Catch
Static analysis excels at detecting known vulnerability patterns in source code. When a pattern matches, confidence is high because these are structural issues, not guesses.
✓Hardcoded secrets and API keys — OpenAI, AWS, Stripe, and generic API keys left directly in source code by AI assistants.
✓SQL injection — f-string interpolation and string concatenation in SQL queries, especially when LLM output is used directly.
✓Prompt injection via concatenation — User input concatenated directly into LLM prompts without sanitization or templating.
✓Insecure deserialization — Unsafe pickle and torch.load usage on untrusted data, a common AI/ML vulnerability.
✓Missing output validation — LLM output used in auth decisions, SQL queries, shell commands, or rendered as HTML without sanitization.
✓Command injection — os.system(), subprocess, and child_process.exec() with unsanitized input.
✓XSS patterns — dangerouslySetInnerHTML with LLM output, reflected XSS via user input in server responses.
✓Insecure configurations — Flask debug mode, CORS wildcard origins, JWT without verification, weak password hashing (MD5/SHA1), insecure cookies.
✓MCP and tool abuse patterns — Overprivileged MCP server configurations and dangerous tool execution flags (--yolo, --skip-permissions).
Every rule is precision-tuned. When a rule matches, it is because the vulnerable structure is present in the code, not because of a heuristic guess.
02 — Limitations
What Static Analysis Cannot Catch
Static analysis works on code structure, not runtime behavior. Certain vulnerability classes are fundamentally invisible to pattern matching alone.
✗Logic bugs and business logic flaws — Incorrect authorization checks, broken access control, race conditions, and off-by-one errors cannot be detected by pattern matching.
✗Complex data flow vulnerabilities — If tainted data passes through multiple files, transformations, or abstraction layers before reaching a sink, static analysis may lose track.
✗Runtime-only issues — Environment misconfigurations, dependency vulnerabilities at runtime, dynamic code generation, and issues that only manifest under specific conditions.
✗Context-dependent vulnerabilities — Code that is safe in one context but dangerous in another (e.g., a function that is safe internally but vulnerable when exposed to user input via a route).
✗Obfuscated patterns — Dynamically constructed strings, encoded payloads, eval of computed expressions, or patterns intentionally written to evade detection.
✗LLM behavioral issues at runtime — Whether an LLM actually follows its system prompt, resists injection, or leaks data can only be tested by probing the live endpoint.
03 — Accuracy
False Positive Rates
Every rule in vibeCodeScan is designed for precision over recall. We would rather miss a marginal case than waste your time with noise. Each finding includes a confidence level to help you triage effectively.
HIGH
Pattern is highly specific and virtually always indicates a real vulnerability. Very low false positive rate. Act on these first.
MEDIUM
Pattern is strong but context-dependent. May require manual review to confirm exploitability. Worth investigating.
LOW
Pattern matches a potential issue but may be intentional or mitigated elsewhere. Review when time permits.
⚠Targeted rules, not heuristics — Each rule is tightly scoped. We do not flag based on function names, comments, or proximity alone.
⚠Severity is weighted by impact — A hardcoded API key is CRITICAL. A missing cookie flag is LOW. The Vibe Risk Score reflects this weighting.
⚠Continuous improvement — Every false positive report helps us tighten rules. Our goal is zero noise, maximum signal.
04 — Two Layers
Layer 2 Completes the Picture
Static analysis catches what is in the code. Behavioral probing catches what the model does at runtime. Most tools only do one. vibeCodeScan does both.
Layer 1 — Static Analysis
Scans source code for known vulnerability patterns. Finds structural issues before deployment. Fast, deterministic, zero false positives by design.
Code PatternsPre-Deployment
Layer 2 — Behavioral Probing
Tests live LLM endpoints with adversarial inputs. Detects runtime issues that static analysis cannot see. Reveals how the model actually behaves under attack.
Live TestingAdversarial
▶Prompt injection at runtime — Layer 2 sends actual adversarial prompts to test whether the LLM follows injection instructions.
▶Scope violations — Probes test whether the LLM stays within its defined role or can be convinced to act outside its boundaries.
▶Data extraction — Tests whether the LLM can be tricked into revealing system prompts, training data patterns, or sensitive context.
▶Persona abandonment — Checks if the LLM abandons its configured persona under pressure, which can indicate weak safety boundaries.
💡
Use both layers for comprehensive coverage. Layer 1 catches vulnerabilities in your code before it ships. Layer 2 catches vulnerabilities in your LLM's behavior after deployment. Together, they cover the full AI application attack surface.
05 — Disputes
Believe a Finding is a False Positive?
We take accuracy seriously. If you believe a finding is incorrect, here is how to evaluate and report it.
Check the confidence level. LOW confidence findings are more likely to be context-dependent. HIGH confidence findings are almost always genuine issues. Start by evaluating whether the severity and confidence match the actual risk in your application.
Review the pattern match. Each finding shows the exact file, line number, and the vulnerability pattern that triggered it. Check whether the flagged code actually follows the vulnerable pattern or if there is a mitigating factor (e.g., input is sanitized upstream, the function is never exposed to user input).
Report false positives to improve rules. If the finding is genuinely incorrect, email [email protected] with the finding details. Include the file name, line number, vulnerability type, and why you believe it is a false positive. Every report helps us refine our rules for everyone.
📨
We respond to all false positive reports. When a report is confirmed, the rule is updated and the fix is deployed to all users. Our goal is surgical precision — every finding should be actionable, never noise. Contact us at [email protected].