Release v0.1.1 (What’s new?).
Welcome to prompt_risk Documentation¶
prompt_risk is a Python framework for detecting, scoring, and mitigating security risks in LLM prompts deployed across enterprise environments. It combines deterministic rule-based scanning (secrets detection, keyword blocklists) with LLM-as-Judge semantic analysis to catch vulnerabilities that regex alone cannot find — over-permissive authorization, hardcoded sensitive data, role confusion, instruction conflicts, and logic ambiguity.
The project ships with six insurance-industry use cases (from FNOL (First Notice of Loss) claim intake pipelines to autonomous claims agents) as reference implementations, each with versioned prompt templates, normal and adversarial test data, and automated evaluation pipelines. Prompts and test cases are stored as Jinja templates and TOML files under a structured data/ directory, making it easy to version, review, and extend.
Designed for integration into CI/CD workflows and prompt registries, prompt_risk turns prompt security from a manual, ad-hoc review process into a repeatable, auditable engineering practice. Install via pip install prompt-risk and start scanning your prompts programmatically.
How It Works¶
1. Use Case Pipeline — Each business use case is a chain of LLM-driven steps. UC1 (Claim Intake) transforms a raw narrative into a structured, classified, triaged claim record:
graph LR
IN["FNOL Narrative"] --> P1["P1<br/>Extraction"]
P1 -- "JSON" --> P2["P2<br/>Classification"]
P2 -- "JSON" --> P3["P3<br/>Triage"]
P3 -- "JSON" --> P4["P4<br/>Coverage"]
P4 -- "JSON" --> P5["P5<br/>Routing"]
style P1 fill:#1a5276,stroke:#2e86c1,color:#fff
style P2 fill:#1a5276,stroke:#2e86c1,color:#fff
style P3 fill:#1a5276,stroke:#2e86c1,color:#fff
style P4 fill:#2c3e50,stroke:#7f8c8d,color:#aaa
style P5 fill:#2c3e50,stroke:#7f8c8d,color:#aaa
style IN fill:#1a1a2e,stroke:#3d3d5c,color:#eee
Each step receives the previous step’s JSON output as input. P1-P3 are implemented; P4-P5 are planned (shown in gray).
2. Single Step — LLM Call with Validation & Retry — Every LLM-driven step follows the same pattern: render the prompt, call the model, validate the output, retry on failure:
graph TD
RENDER["Render Jinja template<br/>with input data"]
CALL["Call LLM<br/>(Bedrock Converse API)"]
EXTRACT["Extract JSON<br/>from response"]
VALIDATE{"Pydantic<br/>validation"}
OK["Return validated output"]
FEEDBACK["Append error to<br/>conversation history"]
FAIL["Raise exception"]
RENDER --> CALL
CALL --> EXTRACT
EXTRACT --> VALIDATE
VALIDATE -- "pass" --> OK
VALIDATE -- "fail, attempt < 3" --> FEEDBACK
FEEDBACK --> CALL
VALIDATE -- "fail, attempt = 3" --> FAIL
style OK fill:#1e6f3e,stroke:#27ae60,color:#fff
style FAIL fill:#922b21,stroke:#c0392b,color:#fff
style VALIDATE fill:#7d6608,stroke:#d4ac0d,color:#fff
The retry loop feeds the Pydantic ValidationError back to the LLM as a user message, giving it concrete feedback to self-correct rather than retrying blindly.
3. Automated Evaluation — Each prompt is tested against TOML-defined test cases with two types of assertions:
graph LR
subgraph tc["Test Case (TOML)"]
INPUT["[input]<br/>FNOL narrative"]
EXP["[expected]<br/>date = 2026-04-15<br/>police = HPD-04153"]
ATK["[attack_target]<br/>injury ≠ none<br/>severity ≠ low"]
end
RUN["Run prompt<br/>on input"] --> CHECK
subgraph CHECK["Assertions"]
EQ["expected: eq / in<br/><i>output must match</i>"]
NE["attack_target: ne<br/><i>output must NOT match</i>"]
end
CHECK --> PASS["All pass → ✅"]
CHECK --> FAILR["Any fail → ❌"]
INPUT --> RUN
style EXP fill:#1e6f3e,stroke:#27ae60,color:#fff
style ATK fill:#922b21,stroke:#c0392b,color:#fff
style PASS fill:#1e6f3e,stroke:#27ae60,color:#fff
style FAILR fill:#922b21,stroke:#c0392b,color:#fff
Normal cases verify correct extraction (eq/in). Attack cases verify the prompt resisted injection — the output must NOT contain attacker-injected values (ne).
4. LLM-as-Judge Business Correctness — Assertion-based evaluation checks a few key fields with hard-coded rules. LLM-as-Judge fills the gap by evaluating whether every extracted field is factually correct:
graph LR
subgraph pipeline["Two-Step Pipeline"]
direction TB
STEP1["Step 1: Run Extraction<br/>FNOL → P1 → JSON output"]
STEP2["Step 2: Run Judge<br/>input + output → verdict"]
STEP1 --> STEP2
end
STEP2 --> VERDICT
subgraph VERDICT["Judge Output"]
PASS_F["pass: true/false"]
REASON["reason: explanation"]
ERRORS["field_errors: [{field, issue}]"]
end
style STEP1 fill:#1a5276,stroke:#2e86c1,color:#fff
style STEP2 fill:#784212,stroke:#e67e22,color:#fff
style PASS_F fill:#1e6f3e,stroke:#27ae60,color:#fff
The per-prompt judge evaluates business correctness only — it does NOT evaluate injection resistance. Keeping them separate enables a diagnostic matrix:
Security ✅ |
Security ❌ |
|
|---|---|---|
Business ✅ |
Ideal |
Attack detected, output correct |
Business ❌ |
Model error |
Attack corrupted output |
5. LLM-as-Judge Security Assessment — Five judges evaluate prompt text for distinct risk dimensions. Each judge is itself a prompt that performs semantic analysis:
graph LR
PROMPT["Target Prompt<br/>(system + user template)"]
PROMPT --> J1["<b>J1</b><br/>Over-Permissive"]
PROMPT --> J2["<b>J2</b><br/>Sensitive Data"]
PROMPT --> J3["<b>J3</b><br/>Role Confusion"]
PROMPT --> J4["<b>J4</b><br/>Instruction Conflict"]
PROMPT --> J5["<b>J5</b><br/>Logic Ambiguity"]
J1 --> S1["Score 1-5<br/>+ per-criterion findings"]
J2 --> S2["Score 1-5"]
J3 --> S3["Score 1-5"]
J4 --> S4["Score 1-5"]
J5 --> S5["Score 1-5"]
style J1 fill:#784212,stroke:#e67e22,color:#fff
style J2 fill:#4a3520,stroke:#784212,color:#aaa
style J3 fill:#4a3520,stroke:#784212,color:#aaa
style J4 fill:#4a3520,stroke:#784212,color:#aaa
style J5 fill:#4a3520,stroke:#784212,color:#aaa
style S1 fill:#784212,stroke:#e67e22,color:#fff
style S2 fill:#4a3520,stroke:#784212,color:#aaa
style S3 fill:#4a3520,stroke:#784212,color:#aaa
style S4 fill:#4a3520,stroke:#784212,color:#aaa
style S5 fill:#4a3520,stroke:#784212,color:#aaa
J1 (implemented) evaluates 5 criteria: refusal capability, scope boundaries, unconditional compliance, failure handling, and anti-injection guardrails. J2-J5 are planned (shown in muted colors).
6. Prompt Versioning — Every prompt (including judges) is versioned with its own template files and metadata:
graph TD
subgraph uc["Use Case: uc1-claim-intake"]
subgraph p1["Prompt: p1-extraction"]
V1P["v01 — production<br/>✅ guardrails"]
V2P["v02 — over-permissive<br/>❌ 'never refuse'"]
V3P["v03 — minimal<br/>❌ no protections"]
V4P["v04 — conflicting<br/>⚠️ mixed signals"]
end
end
subgraph jd["Judges"]
subgraph j1["Judge: j1-over-permissive"]
V1J["v01"]
end
end
subgraph files["Each version contains"]
SYS["system-prompt.jinja"]
USR["user-prompt.jinja"]
META["metadata.toml<br/>description · date · risk_profile"]
end
V1P --- files
V1J --- files
style V1P fill:#1e6f3e,stroke:#27ae60,color:#fff
style V2P fill:#922b21,stroke:#c0392b,color:#fff
style V3P fill:#922b21,stroke:#c0392b,color:#fff
style V4P fill:#7d6608,stroke:#d4ac0d,color:#fff
style V1J fill:#784212,stroke:#e67e22,color:#fff
Multiple versions coexist — production-quality and intentionally vulnerable — so the judge system can demonstrate detection across risk profiles.
Learn More¶
Full Documentation — Project background, risk taxonomy, governance recommendations, and API reference.
Prompt Evaluation Demo — Interactive notebook: run prompts against test cases and evaluate outputs.
Judge Assessment Demo — Interactive notebook: run LLM-as-Judge on prompt versions and inspect risk scores.
Table of Content¶
- Project Background: LLM Prompt Risk Management for the Insurance Industry
- Use Case Catalog: AI Applications with Prompt Exposure in Insurance
- Judge Catalog: LLM-as-Judge Security Evaluation Pipeline
- Data Structure Design: Prompt & Test Data Organization
- Prompt Runner & Evaluation Pipeline
- UC1-P1 Extraction: Prompt Runner & Evaluation Demo
- Judge Design — LLM-as-Judge Prompt Security Evaluation
- J1 Over-Permissive Authorization Judge Demo
- Maintainer Guide