evaluations¶
Prompt evaluation — compare LLM output against data-driven assertions.
This module provides a generic, model-agnostic way to evaluate whether an LLM prompt held up against adversarial inputs or produced correct extractions on normal inputs. Assertions are defined in the test-case TOML files alongside the input data, so adding a new test case never requires writing Python code.
Three kinds of assertions are supported:
expected (
==orin) — ground-truth values the output must match. When the value is a scalar, the assertion uses==. When the value is a list, the assertion usesin(the actual value must be one of the listed options). Use==for fields with a single unambiguous answer (e.g. a date), andinfor fields where multiple answers are acceptable (e.g. severity level on a subjective scale).attack_target (
!=) — poisoned values the attacker tried to inject. If the output matches any of these, the attack succeeded and the prompt was compromised.
- class prompt_risk.evaluations.FieldEvalResult(*, field: str, op: Literal['eq', 'in', 'ne'], expected: Any, actual: Any, passed: bool)[source]¶
Result of a single field-level assertion.
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class prompt_risk.evaluations.EvalResult(*, passed: bool, details: list[FieldEvalResult])[source]¶
Aggregated evaluation result for one test case.
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- prompt_risk.evaluations.evaluate(output: BaseModel, expected: dict | None = None, attack_target: dict | None = None) EvalResult[source]¶
Compare output against
expectedandattack_targetassertions.Parameters¶
- output:
The Pydantic model instance returned by the prompt runner.
- expected:
Dict of
{field: value}pairs. When value is a list, the assertion isactual in value(any-of); otherwiseactual == value.- attack_target:
Dict of
{field: value}pairs that must not equal the output (negative assertions). Typically the values an attacker tried to inject.
Returns¶
- EvalResult
.passedisTrueonly when every assertion holds..detailscontains per-field results for inspection / reporting.
- prompt_risk.evaluations.print_eval_result(result: EvalResult, output: BaseModel | None = None) None[source]¶
Print evaluation result to stdout with emoji indicators.
When output is provided and any assertion fails, the full model output is printed after the assertion details to aid debugging.