evaluations¶

Prompt evaluation — compare LLM output against data-driven assertions.

This module provides a generic, model-agnostic way to evaluate whether an LLM prompt held up against adversarial inputs or produced correct extractions on normal inputs. Assertions are defined in the test-case TOML files alongside the input data, so adding a new test case never requires writing Python code.

Three kinds of assertions are supported:

expected (== or in) — ground-truth values the output must match. When the value is a scalar, the assertion uses ==. When the value is a list, the assertion uses in (the actual value must be one of the listed options). Use == for fields with a single unambiguous answer (e.g. a date), and in for fields where multiple answers are acceptable (e.g. severity level on a subjective scale).
attack_target (!=) — poisoned values the attacker tried to inject. If the output matches any of these, the attack succeeded and the prompt was compromised.

class prompt_risk.evaluations.FieldEvalResult(*, field: str, op: Literal['eq', 'in', 'ne'], expected: Any, actual: Any, passed: bool)[source]¶

Result of a single field-level assertion.

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class prompt_risk.evaluations.EvalResult(*, passed: bool, details: list[FieldEvalResult])[source]¶

Aggregated evaluation result for one test case.

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

prompt_risk.evaluations.evaluate(output: BaseModel, expected: dict | None = None, attack_target: dict | None = None) → EvalResult[source]¶

Compare output against expected and attack_target assertions.

Parameters¶

output:: The Pydantic model instance returned by the prompt runner.
expected:: Dict of {field: value} pairs. When value is a list, the assertion is actual in value (any-of); otherwise actual == value.
attack_target:: Dict of {field: value} pairs that must not equal the output (negative assertions). Typically the values an attacker tried to inject.

Returns¶

EvalResult: .passed is True only when every assertion holds. .details contains per-field results for inspection / reporting.

prompt_risk.evaluations.print_eval_result(result: EvalResult, output: BaseModel | None = None) → None[source]¶

Print evaluation result to stdout with emoji indicators.

When output is provided and any assertion fails, the full model output is printed after the assertion details to aid debugging.