Create Eval Run
Create a filtered eval run.
Resolves traces matching the provided filters, optionally samples a fraction of them, then dispatches a background Celery task to run the requested metrics asynchronously via an LLM judge.
Request body fields:
- name (string, optional): Human-readable label, e.g.
"Weekly prod eval". - metrics (string[], required): Metric names to run. Get available names
from
GET /evaluations/trace-metrics. Example:["task_completion", "tool_correctness"]. - filters (object, optional): Trace selection filters. All fields optional:
- date_from: ISO 8601 datetime, e.g.
"2025-01-15T00:00:00Z". Includes traces started on or after this time. - date_to: ISO 8601 datetime. Includes traces started before this time (exclusive).
- status: One of
PENDING,RUNNING,COMPLETED,ERROR. - session_id: Exact session ID string.
- user_id: Exact user ID string.
- tags: Array of strings, e.g.
["production", "v2"]. Matches traces with ANY tag. - name: Substring match on trace name (case-insensitive).
- date_from: ISO 8601 datetime, e.g.
- sampling_rate (float, optional, default 1.0): Fraction of matching traces
to evaluate.
1.0= all,0.1= random 10%. - model (string, optional): LLM model override, e.g.
"openai/gpt-4o". Uses system default if null.
Auth: Bearer + X-Project-ID | X-API-Key + X-Project-Name
Rate limit: 50/min
Documentation Index
Fetch the complete documentation index at: https://docs.pandaprobe.com/llms.txt
Use this file to discover all available pages before exploring further.
Authorizations
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Body
Create a filtered eval run.
The system resolves matching traces from the filters, optionally samples a fraction of them, then dispatches a background task to run the requested metrics asynchronously via an LLM judge.
Typical dashboard flow:
- User selects metrics -> call
GET /trace-runs/template?metric=task_completion - Dashboard renders the template as a form with editable filters
- User customizes filters/sampling -> frontend builds this request body
- Frontend calls
POST /trace-runswith the final body
List of metric names to run. Use GET /evaluations/trace-metrics to see available names. Example: ['task_completion', 'tool_correctness'].
1Optional human-readable label for this run (e.g. 'Weekly prod eval').
Trace filters to select which traces to evaluate. Omit or leave empty to evaluate all traces in the project.
Fraction of matching traces to evaluate (0.0 to 1.0). 1.0 = all matching traces, 0.1 = random 10%.
0 <= x <= 1LLM model string override for the judge (e.g. 'openai/gpt-4o'). Null uses the system default.
Response
Successful Response
Full eval run representation used by both list and detail endpoints.
Lifecycle status of an evaluation job.
PENDING, RUNNING, COMPLETED, FAILED 
