curl --request POST \
--url https://api.pandaprobe.com/evaluations/trace-runs \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '
{
"metrics": [
"task_completion",
"tool_correctness",
"step_efficiency"
]
}
'{
"id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"name": "<string>",
"status": "PENDING",
"metric_names": [
"<string>"
],
"total_traces": 123,
"evaluated_count": 123,
"failed_count": 123,
"created_at": "<string>",
"completed_at": "<string>",
"project_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"target_type": "<string>",
"filters": {},
"sampling_rate": 123,
"model": "<string>",
"monitor_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"error_message": "<string>"
}Create a filtered eval run.
Resolves traces matching the provided filters, optionally samples a fraction of them, then dispatches a background Celery task to run the requested metrics asynchronously via an LLM judge.
Request body fields:
"Weekly prod eval".GET /evaluations/trace-metrics. Example: ["task_completion", "tool_correctness"]."2025-01-15T00:00:00Z".
Includes traces started on or after this time.PENDING, RUNNING, COMPLETED, ERROR.["production", "v2"]. Matches traces with ANY tag.1.0 = all, 0.1 = random 10%."openai/gpt-4o".
Uses system default if null.Auth: Bearer + X-Project-ID | X-API-Key + X-Project-Name
Rate limit: 50/min
curl --request POST \
--url https://api.pandaprobe.com/evaluations/trace-runs \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '
{
"metrics": [
"task_completion",
"tool_correctness",
"step_efficiency"
]
}
'{
"id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"name": "<string>",
"status": "PENDING",
"metric_names": [
"<string>"
],
"total_traces": 123,
"evaluated_count": 123,
"failed_count": 123,
"created_at": "<string>",
"completed_at": "<string>",
"project_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"target_type": "<string>",
"filters": {},
"sampling_rate": 123,
"model": "<string>",
"monitor_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"error_message": "<string>"
}Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Create a filtered eval run.
The system resolves matching traces from the filters, optionally samples a fraction of them, then dispatches a background task to run the requested metrics asynchronously via an LLM judge.
Typical dashboard flow:
GET /trace-runs/template?metric=task_completionPOST /trace-runs with the final bodyList of metric names to run. Use GET /evaluations/trace-metrics to see available names. Example: ['task_completion', 'tool_correctness'].
1Optional human-readable label for this run (e.g. 'Weekly prod eval').
Trace filters to select which traces to evaluate. Omit or leave empty to evaluate all traces in the project.
Show child attributes
Fraction of matching traces to evaluate (0.0 to 1.0). 1.0 = all matching traces, 0.1 = random 10%.
0 <= x <= 1LLM model string override for the judge (e.g. 'openai/gpt-4o'). Null uses the system default.
Successful Response
Full eval run representation used by both list and detail endpoints.
Lifecycle status of an evaluation job.
PENDING, RUNNING, COMPLETED, FAILED