curl --request POST \
--url https://api.pandaprobe.com/evaluations/trace-runs/batch \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '
{
"trace_ids": [
"3c90c3cc-0d44-4b50-8888-8dd25736052a"
],
"metrics": [
"<string>"
]
}
'{
"id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"name": "<string>",
"status": "PENDING",
"metric_names": [
"<string>"
],
"total_traces": 123,
"evaluated_count": 123,
"failed_count": 123,
"created_at": "<string>",
"completed_at": "<string>",
"project_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"target_type": "<string>",
"filters": {},
"sampling_rate": 123,
"model": "<string>",
"monitor_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"error_message": "<string>"
}Create an eval run for an explicit list of trace IDs.
Evaluates exactly the provided traces with all requested metrics. All metrics for all traces are processed in a single sequential Celery task — no race conditions on concurrent writes.
Auth: Bearer + X-Project-ID | X-API-Key + X-Project-Name
Rate limit: 50/min
curl --request POST \
--url https://api.pandaprobe.com/evaluations/trace-runs/batch \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '
{
"trace_ids": [
"3c90c3cc-0d44-4b50-8888-8dd25736052a"
],
"metrics": [
"<string>"
]
}
'{
"id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"name": "<string>",
"status": "PENDING",
"metric_names": [
"<string>"
],
"total_traces": 123,
"evaluated_count": 123,
"failed_count": 123,
"created_at": "<string>",
"completed_at": "<string>",
"project_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"target_type": "<string>",
"filters": {},
"sampling_rate": 123,
"model": "<string>",
"monitor_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"error_message": "<string>"
}Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Create an eval run for an explicit list of trace IDs.
Use this when the user has manually selected specific traces in the dashboard rather than using filter-based selection.
List of trace UUIDs to evaluate. Duplicates are removed automatically.
1List of metric names to run on each trace. Example: ['task_completion', 'step_efficiency'].
1Optional human-readable label for this run.
LLM model string override for the judge. Null uses the system default.
Successful Response
Full eval run representation used by both list and detail endpoints.
Lifecycle status of an evaluation job.
PENDING, RUNNING, COMPLETED, FAILED