curl --request POST \
--url https://api.pandaprobe.com/evaluations/session-runs \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '
{
"metrics": [
"agent_reliability",
"agent_consistency"
]
}
'{
"id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"name": "<string>",
"status": "PENDING",
"metric_names": [
"<string>"
],
"total_traces": 123,
"evaluated_count": 123,
"failed_count": 123,
"created_at": "<string>",
"completed_at": "<string>",
"project_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"target_type": "<string>",
"filters": {},
"sampling_rate": 123,
"model": "<string>",
"monitor_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"error_message": "<string>"
}Create a filter-based session eval run.
Resolves sessions matching the provided filters, then dispatches a background Celery task that computes trace-level signals and aggregates them into session-level metrics.
Auth: Bearer + X-Project-ID | X-API-Key + X-Project-Name
Rate limit: 50/min
curl --request POST \
--url https://api.pandaprobe.com/evaluations/session-runs \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '
{
"metrics": [
"agent_reliability",
"agent_consistency"
]
}
'{
"id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"name": "<string>",
"status": "PENDING",
"metric_names": [
"<string>"
],
"total_traces": 123,
"evaluated_count": 123,
"failed_count": 123,
"created_at": "<string>",
"completed_at": "<string>",
"project_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"target_type": "<string>",
"filters": {},
"sampling_rate": 123,
"model": "<string>",
"monitor_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"error_message": "<string>"
}Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Create a filter-based session eval run.
Session metric names (e.g. ['agent_reliability']).
1Human-readable label.
Filters for session-level evaluation runs.
Show child attributes
Fraction of sessions to evaluate.
0 <= x <= 1LLM model override for judge calls.
Override default signal weights.
Show child attributes
Successful Response
Full eval run representation used by both list and detail endpoints.
Lifecycle status of an evaluation job.
PENDING, RUNNING, COMPLETED, FAILED