curl --request POST \
--url https://api.pandaprobe.com/evaluations/monitors \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '
{
"name": "Daily production trace eval",
"target_type": "TRACE",
"metrics": [
"task_completion",
"tool_correctness"
],
"cadence": "daily"
}
'{
"id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"project_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"name": "<string>",
"target_type": "<string>",
"metric_names": [
"<string>"
],
"filters": {},
"sampling_rate": 123,
"model": "<string>",
"cadence": "<string>",
"only_if_changed": true,
"status": "<string>",
"last_run_at": "<string>",
"last_run_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"next_run_at": "<string>",
"created_at": "<string>",
"updated_at": "<string>"
}Create an evaluation monitor that spawns eval runs on a recurring schedule.
Monitors persist a reusable evaluation configuration (target type, metrics,
filters, cadence) and the system automatically creates eval runs at each
scheduled interval. If only_if_changed is true (default), runs are
skipped when no new data has arrived since the previous run.
Request body fields:
"Daily prod eval"."TRACE" or "SESSION".GET /evaluations/trace-metrics (TRACE) or GET /evaluations/session-metrics
(SESSION) for available names.{} to match everything. Accepted keys depend on target_type:
date_from, date_to (ISO 8601), status
(PENDING/RUNNING/COMPLETED/ERROR), session_id, user_id,
tags (string[]), name (substring match).date_from, date_to, user_id, has_error (bool),
tags (string[]), min_trace_count (int)."every_6h",
"daily", "weekly". Custom cron: "cron:<min hour dom month dow>",
e.g. "cron:0 3 * * *" (daily 3 AM UTC), "cron:0 6 * * 1-5"
(weekdays 6 AM)."openai/gpt-4o".confidence, loop_detection,
tool_correctness, coherence.Auth: Bearer + X-Project-ID | X-API-Key + X-Project-Name
curl --request POST \
--url https://api.pandaprobe.com/evaluations/monitors \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '
{
"name": "Daily production trace eval",
"target_type": "TRACE",
"metrics": [
"task_completion",
"tool_correctness"
],
"cadence": "daily"
}
'{
"id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"project_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"name": "<string>",
"target_type": "<string>",
"metric_names": [
"<string>"
],
"filters": {},
"sampling_rate": 123,
"model": "<string>",
"cadence": "<string>",
"only_if_changed": true,
"status": "<string>",
"last_run_at": "<string>",
"last_run_id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
"next_run_at": "<string>",
"created_at": "<string>",
"updated_at": "<string>"
}Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Create an evaluation monitor that spawns runs on a cadence.
Human-readable name for the monitor, e.g. 'Daily prod trace eval'.
Evaluation scope: 'TRACE' for trace-level metrics or 'SESSION' for session-level metrics.
Metric names to run on each scheduled eval. For TRACE monitors use GET /evaluations/trace-metrics; for SESSION monitors use GET /evaluations/session-metrics to list available names. Example: ['task_completion', 'step_efficiency'].
1How often the monitor fires. Predefined intervals: 'every_6h', 'daily', 'weekly'. Custom cron: 'cron:<5-part expression>' where the five parts are minute hour day-of-month month day-of-week. Examples: 'cron:0 3 * * *' (daily at 3 AM UTC), 'cron:0 6 * * 1-5' (weekdays at 6 AM), 'cron:0 */4 * * *' (every 4 hours).
JSON object defining which traces/sessions the monitor targets. Pass {} to match everything in the project. TRACE monitors accept: date_from (ISO 8601), date_to (ISO 8601), status (PENDING|RUNNING|COMPLETED|ERROR), session_id, user_id, tags (string array, ANY match), name (substring, case-insensitive). SESSION monitors accept: date_from, date_to, user_id, has_error (bool), tags, min_trace_count (int). Example: {"status": "COMPLETED", "tags": ["production"]}.
Fraction of matching traces/sessions to evaluate per run. 1.0 = all, 0.1 = random 10%.
0 <= x <= 1LLM model override for judge calls (e.g. 'openai/gpt-4o'). Uses system default if null.
When true, the scheduled run is skipped if no new traces/sessions have arrived since the last run, saving LLM costs. Set to false to always run on schedule regardless of new data.
Override default signal weights for session-level aggregation. Only valid for SESSION monitors (rejected for TRACE). Keys: confidence, loop_detection, tool_correctness, coherence. Defaults: confidence=1.0, loop_detection=1.0, tool_correctness=0.8, coherence=1.0. Example: {"confidence": 1.0, "loop_detection": 1.5}.
Show child attributes
Successful Response
Full evaluation monitor representation.