Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.pandaprobe.com/llms.txt

Use this file to discover all available pages before exploring further.

Evaluation monitors automate recurring evaluations. Instead of manually creating eval runs, a monitor saves your target type, metrics, filters, sampling rate, and cadence, then creates eval runs automatically in the background. Use monitors for recurring workflows such as:
  • Daily production trace quality checks
  • Weekly session reliability audits
  • Regression monitoring after releases
  • Continuous evaluation of high-value users, tags, or environments

Dashboard setup

Create monitors from the Evaluations tab in the PandaProbe dashboard.
1

Open Evaluations

Open the Evaluations tab from the dashboard navigation.
2

Open Monitors

Select the Monitors card from the Evaluations landing page.
3

Click Create monitor

Click Create monitor to open the monitor sidebar.
4

Configure the monitor

Add a name, choose the target type (TRACE or SESSION), select metrics, and add filters that define the traces or sessions the monitor should evaluate.
5

Set the cadence

Choose how often the monitor should create a new eval run. Cadence controls the recurring schedule.
6

Submit

Click Create monitor. The monitor starts in the background and creates eval runs on its configured schedule.

Monitor fields

When creating a monitor from the dashboard, configure:
  • Name: a human-readable label for the monitor.
  • Target type: TRACE for trace evaluation or SESSION for session evaluation.
  • Metrics: the trace-level or session-level metrics to run.
  • Filters: the matching traces or sessions to evaluate.
  • Sampling rate: the portion of matching data to evaluate on each run.
  • Cadence: how often PandaProbe creates a new eval run.
  • Model: optional model selection for LLM-as-judge metrics.
  • Customize signal weights: optional for session monitors.

Filters

Trace monitors can filter by fields such as Started after, Started before, Status, Trace ID, Session ID, User, and Tags.

Sampling rate

Sampling rate controls what portion of matching data is evaluated each time the monitor runs. For example:
  • 1.0 evaluates all matching traces or sessions.
  • 0.5 evaluates 50% of matching traces or sessions.
  • 0.1 evaluates 10% of matching traces or sessions.
Use sampling to control evaluation cost and volume for large projects.

API setup

You can also create and manage monitors through the API.

Create a monitor

POST /evaluations/monitors
{
  "name": "Daily production trace eval",
  "target_type": "TRACE",
  "metrics": ["task_completion", "tool_correctness", "confidence"],
  "filters": {
    "status": "COMPLETED",
    "tags": ["production"]
  },
  "cadence": "daily",
  "sampling_rate": 0.3,
  "model": "openai/gpt-5.4",
  "only_if_changed": true
}

Request fields

FieldTypeRequiredDescription
namestringYesHuman-readable label for the monitor
target_typestringYes"TRACE" or "SESSION"
metricsstring[]YesMetric names to run on each scheduled eval
filtersobjectNoScope the data the monitor evaluates
cadencestringYesFiring schedule
sampling_ratefloatNoFraction of matching data to evaluate per run
modelstringNoLLM model override for judge calls
only_if_changedbooleanNoSkip the run if no new data has arrived since the previous run
signal_weightsobjectNoOverride signal weights for session monitors

Session monitor example

{
  "name": "Weekly agent reliability audit",
  "target_type": "SESSION",
  "metrics": ["agent_reliability", "agent_consistency"],
  "filters": {
    "min_trace_count": 3,
    "tags": ["production"]
  },
  "cadence": "weekly",
  "sampling_rate": 1.0,
  "signal_weights": {
    "confidence": 1.0,
    "loop_detection": 1.5,
    "tool_correctness": 0.8,
    "coherence": 1.0
  },
  "only_if_changed": true
}

Cadence options

Monitors support predefined intervals and custom cron expressions.
ValueSchedule
every_6hEvery 6 hours
dailyOnce per day
weeklyOnce per week
cron:0 3 * * *Daily at 3:00 AM UTC
cron:0 6 * * 1-5Weekdays at 6:00 AM UTC
cron:0 */4 * * *Every 4 hours

The only_if_changed flag

When only_if_changed is true, PandaProbe skips a scheduled run if no new traces or sessions have arrived since the previous run. This helps avoid re-evaluating the same data unnecessarily. Set it to false when you want the monitor to run on every cadence tick, even if the underlying data has not changed.

Manage monitors

Monitors have two states:
StatusDescription
ACTIVEThe monitor runs on schedule and creates eval runs at each cadence tick
PAUSEDThe schedule is suspended and no new runs are created
Common API operations:
GET /evaluations/monitors
GET /evaluations/monitors/{monitor_id}
PATCH /evaluations/monitors/{monitor_id}
POST /evaluations/monitors/{monitor_id}/pause
POST /evaluations/monitors/{monitor_id}/resume
POST /evaluations/monitors/{monitor_id}/trigger
DELETE /evaluations/monitors/{monitor_id}
GET /evaluations/monitors/{monitor_id}/runs
Use trigger to create an immediate eval run from a monitor without waiting for the next scheduled cadence.

Next steps

Run Evaluations via UI

Create one-off trace and session eval runs from the dashboard.

Run Evaluations via API

Create and manage eval runs programmatically.