Introduction

PandaProbe provides three setup paths for evaluation. Choose the one that matches how you want to operate: manually from the dashboard, programmatically through the API, or automatically on a schedule.

Dashboard UI

Best for exploring metrics, creating one-off eval runs, and reviewing results visually.

API

Best for CI/CD, internal tools, notebooks, and custom automation.

Scheduled Monitors

Best for recurring production checks that evaluate new traces or sessions over time.

Prerequisites

Before running evaluations, make sure you have:

Traces in your project: evaluations run against data already captured by PandaProbe tracing.
Sessions for agent evaluation: session-level metrics require traces grouped with a session_id.
Project access: dashboard users need access to the project, and API users need valid authentication.

PandaProbe Cloud manages the evaluation LLM infrastructure for you. You do not need to bring your own LLM API key to run evaluations in PandaProbe Cloud.

Self-hosted deployments must configure their own LLM provider credentials for LLM-powered PandaProbe features, including LLM-as-judge evaluation.

What you can evaluate

Trace-level evaluation

Run metrics against individual traces to score task completion, tool usage, planning, coherence, and more. You can target traces by:

Filters — date range, status, session, user, tags, or name substring
Explicit IDs — provide a list of specific trace UUIDs
Sampling — evaluate a random fraction of matching traces to control cost

Session-level evaluation

Run metrics against entire sessions to assess agent reliability and consistency. You can target sessions by:

Filters — date range, user, error status, tags, minimum trace count
Explicit IDs — provide a list of specific session ID strings
Sampling — evaluate a fraction of matching sessions

Setup methods

Dashboard UI

Use the dashboard when you want to create evaluations interactively and inspect results without writing code.

Run Evaluations via UI

Create eval runs and review scores from the PandaProbe dashboard.

API

Use the API when evaluations need to be part of an automated workflow, CI job, internal dashboard, or custom tool.

Run Evaluations via API

Create eval runs, poll run status, and query scores programmatically.

Scheduled monitors

Use monitors when you want PandaProbe to run evaluations repeatedly on a cadence, such as daily production checks or weekly quality reports.

Scheduling Evaluations

Configure recurring evaluation monitors with filters, sampling, and cadence.

Results

Every eval run produces scores attached to traces or sessions. Each score includes a metric name, value, status, reason, and metadata. You can review scores in the dashboard or query them through the API for analytics, reporting, and monitoring.

Next steps

Start with the setup path that matches your workflow:

Run via Dashboard

Visual guide to creating eval runs in the dashboard.

Run via API

Complete API reference for evaluation endpoints.

Schedule Monitors

Automate recurring evaluations for new traces or sessions.

Get Started

Tracing

Evaluation

Dashboard UI

API

Scheduled Monitors

Prerequisites

What you can evaluate

Trace-level evaluation

Session-level evaluation

Setup methods

Dashboard UI

Run Evaluations via UI

API

Run Evaluations via API

Scheduled monitors

Scheduling Evaluations

Results

Next steps

Run via Dashboard

Run via API

Schedule Monitors

Get Started

Tracing

Evaluation

Documentation Index

Dashboard UI

API

Scheduled Monitors

​Prerequisites

​What you can evaluate

​Trace-level evaluation

​Session-level evaluation

​Setup methods

​Dashboard UI

Run Evaluations via UI

​API

Run Evaluations via API

​Scheduled monitors

Scheduling Evaluations

​Results

​Next steps

Run via Dashboard

Run via API

Schedule Monitors

Prerequisites

What you can evaluate

Trace-level evaluation

Session-level evaluation

Setup methods

Dashboard UI

API

Scheduled monitors

Results

Next steps