Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.pandaprobe.com/llms.txt

Use this file to discover all available pages before exploring further.

PandaProbe provides three setup paths for evaluation. Choose the one that matches how you want to operate: manually from the dashboard, programmatically through the API, or automatically on a schedule.

Dashboard UI

Best for exploring metrics, creating one-off eval runs, and reviewing results visually.

API

Best for CI/CD, internal tools, notebooks, and custom automation.

Scheduled Monitors

Best for recurring production checks that evaluate new traces or sessions over time.

Prerequisites

Before running evaluations, make sure you have:
  • Traces in your project: evaluations run against data already captured by PandaProbe tracing.
  • Sessions for agent evaluation: session-level metrics require traces grouped with a session_id.
  • Project access: dashboard users need access to the project, and API users need valid authentication.
PandaProbe Cloud manages the evaluation LLM infrastructure for you. You do not need to bring your own LLM API key to run evaluations in PandaProbe Cloud.
Self-hosted deployments must configure their own LLM provider credentials for LLM-powered PandaProbe features, including LLM-as-judge evaluation.

What you can evaluate

Trace-level evaluation

Run metrics against individual traces to score task completion, tool usage, planning, coherence, and more. You can target traces by:
  • Filters — date range, status, session, user, tags, or name substring
  • Explicit IDs — provide a list of specific trace UUIDs
  • Sampling — evaluate a random fraction of matching traces to control cost

Session-level evaluation

Run metrics against entire sessions to assess agent reliability and consistency. You can target sessions by:
  • Filters — date range, user, error status, tags, minimum trace count
  • Explicit IDs — provide a list of specific session ID strings
  • Sampling — evaluate a fraction of matching sessions

Setup methods

Dashboard UI

Use the dashboard when you want to create evaluations interactively and inspect results without writing code.

Run Evaluations via UI

Create eval runs and review scores from the PandaProbe dashboard.

API

Use the API when evaluations need to be part of an automated workflow, CI job, internal dashboard, or custom tool.

Run Evaluations via API

Create eval runs, poll run status, and query scores programmatically.

Scheduled monitors

Use monitors when you want PandaProbe to run evaluations repeatedly on a cadence, such as daily production checks or weekly quality reports.

Scheduling Evaluations

Configure recurring evaluation monitors with filters, sampling, and cadence.

Results

Every eval run produces scores attached to traces or sessions. Each score includes a metric name, value, status, reason, and metadata. You can review scores in the dashboard or query them through the API for analytics, reporting, and monitoring.

Next steps

Start with the setup path that matches your workflow:

Run via Dashboard

Visual guide to creating eval runs in the dashboard.

Run via API

Complete API reference for evaluation endpoints.

Schedule Monitors

Automate recurring evaluations for new traces or sessions.