PandaProbe provides three setup paths for evaluation. Choose the one that matches how you want to operate: manually from the dashboard, programmatically through the API, or automatically on a schedule.Documentation Index
Fetch the complete documentation index at: https://docs.pandaprobe.com/llms.txt
Use this file to discover all available pages before exploring further.
Dashboard UI
Best for exploring metrics, creating one-off eval runs, and reviewing results visually.
API
Best for CI/CD, internal tools, notebooks, and custom automation.
Scheduled Monitors
Best for recurring production checks that evaluate new traces or sessions over time.
Prerequisites
Before running evaluations, make sure you have:- Traces in your project: evaluations run against data already captured by PandaProbe tracing.
- Sessions for agent evaluation: session-level metrics require traces grouped with a
session_id. - Project access: dashboard users need access to the project, and API users need valid authentication.
PandaProbe Cloud manages the evaluation LLM infrastructure for you. You do not need to bring your own LLM API key to run evaluations in PandaProbe Cloud.
Self-hosted deployments must configure their own LLM provider credentials for LLM-powered PandaProbe features, including LLM-as-judge evaluation.
What you can evaluate
Trace-level evaluation
Run metrics against individual traces to score task completion, tool usage, planning, coherence, and more. You can target traces by:- Filters — date range, status, session, user, tags, or name substring
- Explicit IDs — provide a list of specific trace UUIDs
- Sampling — evaluate a random fraction of matching traces to control cost
Session-level evaluation
Run metrics against entire sessions to assess agent reliability and consistency. You can target sessions by:- Filters — date range, user, error status, tags, minimum trace count
- Explicit IDs — provide a list of specific session ID strings
- Sampling — evaluate a fraction of matching sessions
Setup methods
Dashboard UI
Use the dashboard when you want to create evaluations interactively and inspect results without writing code.Run Evaluations via UI
Create eval runs and review scores from the PandaProbe dashboard.
API
Use the API when evaluations need to be part of an automated workflow, CI job, internal dashboard, or custom tool.Run Evaluations via API
Create eval runs, poll run status, and query scores programmatically.
Scheduled monitors
Use monitors when you want PandaProbe to run evaluations repeatedly on a cadence, such as daily production checks or weekly quality reports.Scheduling Evaluations
Configure recurring evaluation monitors with filters, sampling, and cadence.
Results
Every eval run produces scores attached to traces or sessions. Each score includes a metric name, value, status, reason, and metadata. You can review scores in the dashboard or query them through the API for analytics, reporting, and monitoring.Next steps
Start with the setup path that matches your workflow:Run via Dashboard
Visual guide to creating eval runs in the dashboard.
Run via API
Complete API reference for evaluation endpoints.
Schedule Monitors
Automate recurring evaluations for new traces or sessions.

