Documentation Index
Fetch the complete documentation index at: https://docs.pandaprobe.com/llms.txt
Use this file to discover all available pages before exploring further.
Before you begin, make sure you have:
- A PandaProbe account. Sign up at app.pandaprobe.com.
- At least one trace captured in your project. If you haven’t set up tracing yet, follow the Observability Quickstart first.
- For agent (session) evaluation: traces grouped under the same
session_id.
PandaProbe Cloud manages the evaluation LLM infrastructure for you. You do not need to bring your own LLM API key to run evaluations in PandaProbe Cloud.
Run your first evaluation
The fastest way to evaluate is directly from the dashboard. You pick a trace (or session), choose a metric, and PandaProbe runs the evaluation in the background.Open the Traces tab
In the PandaProbe dashboard, open the Traces tab. You should see the traces that were captured by the SDK.
Select traces to evaluate
Pick one or more traces, then click Evaluate. You can also open a single trace and click Evaluate from the detail view.
Choose a metric
Start with
task_completion — a 2-stage LLM-as-judge metric that scores whether the agent accomplished the user’s objective.Submit the run
Click Submit. PandaProbe creates an eval run with status
PENDING and dispatches the work to a background worker. The API responds with 202 Accepted.Try session evaluation
If you have traces grouped under asession_id, you can evaluate the entire agent lifecycle:
Pick a session metric
Start with
agent_reliability — it surfaces worst-case failure risk across the session by aggregating trace-level signals (confidence, coherence, tool_correctness, loop_detection).What’s next?
Core Concepts
Learn how eval runs, metrics, scores, signals, and monitors fit together.
Evaluation Approaches
Understand when to use trace vs. agent (session) evaluation.
Run via API
Create eval runs programmatically from CI, notebooks, or internal tools.

