The agent engineering loop
PandaProbe is built around a continuous loop:- Trace — capture what your agent actually did.
- Evaluate — score those traces and sessions for quality and reliability.
- Monitor — re-run evaluations on a schedule so regressions surface automatically.
How data flows
Instrument your application
Add the PandaProbe SDK to your app. Wrap an LLM client, enable a framework integration,
or annotate functions with decorators — see the three layers of tracing.
Send traces and spans
As your agent runs, the SDK emits traces (one logical run) made of spans (each
LLM call, tool call, or step) to the PandaProbe API.
Ingest and persist
The API queues incoming data and a background worker persists each trace and its spans,
so ingestion stays fast and non-blocking for your application.
Evaluate traces and sessions
Create an eval run to apply one or more metrics to selected traces or sessions.
Workers compute scores asynchronously using LLM-as-judge metrics, embeddings, or
deterministic aggregation.
Monitor on a schedule
Save an eval configuration as a monitor to re-run it on a recurring cadence, so new
traces and sessions are checked automatically.
Core building blocks
Traces & Spans
A trace is one end-to-end run; spans are the tree of steps inside it.
Sessions
Group related traces under a session to see an agent’s full lifecycle.
Metrics & Scores
Metrics score a trace or session; each result is stored as a score with a reason.
Monitors
Saved evaluations that run on a schedule to catch regressions in production.
Two ways to connect
PandaProbe separates how applications send data from how people manage their workspace:| Plane | Who uses it | How it authenticates |
|---|---|---|
| Data plane | SDK clients and the CLI sending traces, spans, and evaluations | Org-scoped API key + project name |
| Management plane | The dashboard, for users, projects, and billing | Sign-in with your identity provider |
The project an API key writes to is resolved by name within your organization, so SDK
clients only need an API key and a project name to start sending data.
Deployment options
PandaProbe Cloud
Managed deployment with a free tier — no infrastructure to run. Sign up and start tracing.
Self-hosted
Run the open-source stack yourself with Docker Compose and keep your data in your own
environment.
Next steps
Quickstart
Trace your first LLM call in under 2 minutes.
Tracing Overview
Explore the three layers of instrumentation.
Evaluation
Learn how metrics, scores, and monitors work.

