Skip to main content
PandaProbe is a unified platform for the full agent development lifecycle. This page shows how its pieces fit together — from instrumenting your application to continuously monitoring agent quality in production.

The agent engineering loop

PandaProbe is built around a continuous loop:
  1. Trace — capture what your agent actually did.
  2. Evaluate — score those traces and sessions for quality and reliability.
  3. Monitor — re-run evaluations on a schedule so regressions surface automatically.
Each stage feeds the next: traces are the raw signal, evaluation turns that signal into scores, and monitoring keeps those scores fresh as new data arrives.

How data flows

1

Instrument your application

Add the PandaProbe SDK to your app. Wrap an LLM client, enable a framework integration, or annotate functions with decorators — see the three layers of tracing.
2

Send traces and spans

As your agent runs, the SDK emits traces (one logical run) made of spans (each LLM call, tool call, or step) to the PandaProbe API.
3

Ingest and persist

The API queues incoming data and a background worker persists each trace and its spans, so ingestion stays fast and non-blocking for your application.
4

Evaluate traces and sessions

Create an eval run to apply one or more metrics to selected traces or sessions. Workers compute scores asynchronously using LLM-as-judge metrics, embeddings, or deterministic aggregation.
5

Monitor on a schedule

Save an eval configuration as a monitor to re-run it on a recurring cadence, so new traces and sessions are checked automatically.
6

Review in the dashboard

Inspect traces, drill into spans, and track scores and trends over time in the dashboard — or query everything through the API and CLI.

Core building blocks

Traces & Spans

A trace is one end-to-end run; spans are the tree of steps inside it.

Sessions

Group related traces under a session to see an agent’s full lifecycle.

Metrics & Scores

Metrics score a trace or session; each result is stored as a score with a reason.

Monitors

Saved evaluations that run on a schedule to catch regressions in production.

Two ways to connect

PandaProbe separates how applications send data from how people manage their workspace:
PlaneWho uses itHow it authenticates
Data planeSDK clients and the CLI sending traces, spans, and evaluationsOrg-scoped API key + project name
Management planeThe dashboard, for users, projects, and billingSign-in with your identity provider
The project an API key writes to is resolved by name within your organization, so SDK clients only need an API key and a project name to start sending data.

Deployment options

PandaProbe Cloud

Managed deployment with a free tier — no infrastructure to run. Sign up and start tracing.

Self-hosted

Run the open-source stack yourself with Docker Compose and keep your data in your own environment.

Next steps

Quickstart

Trace your first LLM call in under 2 minutes.

Tracing Overview

Explore the three layers of instrumentation.

Evaluation

Learn how metrics, scores, and monitors work.