Architecture - PandaProbe

PandaProbe is a unified platform for the full agent development lifecycle. This page shows how its pieces fit together — from instrumenting your application to continuously monitoring agent quality in production.

The agent engineering loop

PandaProbe is built around a continuous loop:

Trace — capture what your agent actually did.
Evaluate — score those traces and sessions for quality and reliability.
Monitor — re-run evaluations on a schedule so regressions surface automatically.

Each stage feeds the next: traces are the raw signal, evaluation turns that signal into scores, and monitoring keeps those scores fresh as new data arrives.

How data flows

Instrument your application

Add the PandaProbe SDK to your app. Wrap an LLM client, enable a framework integration, or annotate functions with decorators — see the three layers of tracing.

Send traces and spans

As your agent runs, the SDK emits traces (one logical run) made of spans (each LLM call, tool call, or step) to the PandaProbe API.

Ingest and persist

The API queues incoming data and a background worker persists each trace and its spans, so ingestion stays fast and non-blocking for your application.

Evaluate traces and sessions

Create an eval run to apply one or more metrics to selected traces or sessions. Workers compute scores asynchronously using LLM-as-judge metrics, embeddings, or deterministic aggregation.

Monitor on a schedule

Save an eval configuration as a monitor to re-run it on a recurring cadence, so new traces and sessions are checked automatically.

Review in the dashboard

Inspect traces, drill into spans, and track scores and trends over time in the dashboard — or query everything through the API and CLI.

Core building blocks

Traces & Spans

A trace is one end-to-end run; spans are the tree of steps inside it.

Sessions

Group related traces under a session to see an agent’s full lifecycle.

Metrics & Scores

Metrics score a trace or session; each result is stored as a score with a reason.

Monitors

Saved evaluations that run on a schedule to catch regressions in production.

Two ways to connect

PandaProbe separates how applications send data from how people manage their workspace:

Plane	Who uses it	How it authenticates
Data plane	SDK clients and the CLI sending traces, spans, and evaluations	Org-scoped API key + project name
Management plane	The dashboard, for users, projects, and billing	Sign-in with your identity provider

The project an API key writes to is resolved by name within your organization, so SDK clients only need an API key and a project name to start sending data.

Deployment options

PandaProbe Cloud

Managed deployment with a free tier — no infrastructure to run. Sign up and start tracing.

Self-hosted

Run the open-source stack yourself with Docker Compose and keep your data in your own environment.

Next steps

Quickstart

Trace your first LLM call in under 2 minutes.

Tracing Overview

Explore the three layers of instrumentation.

Evaluation

Learn how metrics, scores, and monitors work.

​The agent engineering loop

​How data flows

​Core building blocks

Traces & Spans

Sessions

Metrics & Scores

Monitors

​Two ways to connect

​Deployment options

PandaProbe Cloud

Self-hosted

​Next steps

Quickstart

Tracing Overview

Evaluation

The agent engineering loop

How data flows

Core building blocks

Two ways to connect

Deployment options

Next steps