Skip to main content
The PandaProbe CLI is a single binary for working with the PandaProbe API from your terminal: list and inspect traces, sessions, and spans; create eval runs; and read evaluation scores and details.

Install

curl -fsSL https://cli.pandaprobe.com/install.sh | sh
Verify the install:
pandaprobe version

Authenticate

There are two ways to authenticate. Use automatic login for PandaProbe Cloud, or a manual API key for self-hosted deployments (or if you prefer to manage keys yourself).

Method 1 — Automatic login (PandaProbe Cloud)

pandaprobe auth login currently supports PandaProbe Cloud only. Sign up at app.pandaprobe.com before logging in. For self-hosted or other non-SaaS endpoints, use Method 2 below.
pandaprobe auth login
This opens your browser, authenticates you against PandaProbe Cloud, mints a 90-day API key, and writes api_key + project_name to ~/.pandaprobe/config.yaml. On a headless machine, add --no-browser to print the URL instead.
pandaprobe auth status   # confirm you're logged in (key masked)
pandaprobe auth logout   # remove stored credentials locally

Method 2 — Manual API key

Create an API key in your PandaProbe dashboard, then store it together with your project name:
pandaprobe config set api_key sk_pp_xxxxxxxx
pandaprobe config set project_name my-project
For self-hosted or non-default deployments, also set the endpoint:
pandaprobe config set endpoint https://your-pandaprobe-host
You can also provide these per-command with --api-key / --project / --endpoint, or via the PANDAPROBE_API_KEY, PANDAPROBE_PROJECT_NAME, and PANDAPROBE_ENDPOINT environment variables.

Quickstart

# Confirm you're authenticated
pandaprobe auth status

# List recent traces
pandaprobe traces list --limit 5

# List only failed traces
pandaprobe traces list --status ERROR --limit 5

# Get a full trace with all its spans
pandaprobe traces get <trace-id>

# List conversation sessions
pandaprobe sessions list --limit 5

# Read evaluation scores for a trace
pandaprobe evals scores get <trace-id>

# List evaluation runs
pandaprobe evals runs list

# Human-readable table output
pandaprobe traces list --limit 5 --format table

Commands

Pagination is --limit (1–200) and --offset. Filtering happens server-side, so you fetch only what you need.

Traces

# List traces, newest first
pandaprobe traces list --limit 20

# Filter by status and sort
pandaprobe traces list --status ERROR --sort-by started_at --sort-order desc

# Get a single trace with all its spans
pandaprobe traces get <trace-id>

# Output only the spans, filtered by kind
pandaprobe traces spans <trace-id> --kind LLM
traces list filters: --status (PENDING, RUNNING, COMPLETED, ERROR), --session-id, --user-id, --name, --tags, --started-after, --started-before, --sort-by (started_at, ended_at, name, latency, status), --sort-order (asc, desc). traces get returns the trace with its spans inlined. Use --spans-only for just the spans array, and --kind / --status to filter spans. Span kinds: AGENT, TOOL, LLM, RETRIEVER, CHAIN, EMBEDDING, OTHER. Span statuses: OK, ERROR, UNSET.

Sessions

# List sessions (conversations)
pandaprobe sessions list --limit 20

# Get a session and its traces
pandaprobe sessions get <session-id>
sessions list filters: --user-id, --has-error, --started-after, --started-before, --tags, --query, --sort-by (recent, trace_count, latency, cost), --sort-order. sessions get accepts --include-traces (default true).

Evaluations

Evaluation commands target traces or sessions via --target trace|session (default trace). This is the only command group with write operations alongside the read ones — the three commands that create data are called out explicitly below. Read — inspect metrics, runs, and scores:
# List available metrics
pandaprobe evals metrics --target trace

# List runs, then drill into one
pandaprobe evals runs list
pandaprobe evals runs get <run-id>
pandaprobe evals runs scores <run-id>

# List scores, or fetch all scores for one trace
pandaprobe evals scores list --name coherence
pandaprobe evals scores get <trace-id>
Write — create runs and submit scores:
These three commands execute write endpoints. Everything else in the CLI is read-only.
# Run metrics over traces matching filters
pandaprobe evals runs create --metrics coherence,tool_correctness --status COMPLETED

# Run metrics over a specific set of traces
pandaprobe evals runs batch --trace-ids <id1>,<id2> --metrics coherence

# Submit a score for a trace (trace target only)
pandaprobe evals scores submit --trace-id <trace-id> --name accuracy --value 0.92

Examples

Compose commands with jq:
# Count traces by status
pandaprobe traces list --limit 200 | jq '[.items[].status] | group_by(.) | map({status: .[0], count: length})'

# Find a failed trace and read its first error
ID=$(pandaprobe traces list --status ERROR --limit 1 | jq -r '.items[0].trace_id')
pandaprobe traces get "$ID" | jq '.spans[] | select(.error != null) | .error'

# Pull every score for a trace
pandaprobe evals scores get "$ID" | jq '.[] | {name, value, data_type}'

Output and exit codes

By default the CLI emits JSON: data goes to stdout, errors to stderr — so output pipes cleanly into jq. Pass --format table for human-readable tables. List commands return an items array plus a pagination block:
{
  "items": [ /* ... */ ],
  "pagination": { "total": 150, "limit": 20, "offset": 0 }
}
Errors are JSON objects on stderr:
{
  "error": {
    "code": "validation_error",
    "message": "invalid --status \"NOPE\": must be one of PENDING, RUNNING, COMPLETED, ERROR",
    "status": 422
  }
}
Exit codes are part of the contract:
CodeMeaning
0Success
1General error (network, decode, unexpected)
2Authentication/authorization error (401, 403)
3Not found (404)
4Validation error (bad flags, 400, 422)
5Other API error (other 4xx, 5xx)

Configuration reference

Values resolve in this order (highest to lowest): command-line flags → PANDAPROBE_* environment variables → ~/.pandaprobe/config.yaml → built-in defaults.
SettingFlagEnvironment variableConfig keyDefault
API key--api-keyPANDAPROBE_API_KEYapi_key
Project name--projectPANDAPROBE_PROJECT_NAMEproject_name
Endpoint--endpointPANDAPROBE_ENDPOINTendpointhttps://api.pandaprobe.com
Web app URL--auth-urlPANDAPROBE_AUTH_URLauth_urlhttps://app.pandaprobe.com
Output format--formatPANDAPROBE_FORMATformatjson
Timeout (seconds)PANDAPROBE_TIMEOUTtimeout30
Inspect the effective configuration (the API key is masked):
pandaprobe config show
Other global flags: --verbose and --debug (log HTTP details to stderr, key masked), --no-color, and --config <path> to use a non-default config file.

Shell completion

pandaprobe completion zsh  > "${fpath[1]}/_pandaprobe"
pandaprobe completion bash > /etc/bash_completion.d/pandaprobe
Supported shells: bash, zsh, fish, powershell.