> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pandaprobe.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Introduction

> Set up and run evaluations via the PandaProbe dashboard or API.

PandaProbe provides three setup paths for evaluation. Choose the one that matches how you want to operate: manually from the dashboard, programmatically through the API, or automatically on a schedule.

<CardGroup cols={3}>
  <Card title="Dashboard UI" icon="layout-dashboard" href="/evaluation/setup/run-eval-ui">
    Best for exploring metrics, creating one-off eval runs, and reviewing results visually.
  </Card>

  <Card title="API" icon="terminal" href="/evaluation/setup/run-eval-api">
    Best for CI/CD, internal tools, notebooks, and custom automation.
  </Card>

  <Card title="Scheduled Monitors" icon="clock" href="/evaluation/setup/scheduling">
    Best for recurring production checks that evaluate new traces or sessions over time.
  </Card>
</CardGroup>

## Prerequisites

Before running evaluations, make sure you have:

* **Traces in your project**: evaluations run against data already captured by PandaProbe tracing.
* **Sessions for agent evaluation**: session-level metrics require traces grouped with a `session_id`.
* **Project access**: dashboard users need access to the project, and API users need valid authentication.

<Info>
  PandaProbe Cloud manages the evaluation LLM infrastructure for you. You do not need to bring your own LLM API key to run evaluations in PandaProbe Cloud.
</Info>

<Note>
  Self-hosted deployments must configure their own LLM provider credentials for LLM-powered PandaProbe features, including LLM-as-judge evaluation.
</Note>

## What you can evaluate

### Trace-level evaluation

Run metrics against individual traces to score task completion, tool usage, planning, coherence, and more. You can target traces by:

* **Filters** — date range, status, session, user, tags, or name substring
* **Explicit IDs** — provide a list of specific trace UUIDs
* **Sampling** — evaluate a random fraction of matching traces to control cost

### Session-level evaluation

Run metrics against entire sessions to assess agent reliability and consistency. You can target sessions by:

* **Filters** — date range, user, error status, tags, minimum trace count
* **Explicit IDs** — provide a list of specific session ID strings
* **Sampling** — evaluate a fraction of matching sessions

## Setup methods

### Dashboard UI

Use the dashboard when you want to create evaluations interactively and inspect results without writing code.

<Card title="Run Evaluations via UI" icon="layout-dashboard" href="/evaluation/setup/run-eval-ui">
  Create eval runs and review scores from the PandaProbe dashboard.
</Card>

### API

Use the API when evaluations need to be part of an automated workflow, CI job, internal dashboard, or custom tool.

<Card title="Run Evaluations via API" icon="terminal" href="/evaluation/setup/run-eval-api">
  Create eval runs, poll run status, and query scores programmatically.
</Card>

### Scheduled monitors

Use monitors when you want PandaProbe to run evaluations repeatedly on a cadence, such as daily production checks or weekly quality reports.

<Card title="Scheduling Evaluations" icon="clock" href="/evaluation/setup/scheduling">
  Configure recurring evaluation monitors with filters, sampling, and cadence.
</Card>

## Results

Every eval run produces **scores** attached to traces or sessions. Each score includes a metric name, value, status, reason, and metadata. You can review scores in the dashboard or query them through the API for analytics, reporting, and monitoring.

## Next steps

Start with the setup path that matches your workflow:

<CardGroup cols={3}>
  <Card title="Run via Dashboard" icon="layout-dashboard" href="/evaluation/setup/run-eval-ui">
    Visual guide to creating eval runs in the dashboard.
  </Card>

  <Card title="Run via API" icon="terminal" href="/evaluation/setup/run-eval-api">
    Complete API reference for evaluation endpoints.
  </Card>

  <Card title="Schedule Monitors" icon="clock" href="/evaluation/setup/scheduling">
    Automate recurring evaluations for new traces or sessions.
  </Card>
</CardGroup>
