> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pandaprobe.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Run Evaluations via UI

> Create and manage evaluation runs through the PandaProbe dashboard.

The PandaProbe dashboard gives you several ways to create evaluation runs without writing code. You can evaluate directly from the data you are already inspecting, or create broader filtered eval runs from the **Evaluations** tab.

Use the dashboard when you want to:

* Evaluate selected traces or sessions during review
* Create a one-off eval run with filters and sampling
* Choose metrics visually
* Review run status, scores, reasons, and metadata from the dashboard

<img src="https://mintcdn.com/chirpzai/OUkKdm0Z4YTMQdZN/assets/evals/overview.png?fit=max&auto=format&n=OUkKdm0Z4YTMQdZN&q=85&s=fa0ba8fc3b832a1d19fa48fb4c030c00" alt="PandaProbe dashboard navigation showing Traces, Sessions, and Evaluations" width="3024" height="1490" data-path="assets/evals/overview.png" />

## Create evals from Traces tab

Use the **Traces** tab when you already know which traces you want to evaluate.

<Steps>
  <Step title="Open the Traces tab">
    In the PandaProbe dashboard, open **Traces** to view the trace table.
  </Step>

  <Step title="Choose traces to evaluate">
    Select a batch of traces from the table and click **Evaluate**, or open a specific trace and click **Evaluate** from the trace detail view.
  </Step>

  <Step title="Configure the eval run">
    In the sidebar, enter a run name, select one or more trace-level metrics, and optionally choose the model used for LLM-as-judge evaluation.
  </Step>

  <Step title="Submit the run">
    Click **Submit**. PandaProbe starts the eval run in the background and attaches scores to the selected traces when the run completes.
  </Step>
</Steps>

<video controls width="100%">
  <source src="https://mintcdn.com/chirpzai/OUkKdm0Z4YTMQdZN/assets/evals/traces-eval.mp4?fit=max&auto=format&n=OUkKdm0Z4YTMQdZN&q=85&s=549de6bda508a5c858ebe379e56ae147" type="video/mp4" data-path="assets/evals/traces-eval.mp4" />
</video>

## Create evals from Sessions

Use the **Sessions** tab when you want to evaluate complete agent lifecycles.

The workflow is the same as trace evaluation: select sessions from the table, or open a session detail page and click **Evaluate**.

<Steps>
  <Step title="Open the Sessions tab">
    Open **Sessions** to view grouped agent sessions tab.
  </Step>

  <Step title="Choose sessions to evaluate">
    Select a batch of sessions from the table, or open one session and click **Evaluate**.
  </Step>

  <Step title="Configure the eval run">
    In the sidebar, enter a run name and select session-level metrics such as `agent_reliability` or `agent_consistency`.
  </Step>

  <Step title="Optionally customize signal weights">
    For session evaluation, you can use **Customize signal weights** to adjust how much each trace-level signal contributes to the session score.
  </Step>

  <Step title="Submit the run">
    Click **Submit**. PandaProbe starts the session eval run in the background and attaches scores to the selected sessions.
  </Step>
</Steps>

<video controls width="100%">
  <source src="https://mintcdn.com/chirpzai/OUkKdm0Z4YTMQdZN/assets/evals/session-eval.mp4?fit=max&auto=format&n=OUkKdm0Z4YTMQdZN&q=85&s=88d94cff76419bc590cb2cdcfc5d9660" type="video/mp4" data-path="assets/evals/session-eval.mp4" />
</video>

## Create evals from Evaluations tabs

Use the **Evaluations** tab when you want to create an eval run from filters rather than manually selecting traces or sessions.

When you open **Evaluations**, you will see five cards:

* **Trace evaluation runs**
* **Session evaluation runs**
* **Monitors**
* **Trace scores**
* **Session scores**

### Trace evaluation runs

Open **Trace evaluation runs** when you want to evaluate traces selected by filters.

<Steps>
  <Step title="Open Trace evaluation runs">
    From **Evaluations**, click **Trace evaluation runs**.
  </Step>

  <Step title="Click Create evaluation">
    Click **Create evaluation** to open the eval run sidebar.
  </Step>

  <Step title="Configure the run">
    Add a name, select trace-level metrics, and optionally select the model used for LLM-as-judge evaluation.
  </Step>

  <Step title="Add filters">
    Use filters such as **Started after**, **Started before**, **Status**, **Trace ID**, **Session ID**, and **Tags** to define the traces you want to evaluate.
  </Step>

  <Step title="Set the sampling rate">
    Set **Sampling rate** to choose what portion of matching traces should be evaluated. For example, `0.25` evaluates 25% of traces that match your filters.
  </Step>

  <Step title="Submit">
    Click **Submit**. The eval run starts in the background.
  </Step>
</Steps>

<video controls width="100%">
  <source src="https://mintcdn.com/chirpzai/OUkKdm0Z4YTMQdZN/assets/evals/trace-eval-tab.mp4?fit=max&auto=format&n=OUkKdm0Z4YTMQdZN&q=85&s=42e521bbcc04dbb4ca7f02ad0a1e67b8" type="video/mp4" data-path="assets/evals/trace-eval-tab.mp4" />
</video>

### Session evaluation runs

Open **Session evaluation runs** when you want to evaluate sessions selected by filters.

<Steps>
  <Step title="Open Session evaluation runs">
    From **Evaluations**, click **Session evaluation runs**.
  </Step>

  <Step title="Click Create evaluation">
    Click **Create evaluation** to open the eval run sidebar.
  </Step>

  <Step title="Configure the run">
    Add a name, select session-level metrics, and optionally customize signal weights.
  </Step>

  <Step title="Add filters">
    Use filters such as **Started after**, **Started before**, **Session ID**, **User**, **Tags**, and other session filters to define the sessions you want to evaluate.
  </Step>

  <Step title="Set the sampling rate">
    Set **Sampling rate** to choose what portion of matching sessions should be evaluated.
  </Step>

  <Step title="Submit">
    Click **Submit**. The session eval run starts in the background.
  </Step>
</Steps>

<video controls width="100%">
  <source src="https://mintcdn.com/chirpzai/OUkKdm0Z4YTMQdZN/assets/evals/session-eval-tab.mp4?fit=max&auto=format&n=OUkKdm0Z4YTMQdZN&q=85&s=d4415cdd25fa3186f6cbebfda24ec2a3" type="video/mp4" data-path="assets/evals/session-eval-tab.mp4" />
</video>

## Review eval results

After an eval run starts, PandaProbe processes it in the background. You can review progress and results from the **Evaluations** tab:

* **Trace evaluation runs** shows trace eval run status and history.
* **Session evaluation runs** shows session eval run status and history.
* **Trace scores** lets you inspect scores attached to traces.
* **Session scores** lets you inspect scores attached to sessions.

Each score includes the metric name, value, status, reason, and metadata. Use these details to understand why a trace or session passed, failed, or needs review.

## Next steps

<CardGroup cols={2}>
  <Card title="Scheduling Evaluations" icon="clock" href="/evaluation/setup/scheduling">
    Automate recurring evaluations with monitors.
  </Card>

  <Card title="Run Evaluations via API" icon="terminal" href="/evaluation/setup/run-eval-api">
    Create eval runs programmatically.
  </Card>
</CardGroup>
