Skip to main content

Evaluation Overview

PandaProbe provides a comprehensive evaluation framework for measuring the quality, reliability, and performance of your LLM applications. Evaluate individual traces or entire agent workflows using built-in and custom metrics.
This page is under construction. Detailed evaluation documentation is coming soon.

What you’ll find here

  • Trace Evaluation — Score and assess individual trace outputs against expected results
  • Agent Evaluation — End-to-end evaluation of multi-step agent workflows
  • Evaluation Setup — Configure and run evaluations via the UI or API

Trace Evaluation

Evaluate individual traces

Agent Evaluation

Evaluate agent workflows