Evaluation Overview
PandaProbe provides a comprehensive evaluation framework for measuring the quality, reliability, and performance of your LLM applications. Evaluate individual traces or entire agent workflows using built-in and custom metrics.This page is under construction. Detailed evaluation documentation is coming soon.
What you’ll find here
- Trace Evaluation — Score and assess individual trace outputs against expected results
- Agent Evaluation — End-to-end evaluation of multi-step agent workflows
- Evaluation Setup — Configure and run evaluations via the UI or API
Trace Evaluation
Evaluate individual traces
Agent Evaluation
Evaluate agent workflows
