Skip to main content

Evaluation Concepts

Understanding the building blocks of PandaProbe’s evaluation system.
This page is under construction. Detailed concept definitions are coming soon.

Topics to be covered

  • Datasets — Collections of test cases with inputs and expected outputs
  • Metrics — Quantitative measures of quality (accuracy, relevance, faithfulness, etc.)
  • Evaluators — Functions that compute metrics against trace data
  • Evaluation runs — Batch execution of evaluators across datasets
  • Scoring — How evaluation results map to trace scores