Evaluation Concepts
Understanding the building blocks of PandaProbe’s evaluation system.This page is under construction. Detailed concept definitions are coming soon.
Topics to be covered
- Datasets — Collections of test cases with inputs and expected outputs
- Metrics — Quantitative measures of quality (accuracy, relevance, faithfulness, etc.)
- Evaluators — Functions that compute metrics against trace data
- Evaluation runs — Batch execution of evaluators across datasets
- Scoring — How evaluation results map to trace scores
