Evals
Evals provide systematic assessment of AI models, functions, and workflows to ensure quality, reliability, and business alignment.
Overview
The Evals system provides a way to evaluate the performance and quality of your business AI applications. Evals can:
- Assess model performance on specific business tasks
- Validate function outputs against expected business results
- Measure workflow efficiency and effectiveness
- Ensure compliance with business requirements
- Track improvements over time
Components
- Metrics: Measure business performance and quality
- Tests: Test business AI components and systems
- Benchmarks: Compare against industry standards
Getting Started
To get started with the Evals collection, explore the following pages:
Last updated on