Skip to Content

Evals

Evals provide systematic assessment of AI models, functions, and workflows to ensure quality, reliability, and business alignment.

Overview

The Evals system provides a way to evaluate the performance and quality of your business AI applications. Evals can:

  • Assess model performance on specific business tasks
  • Validate function outputs against expected business results
  • Measure workflow efficiency and effectiveness
  • Ensure compliance with business requirements
  • Track improvements over time

Components

  • Metrics: Measure business performance and quality
  • Tests: Test business AI components and systems
  • Benchmarks: Compare against industry standards

Getting Started

To get started with the Evals collection, explore the following pages:

Last updated on