AI evaluations (evals) in Optimizely Opal provide a structured approach to assessing the quality and effectiveness of AI-generated content and agent outputs. This capability ensures that Opal-driven initiatives align with your brand standards and business objectives, fostering continuous improvement and reliable performance.
Evals
Evals are the systematic process of measuring and analyzing the performance of AI models and their outputs. In the context of Optimizely Opal, evals are designed to
- Assess quality – Determine if Opal-generated content meets predefined standards for accuracy, relevance, and completeness.
- Ensure consistency – Verify that outputs adhere to brand guidelines, tone of voice, and other stylistic requirements.
- Drive improvement – Provide actionable feedback that can be used to refine agent prompts, configurations, and underlying models.
The primary goal of evals is to build trust in Opal's capabilities by ensuring that the outputs are consistently high-quality and fit for purpose.
Benefits of evals
Integrating evals into your Opal workflows offer the following key advantages:
- Improved Opal output quality – Provide high-quality content and outputs from your specialized agents consistently.
- Enhanced brand consistency – Ensure all Opal-generated materials align with your brand's voice, style, and messaging.
- Faster iteration and optimization – Identify areas for improvement and refine agent performance efficiently.
How evals are used in Opal
Preferred output examples
When a specialized agent executes and produces output, you can designate a specific execution's output as a preferred output. This preferred output example then serves as a benchmark against which future outputs from the same agent can be measured. It represents the desired quality or outcome for a particular task.
Once you set a preferred output, Opal automatically generates a Quality Score for subsequent agent executions. This score quantifies how closely a new output matches the designated preferred outputs.
See Preferred output examples.
Evaluation agents
Opal supports the creation of specialized evaluation agents. These agents are designed with the specific purpose of critiquing the outputs of other agents. They can
- Automate feedback – Automatically assess content against predefined criteria (for example, grammar, Search Engine Optimization (SEO) best practices, and brand voice).
- Create feedback loops – Integrate into workflows where an agent generates content, an evaluation agent assesses it, and the generating agent revises its output until the evaluation score is sufficiently high. This iterative process significantly reduces issues like hallucination and ensures outputs meet stringent standards.
If you use Opti ID, administrators can turn off generative AI in the Opti ID Admin Center. See Turn generative AI off across Optimizely applications.
Article is closed for comments.