System tools for Feature Experimentation

  • Updated

System tools are built-in features that help Optimizely Opal take action. Each tool performs a specific task, such as creating a campaign, uploading files, or generating images. Think of tools like attachments on a Swiss Army knife. Each one has a distinct purpose that helps you get work done.

In addition to the system tools available in Opal, Optimizely Feature Experimentation includes a set of system tools designed to help you improve your experimentation program.

You can ask Opal what tools it has at any time. For example, enter "Please list the tools you have with a brief description of what they do and the parameters" into Opal Chat.

Click a tool's name to expand it and learn when to use it, its required and optional parameters, and example prompts to calling the tool. If you do not provide a required parameter, Opal prompts you for it.

Experimentation context

  • Opal cannot look up detailed results for a specific experiment you ask about by name. However, it can show your top-performing and underperforming experiments with their lift and significance data. For full experiment results, use the Optimizely Experiment Results page.
  • The feature is not connected to the Optimizely Analytics product. Opal does not have access to Analytics explorations or custom analyses.
  • Results data support is still evolving. While Opal can surface high-level performance information, deeper results integration is still in progress.
exp_get_schemas – Retrieves detailed schemas for various Optimizely entities relevant to Feature Experimentation.
  • When to use
    • If you are unsure what entities (like flags, rules, or environments) are available in your Feature Experimentation setup.
    • Before building a query with the exp_execute_query tool, you need to know the exact field names and their types for the entities you are interested in.
    • To see how different pieces of your Feature Experimentation data are connected.
  • Parameters
    • entities – A list of entity types for which you want to retrieve schemas. For Feature Experimentation, common entities include flag, rule, environment, attribute, and event.
    • (Optional) include_dependencies – A boolean value that, when set to true, includes schemas for related entities in the response.
  • Example prompts
    • What are the available fields for feature flags?
    • Show me the schema for rules in Feature Experimentation.
    • List all the properties of an environment entity.
    • I need to understand the structure of attributes used in my feature flags. Get me the schema for attribute.
exp_execute_query – Executes a template-based query to fetch specific data from your Optimizely Feature Experimentation instance. It lets you retrieve detailed information about various entities such as feature flags, rules, environments, attributes, and events, based on your specified criteria.
  • When to use

    Before using the exp_execute_query tool, you should use the exp_get_schemas tool to understand the available fields and structure for the entities you want to query. This ensures you build your query template correctly.
    • Retrieve a list of feature flags, rules, or other Feature Experimentation entities.
    • Filter entities based on specific conditions (for example, flags in a particular environment or rules with certain traffic allocation).
    • Get detailed information about specific fields of an entity.
    • Query across all projects or within a specific project.
  • Parameters
    • template – A complete query template that defines the steps, filters, and fields you want to retrieve. The template is written in a specific format that outlines how to fetch and structure the data.
    • (Optional) project_id – If you want to query data within a specific project, provide its ID. If you do not provide a project_id, the query attempts to run across all projects you have access to. You can often find the project_id in your Optimizely URL (for example, https://app.optimizely.com/v2/projects/1234567890/...).
  • Example prompts
    • Show me all feature flags in my current project.
    • List all rules for the 'new_checkout_flow' feature flag.
    • What are the events associated with my feature tests in the 'development' environment?
    • Give me a summary of all attributes used in my Feature Experimentation projects.
    • Find all feature flags that are currently enabled and their traffic allocation.
exp_program_reporting_top_experiments – Retrieves experiments that have generated the highest positive or negative lift values within a specified timeframe. It focuses on the performance of individual tests, letting you see which variations of your features are performing best (or worst) against your chosen metrics.

The tool returns detailed information for each experiment, including the following:

  • Layer Name – The name of the experiment (corresponding to a rule within a flag).
  • Project ID – The identifier of the project the feature flag belongs to.
  • Variation Name - The specific variation of the feature that achieved the lift.
  • Metric Name – The metric used to measure the impact of the experiment.
  • Relative Lift Value – The numerical lift value (can be positive for winning or negative for losing).
  • Is Significant – Indicates whether the result is statistically significant.
  • Winning Direction – The direction of the winning variation (for example, "increasing").
  • Start Date – When the experiment began.
  • Last Modified – When the experiment was last updated.
The exp_program_reporting_top_experiments tool does not include results data for experiments that use stats accelerator as the traffic distribution mode.
  • When to use
    • Discover which feature variations are driving the most significant positive outcomes for your product.
    • Learn from successful experiments to inform future product design and development decisions.
    • Pinpoint features or variations that are having a negative impact, letting you quickly iterate or rollback.
    • Demonstrate the value of specific feature releases or experiments to product managers and stakeholders.
    • Answer questions like the following: 
      • "Which feature flags had the biggest impact on user engagement last quarter?"
      • "Show me the top 5 winning variations for our 'new_search_algorithm' test"
      • "What were the most impactful experiments in project X this year?"
  • Parameters
    • date_range – The inclusive start and end timestamps in ISO-8601 format (for example, YYYY-MM-DDTHH:MM:SSZ or with offset). This defines the period over which experiments are evaluated.
    • (Optional) direction – Specifies whether to rank by "winning" (highest positive lifts) or "losing" (largest negative lifts).
    • (Optional) metric_name – The name of the metric to use for calculating lift. If omitted, it defaults to the primary metric.
    • (Optional) page_size – The maximum number of experiments to return. Must be between 1 and 100 (inclusive). If omitted, the backend applies its own default.
    • (Optional) project_ids – A list of numeric project IDs to filter the results by. If omitted or empty, the tool may aggregate across all authorized projects.
  • Example prompts
    • Show me the top 10 winning feature experiments for the 'user_onboarding' flow in the last 3 months.
    • Which feature rollouts had the biggest negative impact on our 'purchase_conversion' metric this quarter?
    • List the top 5 feature variations that increased 'feature_X_usage' in project ID 987654321 between July and September 2025.
    • What were the most impactful tests this year, ranked by 'subscription_rate'?
exp_program_reporting_underperforming_experiments – Helps you identify A/B tests that are not yielding significant or positive results, letting you quickly decide whether to stop, re-evaluate, or iterate on them.
The exp_program_reporting_underperforming_experiments tool does not include results data for experiments that use stats accelerator as the traffic distribution mode.
  • When to use
    • Find experiments that are not reaching statistical significance, have confidence intervals that include zero, or show negative lift.
    • Quickly pinpoint tests that you should stop to free up traffic and resources for more promising ideas.
    • Analyze historical data to understand common pitfalls or patterns in underperforming tests.
    • Check the health of your running experiments to intervene if they are clearly not progressing towards a positive outcome.
  • Parameters
    • date_range – Specifies the inclusive start and end timestamps in ISO-8601 format (for example, {"start": "2025-01-01T00:00:00Z", "end": "2025-12-31T23:59:59Z"}). This defines the period for which experiments are analyzed.
    • (Optional) project_ids – A list of specific project IDs to filter the experiments. If omitted, the tool aggregates data across all authorized projects.
    • (Optional) page_size – The maximum number of underperforming experiments to return (between 1 and 100).
    • (Optional) metric_name – The name of the metric to filter by. If omitted, it defaults to the primary metric on the backend. Examples include "primary", "Add to Cart", or "conversion_rate".
  • Example prompts
    • Show me all underperforming A/B tests from the last quarter.
    • Which experiments should I stop or review from the last 6 months for Project X?
    • Find experiments with low significance for the 'Add to Cart' metric this year.
    • List all experiments that are not performing well between January 1, 2025, and June 30, 2025.
    • Are there any underperforming tests in my 'Website Redesign' project?
exp_program_reporting_win_rate – Gives you a high-level overview of the success of your Experimentation program by calculating the win rate over a specified period. It is a key metric for understanding the efficiency and impact of your product development cycle.
The exp_program_reporting_win_rate tool does not include results data for experiments that use stats accelerator as the traffic distribution mode.

The win rate is defined as the following:

  • Wins – Feature experiments (rules within flags) that show a positive and statistically significant result for the chosen metric. 
  • Total – All feature experiments that have concluded or been paused within the defined scope.
  • Win rate – \[ \text{Win Rate} = \left( \frac{\text{Wins}}{\text{Total}} \right) \times 100 \]
  • When to use
    • Understand the overall success rate of your feature testing and release process.
    • Monitor how your team's ability to launch winning features evolves over time.
    • Provide a clear, high-level metric to product leadership and other stakeholders on the effectiveness of your Feature Experimentation program.
    • Answer questions like the following:
      • "What is the win rate for our feature experiments?"
      • "How many of our tests resulted in a positive outcome last year?"
      • "What is the win rate for features impacting our core business metrics?"
  • Parameters
    • date_range – The inclusive start and end timestamps in ISO-8601 format (for example, YYYY-MM-DDTHH:MM:SSZ or with offset). This defines the period over which the win rate is calculated.
    • (Optional) project_ids – A list of numeric project IDs to filter the experiments by. If omitted or empty, the tool may aggregate across all authorized projects.
    • (Optional) metric_name – The name of the metric used to determine what constitutes a "win" (for instance, a positive and significant result). If omitted, it defaults to the primary metric configured on the backend.
  • Example prompts
    • What was the win rate for our experiments in the last fiscal year?
    • Calculate the win rate for all feature tests in project ID 123456789 between January 1st, 2025, and June 30th, 2025.
    • Show me the win rate for feature experiments focused on the 'subscription_upgrade' metric this quarter.
    • What is the overall win rate for our Feature Experimentation program across all projects for the past 12 months?

Flag ideation

exp_suggest_flag_variables – Generates a list of flag variables for Feature Experimentation based on your hypothesis. Flag variables define the different aspects of your feature that you want to test.
  • You must provide a clear hypothesis for your experiment. A hypothesis is a testable statement that explains what you expect to happen. See Design an effective hypothesis.
  • This tool can often derive your project_id and flag_key from your current context (for example, the URL you are viewing). If this information is not available, Opal prompts you to provide it.
  • When to use
    • Define the specific variables that are part of an experiment.
    • Translate a hypothesis into concrete, testable variables for your feature.
    • Configure the different elements of your feature for A/B testing.
  • Parameters
    • hypothesis – The testable statement that explains what you expect to happen in your experiment. 
    • project_id – The identifier of the Feature Experimentation project where the feature flag is defined.
    • flag_key –The key for the feature flag you are modifying.
  • Example prompts
    • I want to suggest flag variables for an experiment. My hypothesis is, "If we change the call-to-action text on the homepage button, more users will click through to the signup page."
    • Generate variables for a feature flag in project 123 and flag new_checkout_flow. The hypothesis is that a simplified checkout form will reduce cart abandonment.
    • Suggest variables for testing different pricing tiers. My hypothesis is that offering a 'premium' tier will increase average order value.
exp_suggest_flag_variations – Generates flag variations for Feature Experimentation based on your hypothesis. Variations represent the different experiences or treatments you want to test.
  • You must provide a clear hypothesis for your experiment. A hypothesis is a testable statement that explains what you expect to happen. See Design an effective hypothesis.
  • This tool can often derive your project_id and flag_key from your current context (for example, the URL you are viewing). If this information is not available, Opal prompts you to provide it.
  • When to use
    • Define the different variations that are part of an experiment.
    • Translate a hypothesis into concrete, testable variations for your feature.
    • Set up the different configurable experiences of your feature for A/B testing or gradual rollout.
  • Parameters
    • hypothesis– The testable statement that explains what you expect to happen in your experiment.
    • project_id – The identifier of the Feature Experimentation project where the feature flag is defined.
    • flag_key – The key for the feature flag you are modifying. 
  • Example prompts
    • I want to suggest flag variations for an experiment. My hypothesis is: "If we change the call-to-action text on the homepage button, more users will click through to the signup page."
    • Generate variations for a feature flag in project 123 and flag new_checkout_flow. The hypothesis is that a simplified checkout form will reduce cart abandonment.
    • Suggest variations for testing different pricing tiers. My hypothesis is that offering a 'premium' tier will increase average order value.

If you use Opti ID, administrators can turn off generative AI in the Opti ID Admin Center. See Turn generative AI off across Optimizely applications.