The goal of a randomized controlled experiment, aka A/B test, is to scientifically identify and evaluate the impact of a change we want to make to a product or a process. For example, does the new email subject line, or call to action, or even an additional image or video in an email work better than the current one running. The result of an experiment is a precise measurement of increase or decrease in the metrics we care about caused by the change we were testing.
The key idea behind an experiment is extremely simple and is illustrated in the figure below, using the example of testing a new email subject line. Every prospect who is supposed to receive the email is randomly assigned one of the two variants: the current subject line, or the new subject line. We run the experiment for a few weeks, then compute metrics of interest i.e open rate, reply rate, etc. for each of the two subject lines. We then perform a statistical test, and if the test shows that a certain metric is different between the two groups in a statistically significant way, we can conclude that this difference was caused by our new subject line.
Experiment design to evaluate the impact of a new e-mail subject line
To maximize your chances of identifying the winner, try bigger changes which are likely to matter, rather than small tweaks, and run your experiments on sequence steps with more traffic.
Running Email Experiments in Outreach
Setting up a new experiment in Outreach is easy.
- Go to the sequence page in Outreach
- Identify the sequence and the email step in that sequence where you want to setup the experiment
- Clone the email template, and click the clone to edit it. To get the most value out of the experiment, edit only one thing, e.g. one of the subject line, the call to action, the preview text, etc. but not all of them at the same time.
- Turn the new template on
The experiment is now running!
While the experiment is running, you may see the following messages appear on the Sequences page in Outreach:
This means that the experiment is still collecting data needed for a statistical test.
This means that no statistically significant differences were detected. Wait longer, or if the experiment has already ran for longer than you planned, you can conclude that no large enough differences exist. You can stop the template you were testing and start another experiment on this step.
This means the experiment is invalid. You should not derive conclusions from it - for example if one template appears to be performing better it may be caused by the issues with setup and not be a true conclusion. There are two criteria we check for to ensure correctness of experiments.
- First, the number of emails sent to each variant should be about the same, with at least 150 emails in each variant.
- Second, both templates should be active during the same consecutive period of time.
If you see this message, it means that one or both of these two requirements were violated.
This means that our statistical tests showed that the winning variant has statistically significantly higher reply rate than the losing variant. You can turn off the losing variant. You can click “View Results” to see the data used to determine the winner.
While the experiment is running, it’s important to not make any changes to the sequence step and the templates involved in the experiment. Otherwise results of the experiment will become invalid. Outreach will warn you if you attempt a change that will affect any running experiment.
Editing a template or the subject line which is part of a running experiment makes the results difficult to interpret, since part of the results would be using one template or subject line and part of the results would be using another template. While Outreach will keep the experiment running, we recommend that you stop and restart the experiment cloning the sequence step and deleting the old step.
Turning off the template will make the experiment invalid.
Turning off sequence will make the experiment invalid.
Note that if you stop an experiment by pausing or deleting one of the templates, the messaging described above will go away. There is currently no place in Outreach to view historical experiments for which templates are not active or do not exist anymore (this feature is coming soon).
Also note that the messaging described above will only appear for experiments that involve exactly two templates. Using more than two templates is not a recommended practice due to fewer email going to each variant, requiring to run the experiment much longer. Therefore, guded template A/B testing is not available for more than two templates. If you want to evaluate three or more new options for the same email template, you can evaluate them one by one in two-template experiments.