Make your product based on data, not opinions. Project manager’s guide to experimentation
We experiment every day. We experiment with what to eat for breakfast, what to wear and how to spend our days. When it comes to building software — it’s no different, experiments are ingrained there too.
Working with experimentation is a baseline skill that every product manager should have. This guide, based on the lecture we had with Indra Pal Singh from Boulevard, will help you to dig into the world of experiments and add them to your daily working routine.
Let’s look at experimentation best practices and how they apply to startups at various stages:
- What are product experiments?
- How to experiment
- How long should your experiment run?
- Types of product experiments
- When to experiment
- Risks associated with the experiments
What are product experiments?
Simply put, experiments are a procedure to confirm or refute a hypothesis — a decision-making tool using real data, but not opinions.
Just imagine you have a concept, and you believe in your idea, but you don’t know how the public will react or if your idea will help you reach current business goals. In such cases, you can use experiments to confirm that your idea is valid.
Experiments are not about hunches or POVs, and that is why they’re a great tool to make real, data-driven iterations to your product.
The most common experiment
If your company is running A/B tests then you are already experimenting with your product. This type of testing is the most commonly used: you have two options and users are divided among both options in half. By tracking their behavior and their reaction you can determine which option is better.
Such tests can be done with everything including different button colors in the app and various user interfaces.
How to experiment
In case you’re not going to do experiments just for the sake of trying it and want to reach some product goals, these four principles will help you along the way.
1. Experiment with EVERYTHING — where it makes sense
Of course, it’s a very complex statement, but it is especially true for the software development industry where there are always tons of ideas, but not all of them give an equal impact. To know what ideas will actually work and help your product to grow, start experimenting as early as possible and as much as possible to find out if the idea is valid and applicable.
Of course, if you have unlimited resources, you don’t need to choose or prioritize these experiments. But in real life, with limited time and resources, we still need to choose what to experiment with and what may be left aside.
Just remember — experiments require time and careful planning, but in the long run, lots of experiments will do your product good.
Don’t focus on experiments that won't have an impact on final decisions
There are situations where some features and things are going to be in the product no matter what: the CEO asked for this, your client or investor insists on adding some feature, and so on. If this is the case — don’t waste your time experimenting with such a feature.
If decisions are not going to be impacted by the results of the experiment, focus your resources elsewhere. If the experiment is not going to change the actual result and outcome, don’t waste your time and resources.
2. Your experimentation should be scientifically diligent
Remember that experiments don’t rely on opinions, they rely on data and metrics that are important to your product. We need to remove our bias and be focused on science.
Just don’t forget that being scientifically rigorous also means being flexible with your experiments to meet the needs and realities of the business. It means that you also should have the ability to be flexible when it comes to scientific correctness because business needs to move on and deliver and not spend every cent on data.
The real world isn’t a perfectly controlled environment, so you may have given up certain things in order to move at the pace the business needs. What determines the statistical significance of the experiment
There are three basic parameters you need to consider:
- Traffic — stands for the number of users;
- The baseline of KPI — what primary success metric you’re looking at;
- Percentage of the change — how much of the change you’re expecting.
The numbers are not solid, you can change your expectations in the process and the numbers you are expecting will change along with it.
You may adjust:
- how many tests you want to run,
- how many users you want to experiment with,
- what is the Minimum Detectable Effect (AMC or MDE) possible, and so on.
It’s important to remember that longer tests may have better statistically accurate results, but be sloppy for business, so you’ll need to adjust the time. If the experiment runs for a week and doesn’t deliver the results you’ve expected — it’s one thing. But if it runs for three months and doesn’t reach the numbers then it’s definitely bad for the business.
3. The pre-experiment analysis is more important than the post-analysis
As teams run experiments, they often spend more time afterward discussing it, but it’s more important to spend time analyzing before launching an experiment. Always think before the experiment, it’s going to make your experiment more accurate and save you time for the post-analysis.
These tips may help you to prepare properly for the experiment and how to conduct it.
Key elements to plan ahead:
- Target audience (cohort, device, service of area) — always plan who you are going to experiment with and why. Your audience should be right to get the proper results of the experiment.
- The number of variations — how many different things are OK to experiment with. Do you prefer to run a simple A/B test or something more complex?
- The success metric — which one is the most important for your experiment? Usually, this is just one metric, but it can be changed into two with strict guidelines. For example, you want to improve AOV (Average Order Value) without negatively impacting CVR (Conversion Rate). Or do you want to know if the CVR can decrease by no more than 1% in order to make the change?
- The time period of the test — basically how long does the test need to run to show the results?
The pre-experiment analysis will tell you how many users you need to target: you want to focus on the sample size required first, and calculate how long it would take to achieve statistical significance if you enroll 100% of eligible users. This will tell you if you should run the experiment at all.
Determine how you want to enroll users:
- Do you want to limit the exposure of the experiment and spread out the enrollment longer?
- Do you want to enroll a number of folks quickly and then follow their behavior over a period of time?
Just remember that post-experiment analysis and communication are also needed. Don’t forget to return to your numbers after the experiment is done and see the outcomes.
How long should your experiment run?
Always ask yourself how long will the experiment need to run. The sooner you can achieve measurable results, the better for business.
The best practice is to run an experiment in 7-day increments to account for changes in user behavior based on the day of the week. It may be 7 days, 14 days, or even 21 days, but the weekly counting should stay on to understand the changes based on the time of the week.
You may have options: sometimes you may want an MDE of 1%, but it would take too long to run that experiment. So you can adjust the MDE expectations to run the experiment faster.
It is easy to calculate the users needed to achieve different levels of MDE and you may be OK with larger MDE, especially if you believe the change will be significant. You may also reduce the number of variations to decrease the duration.
4. Being wrong is a good thing
The more wrong you are at the beginning, the more right you become later.
Imagine you have an idea, you run the experiment and you learn that the idea was not so good. Don’t feel bad about poor performance. It’s actually a good thing, especially at the beginning of the product journey.
One of the key benefits of experimentation is preventing incorrect assumptions from being fully implemented. It saves you the trouble of building something wrong and spending too much time on the wrong objectives.
Being wrong is a learning and decision-making tool that saves you time and money long-term.
Types of product experiments
Not all experiments are created equal, but in the context of products, there are 4 types of experiments:
- Exploratory experiments — good for small startups and early-stage products with an initial idea with no users who are trying to validate if the idea makes sense and whether the investor should invest or not.
- Validation experiments — these experiments are good when you’ve completed the build and want to validate the features before sharing them with users. A great tool to know if the feature is going to give an impact, positive or negative.
- Optimization experiments — great to use when the product is already in the market and growing in multiple directions. For example, you can send different emails to different parts of your audience. Usually, these experiments are done by the marketing department.
- Quality of Service experiments — experiments performed on the backend side, typically users don’t see them. As an example, you can take the search for movies on Netflix. If you see that results are changing over time it means that Netflix engineers are running experiments to make the search more efficient for users.
When to experiment
The product development lifecycle consists of five main stages: ideation, definition (planning), development, release, and iterate. Each stage corresponds to different types of experiments.
- For the ideation and definition stages, exploratory experiments are the best to validate research and analysis-based hypotheses and to inform prioritization.
- For the development stage, quality of service experiments work better to prevent unintended consequences of implementation.
- For the release stage, validation experiments are great to validate the expected impact before the full launch.
- For the iterate stage go with the optimization experiments to personalize experiences, optimize user journeys and increase adoption.
Risks associated with the experiments
No experiment is without risk but you can align on what is acceptable risk ahead of time. Just remember that experimenting with real products always involves some risk.
Some common fears are:
- Results were purely random chance — even if you are 99% sure that the experiment will turn out great, there is a 1% chance that it won’t.
To minimize the negative impact communicate the risk clearly and align on confidence level ahead of time. Understand beforehand the impacts of errors on your product.
- Not every experiment will reach statistical significance — the more variables there are within the experiment variation, the harder it becomes to isolate the impact and results can be muddied.
Remember that not all metrics are sensitive to short-term change. It’s critical to perform a pre-experiment analysis to minimize this risk.
- Chance the experiment will cause a bad experience for real users — experiments require real users to be exposed to the experience in order to measure their unprompted behavior. It may turn out to be a bad experience for users.
Remember that new features requiring secrecy are not great candidates for experiments, set performance guardrails, and minimize exposure.
- Results will show that the feature is not performing as it was expected — even the best thought-out ideas can not perform on real users as well as expected — especially certain types of users.
Once errors are ruled out, a poorly performing experiment can result in wasted resources. So use minimal resources wherever possible.
What risk is acceptable should always depend on the type of experiment, but there are some common errors that can help you to find actual risks.
False Positive (Type I error): the mistaken rejection of the null hypothesis where there is no discernible difference in performance:
- Could lead to launching a feature that doesn’t actually improve success metrics.
- Less concerning for Exploratory and Validation experiments, as there are other opportunities to measure performance.
False Negative (Type II error): the mistaken acceptance of a null hypothesis where this is actually a difference in performance between control and treatment:
- Could lead to forgoing a change or a feature that would have actually improved your success metrics OR to a feature being shipped because “there is no difference” but there is a negative impact to key metrics.
- Less concerning for Optimization Experiments as there are other opportunities to optimize, as there are shorter experiments with small, incremental changes.
And remember, even the bad experience is good in the experiment because it helps you to learn.
Experiments are not about hunches, they use real data and that is why it’s a great tool to get close to reality with your product. So experiment as much as you can with as many things as you can without risking your business goals.
Focus on the short experiments that lead to the greater impact, but feel free to play around with adjusting key elements of your experiment.
Experiments are great for learning and delivering the best products to the market, so don’t miss the chance to do it!