A / B Testing: The Most Powerful Way to Turn Clicks Into Customers

"Step One: Define Success" (Chapter 2)

Since I am working for an e-commerce business, for our experiments the definition of success was easy to identify: purchases or transactions.

Created: 2019-11-16 01:54:20 Last updated: 2019-11-16 01:54:20

"Test Through the Redesign, Not After: Digg and Netflix
When it comes to making better data-driven decisions, the sooner the better. Often the temptation is (and we've heard this before) “Oh, we're doing a redesign; we'll do the A/B testing afterwards.” The fact is you actually want to A/B test the redesign.
Around 2010, we were introduced to the folks at Digg by their new VP of Product Keval Desai to talk about using Optimizely. Their response was, “We are busy working on a complete overhaul of our site. After we do that, then we'll do A/B testing.”
As Desai explains, the “Digg v4” redesign was a perfect storm of problems. The company rolled out a new backend and a new frontend at the same time, conflating two different sets of challenges. “It was a big bang launch,” he says. The backend couldn't initially handle the site traffic and buckled on launch day. What's more, despite faring well in usability tests, focus groups, surveys, and a private beta, the new frontend met with vociferous criticism when it was released to the public, and became a magnet for negative media attention. “When you change something, people are going to have a reaction,” Desai says. “Most of the changes, I would say, were done for the right reasons, and I think that eventually the community settled down despite the initial uproar.” But, he says, “a big-bang launch in today's era of continuous development is just a bad idea.”“To me, that's the power of A/B testing: that you can make this big bet but reduce the risk out of it as much as possible by incrementally testing each new feature,” Desai explains. People are naturally resistant to change, so almost any major site redesign is guaranteed to get user pushback. The difference is that A/B testing the new design should reveal whether it's actually hurting or helping the core success metrics of the site. “You can't [always] prevent the user backlash. But you can know you did the right thing.”
Netflix offers a similar story of a rocky redesign, but with a crucial difference: they were A/B testing the new layout, and had the numbers to stand tall against user flak. In June 2011, Netflix announced a new “look and feel” to the Watch Instantly web interface. “Starting today,” wrote Director of Product Management Michael Spiegelman on the company's blog, “most members who watch instantly will see a new interface that provides more focus on the TV shows and movies streaming from Netflix.” At the time of writing, the most liked comment under the short post reads, “New Netflix interface is complete crap,” followed by a litany of similarly critical comments. The interface Netflix released to its 24 million members on that day is the same design you see today on netflix.com: personalized scrollable rows of titles that Netflix has calculated you will like best. So, in the face of some bad press on the blogosphere, why did Netflix decide to keep the new design? The answer is clear to Netflix Manager of Experimentation Bryan Gumm, who worked on that redesign: the data simply said so.
The team began working on the interface redesign in January 2011. They called the project “Density,” because the new design's goal was literally a denser user experience (Figure 3.7).
Figure 3.7 Netflix original site versus “Density” redesign.
Netflix original site versus \
The original experience had given the user four titles in a row from which to choose, with a “play” button and star rating under each title's thumbnail. Each title also had ample whitespace surrounding it—a waste of screen real estate, in the team's opinion.
The variation presented scrollable rows with title thumbnails. The designers removed the star rating and play button from the default view, and made it a hover experience instead.
They then A/B tested both variations on a small subset of new and existing members while measuring retention and engagement in both variations. The result: retention in the variation increased by 20 to 55 basis points, and engagement grew by 30 to 140 basis points.
The data clearly told the designers that new and existing members preferred the variation to the original. Netflix counted it as a success and rolled the new “density” interface out to 100 percent of its users in June 2011. As Gumm asserts, “If [the results hadn't been] positive, we wouldn't have rolled it out.” The company measured engagement and retention again in the rollout as a gut-check. Sure enough, the results of the second test concurred with the first that users watched more movies and TV shows with the new interface.
Then the comment backlash started.
However, as far as Netflix is concerned, the metrics reflecting data from existing and new members tell the absolute truth. As Gumm explains, the vocal minority made up a small fraction of the user base and they voiced an opinion that went against all the data Netflix had about the experience. Gumm points out, “We were looking at the metrics and people were watching more, they liked it better, and they were more engaged in the service.… [Both the tests] proved it.”
Gumm also makes the following very important point: “What people say and what they do are rarely the same. We're not going to tailor the product experience, just like we're not going to have 80 million different engineering paths, just to please half a percent of the people It's just not worth the support required.”
Gumm then reminds us that despite a few loud, unhappy customers that may emerge, the most critical thing to remember is the data: “I think it's really important in an A/B testing organization, or in data-driven organization, to just hold true to the philosophy that the data is what matters.”" (Chapter 3)

One of my bosses was inclined to publish one section of the redesign of a website without running a split test but simply trusting our judgement on that section. I recommended to run a split test for the entire redesign the way Netflix did it, without excluding anything. That is what we are going to do.

Created: 2019-11-21 21:13:20 Last updated: 2019-11-21 21:13:20

"If you're working on a major site redesign or overhaul, don't wait until the new design is live to A/B test it. A/B test the redesign itself." (Chapter 3)

One of my bosses was tempted to do a website header redesign without running a split test. In a meeting I expressed that I would like to do a split test and it was not part of the options in the management conversations anymore to implement the website header redesign without running an A/B test. For this major site redesign (the header looks completely different in the new version), we are going to be executing an A/B test.

Created: 2019-11-16 19:15:43 Last updated: 2019-11-16 19:15:43

"Giving visitors fewer distractions and fewer chances to leave the checkout funnel by removing choices can help boost conversion rates." (Chapter 4)

After examining A/B test results, I sent the following recommendation to my bosses:

'The "All new elements" variant is totally losing. That is even using a two-sided hypothesis and 99% confidence. Maybe we are adding too many distractions? See the way the "All new elements" variant appears in the checkout page on mobile:

'All new elements' variant

I think we have too many distractions around the "Place order" button. We should even remove the footer I think in that checkout page. No footer, the end of the page should be the "Place order" button. Nothing below that button because we do not want distractions at that critical moment when someone is about to buy. Maybe those new elements are good for the deal details page, not for the checkout page. We have to wait for a few more weeks to see what the split tests reveal. But the results so far might be giving us a hint about the importance of not distracting customers in the checkout page.'

Created: 2019-11-22 23:16:17 Last updated: 2019-11-22 23:16:17

"We usually give folks some pretty straightforward advice when they ask about how to improve their calls to action: verbs over nouns. In other words, if you want somebody to do something, tell them to do it." (Chapter 5)

For a button to subscribe to receive emails, the text that I am using on the button is the verb "Subscribe".

Created: 2019-11-16 19:29:39 Last updated: 2019-11-16 19:29:39

"Communicate Findings and Value to Your Team
Communicating A/B testing's findings and value to the team, whether it be large or small, is an important part of month one—and every month. Consider weekly, monthly, or quarterly results-sharing meetings with key stakeholders. Let them know what you've been up to. It will help the organization, as well as your career, because you've quantified your value in a way that may be difficult for roles that don't use testing.
“Stakeholder support and buy-in only happens if you do a good job of communicating and sharing things that you are learning,” explains Nazli Yuzak, Senior Digital Optimization Consultant at Dell. “Making sure that we are communicating our wins, communicating the learning, and sharing those large-level trends with the rest of the organization actually becomes an important part of the culture, because that's where we are able to showcase the value we bring to the organization.”
You want to let others in on what you've learned from your first tests. You can't always predict who within the organization will turn out to be an evangelist for testing. We've seen companies handle this communication many different ways. Having built a testing culture at three large e-commerce sites—Staples, Victoria's Secret, and Adidas—Scott Zakrajsek suggests sending straightforward emails with subject lines like “A/B Test Completion,” or “A/B Test Results.” Include screenshots of the variations and results in those emails: images are likely to be more memorable than just the results alone, as they give a clear indication of the evolution of the site over its optimization—“where we were” versus “where we are now.”" (Chapter 8)

I send emails to directors and managers involved in A/B testing to summarize final results or progress of our experiments, describing which variations are winning or losing in general or for specific segments. They seem to appreciate this information and communication.

Created: 2019-11-16 19:53:10 Last updated: 2019-11-16 19:53:10

"Testing without Traffic
The good news is that you need only two things to conduct an A/B test: a website with some content on it, and visitors. The more traffic you have the faster you will see statistically significant results about how each variation performed.
What A/B testing your site can't do, however, is generate that traffic in the first place. A blogger who's just getting off the ground and has only 100 visitors per month would be better off focusing primarily on content and building a following of users (bolstered perhaps by SEO or paid ads) who provide traffic to the site before delving into the statistics of optimizing that traffic. After all, you have to generate the traffic in the first place before you do anything with it. (In addition, in a site's fledgling period, a handful of conversations with real users will offer more feedback than you will get from an A/B test on a sparsely trafficked site.) While optimization can help even the smallest enterprise, it's also true that testing becomes faster, more precise, and more profitable the more user traffic you have to work with." (Chapter 11)

That was approximately the traffic of my website during the last 30 days: 103 users. My site is starved for traffic. For that reason, I am focusing primarily on content. I am not running experiments on my website yet.

Users of my website from Oct 30 to Nov 28, 2019

Users of my website from Oct 30 to Nov 28, 2019

Created: 2019-11-29 22:20:55 Last updated: 2019-11-29 22:20:55