iOS 15: 7 Profound A/B Testing Differences To Expect Between App Store & Play Store

Get prepared in time and future-proof your ASO

Binh Dang
Geek Culture

--

Image by Martin Sanchez (Unsplash)

At WWDC 2021, Apple announced the release of Product Page Optimization, a tool that can realize a dream many of us have for iOS App Store Optimization (ASO): A/B testing. At last, Conversion Rate Optimization (CRO) processes on the App Store will get a chance to match Google Play, the advantages of which are brought by the beloved Store Listing Experiments. However, as exciting as it seems, Apple’s A/B testing platform won’t be identical to Google’s own.

In fact, we can expect at least 7 fundamental differences between the two that could heavily impact overall CRO strategies upon the arrival of iOS 15. They aren’t straightforward and transparent, and their entailed uncertainties could easily lead to lots of potential complications. That’s why it’s important to address them early and plan accordingly in advance, so you can get prepared in time and future-proof ASO post-iOS 15. That’s also why the following list of differences between App Store and Play Store A/B testing is made — along with examples of what you can do to handle them.

App Store vs. Play Store A/B Testing

1. Data availability

A/B testing is a quantitative research method. This means the statistical data behind each test is its life blood. If you collect and analyze enough data, you’ll get reliable test results to scientifically validate or reject certain hypotheses. The ideas behind those hypotheses are, ultimately, what fuel CRO. That’s why, when A/B testing on the App Store comes with different amounts and forms of data compared to the Play Store, the entirety of CRO changes.

Image by Luke Chesser (Unsplash)

Here are the most important — or, at least, actionable — differences you should mind:

  • Statistical significance:

Google offer a statistical significance level of 90% for test results, whereas it remains a mystery if Apple will offer anything comparable. In fact, whether or not they will offer any confidence interval at all is unknown, as the topic wasn’t mentioned in their announcements (yet).

Google are 90% confident about test results. How high will it be for Apple? (Source: Google)
  • Performance history:

Google shows a graph that continually maps how each test variant performs overtime. You have the option to review past performance for more detailed insights when needed.

Apple, on the other hand, have shown no evidence that they’ll offer similar features. They only mention you can “compare performance [of test variants] against your original product page throughout the duration of your test”. It remains to be determined whether such comparisons are available for both present and past performance.

  • Completeness of performance data:

Google only show the number of installs, scaled installs and conversion rate (CVR) differences among test variants. Hence, you can only estimate how much better or worse they are compared to each other. On the contrary, Apple will show the traffic size (impressions) and precise CVR of each variant, on top of the CVR differences (improvement). This means it’ll be up to you to either investigate individual versions or form comparisons among them.

Based on these differences, you can plan to do the following before iOS 15:

  • Be prepared to calculate statistical significance yourself:

Since Apple will show both the CVR and impressions associated with each test variant, you can either calculate the statistical significance of each test result yourself, or use an online tool like AB Testguide. They could be equally reliable, and all you need to do is decide.

  • Be ready to record performance history yourself:

In case Apple indeed won’t show performance histories during A/B tests, avoid getting surprised by collecting daily performance data yourself. This will ensure no figures are lost. As long as the records remain, you can easily map out the performance history yourself. All it takes is a spreadsheet.

2. Choice of assets

On Google Play, all localizable assets except the App Title/Name are eligible for A/B testing. What’s more, they include both visual and textual elements. This provides enough capacity for incredibly versatile CRO strategies.

By contrast, only visual assets have been confirmed to be eligible for A/B testing on the App Store. They are the Icon, Screenshots, and Preview (Videos). No further insights have been given in relation to text assets.

However, there will likely be none, looking at the different assets allowed in Product Page Optimization and Custom Product Pages. Specifically, Apple mentioned the Promotional Text in relation to the latter, but not the former. Here’s what to consider with this fact:

  • Apple did consider text assets and decide to allow the Promotional Text to be customized. It’s unlikely that they forgot about them, either in the development or announcement of both iOS 15 features. This is one evidence that Apple intentionally left out text asset A/B testing.
  • Apple announced both Product Page Optimization and Custom Product Pages in the same section at WWDC21, after WWDC21, and on their Website. Yet, the Promotional Text was specifically listed in relation to the second feature. This is another evidence that Apple intentionally left out text asset A/B testing.
  • So, at least two pieces of evidence suggest that Apple won’t let us test text assets— at least not immediately with iOS 15.
Differences between the two features (Source: Apple)

Of course, this is yet to be confirmed. However, when in doubt, getting prepared for the worst case scenario is the safest bet. If such a scenario turns out to be true, you’ll be left with much narrower choice of assets in A/B testing on the App Store compared to the Play Store. That will translate to the following pro and con that you should plan for:

  • The pro:

You can trade off the lack of options with the greater focus. Plan and test with fewer assets so you have less to worry about. After all, focus is sometimes more impactful than variety, so put it in the center of your CRO strategy.

  • The con:

You can’t A/B test the copy. If you still want to validate ideas to optimize them, fall back to pre-iOS 15 options. These include pre-post incrementation, fake landing pages and Search Ads creative sets, among others. Mixing them in the same strategy may not be too bad either, so try it out.

  • The bonus:

Watch out for Apple’s future moves. They may upgrade Product Page Optimization with text-based A/B testing next. When, or if they do, you could get informed fast and adjust timely with proper observations.

3. Customizability

This also ties in to the relations between custom app store presence and A/B testing: While you can surely run experiments on custom Play Store listings, whether it will be the same on custom Product Pages remains a puzzle. If yes, possibilities for iOS CRO would be immensely broadened, e.g. with something like “Custom Product Page Optimization” — but that’s a big “if”.

In fact, some evidence may suggest that it will unlikely be the case. As presented earlier, the Promotional Text is proof of the disconnection between Custom Product Pages and Product Page Optimization, since it can be customized but not A/B tested. Another proof is the App Icon, which can be tested but not customized. Therefore, if A/B tests can run on Custom Product Pages, App Store would become unbearably inconsistent.

The problem with this is: It doesn’t sound like something Apple would allow. This means they wouldn’t allow any crossover between the two features at all. Hence, it’s safe to say we can’t expect to A/B test Custom Product Pages.

How connected Custom Product Pages and Product Page Optimization will be will depend on Apple’s tolerance towards the two’s inconsistencies (Source: Apple)

There’s still hope, though. While the Promotional Text and Icon are where the two features diverge, they do converge at the Screenshots and Previews. If we’re lucky, Apple will let us test one of these two assets, or both, on Custom Product Pages. Although this scenario sounds far-fetched, it remains a logical possibility and shouldn’t be dismissed too soon.

For now, and before iOS 15 arrives, what you can do is keep watching for more news from Apple and plan for both scenarios:

  • No A/B testing with Custom Product Pages:

You could still conduct experiments for CRO with Custom Product Pages without A/B tests. Remember the fake landing pages method? Well, you can follow similar principles and turn Custom Product Pages into “cloned landing pages” to serve idea validation purposes.

Theoretically, this would require, for instance: One traffic source, one campaign, two identical audience or ad groups, two different URLs, which link to two different Custom Product Pages, which then show two variants of some asset. Practically, the full extent of this method’s applicability is unknown. Thus, it’s important to plan a pilot project to test it and see if it’ll be profitable, feasible, or technically possible, setup-wise, at all.

  • Partial A/B testing with Custom Product Pages:

A/B testing is, as said, still probable on Custom Product Pages with assets eligible for Product Page Optimization, namely the Screenshots and Previews. If this turns into reality, you’d need custom CRO strategies, with one focus being on both testable and customizable assets, and the other being on either testable or customizable assets.

For example, you hypothetically wanted to customize the US localized Product Page to target 10 market segments with 10 user personas, but only the Screenshots were both testable and customizable. You also wanted to run an A/B/C/D test. The asset production plans would, then, need to cover up to 40 sets of Screenshots, but only up to 10 Promotional Texts and four Icons. If this pattern persists, you’d ideally need three of such plans to avoid mixing up different processes and requirements. Rather plan now than later.

4. Icon testing

The Icon is a vital visual asset of your app, which expresses its sense of identity. If anything, it should be the one consistent asset across the app stores. Unfortunately, it isn’t.

First, you can freely upload new Icons to test them only on Google Play, not the App Store. Apple will require all variants of the Icon (both on-device and and in the Store) to be added to the app’s binary beforehand and to be reviewed together with new app releases. So, if they’re rejected or delayed, A/B testing will be affected.

Second, also related to the app binary requirement, while Google allows independent changes between on-device and in-store assets, Apple will enforce some degree of dependency between them for the Icon. Specifically, if you apply an App Store Icon variant via Product Page Optimization, the matching on-device Icon variant will automatically replace the original.

Change App Store Icon, change on-device App Icon (Source: Apple)

With this in mind, here’s what to consider before iOS 15:

  • Schedule Icon tests with product or app development team:

If an upcoming release is deemed too risky for some reason, don’t schedule it with an Icon test. Similarly, if the product development roadmap indicates potential delays in app releases that your Icon tests can’t afford, schedule them with separate dates.

  • Plan on-device App Icon tests after applying App Store Icon variants:

Apple announced the required match between App Store Icon and on-device App Icon only in the context of Product Page Optimization. It’s unknown whether the same will be required outside of it. Particularly, after the variant is applied on the App Store, and it’s become the new default Icon, can you independently change the on-device Icon back to original?

This is possible pre-iOS 15, e.g. via Xcode, without A/B tests in consideration. If it remains that way post-iOS 15, you could perform sequential analyses and measure metrics like sessions, DAUs, retention rate, etc., to see two things: Whether the new home screen Icon also has an impact, or just the App Store Icon; and whether this impact is significantly positive or negative. In shorts, post-install performance could potentially be monitored to gain more insights into Apple’s enforcement.

Overall, Icon testing will have different roles across different app stores. While it’s purely ASO on Google Play, it will play a much greater role on the App Store, with potential relations with product, backend and retention. Therefore, the earlier you understand their differences, the sooner you can get prepared for iOS 15.

5. Scalability

Localization is a vital piece of ASO in general and CRO in particular. That’s why it’s crucial to be able to run A/B tests on localized Product Pages. On top of that, the more localized tests you can run simultaneously, the more efficiently you can scale overall CRO.

On Google Play, up to five of such tests are allowed (not including Custom Store Listings). On Apple App Store, the number is unknown. If it’s identical, you can scale CRO on both stores the same way and, hence, not much adjustments will be required. By contrast, if one store allows significantly more tests to be run at the same time than the other, you’ll need to plan ahead and account for their differences to make informed decisions.

Why does it matter? Two reasons:

  • In standard ASO:

If you can upscale CRO for an app store, you can fast-track the learning on what works and what doesn’t. The more you learn, the better-informed your decisions. That’s why different scalability results in different CRO performance levels. Eventually, asset production, human resources, time and efforts will have varying values between app stores. Then, they should be managed differently.

  • In unusual ASO, such as seasonal campaigns or extraordinary contexts such as COVID-19 pandemic:

Imagine you have three months to test all assets for 10 localizations before the launch of a critical associated marketing campaign. For the same idea or hypothesis, you’d need to go through two turns of 5 localized experiments on the Play Store to cover them all. If each turn takes three weeks to yield statistically significant result, you’d need 1.5 months to complete a full series.

This means a time frame of three months would allow two ideas to be tested before launch. If Apple allows less than five tests to be run at the same time, and under the same condition, 10 localizations would be too many for three months. You’d either need to test fewer ideas, or on fewer localizations, or get more time.

Differences in scalability means inconsistencies between App Store and Play Store A/B testing and imbalances in overall CRO (Image by Barbara Horn, Unsplash)

In order to be better protected against iOS 15-induced uncertainties, consider the following:

If it’s consistency that you’re after, e.g. every idea is tested in all localizations at least once, then you’ll need to think about adjusting the pace. For example, you can increase the pace of A/B testing on the app store with lower scalability by giving each test in each localization less time to run, while doing the opposite on the other app store and make it’s A/B tests “wait”. This isn’t recommended in the case of low traffic, which may cause insignificant results.

Alternatively, if the strategy is about optimal run rates, then it wouldn’t make sense to “wait”. The pace should be accelerated whenever viable. Naturally, A/B testing on one store will be left behind. Since there’s nothing we can do about it — at least not yet anyway — you can change the focus and think about adjusting the efforts. If CRO inevitably lags behind on one of the app stores, then rather invest more efforts in other elements like keyword optimization.

Lastly, sometimes you’ll find yourself in a tight spot where time is short and A/B testing must complete before a project or campaign rolls out. You’ll need both high pace and effort on both stores to meet deadlines. This is when adjusting the priorities will help. If you can’t run tests to optimize all localizations or all assets, do it for only the critical ones. It will make sure the “must have’s” for your campaigns will be optimized in time.

6. Setup flexibility

The extents to which you can flexibly set up A/B tests will also vary between the app stores. In fact, their variance will be caused by two factors:

  • Individual assets:

While Google allow both visual and textual assets to be A/B tested, Apple will let you test only the former. This means even if you can translate creative ideas, concepts, messages or hypotheses into appealing copy, you can’t set it up for testing. That’s half as flexible as Google Play.

Google allow more assets to be tested (Source: Google)
  • Combinations of assets:

Test setups on Google Play can get even more flexible when multiple assets are combined. Specifically, you may have text-text, visual-visual or visual-text combo experiments. On the App Store, it’s also possible to combine several assets in a test (otherwise Apple wouldn’t advise us to limit them). However, the best you could do is set up only visual-visual combo experiments. That’s a third as flexible as the Play Store.

Apple also allow multiple assets per test (Source: Apple)

So, what should we do when we are less flexible? Answer: Increase the depth. No testing with text assets means more testing with visual assets. Here’s what “more” to consider:

  • Logo:

The logo is the face of your whole brand, not just the app. Without the need to worry about text assets, there’s more capacity to play with how it’s shown. Sizes, positions, compositions, etc., could be the grounds on which test hypotheses are formulated.

  • Colors:

Testing with the color palette isn’t new, but it isn’t ubiquitous either. If you don’t have to worry about text assets, you can invest in finding and validating more ways to play with individual colors and their mixes.

  • Wordings:

Just because text assets aren’t testable doesn’t mean you can’t play with words. Certain treatments like word marks on the Icon or captions on the Screenshots can also prove to be great places to experiment. With the absence of text assets, they are the only means through which you can observe the effects of different wordings on conversion.

  • Imagery:

Visual assets won’t be complete without the images they rely on to deliver the messages. When no texts can be tested, CRO depends more on how images are utilized to persuade users. This makes them relatively more impactful than on the Play Store.

7. Creative freedom

A/B testing isn’t only about the technical factors influencing CRO. On top of the test designs, setups and hypotheses, it must also cover the creative ideas, concepts and stories. These are what get translated into the tangible app store assets that you must upload before any test could run. Thus, it’s important to have enough creative freedom to allow bold and innovative ideas to be tested and, in turns, contribute to CVR uplifts.

Of course, Apple and Google allow different extents of such creative freedom. On the Play Store, test assets never seem to be under Google’s review. All policies and metadata guidelines are meant for the Store Listings, not Store Listing Experiments. Therefore, it’s safe to say you can virtually test anything with any eligible asset without restrictions, as long as you don’t apply the “risky” variants (I’ve tested and could validate this assumption myself).

However, this begs a question: If you won’t apply it, what’s the point testing it? Let’s answer that with another question: What’s the point of A/B testing? Well, we test to learn. We can learn what ideas could increase CVR first, then we learn how to execute those ideas without violating policies. In shorts, A/B testing on the Play Store allows you to isolate the story from the storytelling, or the substance from the presentation. This gives more meanings to test results.

The learnings are more important than the tests (Image by Robo Wunderkind, Unplash)

On the App Store, such an isolation between the story and the storytelling won’t be possible. As per Apple’s announcements, all individual test variants will be reviewed independently. If you test a brave and bold yet risky idea with them and they get rejected, you’ll never know if the idea works. You’ll only know the execution doesn’t. Plus, test schedules would be delayed. This is why you’ll have to deal with much less creative freedom in App Store A/B testing compared to Play Store.

What does this mean for iOS 15 preparation? Answer: Your CRO strategies should be more conservative for the App Store and aggressive for the Play Store. This means:

  • Explore bold ideas on Android, consolidate them on iOS:

Store Listing Experiments should be the testing ground for bold and risky ideas, as they allow higher creative freedom. Overtime, you can relatively point out which of them are beneficial for CRO. Then, you can move on to test and learn which approaches are acceptable to publicly present such ideas — whichever Google allow you to apply.

Subsequently, such learnings can be consolidated further with Product Page Optimization. If you find an asset iteration accepted by Google, chances are Apple will be fine with it as well, given their similar guidelines and policies. That’s how to test bold ideas with minimal rejection risks on the App Store.

  • Run tests incrementally from safe to bold ideas on iOS:

While waiting for Play Store A/B tests to “show the way”, App Store A/B tests could also run independently to “find the way”. Start with safe ideas first, then proceed with incrementally riskier ones, until you approach the “restricted areas”. It will show how bold Apple will allow your ideas to be. CRO will take more time, but at least you won’t be stuck while waiting for Android test results, or misguided by untested assumptions.

In shorts, start aggressively on Android then scale down overtime. Simultaneously, start conservatively on iOS then scale up gradually.

Summary

A/B testing is coming on the App Store “later this year”. This gives all of us ASO enthusiasts great hope, that App Store CRO will finally have a chance to pick up the pace. Yet, with at least 7 fundamental differences between App Store and Play Store A/B testing, this hopeful journey in the future won’t be easy. Here’s where they are different in:

  • Data availability
  • Choice of assets
  • Customizability
  • Icon testing
  • Scalability
  • Setup flexibility
  • Creative freedom

The uncertainties that each of these presents will give App Store CRO an uncertain future. One way to deal with them is to plan ahead and future-proof your ASO. Directions for such plans have been outlined above. Time to put them to test as early as possible and see where it leads.

--

--