Test Using a Minimally-Sufficient Sample Size

Please Note: If my university statistics professors are reading this, I apologize for glossing over the concepts of "standard deviation," "confidence levels," and different kinds of distributions. However, these simplifications have worked well enough for my clients to earn millions of dollars and squeeze-out the vast majority of the potential profit from their mailings.

To my readers: Here is what you need to know from a few college-credit courses in math and statistics, summarized into 718 words...

Suppose your goal is to test how frequently a flipped coin would land on "tails." How many test coin flips would be necessary so that you could have reasonable confidence that the results of your test would be closely-predictive of the future results of 1 million coin flips?

To explore that question, let's imagine the extremes.

  • Imagine that you flipped the coin just once, and it landed on tails that one time. Based on the test results of your sample size of 1, would you now be confident in predicting that the coin would land on tails in 100% of future instances? Of course not. The test's sample size was too small.
  • On the other hand, imagine that you flipped the coin 100,000 times and it landed on tails 50,007 times. Based on your sample size of 100,000, would you now be confident in predicting that the coin would land on tails in approximately 50% of the remaining 900,000 flips? Yes, sure. But, you worked too hard to find that out. (Your poor thumb probably has calluses on its calluses by now.)

The trick is to choose a sample size that is large enough to provide a statistically-valid result, but not so large that you use-up too much of your available mailing list names on the testing portion before roll-out.

So, if you had a mailing list size of 1 million people, what is the right sample size: 1,000? 5,000? 25,000? Suppose that your whole mailing list size was 20,000: What would be the right sample size then?

My answer is: "As many as it takes to generate at least 50 successful responses for each element that you're testing." That translates to a simple equation:

50
÷
Guesstimated Response Rate
=
Minimally-Sufficient Sample Size

For instance, suppose that your educated guess is that the response rate for a particular test will be 2%. Then, the minimally-sufficient sample size is 50 ÷ 2% = 2,500. On the other hand, if you guess that the response rate will be only 1%, then the minimally-sufficient sample size is 50 ÷ 1% = 5,000.

To demonstrate that this makes sense, let's return to the coin flip example. Let's assume that your educated guess is that tails will appear 50% of the time that the coin is flipped. In that case, the minimally-sufficient sample size for your test is 50 ÷ 50% = 100. In other words, it will take a sample size of 100 coin flips to have reasonable confidence that the results of your coin-flip test would be closely-predictive of future results. I encourage you to do this experiment yourself. Flip a coin 100 times. Note how many times that "tails" appears. Now try the test again. And again. You'll see that tails will appear between 45% to 55% of the time in almost every test. If tails appears 48 times out of 100 in your test, then you would be right to predict that if flipping that same coin 100,000 times, tails would appear approximately 48,000 times. If that same coin were flipped 1 million times, you would be right to predict that tails would appear approximately 480,000 times.

Now, let's apply this technique to email advocacy. Suppose that the necessary sample size is 5,000 and you need to test 3 different concepts. In that case, you would use-up 15,000 emails for the test prior to roll-out. If your total list size is 100,000, then you would send the winning email-concept to the remaining 85,000 people on your list. If your total list size is 400,000, then you would send the winning email-concept to the remaining 385,000 people on your list.

By increasing your sample size beyond the amount necessary to generate 50 responses for each element tested, your prediction of future results will be slightly more accurate, slightly more of the time. But the marginal benefit of those small improvements will be small. And, more of your list will have received failed test concepts rather than the roll-out.

Caution: Due to random variation (aka "Lady Luck"), no testing is perfectly predictive. Even when mailing to a statistically-valid sample size under perfectly representative conditions, random variation will cause your predictions to be wrong approximately 5% of the time. (Think of it this way: Even when a sports team is hugely favored to win, improbable circumstances cause them to lose every once in a while.)

Test "Creative" and "Offers"
Prior to Roll-Out
Test New Email Concepts
vs. Your "Control"
Terms of Use | Copyright © 2007 IssueMarketing.com All Rights Reserved.
Google