Sample Sizes and Statistical Significance

Not too long ago, a certain anti-vaccine individual decided that he was going to get his Master of Public Health degree in epidemiology at the same university where I got mine. (It was yet another troubling indicator that has since led me to believe that he wanted to emulate me, be the anti-Ren.) Over the years that he has been active online, writing incredibly racist and troubling things about people who promote the safe and rigorous use of vaccines, this individual has also mocked my epidemiological and biostatistical abilities.

At one point, in a discussion about p-values, he decided that increasing the sample size of a very small study would make the apparent association between a vaccine and a condition more statistically significant. The p-value was borderline significant, and I pointed out that increasing the sample size was necessary, and that it might even make the p-value not statistically significant (greater than 0.05). To this, he commented that I was somehow embarrassing the university where I was working on my DrPH because, in his mind, increasing the sample size of a study only makes the observed p-values more significant.

I didn’t have time back then to educate him on the nature of biostatistics and how there is the very real possibility that increasing the sample size of something may make the p-value of an association less significant. It’s actually effortless to understand the concept if you think about it. Let me explain with an example.


Suppose that you walk into a bar and there are 14 people there. You know from your reading about the bar that people there can wear either a red or a green shirt, and you theorize that people who wear red are more likely to be drinking beer. Under random conditions, the proportion wearing red would be 50%, and the proportion wearing green would of course be 50%. Similarly, the proportions drinking beer and drinking mixed drinks would be 50% each.

So you walk in and see that nine people are drinking beer, and seven of them are wearing red shirts. Of the five who are drinking mixed drinks four are wearing green shirts. Thus, 7 of 8 (87.5%) wearing red are drinking beer, and 2 of 6 (33.3%) not wearing red are drinking beer. Your theory has been proven correct, and you decide to publish your findings. But is that really all that influenced the observed association? What about all the other variables?

A quick chi-square analysis, just for kicks.

You ask the bartender about drinking habits, and he tells you that women tend to drink mixed drinks while men tend to drink beer. You look around, and you see that all ten of your subjects are men. That changes the math. Out of ten men, eight are drinking beer and two are not. Not only that, but seven are wearing red and three are not. Maybe there’s a preponderance of red not because of the beer but because men tend to wear red?

Not only that, but the bartender tells you that you’ve come in on a weekday. “Come in on Saturday,” he says. “That is ladies’ night.” And you do because who doesn’t like ladies’ night? That night, there are 100 people in the bar, the male to female ratio is 1 to 1, and you see that most men are wearing red while most women are wearing green. Furthermore, you see that most women are drinking mixed drinks while most men are drinking beer. So was it the shirts that predicted the drinking? Or was it gender?

Or was it the day (or night) of the week? The bartender also tells you that beers are $1 on Tuesday nights starting at 6pm, just after the nearby [stereotypical male-oriented worksite] lets out. But mixed drinks are $1 on Saturdays starting at 1pm, just after the nearby [place where women congregate]. So maybe gender had nothing to do with it? (You might suggest to the bartender to flip the script so you can see if men will drink mixed drinks when they’re $1 and women will drink beers when they’re $1.)

My examples are ridiculous, I know.

In the ladies’ night scenario, you’ve increased your sample size from ten to 100, and you washed out the influence of shirts on drinking habits and replaced it with the true (maybe?) influence of gender. It happens all the time. It’s the reason why you can’t go with small studies, and the reason why increasing the sample size will not always strengthen the statistical significance (or lack thereof) of your observation. You dig?

A Video Says It All

This video explains this and other sources of bias in studies that we are grappling with as scientists. (The effect I mentioned above is explained around 5 minutes and 55 seconds into the video.)


The online programs in public health have multiplied exponentially, it seems, in the last few years. More and more students are going straight from their undergraduate degree into a public health master’s degree without ever really having much experience in public health (or in life). This is why I’ll be the first one to remind you that an MPH or MHS or MS (or whatever) in public health is meaningless if the person holding it decides to be anti-science and a vitriolic antivaxxer.

Should you not trust someone with such a degree? Yes, but verify that what they say and do jives (or jibes) with what they claim to know. The particular antivaxxer’s criticism of me went unanswered because I really don’t have time for his shenanigans, and everyone who wanted to verify my epidemiological and biostatistical knowledge can just come to my blog or get to know me.

My work speaks for me… And, boy, have I been working a lot lately. But that’s for a later post.

2 Comments on “Sample Sizes and Statistical Significance”

  1. “I’ll be the first one to remind you that an MPH or MHS or MS (or whatever) in public health is meaningless if the person holding it decides to be anti-science and a vitriolic antivaxxer.”

    Another prime example: Dana Ullman. He is big on homeopathy.

    So, I must say: thank you for this important message.

    Also, good luck on your new job.

    Liked by 1 person

%d bloggers like this: