Science Lesson: How Understanding ‘Confounding’ Can Combat Anti-Vaping Junk Science

Most vapers who have tried to understand scientific claims (or argue against anti-vaping junk science) have acquired an intuitive understanding of the concept known as confounding. You may have found yourself arguing, “of course kids who vape are more likely to smoke for many reasons, but that does not mean it is a gateway.” Or you may have thought, “it is misleading to compare the health status of vapers to never-smokers, because vapers are ex-smokers who still have residual health effects from smoking.” Most people understand these points without knowing that they are talking about confounding, but it can still be helpful to fully understand the concept.

The word appears in most health research papers, and sometimes in popular press summaries of them. But most readers end up with a faulty understanding of the concept because the authors of those articles have faulty understandings. They confuse the misnamed “confounder” variables with the confounding itself.

Studies let us observe associations between exposures and outcomes (e.g., lung disease is more common in vapers than never smokers), but we are usually interested in causation (e.g., is vaping the cause of that lung disease). You may have seen claims that some types of studies show causation, while others only show association. This is flat-out wrong. We cannot observe causation, ever. We can only infer it from observed associations and other information. But it is true that some studies are more likely to produce an association that results from confounding rather than causation.

Confounding exists when there is a difference in the outcome between exposed people and unexposed people that is not caused by the exposure. Vapers are more likely than average to have lung disease even if vaping causes no lung disease (it is possible that it does, but the evidence does not support that claim). In this case, it is easy to predict that past smoking is the major confounder that will cause confounding. But it is also possible to have serious confounding without such an obvious confounder.

Researchers try to “control for” confounding by putting other variables into their calculations which, if everything were perfect, would remove the association due to confounding from the result, leaving only an estimate of causation. These variables get called “confounders,” which is very confusing. They are almost never the actual source of the confounding, but readers are given the impression that they are. These control variables should be called “deconfounders,” a way to reduce the effect of confounding. However, they seldom do so very effectively because they are usually only rough proxies for the real source of confounding and are used incorrectly.

So, for example, the deconfounder used to adjust away the effect of past smoking might be number of years smoked, an imperfect measure since it ignores intensity of smoking and other variables. Worse, when this is put into the calculations to try to adjust for the effect, it will typically be assumed that each additional year of smoking causes the same increase in risk, which we know is not true. As a result, the confounding from past smoking will only be partially eliminated.

Moreover, most attempts at deconfounding are not even that thoughtful. Consider the gateway studies, where we know that a kid who is inclined to vape is more inclined to smoke than average (regardless of whether he actually does vape). This confounding guarantees there will be an association that can be misconstrued as showing there is a gateway effect (causation). But instead of trying to figure out how to best measure this propensity and correct for it, albeit imperfectly, the researchers just throw in whatever variables they happen to have, and claim this controls for confounding. This error is common throughout health research. It turns out that this approach is almost as likely to make the confounding worse as it is to reduce it.

Even if the deconfounder variables they use are proxies for some of the differences between vapers and non-vapers (e.g., parental smoking status probably predicts whether someone likes nicotine, though rather poorly), they fall far short of accounting for all them. The result is substantial “residual confounding” – the effect of confounding even after the deconfounder variables are put in the calculation. It is generally safe to assume that a study’s estimate is biased by residual confounding. If the unadjusted estimate of the association (no deconfounders in the calculation) is that vapers are 2.4 times as likely to start smoking, and the adjusted estimate drops by half, to 1.7, after some other variables are included in the model, it is reasonable to guess that a better job of controlling would reduce it clear to 1.0.

Do you really need to know these technical points about confounding if you already intuitively understood the examples? It is certainly not necessary, but it has some advantages. First, it makes your arguments seem stronger. It is perfectly valid to say, “the kids who vaped are more likely to have smoked anyway.” But it might carry more weight to add, “…and there is clearly residual confounding because the poor deconfounder variables in the calculation could not possibly fully measure that propensity difference.” Second, knowing what confounding is makes it easier to recognize when it might affect an estimate. Think about reasons why the exposed and unexposed populations might have different rates for the outcome, apart from the exposure.

Third, and most important, it is critical to understand what an adjusted model is attempting to do, and why it usually falls far short of eliminating confounding. If the deconfounder variables are clearly failed estimates of the confounding, they will not deconfound. If a poor adjustment for confounding changes the estimate, you can be almost certain that a proper adjustment would move the estimate even further in the same direction.

Follow Dr. Phillips on Twitter