28. Simpson's Paradox | georgeszpiro

To Treat or not to Treat?

Simpson’s Paradox

The Question:

A pharmaceutical company has discovered a new treatment for a disease. In clinical trials with young patients, the treatment was 90% effective, while subjects treated with a placebo had only an 80% chance of recovery. The results were a little less encouraging for older patients, but recovery rates were still 60% with treatment and only 50% without treatment. So, the treatment should definitely be approved for this disease.

Correct?

The Paradox:

This needs context.

In particular, we must look at the numbers in more detail. Let’s say 200 young people were treated and 800 young people received the placebo. Among the trials with older patients, 800 were treated and 200 received the placebo. Recall that of the 200 youngsters who were treated 90% made a recovery; of the 800 who only got the placebo, 80% recovered. For the elderly it was 60% recovery with treatment, 50% without treatment.

With treatment Placebo

Youngsters 180 of 200, i..e, 90% 640 of 800, i.e., 80%

Elderly 480 of 800, i.e., 60% 100 of 200, i.e., 50%

Obviously, the treatment was more effective than the placebo, both for the young and the old. But that’s not the whole story, there’s a surprise in store. When inspecting the entire clinical trial, it turns out that 740 of the one thousand people who received only a placebo recovered on their own, while only 660 of the one thousand people who received the treatment recovered.

Entire trial 660 of 1000, i.e., 66% 740 of 1000, i.e., 74%

The conclusion seems to be that when the patient is young or old, treatment should be administered. But when the age is unknown, treatment should be withheld.

Huh? Imagine a health hotline. Over the phone, the doctor asks the patient’s age and after considering the answer prescribes treatment…regardless of the answer. If, however, the patient does not reveal her age, the doctor recommends against treatment.

What a ridiculous conclusion!

Background:

The paradox is named after the British statistician Edward High Simpson who had worked at the famed Bletchley Park as a codebreaker during World War II. After the war, he entered the British civil service and remained there until his retirment as Deputy Secretary of the Department of Education and Science. Simpson described the paradox in a paper that he published in 1951 while at Cambridge University. However, Karl Pearson and UdnyYule, two of the founders of mathematical statistics, had already identified the paradox half a century earlier, at the turn of the twentieth century.

Once the cause of this statistical puzzle is understood (see below), the conundrum is no longer considered a paradox and the phenomenon is now often referred to as the Yule-Simpson Effect.

Dénouement:

The erroneous conclusion – that treatment should be withheld if the patient’s age is unknown – derives from the fact that the sample numbers differ so markedly. The data show that it is more difficult to treat the elderly. Even though overall, the treatment is effective, its effectiveness shows up only in 60% of the elderly cases. However, since the trial included many more elderly subjects than young ones, the overall average of the treated population is pulled down. On the other hand, youngsters recover much more easily, even those who get placebos. But since so many more youngsters were given placebos, the average for the entire placebo-population is pulled up. Hence, the overall averages for the treated population and for the untreated populations, the so-called weighted averages, provide an incorrect picture.

Mathematically, Simpson’s Paradox arises because we may have two fractions a/b and c/d such that

a/b > A/B and c/d > C/D, but (a+c)/(b+d) < (A+C)/(B+D).

In our example,

180/200 > 640/800 and 480/800 > 100/200,

but (180+480)/(200+800) < (640+100)/(800+200).

So, to answer the question posed at the outset: yes, the treatment should definitely be approved and recommended…for everybody.

Technical supplement:

Food for thought: If the pharmaceutical company had performed the clinical trials but neglected to ask the subjects their age, they would have had to conclude that the placebo, with a 74% recovery rate, outperforms the treatment. Their research effort would have been deemed a failure even though the treatment would have helped mankind.

On the other hand, a company could artificially claim success by conjuring up all kinds of stratifications of the data. Even though recovery rates might be lower overall with the treatment then without, a researcher under pressure to demonstrat success at any price, could claim that recovery rates are higher for, say, both left-handed people and right handed people, for subjects with both blue eyes and non-blue eyes…

In the early days of big data, this is the sort of activity that gave data-mining a bad name. Honest science demands that a causal effect be postulated at the outset of an experiment, before data are collected and the hypothesis tested. If a researcher combs through the data after the research has ended, to seek and pick judicious tidbits – for example, to find that left- and right-handedness happen to render the results significant evne though there is no fathomable reason why handedness would cause the //// – he or she is guilty of ex post rationalization, definitely a no-no in good science.

Comments, corrections, observations:

George G. Szpiro