68. Lindley's Paradox | georgeszpiro

68. Are More than Half the Babies Boys?

Lindley's Paradox

The Question:

In a certain country, in a certain year, one million babies are born, 501,200 boys and 498,800 girls. We believe that the true proportion of baby boys to baby girl is half and half but, of course, we do not expect the number of boys and girls to be exactly half a million each. Some random deviation will always occur.

But does a prevalence of boys of 50.12% indicate a bias towards male children? Or is a preponderance of up to 2,400 boys among a million babies statistically inevitable? In other words, is a birthrate of 50.12% versus 49.88% compatible with a proportion of fifty-fifty?

The Paradox:

Yes and No, depending on whom you ask.

For the purpose of this question, statisticians can be divided into Bayesians (see below) and Frequentists (see also below). The truly surprising answer to the above question is that the two groups do not agree in their answer.

Bayesians, on the one hand, would claim that even if one hypothesizes that the true distribution of boys and girls is 50-50, there is – according to their methodology – a very good chance that in any given country, in any given year, a deviation of up to 50.12% vs. 49.88% may occur. Hence, the deviation is random, they claim, and the birthrate in that country supports the hypothesis that boys represent one half of all babies born.

Frequentists, on the other hand, argue differently. Their statistical methodology shows that there is only a very low chance that if the true rate were half-half an outcome with a 1,200 preponderance of boys would occur. Hence, they reject the hypothesis that the deviation is only random, and do not accept the hypthesis that the true proportion is, in fact, fifty-fifty.

A paradox!

Background:

The problem was first pointed out by the British statistician Sir Harold Jeffreys who discussed it in 1939 in a textbook on statistics.[1] But it was only two decades later, when the Cambridge statistician Dennis Lindley published the paper “A Statistical Paradox” in 1957, that it gained prominence.[2]

The crux of the matter is a theorem, devised in the 18th century by Thomas Bayes (1701-1761). Bayesians compare a hypothesis H0 with a competing hypothesis H1. At the outset, both hyptheses are assumed to have certain probabilities, for example 25% chance that H0 is correct, 75% that H1 is correct. With the gathering of additional data, the probabilities must be revised. As evidence supporting H1 pours in, its probability is raised, or vice versa. Bayes designed a formula that tells how the probabilities must be updated.

In his paper “A problem in forensic science”, Lindley presented an interesting instance of the paradox.[3] Glass shards are found on a suspect’s clothing. The question is whether the refractive index of the shards matches the refractive index of a window broken during a burglary. The suspect may be guilty or innocent, depending on whether the court uses the Frequentist or the Bayesian approach.

Dénouement:

Bayesians are of the opinion that at the outset, before the babies are counted, there’s a fifty percent chance that hypothesis H0 is correct (“the true ratio of boys to girls is fifty-fifty”) and a fifty percent chance that the alternative hypothesis H1 is correct (“the true ratio is not fifty-fifty”). If H1 is correct, all boy-to-girl ratios between zero and 100% are equally likely. Once, the actual proportion (51,200 boys vs. 48,800 girls) is determined, Bayesians update the chances that the hypothesis is correct. According to Bayes’ formula – see the footnote for the nitty-gritty details[4] – the chances are updated from fifty-fifty to 98 to 2, which means that there is a 98% chance that the proportion of boys to girls is half-half. Hence, Bayesians accept the hypothesis.
The Frequentist’s argument goes as follows: the ratio of boys to girls corresponds to the so-called binomial distribution which, for large numbers of babies, is similar to the normal distribution. This means that 68% of cases would lie within one standard deviation of the mean, 95% within two standard deviations. (See any introductory textbook on statistics.)

Now, the variance of a binomial distribution with a million babies, is 250,000;[5] hence, the standard deviation is 500. A surplus of 1200 boys – corresponding to a departure of 2.4 standard deviations from the conjectured fifty percent boys – would be a rare occurrence. It would happen in only 1.6% of all cases. This is why Frequentists would claim that the observed data disagrees with the hypothesis that the true ratio is half-half.

Technical supplement:

Bayesians conclude that the 50.12 vs. 49.88 outcome is not very far off from the 50 vs. 50 proportion; hence they conclude that the result supports the hypothesis “half-half’.

The Frequentists’ approach is more diffuse. They compare the 50.12 vs. 49.88 outcome with any proportion between zero and one and conclude that “fifty-fifty” does not explain the outcome very well. For example, 50.05% boys vs. 49.95% girls would be a better explanation of the data. So maybe that is is the true ratio of boys to girls? Or maybe 50.2? In general, frequentists refuse a specific hypothesis more easily.

[1] Jeffreys, H. (1939). Theory of Probability. 1st ed. The Clarendon Press, Oxford.

[2] Lindley, D. (1957). A statistical paradox. Biometrika, 44 187–192

[3] Lindley, D. V. (1977). A problem in forensic science. Biometrika, 64(2), 207–213.

[4] The theory of combinatorics says that

////

and

/////

The updating is performed with the help of Bayes Theorem

///////

[5] Variance = np(1-p) = 1,000,000 x 0.5 x 0.5. The standard deviation is the square root of the variance, i.e., sqrt(250,000),

Corrections, comments, observations:

George G. Szpiro