Search

defense

Abraham Wald's Statistical Genius: Survivorship Bias, Sequential Analysis, and Modern Decision Theory

The contribution of Abraham Wald, a Hungarian-born mathematician who fled Nazi persecution and sought refuge in the United States, to the war effort continues to live on today across a wide range of fields, from data science to medicine, finance to artificial intelligence, through the concepts of ‘survivorship bias’ and ‘sequential analysis’.

During the most intense periods of the Second World War, the military strategies of the Allied forces were shaped not only by courage on the front lines, but also by equations developed in the secret corridors of Columbia University. One of the brightest figures on this mathematical front was Abraham Wald, a Hungarian-born mathematician who fled Nazi persecution and sought refuge in the United States.1 Wald's contribution to the war effort continues to live on today through the concepts of ‘survivorship bias’ and ‘sequential analysis,’ which span a wide range of fields from data science to medicine, finance, and artificial intelligence. (sequential analysis). This article will comprehensively examine Wald's theoretical depth, which goes beyond his famous aircraft diagram and fundamentally changed statistical thinking, as well as the implications of these theories in the modern world.

The Discovery of Invisible Data: Survivorship Bias and Epistemological Foundations

Survival bias is defined as the tendency to focus on objects or individuals that have passed through a selection process, while neglecting the group that failed in that process and thus became ‘invisible’.3 This logical error arises when the nature of the data set is inherently incomplete and can lead researchers to highly misleading conclusions. Abraham Wald's 1943 study on aircraft armouring is considered the most iconic example in the history of science for identifying and correcting this bias.1

Historical Background and the Statistical Research Group (SRG)

In the summer of 1942, shortly after the United States entered the war, the Statistical Research Group (SRG), composed of distinguished mathematicians, was established. This secret unit, operating within Columbia University, has been described as ‘a Manhattan Project fighting with equations’.6 The SRG included Milton Friedman, George Stigler, and statistical giants W. Allen Wallis and Frederick Mosteller, who would later win the Nobel Prize.5

Wald, a Jew born in Austria-Hungary, studied mathematics at the University of Vienna, but struggled to find academic positions due to the political climate of the time. With the support of Karl Menger, he worked on economics and geometry, and in 1938 he emigrated to the United States to work at the Cowles Commission.1 Wald's role at the SRG was to develop an armouring strategy that would minimise the losses of Allied bomber aircraft under enemy fire. The army wanted the aircraft to be armoured, but the weight of the armour increased fuel consumption and reduced manoeuvrability. This situation created an optimisation problem that required the armour to be placed only in the most necessary areas.8

Wald's Famous Aircraft Diagram and Logical Reversal

Military officials had meticulously mapped the distribution of damage on aircraft that managed to return to base. The data showed a high concentration of anti-aircraft fire holes in the fuselage and wings, with almost no damage around the engine and cockpit.8 The army's intuitive conclusion was this: if anti-aircraft fire holes hit the fuselage most, then the fuselage should be reinforced with more armour.5

Wald realised that this approach was based on a fundamental assumption error. Using mathematical reasoning, he asked: ‘Where are the damage holes caused by missing anti-aircraft fire?’6 Wald's revolutionary insight was based on the fact that the data set being examined (returning aircraft) did not represent the entire population (all aircraft that went on missions). Aircraft hit in the engine area did not return to base and therefore could not be included in the analysed data set. 1 The damage zones on returning aircraft proved that the aircraft could continue flying despite being hit in those areas, meaning those areas were relatively ‘resilient.’3 Wald suggested that armour be placed not in areas with anti-aircraft fire damage holes, but in critical areas like the engine and cockpit where there were no holes. This is because a single hit to these areas would result in the loss of the aircraft.6

Methodology: Wald's Statistical Approach and Mathematical Model

Wald's success lies not only in his intuitive awareness, but in his ability to ground that awareness in a rigorous mathematical framework. His methodology is built on the assumption of ‘random distribution’ and the probabilistic estimation of ‘unseen data’.

Probabilistic Analysis of Hits

Wald assumed that the probability of hitting each part of the aircraft was equal (uniform distribution) in proportion to the surface area of that part of the aircraft.5 If the hits were randomly distributed, the absence of hits on certain areas of returning aircraft does not mean that those areas were not damaged by anti-aircraft fire; rather, it means that aircraft hit in those areas by anti-aircraft shells did not survive.5

One of the basic models used in Wald's analysis defines the probability of survival in terms of the number of hits. The probability of an aircraft surviving after receiving i hits (Pi) can be formulated as follows:

Pi = P(i hits and survival) = Qi . λi

Here, Qi represents the probability of survival when receiving exactly i hits, while λi represents the probability of an aircraft receiving i hits during an operation.12 Wald demonstrated that these probabilities differ for different parts of the aircraft (engine, wings, fuselage). If the Qİ value for the engine region is very low, hits to this region will cause the aircraft to crash, and the observed Pİ value will also remain low. Wald's methodology aims to identify the most vulnerable points of the system based on these ‘near-zero’ observations.6

The table below summarises the fundamental methodological differences between Wald's analytical approach and traditional military logic8

Table 1; Key methodological differences between Wald's analytical approach and traditional military logic

 

Methodological Element8

Traditional Approach8

Wald's Approach8

 

Data Source

Only visible evidence (survivors)

Visible evidence + Missing data estimation

 

Sample Structure

Selected (biased) subset

Projection of entire population

 

Basic Assumption

Damage intensity indicates need

Lack of damage indicates no critical threat

 

Statistical Focus

Observed frequencies

Conditional probability and survival rate

 

Field of Application

Repair and reinforcement of damaged areas with armour plates

Reinforcement of vulnerable areas with armour plates

 

Sequential Analysis: A Dynamic Revolution in Statistical Processes

One of Wald's most profound contributions to the world of statistics is the ‘Sequential Analysis’ method, which is much more technical and has a wider field of application than survival bias. Developed during World War II to accelerate quality control processes in ammunition production, this method eliminates the requirement for data to be collected with a predetermined fixed sample size.1

Sequential Probability Ratio Test (SPRT)

In classical Neyman-Pearson statistics, a researcher determines the sample size (N) before the experiment and does not make a decision until the data collection process is complete. However, Wald realised that this method led to a waste of time and resources.

 

Wald's Sequential Probability Ratio Test (SPRT) allows a decision to be made after analysing the current data following each new data point (or data group).13

The SPRT procedure provides one of three decisions at each step1:

The image above shows the SPRT decision-making process.

Secrecy and Declassification: Statistics as a Military Secret

Sequential Analysis proved so effective in optimising the production capacity of the Allies that the US government kept this work classified as ‘top secret’ throughout the war. Wald's technique reduced the sample size required to test the reliability of ammunition by an average of 50%, resulting in enormous savings for the war economy.13 Wald was only able to share his theory with the academic world after the war ended, 1947 in his book ‘Sequential Analysis’.1

A Standard in the Medical World: Sequential Design in Clinical Trials

The mathematical framework Wald developed during the war years to measure bullet and bomb quality is now at the heart of the pharmaceutical industry and biostatistics. Most modern clinical trials are conducted according to the principles of ‘sequential design’ or ‘group sequential design’ for ethical and economic reasons.13

Ethical Imperative and Early Termination Decisions

In the medical world, conducting a trial with a fixed sample size can sometimes be unethical. If a new drug demonstrates incredible efficacy much earlier than the planned trial duration, it is ethically indefensible not to administer this drug to patients in the control group. Conversely, if the drug shows serious side effects or if it is mathematically certain that it will provide no benefit (futility), it is wrong to continue the trial and put more patients at risk.13

Wald's SPRT method and its modern versions form the scientific basis for these early termination decisions (stopping rules). The FDA (US Food and Drug Administration) supports these adaptive designs in drug approval processes and encourages these methods, particularly in situations where rapid results are needed, such as for rare diseases or pandemics.13

Basic Methods in Sequential Analysis Clinical Applications

In clinical trials, Wald's legacy continues with ‘group sequential tests’ developed by researchers such as Pocock and O'Brien-Fleming. These methods allow for interim analyses while preserving the type I error rate (????).20

Table 2: Basic Methods in Sequential Analysis in Clinical Applications

 

Method20

Decision Mechanism20

Clinical Purpose20

 

SPRT (Wald)

Analysis after each new patient

Phase I/II safety and dose studies

 

O'Brien-Fleming

 

Very strict early on, flexible limits towards the end

Phase III efficacy trials (for early success declaration)

 

Pocock

Equal significance levels in all interim analyses

Treatment comparisons requiring rapid decision-making

 

Triangular Test

Convergence of acceptance and rejection limits at one point

 

Adaptive trials with high uncertainty

 

For example, many studies on COVID-19 treatments have made dynamic decisions as data was collected, using Wald's principles. If an antiviral drug provides 30% more improvement than expected, SPRT-based models can catch this signal early and accelerate the drug's time to market.19

Case Studies: From World War II to Modern Data Science

Survivorship bias is a cognitive trap lurking wherever data exists. Since Wald's aircraft, this concept has helped prevent critical errors across numerous disciplines.

Financial Markets and Investment Fund Performance

In the investment world, survivorship bias causes fund performance to appear inflated. When an investor examines the ‘best-performing funds’ over the past 10 years, they only see the funds that still exist today (the survivors).8 However, funds that performed poorly and were closed or merged with other funds have been removed from the data set. Research has shown that when these deleted funds are not included, average annual returns appear to be between 0.9% and 1.5% higher, meaning investors are being misled.9

Entrepreneurship and the ‘Unicorn’ Illusion

The modern business world is full of stories about ‘university dropout billionaires’ like Steve Jobs or Mark Zuckerberg. These success stories create the perception that dropping out of university is a prerequisite for success.4 However, this analysis does not include the thousands of entrepreneurs who dropped out and failed (the lost planes). Through Wald's lens, the secret to success can be understood not by looking at the survivors, but by focusing on why the unsuccessful ones were eliminated.4

Selection Bias in Machine Learning and Artificial Intelligence

Artificial intelligence systems can be victims of the data they are trained on. Sample Selection Bias occurs when a model is trained on a subgroup that is not representative of the target population. AI models developed in health sciences may produce incorrect diagnoses in the general population (especially in disadvantaged groups).28

Amazon's CV screening algorithm, developed in the 2010s and later cancelled, is a digital form of survival bias. The system was trained on data from the company's past successful employees. Since the company's past successes occurred in a male-dominated structure, the algorithm systematically eliminated female candidates, associating ‘success’ only with the demographic characteristics of those who had survived in the past.27

Discussion: Dangers and Solutions in Modern Decision-Making Processes

Abraham Wald's legacy is not just a statistical technique, but a protocol for ‘critical thinking.’ In the modern data-driven world, data abundance can sometimes act as a veil that hides ‘data gaps.’ While decision-makers look at the glowing green lights (sales, clicks, successful projects) on their dashboards, they must not forget to ask what the ‘missing data’ outside the system is telling them.8

The Wald Protocol: A New Approach to Corporate Decision-Making

To integrate Wald's mindset into modern organisations, a strategic framework called the ‘Wald Protocol’ can be proposed. This framework aims to go beyond visible successes.8

Table 3: Wald Protocol A New Approach to Corporate Decision Making

 

Protocol Step8

 

Implementation Strategy8

 

Relationship to Wald's Principle8

 

Missing Data Mapping

Asking the question, ‘What data do we not have?’

Searching for gaps in damage caused by missing anti-aircraft fire

 

Pre-Mortem Analysis

 

Imagining a project failing before it starts and writing down the reasons why

Predicting possible reasons for ‘falling’

 

Negative Data Mining

Conducting in-depth interviews with churning (departing) customers

Hearing the sound of planes that don't return

 

Red Teaming

Having a dedicated team assigned to debunk the current strategy

Mercilessly questioning assumptions

 

Reading ‘Success Stories’ (Case Studies) is a popular training method in the corporate world. However, Wald's teaching argues that failure stories (Failure Studies) are much more instructive. What successful companies do may be ‘necessary’ for their survival, but it may not be ‘sufficient.’ The real difference lies in understanding the mistakes that led to the downfall of those who failed.8

Conclusion: Wald's Legacy and Seeing Beyond the Data

Abraham Wald's death in a plane crash in India in 1950 was a great loss to the scientific community.1 However, his SRG and Columbia career, which lasted only eight years, transformed statistics from a ‘descriptive’ tool into a ‘decision-making’ and ‘strategic’ tool.2

Wald's legacy today rests on three main pillars:

First, Epistemological Humility. Wald showed us that the available data is not always the whole truth, and can sometimes even represent the most misleading part of the truth. The ‘invisible data’ is the real factor that determines the fate of the system.3

Second, Statistical Efficiency. Sequential Analysis introduced the principle of ‘as much data as necessary’ in scientific research, optimising resources, particularly in medicine and engineering. This method, now an ethical standard in clinical trials, protects the lives of thousands of patients.13

Third, Mathematical Courage. Wald demonstrated the courage to reject a truth that senior officers and experienced pilots ‘clearly’ saw, using the power of equations. As Jordan Ellenberg noted, Wald asked the most fundamental question of a mathematician: ‘What assumptions are you making, and can they be justified?’1

Although Wald's planes no longer fly in the skies today, his metaphor of ‘missing anti-aircraft fire holes’ continues to fly in the minds of data scientists, managers, and researchers. True wisdom lies not only in looking at the data that shines in the light, but also in accepting the silence of what remains in the dark as data. As Wald said, sometimes the most important thing is what is not there.8

References

1. Abraham Wald: A Statistical Hero - History of Data Science, https://www.historyofdatascience.com/abraham-wald-a-statistical-hero/

2. Abraham Wald - Wikipedia, https://en.wikipedia.org/wiki/Abraham_Wald

3. Survivorship bias - Wikipedia, https://en.wikipedia.org/wiki/Survivorship_bias

4. Survivorship Bias - The Decision Lab, https://thedecisionlab.com/biases/survivorship-bias

5. AMS :: Feature Column :: The Legend of Abraham Wald, https://ams.org/publicoutreach/feature-column/fc-2016-06

6. Abraham Wald and the Missing Bullet Holes | by Penguin Press - Medium, https://medium.com/@penguinpress/an-excerpt-from-how-not-to-be-wrong-by-jordan-ellenberg-664e708cfc3d

7. Abraham Wald, https://epub.ub.uni-muenchen.de/1808/1/paper_439.pdf

8. The Missing Bullet Holes: How a WWII Statistician's Insight Can Save Your Strategy, https://peopleplusscience.com/the-missing-bullet-holes-how-a-wwii-statisticians-insight-can-save-your-strategy/

9. Lessons from History – The Legend of Abraham Wald - RGF Integrated Wealth Management, https://www.rgfwealth.com/article/lessons-from-history-the-legend-of-abraham-wald/

10. Survivorship bias - lessons from World War Two aircraft - Clear Thinking, https://clearthinking.co/survivorship-bias/

11. The Story of Abraham Wald — Profound - Having intellectual depth and insight, https://www.profound-deming.com/blog-1/the-story-of-abraham-wald

12. Abraham Wald's [WW II] work on aircraft survivability - James Hanley, https://jhanley.biostat.mcgill.ca/bios601/CandH-ch0102/WaldAircraft.pdf

13. A Review of Sequential Analysis - DTIC, https://apps.dtic.mil/sti/pdfs/AD1184407.pdf

14. (PDF) Abraham Wald - ResearchGate, https://www.researchgate.net/publication/33028110_Abraham_Wald

15. Sequential Test for Practical Significance: Truncated Mixture Sequential Probability Ratio Test - arXiv, https://arxiv.org/html/2509.07892v1

16. A Modified Sequential Probability Ratio Test - PMC - PubMed Central, https://pmc.ncbi.nlm.nih.gov/articles/PMC9053723/

17. 9.2 - Likelihood Methods | STAT 509 - Statistics Online, https://online.stat.psu.edu/stat509/lesson/9/9.2

18. Generalised Sequential Probability Ratio Test for Separate Families of Hypotheses - Columbia University, https://sites.stat.columbia.edu/jcliu/paper/GSPRT_SQA3.pdf

19. Exact sequential test for clinical trials and post-market drug and vaccine safety surveillance with Poisson and binary data - PubMed Central, https://pmc.ncbi.nlm.nih.gov/articles/PMC8441767/

20. A Bayesian Sequential Design for Clinical Trials with Time-to-Event Outcomes - PMC - NIH, https://pmc.ncbi.nlm.nih.gov/articles/PMC7100880/

21. Early Stopping in Pragmatic Clinical Trials - Workshop Summary 20250729, https://dcricollab.dcri.duke.edu/sites/NIHKR/KR/Early%20Stopping%20in%20Pragmatic%20Clinical%20Trials%20-%20Workshop%20Summary.pdf

22. Interpretation of Clinical Trials That Stopped Early, https://gin.i-med.ac.at/download/public/LV%20Ulmer/JAMA%20Statistics/Interpretation%20of%20Clinical%20Trials%20that%20stopped%20eraly/Statistics%20jgm160002.pdf

23. Optimal scheduling of interim analyses in group sequential trials - arXiv, https://arxiv.org/html/2509.05537v1

24. What to do today (26 January 2023)? - Part I. Introduction Part II. Epidemiologic Concepts and Designs Part III. Clinical Trials - Simon Fraser University, http://www.sfu.ca/~joanh/stat854/week04Ab.pdf

25. Comparison of four sequential methods allowing for early stopping of comparative clinical trials - PubMed, https://pubmed.ncbi.nlm.nih.gov/10781388/

26. Seeing What's Missing: Survivorship Bias in Data Science - Kasadara, https://kasadara.com/blogs/seeing-whats-missing-survivorship-bias-in-data-science/

27. Common Types of Data Bias (With Examples) - Pragmatic Institute, https://www.pragmaticinstitute.com/resources/articles/data/5-common-bias-affecting-your-data-analysis/

28. Sampling bias in machine learning: [2024 update] - UBIAI tool, https://ubiai.tools/sampling-bias-in-machine-learning-fresh-update/

29. Sample Selection Bias in Machine Learning for Healthcare - arXiv, https://arxiv.org/html/2405.07841v2

30. The Hidden Data Killing Your KPIs: Beating Survivorship Bias in Business - MyOutDesk, https://www.myoutdesk.com/blog/survivorship-bias-in-business/

Araştırmacı Yazar Burak ÖZCAN
Research Author Burak ÖZCAN
All Articles

  • 19.01.2026
  • Time : 5 min
  • 860 Read

Google Ads