In 1980, W. Edwards Deming built a fake factory with real volunteers and a bowl of colored beads. The results were random. The management reactions were not.
scroll to begin
A bowl contains 4,000 beads — 3,200 white and 800 red. White beads are acceptable product. Red beads are defects.
The defect rate is built into the incoming material: exactly 20%.
Each worker's tool is a paddle with 50 holes arranged in a grid. To produce a batch, the worker dips the paddle into the bowl and draws out 50 beads at random.
The number of red beads in the paddle is the worker's defect count.
The first worker dips the paddle. Some holes catch red beads, others catch white. The worker has no control over which.
The inspector counts the red beads and records the number.
Six workers each take a turn. The results vary — some workers produce 7 defects, others produce 14. Management takes note.
The worker with the fewest defects receives public praise. The worker with the most is counseled on the need for improvement.
Management is certain: the difference reflects effort, skill, or attitude.
The experiment runs for four days. Rankings shift. Yesterday's top performer becomes today's worst. Management adjusts evaluations, issues new warnings, recalibrates praise.
Management compiles the averages and publishes the rankings. The gap between first and last place feels meaningful. Consequences are assigned.
Every result falls within the range predicted by the defect rate of the incoming material. The workers had no influence whatsoever on the outcome.
The variation is the system's. The rankings are fiction.
If the process determines the outcome, then blaming individuals is not just unfair — it is statistically illiterate.
The only way to reduce defects is to change the system: improve the incoming material, redesign the process. No amount of slogans, incentives, or firings will move the needle.
Deming ran this experiment at seminars for decades, always with the same result. Managers in the audience would react exactly as the fictional managers did — they would look at the numbers and see talent, effort, commitment.
They would miss that they were watching a random number generator.
Every worker's cumulative defect rate converges toward the same value: 20%. There is no skill to separate.
20 independent simulations, 100 scoops each. Paddle size: 50. System defect rate: 20%.
With 50 beads drawn from a bowl that is 20% red, the expected number of red beads per scoop is 10. The standard deviation is about 2.8. Control limits at three standard deviations from the mean run from roughly 2 to 18.
Any result within this range is explained entirely by the system — no need to invoke individual skill or laziness.
10,000 simulated scoops. The distribution is binomial, centered on 10, with nearly all outcomes between 3 and 18.
Green curve shows the theoretical Binomial(50, 0.20) distribution.
Deming's target was the practice of ranking — performance reviews, school league tables, hospital mortality ratings, sales leaderboards. Any system that ranks individuals without first asking whether the variation falls within the system's natural range.
As he put it: "A manager of people needs to understand that all people are different. This is not ranking people. He needs to understand that the performance of anyone is governed largely by the system that he works in."
Run the experiment with different parameters. Fire the worst worker and watch their replacement perform identically. The system doesn't care who's holding the paddle.