How Random Beads Expose Bad Management

In 1980, W. Edwards Deming built a fake factory with real volunteers and a bowl of colored beads. The results were random. The management reactions were not.

scroll to begin

The Bowl

800

Red (defects)

4,000

Total

3,200

White (good)

The raw material

A bowl contains 4,000 beads — 3,200 white and 800 red. White beads are acceptable product. Red beads are defects.

The defect rate is built into the incoming material: exactly 20%.

The paddle

Each worker's tool is a paddle with 50 holes arranged in a grid. To produce a batch, the worker dips the paddle into the bowl and draws out 50 beads at random.

The number of red beads in the paddle is the worker's defect count.

A single scoop

The first worker dips the paddle. Some holes catch red beads, others catch white. The worker has no control over which.

The inspector counts the red beads and records the number.

Day one

Six workers each take a turn. The results vary — some workers produce 7 defects, others produce 14. Management takes note.

Management responds

The worker with the fewest defects receives public praise. The worker with the most is counseled on the need for improvement.

Management is certain: the difference reflects effort, skill, or attitude.

Four days complete

The experiment runs for four days. Rankings shift. Yesterday's top performer becomes today's worst. Management adjusts evaluations, issues new warnings, recalibrates praise.

The final ranking

Management compiles the averages and publishes the rankings. The gap between first and last place feels meaningful. Consequences are assigned.

The statistical truth

Every result falls within the range predicted by the defect rate of the incoming material. The workers had no influence whatsoever on the outcome.

The variation is the system's. The rankings are fiction.

Deming's point

If the process determines the outcome, then blaming individuals is not just unfair — it is statistically illiterate.

The only way to reduce defects is to change the system: improve the incoming material, redesign the process. No amount of slogans, incentives, or firings will move the needle.

A demonstration, not a factory

Deming ran this experiment at seminars for decades, always with the same result. Managers in the audience would react exactly as the fictional managers did — they would look at the numbers and see talent, effort, commitment.

They would miss that they were watching a random number generator.

Twenty workers, one hundred scoops each

Every worker's cumulative defect rate converges toward the same value: 20%. There is no skill to separate.

20 independent simulations, 100 scoops each. Paddle size: 50. System defect rate: 20%.

The mathematics are simple

With 50 beads drawn from a bowl that is 20% red, the expected number of red beads per scoop is 10. The standard deviation is about 2.8. Control limits at three standard deviations from the mean run from roughly 2 to 18.

Any result within this range is explained entirely by the system — no need to invoke individual skill or laziness.

Distribution of defects per scoop

10,000 simulated scoops. The distribution is binomial, centered on 10, with nearly all outcomes between 3 and 18.

Green curve shows the theoretical Binomial(50, 0.20) distribution.

Where this applies

Deming's target was the practice of ranking — performance reviews, school league tables, hospital mortality ratings, sales leaderboards. Any system that ranks individuals without first asking whether the variation falls within the system's natural range.

As he put it: "A manager of people needs to understand that all people are different. This is not ranking people. He needs to understand that the performance of anyone is governed largely by the system that he works in."

Try it yourself

Run the experiment with different parameters. Fire the worst worker and watch their replacement perform identically. The system doesn't care who's holding the paddle.

Defect rate 20% Paddle size

Scoreboard

Control chart