How a Single Neuron Learns to See // Polya's Urn

THE NEURON

Epoch

Errors

Accuracy

The McCulloch-Pitts Neuron

In 1943, McCulloch and Pitts showed that a simple unit could implement logic. It receives binary inputs, sums them, and fires if the sum exceeds a threshold.

But the weights are fixed. The neuron computes, but it cannot learn.

Add Learnable Weights

Rosenblatt's insight: make the weights adjustable. Each input is multiplied by a weight, the products are summed, and a threshold determines the output.

Now the question becomes: which weights produce the right answers?

The Training Data

Give the perceptron labeled examples: points that belong to class 1 and points that belong to class 0. The perceptron must find a line that separates them.

This is linear classification — the oldest problem in machine learning.

The Learning Rule

When the perceptron makes a mistake, adjust the weights: w ← w + α · (y − y') · x. If it predicted 0 but the answer is 1, nudge the boundary toward the misclassified point.

Weights change only when errors occur. Correct predictions leave them untouched.

Convergence

Watch the epochs tick. Each pass through the data moves the boundary closer. Novikoff proved in 1962: if the data is linearly separable, the perceptron converges in at most R²/δ² updates.

The wider the margin, the faster it learns.

Perfect Classification

Zero errors. The perceptron has found a hyperplane that separates the two classes. No human told it the equation — it discovered the boundary through iterative weight adjustment.

This is what Rosenblatt demonstrated in 1958. A machine that learns by doing.

The XOR Problem

Now try a problem that isn't linearly separable. XOR: (0,0)→0, (0,1)→1, (1,0)→1, (1,1)→0. The positive points sit on opposite corners of a square.

No single line can separate them. The perceptron oscillates forever, never converging. Minsky and Papert proved this in 1969.

The Hidden Layer Solution

Add a hidden layer with two neurons. One computes OR, the other NAND. The output neuron computes AND. The hidden layer transforms the input space into a representation where XOR becomes linearly separable.

This is what backpropagation (1986) made trainable. The perceptron's principle, extended to depth.

The Weight of History

Frank Rosenblatt and Marvin Minsky attended the same high school — the Bronx High School of Science, one year apart. They spent the next two decades on opposite sides of AI's deepest divide: can machines learn from examples, or must intelligence be programmed by hand?

Rosenblatt drowned on his 43rd birthday, July 11, 1971, before seeing the vindication of his ideas. Minsky lived until 2016, long enough to see neural networks dominate the field he once declared them unfit to enter.

How a Single Neuron Learns to See

The McCulloch-Pitts Neuron

Add Learnable Weights

The Training Data

The Learning Rule

Convergence

Perfect Classification

The XOR Problem

The Hidden Layer Solution

The Weight of History

From Neuron to Transformer

The Conceptron

The Convergence Bound

400 Photocells, Billions of Parameters

Train Your Own Perceptron