How a Single Neuron Learns to See

In 1958, Frank Rosenblatt taught a room-sized machine to classify patterns by adjusting its own weights. The perceptron couldn't compute XOR. It started a war. And it changed everything.

scroll to begin
THE NEURON
0
Epoch
--
Errors
--
Accuracy

The McCulloch-Pitts Neuron

In 1943, McCulloch and Pitts showed that a simple unit could implement logic. It receives binary inputs, sums them, and fires if the sum exceeds a threshold.

But the weights are fixed. The neuron computes, but it cannot learn.

Add Learnable Weights

Rosenblatt's insight: make the weights adjustable. Each input is multiplied by a weight, the products are summed, and a threshold determines the output.

Now the question becomes: which weights produce the right answers?

The Training Data

Give the perceptron labeled examples: points that belong to class 1 and points that belong to class 0. The perceptron must find a line that separates them.

This is linear classification — the oldest problem in machine learning.

The Learning Rule

When the perceptron makes a mistake, adjust the weights: w ← w + α · (y − y') · x. If it predicted 0 but the answer is 1, nudge the boundary toward the misclassified point.

Weights change only when errors occur. Correct predictions leave them untouched.

Convergence

Watch the epochs tick. Each pass through the data moves the boundary closer. Novikoff proved in 1962: if the data is linearly separable, the perceptron converges in at most R²/δ² updates.

The wider the margin, the faster it learns.

Perfect Classification

Zero errors. The perceptron has found a hyperplane that separates the two classes. No human told it the equation — it discovered the boundary through iterative weight adjustment.

This is what Rosenblatt demonstrated in 1958. A machine that learns by doing.

The XOR Problem

Now try a problem that isn't linearly separable. XOR: (0,0)→0, (0,1)→1, (1,0)→1, (1,1)→0. The positive points sit on opposite corners of a square.

No single line can separate them. The perceptron oscillates forever, never converging. Minsky and Papert proved this in 1969.

The Hidden Layer Solution

Add a hidden layer with two neurons. One computes OR, the other NAND. The output neuron computes AND. The hidden layer transforms the input space into a representation where XOR becomes linearly separable.

This is what backpropagation (1986) made trainable. The perceptron's principle, extended to depth.

The Weight of History

Frank Rosenblatt and Marvin Minsky attended the same high school — the Bronx High School of Science, one year apart. They spent the next two decades on opposite sides of AI's deepest divide: can machines learn from examples, or must intelligence be programmed by hand?

Rosenblatt drowned on his 43rd birthday, July 11, 1971, before seeing the vindication of his ideas. Minsky lived until 2016, long enough to see neural networks dominate the field he once declared them unfit to enter.

From Neuron to Transformer

Key milestones in neural network history, from the McCulloch-Pitts neuron (1943) to the transformer architecture (2017). The gap between 1969 and 1986 — the first AI winter — is visible as a void.

Each node represents a foundational paper or breakthrough. Hover to see details.

The Conceptron

If the perceptron perceives — classifying inputs into categories — the Conceptron (Gardini, Cavalli, Decherchi, 2021) conceptualizes. It learns what "normal" looks like for a single class and flags anything outside that concept as anomalous.

Where Rosenblatt drew a boundary between two classes, the Conceptron builds a model of one class from the inside. The arc from 1958 to 2021: from "which side of the line?" to "what does normality mean?"

The Convergence Bound

The perceptron convergence theorem guarantees at most R²/δ² mistakes, where R is the data radius and δ is the margin. Wider margins mean faster learning.

Maximum mistakes as a function of margin width. The inverse-square relationship means doubling the margin cuts mistakes by 4x.

400 Photocells, Billions of Parameters

The Mark I Perceptron had 400 cadmium sulfide photocells, 512 association units, and 8 output neurons. Its weights were stored in potentiometers turned by electric motors. It filled an entire room at the Cornell Aeronautical Laboratory.

GPT-4 has hundreds of billions of parameters. But the principle is the same: a weighted sum, a nonlinear threshold, and the patient adjustment of connections through experience.

Train Your Own Perceptron

Choose a dataset, set the learning rate, and watch the perceptron find (or fail to find) a decision boundary. The XOR preset shows the fundamental limitation that Minsky and Papert proved in 1969.

0.10
Status: Ready
Epoch: 0   Errors: --   Weights: --
Decision Boundary
Error Over Epochs