Written by: Stephen Hsu
Primary Source: Information Processing
The recent flourishing of deep neural nets is not primarily due to theoretical advances, but rather the appearance of GPUs and large training data sets.
Chronicle: … Hinton has always bucked authority, so it might not be surprising that, in the early 1980s, he found a home as a postdoc in California, under the guidance of two psychologists, David E. Rumelhart and James L. McClelland, at the University of California at San Diego. “In California,” Hinton says, “they had the view that there could be more than one idea that was interesting.” Hinton, in turn, gave them a uniquely computational mind. “We thought Geoff was remarkably insightful,” McClelland says. “He would say things that would open vast new worlds.”
They held weekly meetings in a snug conference room, coffee percolating at the back, to find a way of training their error correction back through multiple layers. Francis Crick, who co-discovered DNA’s structure, heard about their work and insisted on attending, his tall frame dominating the room even as he sat on a low-slung couch. “I thought of him like the fish in The Cat in the Hat,” McClelland says, lecturing them about whether their ideas were biologically plausible.
The group was too hung up on biology, Hinton said. So what if neurons couldn’t send signals backward? They couldn’t slavishly recreate the brain. This was a math problem, he said, what’s known as getting the gradient of a loss function. They realized that their neurons couldn’t be on-off switches. If you picture the calculus of the network like a desert landscape, their neurons were like drops off a sheer cliff; traffic went only one way. If they treated them like a more gentle mesa—a sigmoidal function—then the neurons would still mostly act as a threshold, but information could climb back up.
A decade ago, Hinton, LeCun, and Bengio conspired to bring them back. Neural nets had a particular advantage compared with their peers: While they could be trained to recognize new objects—supervised learning, as it’s called—they should also be able to identify patterns on their own, much like a child, if left alone, would figure out the difference between a sphere and a cube before its parent says, “This is a cube.” If they could get unsupervised learning to work, the researchers thought, everyone would come back. By 2006, Hinton had a paper out on “deep belief networks,” which could run many layers deep and learn rudimentary features on their own, improved by training only near the end. They started calling these artificial neural networks by a new name: “deep learning.” The rebrand was on.
Before they won over the world, however, the world came back to them. That same year, a different type of computer chip, the graphics processing unit, became more powerful, and Hinton’s students found it to be perfect for the punishing demands of deep learning. Neural nets got 30 times faster overnight. Google and Facebook began to pile up hoards of data about their users, and it became easier to run programs across a huge web of computers. One of Hinton’s students interned at Google and imported Hinton’s speech recognition into its system. It was an instant success, outperforming voice-recognition algorithms that had been tweaked for decades. Google began moving all its Android phones over to Hinton’s software.
It was a stunning result. These neural nets were little different from what existed in the 1980s. This was simple supervised learning. It didn’t even require Hinton’s 2006 breakthrough. It just turned out that no other algorithm scaled up like these nets. “Retrospectively, it was a just a question of the amount of data and the amount of computations,” Hinton says. …