Recursive Cortical Networks: data efficient computer vision

Written by: Stephen Hsu

Primary Source:  Information Processing

Will knowledge from neuroscience inform the design of better AIs (neural nets)? These results from startup Vicarious AI suggest that the answer is yes! (See also this company blog post describing the research.)

It has often been remarked that evolved biological systems (e.g., a baby) can learn much faster and using much less data than existing artificial neural nets. Significant improvements in AI are almost certainly within reach…

Thanks to reader and former UO Physics colleague Raghuveer Parthasarathy for a pointer to this paper!

A generative vision model that trains with high data efficiency and breaks text-based CAPTCHAs

Science 08 Dec 2017: Vol. 358, Issue 6368, eaag2612
DOI: 10.1126/science.aag2612

Compositionality, generalization, and learning from a few examples are among the hallmarks of human intelligence. CAPTCHAs (Completely Automated Public Turing test to tell Computers and Humans Apart), images used by websites to block automated interactions, are examples of problems that are easy for people but difficult for computers. CAPTCHAs add clutter and crowd letters together to create a chicken-and-egg problem for algorithmic classifiers—the classifiers work well for characters that have been segmented out, but segmenting requires an understanding of the characters, which may be rendered in a combinatorial number of ways. CAPTCHAs also demonstrate human data efficiency: A recent deep-learning approach for parsing one specific CAPTCHA style required millions of labeled examples, whereas humans solve new styles without explicit training.

By drawing inspiration from systems neuroscience, we introduce recursive cortical network (RCN), a probabilistic generative model for vision in which message-passing–based inference handles recognition, segmentation, and reasoning in a unified manner. RCN learns with very little training data and fundamentally breaks the defense of modern text-based CAPTCHAs by generatively segmenting characters. In addition, RCN outperforms deep neural networks on a variety of benchmarks while being orders of magnitude more data-efficient.

Modern deep neural networks resemble the feed-forward hierarchy of simple and complex cells in the neocortex. Neuroscience has postulated computational roles for lateral and feedback connections, segregated contour and surface representations, and border-ownership coding observed in the visual cortex, yet these features are not commonly used by deep neural nets. We hypothesized that systematically incorporating these findings into a new model could lead to higher data efficiency and generalization. Structured probabilistic models provide a natural framework for incorporating prior knowledge, and belief propagation (BP) is an inference algorithm that can match the cortical computational speed. The representational choices in RCN were determined by investigating the computational underpinnings of neuroscience data under the constraint that accurate inference should be possible using BP.

RCN was effective in breaking a wide variety of CAPTCHAs with very little training data and without using CAPTCHA-specific heuristics. By comparison, a convolutional neural network required a 50,000-fold larger training set and was less robust to perturbations to the input. Similar results are shown on one- and few-shot MNIST (modified National Institute of Standards and Technology handwritten digit data set) classification, where RCN was significantly more robust to clutter introduced during testing. As a generative model, RCN outperformed neural network models when tested on noisy and cluttered examples and generated realistic samples from one-shot training of handwritten characters. RCN also proved to be effective at an occlusion reasoning task that required identifying the precise relationships between characters at multiple points of overlap. On a standard benchmark for parsing text in natural scenes, RCN outperformed state-of-the-art deep-learning methods while requiring 300-fold less training data.

Our work demonstrates that structured probabilistic models that incorporate inductive biases from neuroscience can lead to robust, generalizable machine learning models that learn with high data efficiency. In addition, our model’s effectiveness in breaking text-based CAPTCHAs with very little training data suggests that websites should seek more robust mechanisms for detecting automated interactions.

The following two tabs change content below.
Stephen Hsu
Stephen Hsu is vice president for Research and Graduate Studies at Michigan State University. He also serves as scientific adviser to BGI (formerly Beijing Genomics Institute) and as a member of its Cognitive Genomics Lab. Hsu’s primary work has been in applications of quantum field theory, particularly to problems in quantum chromodynamics, dark energy, black holes, entropy bounds, and particle physics beyond the standard model. He has also made contributions to genomics and bioinformatics, the theory of modern finance, and in encryption and information security. Founder of two Silicon Valley companies—SafeWeb, a pioneer in SSL VPN (Secure Sockets Layer Virtual Private Networks) appliances, which was acquired by Symantec in 2003, and Robot Genius Inc., which developed anti-malware technologies—Hsu has given invited research seminars and colloquia at leading research universities and laboratories around the world.
Stephen Hsu

Latest posts by Stephen Hsu (see all)