Brains Don’t Do Matrix Multiplications

At Google I/O developer conference the company revealed the third generation of TPU (Tensor Processing Unit), a high performance device that helps AI developers train and run massive neural networks on even more massive data sets.

Source: CNBC

I really appreciate the impact that these technologies have on the progress of Machine Learning and all breakthrough results in the recent years in computer vision, text and speech processing are unthinkable without the support of GPU and TPU processing. I personally experienced first hand the massive time saving offered by GPU processing when completing a project that involved a relatively large Full Convolutional Network (FCN) for semantic segmentation of streamed images for which a training that might have taken a whole day on my Quad Core MacBook Pro was finished in  about two hours on a p2 server on AWS using one NVIDIA K80 GPU.

The problem I have with the current approach is that everything seems to be channeled in delivering faster and bigger number crunching tools. This stems from the fact that all progress in Machine Learning is driven by mathematical models that ultimately reduce to performing massive matrix operations. Open any new book on Machine Learning or read any new paper describing a new model and everything is full of mathematical formulae, derivatives and probability distributions. But, – and I want to stress this, so I will put it on a separate line:

Brains don’t do matrix multiplications

The mathematical representations that we produce are a remarkable feat and a great way to understand natural phenomena, but they are not the phenomena themselves. A ball that is kicked and flies through the air does not compute an integration of acceleration to determine the position in space, we do that when we want to explain and maybe predict the trajectory. But when we are on the other side of the ball and we need to hit it we use a far better tool: our brain. Because our brains have evolved to represent the world around us and follow, like a glove, the structure of reality into the structure of our cortex. We literally feel the trajectory.

So, what’s wrong with the current “forceful” approach to number crunching and math based models for Machine Learning? They are utterly unsustainable and unscalable for building large scale Artificial Intelligence. The human brain, although by far the most power hungry organ on a per mass basis in the body, needs about 20Wh or as much as a 20W light bulb to function. The boards produced by NVIDIA and Google consume somewhere above 200W (frankly the specifications for the TPU boards are not public, but look at those water cooling pipes that are coming out of the chips in the new versions to have an idea about how badly they fair from an efficiency perspective compared with a brain). And what you can run on one of those boards, no matter how remarkable looks from the perspective of the ML specialist, pale in comparison to the feats of brain.

Think about all the hardware that sits in a self-driving car that helps it identify obstacles and navigate in 2D . Now compare that with the brain of a fly that does the same thing – but in 3D and I hope you got my point. Think about how much energy this fly needs to go through the day compared with the energy sucked by the car “brains”. From an evolutionary perspective these cars (if I can refer to them as autonomous, maybe “intelligent” entities) will either need to evolve or will be extinct.

We now live in a short period of luxury in our history when we can ignore the massive inefficiencies in the technologies we create, because we’re benefitting from million of years of energy accumulation in fossil fuel. What will happen when those will be exhausted?

There is also the practical implication of these devices: due to their bulky nature and high power needs there is very difficult to use them in actual robots. NVIDIA Jetson is a timid approach in this direction, but it trades computational power with the portability and power consumption, so it’s not really a paradigm change,

What we need are completely different tools that break away from the currently established processing paradigms – including the Von Neumann architecture – and learn from the millions of years of trial and error that nature put into the development of brains. It’s very disappointing to see exciting projects like IBM’s Synapse are fading slowly in anonymity despite their huge potential (what is going on with IBM?). Next month I will be at the Nengo Summer School at the Waterloo University in Canada and we’ll have the chance to get for the first time access to Braindrop a new mixed analog-digital neuromorphic chip developed in collaboration with Stanford and Yale. We are all excited about the brand new opportunities this type of chips can bring in mobile robotics and Artificial Intelligence in general.

This is the direction that we need to pursue if we really want to have AI that will not end up competing directly with us humans for resources. It’s all up to us what kind of AI we’re building.