N E U R A L C O M P U T I N G
THE DEVIL IS IN THE DETAIL OF DEEP LEARNING HARDWARE BY JAMES MORRA
To identify skin cancer, perceive human
speech, and run other deep learning tasks,
chipmakers are editing processors to work
with lower precision numbers. These numbers
contain fewer bits than those with higher
precision, which require “heavier lifting” from
computers.
Intel’s Nervana unit plans to release a special
processor before the end of the year (2017)
that trains neural networks faster than other
architectures. But in addition to improving
memory and interconnects, Intel created a new
way of formatting numbers for lower precision
maths. The numbers weigh fewer bits so the
hardware can use less silicon, less computing
power, and less electricity.
Intel’s numerology is an example of the dull and
yet strangely elegant ways that chip companies
are coming to grips with deep learning. It is
still unclear whether ASICs, FPGAs, CPUs,
GPUs, or other chips will be best at handling
calculations like the human brain does. But
every chip appears to be using lower precision
math to get the job done.
Still, companies pay a surcharge for using
numbers with less detail. “You are giving up
something, but the question is whether it’s
significant or not,” said Paulius Micikevicius,
principal engineer in Nvidia’s computer
architecture and deep learning research group.
“At some point you start losing accuracy, and
people start playing games to recover it.”
Shedding with precision is nothing new, he
said. For over five years, oil and gas companies
have stored drilling and geological data in halfprecision
numbers – 16-bit floating point – and
run calculations with single-precision – 32-bit
floating point – on Nvidia’s graphics chips,
which are the current gold standard for training
and running deep learning.
In recent years, Nvidia has edited its graphic
chips to reduce computing power wasted
in training deep learning programs. Its older
Pascal architecture performs 16-bit maths
twice as efficiently as 32-bit operations. Its
latest Volta architecture runs 16-bit operations
inside custom tensor cores, which speedily move
data through the layers of a neural network.
Intel’s new format, FlexPoint, maximizes the
precision that can be stored in 16-bits. It can
represent a slightly wider range of numbers
than traditional fixed-point formats, which can
be handled with less computing power and
memory. But it seems to provide less flexibility
than floating-point numbers commonly used
with neural networks.
Different parts of deep learning need
different levels of precision. Training entails
going through, for example, thousands of
photographs without explicit programming.
An algorithm automatically adjusts millions of
connections between the layers of the neural
36 DesignNews NOVEMBER 2017 www.eedesignnewseurope.com