Figure 3. SK Telecom diagram - Latency vs number of
SK Telecom is Korea’s largest telecommunications company.
With more than 29 million mobile subscribers, around 50
percent of the total market, the company is currently developing
a portfolio of AI-based services including NUGU; the first digital
home assistant to work in Korean language. NUGU already provides
music and smart-home support, information on demand,
smartphone-location tracking, diary assistance, and more
features are planned for the future including open APIs to let
third-party developers enter the ecosystem.
NUGU incorporates SK Telecom’s expertise in artificial intelligence,
speech recognition, and natural language processing.
Capable of understanding voice tones, accents, and dialects, it
achieves a very high voice-recognition rate. With SK Telecom’s
natural language processing engine at its heart, it can interpret
the user’s wishes accurately and interact by voice.
“In addition to ultra-reliable recognition, we knew that
delivering the best possible user experience would depend on
instant response to any query at any time,” says Lee Kang-Won,
senior vice president and head of Software Labs at SK Telecom.
“To achieve this, we set out to build AIX, our AI Inference Accelerator.
It contains multiple custom NPUs that we optimised for
low-latency voice recognition and implemented on Xilinx FPGAs
for flexibility and a fast time to market.”
Inference describes the function of the neural network after
having been trained and deployed. Deploying trained models
for inference has become one of the most important challenges
to the commercialization of AI: whereas the tools for training
neural networks are more affordable and easier to use than ever,
industry experts say the cost of deploying models for inference
is now the largest contributor to infrastructure cost over time.
As far as performance is concerned, extremely low latency
is critical for voice-based services that interact directly with human
end-users, as SK Telecom’s Lee Kang-Won has observed.
Consumers expect a natural and seamless experience, which
calls for near real-time inference. Techniques for achieving this
are still developing, as more and more operators set out on
the AI-deployment journey. In contrast, much is already known
about training neural networks. For this, large GPU arrays have
become the platform of choice to handle the many exabits
of data and teraflops of compute, but training is done offline
and can be completed over days or weeks. When it comes to
deployment, the application must deliver the expected performance
within stringent latency and power consumption requirements.
Xilinx has shown that FPGA accelerators deliver the realtime
inferencing response needed for speech recognition and
natural language interaction at much lower power consumption
than is possible using GPUs. On the other hand, although an
ASIC-based inferencing engine could combine low latency with
low power consumption, FPGA accelerators give the added advantage
of reconfigurability to adopt the latest machine-learning
technologies quickly as they continue to evolve.
The team at SK Telecom based its AIX on Xilinx KCU1500
datacentre accelerator cards containing Kintex UltraScale
XCKU115 FPGAs. The AIX contains a large array of neural cores
implemented in the DSP slices of Kintex FPGAs to run the
automatic speech-recognition (ASR) application at the heart of
The neural array and associated functions including weight
feeder, tensor cache, and tensor controller (figure 1) create a
high-performance Neural Processing Unit (NPU) that effectively
contains tens of thousands of accelerators for inference: ultimately
providing much greater parallelism than is possible with
a GPU. By applying static and dynamic computation optimization,
with pruning, quantization, and dynamic precision, SK
Telecom’s engineers have ensured that over 95% of the FPGAs’
DSP cores are active on every cycle.
SK Telecom populated its existing CPU-only ASR servers
with the KCU1500 PCIe Gen 3 x16 accelerator cards. The
teams’ own figures point to a 500% performance improvement
when running multiple concurrent voice channels compared
to experience with GPU-based accelerators (figures 2 and 3).
Moreover, consuming less than one-third of the power translates
into a 16-fold improvement in performance per Watt.
In addition, the reprogrammable nature of the Kintex Ultra-
Scale FPGAs gives SK Telecom the flexibility to adopt new and
improved neural network architectures in the future, while at the
same time delivering the solution within a tight timeframe.
Following successful introduction of the AIX cards, SK Telecom’s
project represents the first commercial adoption of FPGA
accelerators in the AI domain for large-scale data centers in
South Korea. The adaptive nature of the Kintex UltraScale FPGAs
allow the team to continue developing new and improved
custom hardware accelerators, keeping pace with the state of
the art in AI and deep learning.
www.eenewsembedded.com September 2019 Embedded 5 News eeNews Europe