Shared Publication

Figure 3. SK Telecom diagram - Latency vs number of channels. SK Telecom is Korea’s largest telecommunications company. With more than 29 million mobile subscribers, around 50 percent of the total market, the company is currently developing a portfolio of AI-based services including NUGU; the first digital home assistant to work in Korean language. NUGU already provides music and smart-home support, information on demand, smartphone-location tracking, diary assistance, and more features are planned for the future including open APIs to let third-party developers enter the ecosystem. NUGU incorporates SK Telecom’s expertise in artificial intelligence, speech recognition, and natural language processing. Capable of understanding voice tones, accents, and dialects, it achieves a very high voice-recognition rate. With SK Telecom’s natural language processing engine at its heart, it can interpret the user’s wishes accurately and interact by voice. “In addition to ultra-reliable recognition, we knew that delivering the best possible user experience would depend on instant response to any query at any time,” says Lee Kang-Won, senior vice president and head of Software Labs at SK Telecom. “To achieve this, we set out to build AIX, our AI Inference Accelerator. It contains multiple custom NPUs that we optimised for low-latency voice recognition and implemented on Xilinx FPGAs for flexibility and a fast time to market.” Inference describes the function of the neural network after having been trained and deployed. Deploying trained models for inference has become one of the most important challenges to the commercialization of AI: whereas the tools for training neural networks are more affordable and easier to use than ever, industry experts say the cost of deploying models for inference is now the largest contributor to infrastructure cost over time. As far as performance is concerned, extremely low latency is critical for voice-based services that interact directly with human end-users, as SK Telecom’s Lee Kang-Won has observed. Consumers expect a natural and seamless experience, which calls for near real-time inference. Techniques for achieving this are still developing, as more and more operators set out on the AI-deployment journey. In contrast, much is already known about training neural networks. For this, large GPU arrays have become the platform of choice to handle the many exabits of data and teraflops of compute, but training is done offline and can be completed over days or weeks. When it comes to deployment, the application must deliver the expected performance within stringent latency and power consumption requirements. Xilinx has shown that FPGA accelerators deliver the realtime inferencing response needed for speech recognition and natural language interaction at much lower power consumption than is possible using GPUs. On the other hand, although an ASIC-based inferencing engine could combine low latency with low power consumption, FPGA accelerators give the added advantage of reconfigurability to adopt the latest machine-learning technologies quickly as they continue to evolve. The team at SK Telecom based its AIX on Xilinx KCU1500 datacentre accelerator cards containing Kintex UltraScale XCKU115 FPGAs. The AIX contains a large array of neural cores implemented in the DSP slices of Kintex FPGAs to run the automatic speech-recognition (ASR) application at the heart of NUGU. The neural array and associated functions including weight feeder, tensor cache, and tensor controller (figure 1) create a high-performance Neural Processing Unit (NPU) that effectively contains tens of thousands of accelerators for inference: ultimately providing much greater parallelism than is possible with a GPU. By applying static and dynamic computation optimization, with pruning, quantization, and dynamic precision, SK Telecom’s engineers have ensured that over 95% of the FPGAs’ DSP cores are active on every cycle. SK Telecom populated its existing CPU-only ASR servers with the KCU1500 PCIe Gen 3 x16 accelerator cards. The teams’ own figures point to a 500% performance improvement when running multiple concurrent voice channels compared to experience with GPU-based accelerators (figures 2 and 3). Moreover, consuming less than one-third of the power translates into a 16-fold improvement in performance per Watt. In addition, the reprogrammable nature of the Kintex Ultra- Scale FPGAs gives SK Telecom the flexibility to adopt new and improved neural network architectures in the future, while at the same time delivering the solution within a tight timeframe. Conclusion Following successful introduction of the AIX cards, SK Telecom’s project represents the first commercial adoption of FPGA accelerators in the AI domain for large-scale data centers in South Korea. The adaptive nature of the Kintex UltraScale FPGAs allow the team to continue developing new and improved custom hardware accelerators, keeping pace with the state of the art in AI and deep learning. www.xilinx.com www.eenewsembedded.com September 2019 Embedded 5 News eeNews Europe / /www.xilinx.com /www.eenewsembedded.com