NVIDIA is expanding its Pascal™ architecture-based deep learning platform with the introduction of new Tesla P4 and P40 GPU accelerators and new software. The solution is aimed at accelerating inferencing production workloads for artificial intelligence services, such as voice-activated assistance, email spam filters, and movie and product recommendation engines.
NVIDIA said its GPU are better at these tasks than current CPU-based technology, which isn't capable of delivering real-time responsiveness. The Tesla P4 and P40 are specifically designed for inferencing, which uses trained deep neural networks to recognize speech, images or text in response to queries from users and devices.
Based on the Pascal architecture, these GPUs feature specialized inference instructions based on 8-bit (INT8) operations, delivering 45x faster response than CPUs and a 4x improvement over GPU solutions launched less than a year ago.
With 47 tera-operations per second (TOPS) of inference performance with INT8 instructions, a server with eight Tesla P40 accelerators can replace the performance of more than 140 CPU servers.5 At approximately $5,000 per CPU server, this results in savings of more than $650,000 in server acquisition cost.
"With the Tesla P100 and now Tesla P4 and P40, NVIDIA offers the only end-to-end deep learning platform for the data center, unlocking the enormous power of AI for a broad range of industries," said Ian Buck, general manager of accelerated computing at NVIDIA. "They slash training time from days to hours. They enable insight to be extracted instantly. And they produce real-time responses for consumers from AI-powered services."