NVIDIA Tesla P4 and P40 GPUs for deep learning inferencing announced

nvidia-tesla-p40

NVIDIA at the GTC in China announced Tesla P4 and P40 GPU accelerators that offer massive leaps in efficiency and speed to accelerate inferencing production workloads for artificial intelligence services.

nvidia-ceo-jen-hsun-huang-tesla-p4-and-p40

The NVIDIATesla P4 is designed to meet the density and power efficiency requirements of modern data centers, while the NVIDIA Tesla P40 accelerator is engineered to deliver the highest throughput for scale-up servers, where performance matters most.

“Tesla P4 provides high performance on a tight power budget, and it is small enough to fit in any server (PCIe half-height, half-length). Tesla P4 is 40x more efficient in terms of AlexNet images per second per watt than an Intel Xeon E5 CPU, and 8x more efficient than an Arria 10-115 FPGA,” said NVIDIA.

Tesla P4’s Pascal GP104 GPU provides high floating point throughput and efficiency and features optimized instructions aimed at deep learning inference computations. Specifically, the new IDP2A and IDP4A instructions provide 8-bit integer (INT8) 2- and 4-element vector dot product computations with 32-bit integer accumulation.

Tesla P40 has 3840 CUDA cores with a peak FP32 throughput of 12 TeraFLOP/s, and like it’s little brother P4, P40 also accelerates INT8 vector dot products (IDP2A/IDP4A instructions), with a peak throughput of 47.0 INT8 TOP/s.

Tesla P40 provides speedups on deep learning inference performance of up to 4x compared to the previous generation M40. NVIDIA also introduced two software innovations to accelerate AI inferencing: NVIDIA TensorRT and the NVIDIA DeepStream SDK.

Tesla Accelerator	Tesla M4	Tesla P4	Tesla M40	Tesla P40
GPU	Maxwell GM206	Pascal GP104	Maxwell GM200	Pascal GP102
SMs	8	20	24	30
FP32 CUDA Cores / SM	128	128	128	128
FP32 CUDA Cores / GPU	1024	2560	3072	3840
Base Clock	872 MHz	810 MHz	948 MHz	1303 MHz
GPU Boost Clock	1072 MHz	1063 MHz	1114 MHz	1531 MHz
Memory Interface	128-bit GDDR5	256-bit GDDR5	384-bit GDDR5	384-bit GDDR5
Memory Bandwidth	88 GB/s	192 GB/s	288 GB/s	346 GB/s
Memory Size	4 GB	8 GB	12/24 GB	24 GB
TDP	50/75 W	75 W (50W option)	250 W	250 W
Transistors	2.9 billion	7.2 billion	8 billion	12 billion
GPU Die Size	227 mm²	314 mm²	601 mm²	471 mm²
Manufacturing Process	28-nm	16-nm	28-nm	16-nm

The NVIDIA Tesla P4 and P40 would be available in November and October, respectively. Pricing would be revealed when it goes on sale.

Author: Srivatsan Sridhar

Srivatsan Sridhar is a Mobile Technology Enthusiast who is passionate about Mobile phones and Mobile apps. He uses the phones he reviews as his main phone. You can follow him on Twitter and Instagram View all posts by Srivatsan Sridhar