NVIDIA Tesla P4 and P40 GPUs for deep learning inferencing announced


nvidia-tesla-p40

NVIDIA at the GTC in China announced Tesla P4 and P40 GPU accelerators that offer massive leaps in efficiency and speed to accelerate inferencing production workloads for artificial intelligence services.

nvidia-ceo-jen-hsun-huang-tesla-p4-and-p40

The NVIDIATesla P4 is designed to meet the density and power efficiency requirements of modern data centers, while the NVIDIA Tesla P40 accelerator is engineered to deliver the highest throughput for scale-up servers, where performance matters most.

“Tesla P4 provides high performance on a tight power budget, and it is small enough to fit in any server (PCIe half-height, half-length). Tesla P4 is 40x more efficient in terms of AlexNet images per second per watt than an Intel Xeon E5 CPU, and 8x more efficient than an Arria 10-115 FPGA,” said NVIDIA.

Tesla P4’s Pascal GP104 GPU provides high floating point throughput and efficiency and features optimized instructions aimed at deep learning inference computations. Specifically, the new IDP2A and IDP4A instructions provide 8-bit integer (INT8) 2- and 4-element vector dot product computations with 32-bit integer accumulation.

Tesla P40 has 3840 CUDA cores with a peak FP32 throughput of 12 TeraFLOP/s, and like it’s little brother P4, P40 also accelerates INT8 vector dot products (IDP2A/IDP4A instructions), with a peak throughput of 47.0 INT8 TOP/s.

Tesla P40 provides speedups on deep learning inference performance of up to 4x compared to the previous generation M40. NVIDIA also introduced two software innovations to accelerate AI inferencing: NVIDIA TensorRT and the NVIDIA DeepStream SDK.

Tesla Accelerator Tesla M4 Tesla P4 Tesla M40 Tesla P40
GPU Maxwell GM206 Pascal GP104 Maxwell GM200 Pascal GP102
SMs 8 20 24 30
FP32 CUDA Cores / SM 128 128 128 128
FP32 CUDA Cores / GPU 1024 2560 3072 3840
Base Clock 872 MHz 810 MHz 948 MHz 1303 MHz
GPU Boost Clock 1072 MHz 1063 MHz 1114 MHz 1531 MHz
Memory Interface 128-bit GDDR5 256-bit GDDR5 384-bit GDDR5 384-bit GDDR5
Memory Bandwidth 88 GB/s 192 GB/s 288 GB/s 346 GB/s
Memory Size 4 GB 8 GB 12/24 GB 24 GB
TDP 50/75 W 75 W (50W option) 250 W 250 W
Transistors 2.9 billion 7.2 billion 8 billion 12 billion
GPU Die Size 227 mm² 314 mm² 601 mm² 471 mm²
Manufacturing Process 28-nm 16-nm 28-nm 16-nm

The NVIDIA Tesla P4 and P40 would be available in November and October, respectively. Pricing would be revealed when it goes on sale.


Author: Srivatsan Sridhar

Srivatsan Sridhar is a Mobile Technology Enthusiast who is passionate about Mobile phones and Mobile apps. He uses the phones he reviews as his main phone. You can follow him on Twitter and Instagram