NVIDIA at the GTC in China announced Tesla P4 and P40 GPU accelerators that offer massive leaps in efficiency and speed to accelerate inferencing production workloads for artificial intelligence services.
The NVIDIATesla P4 is designed to meet the density and power efficiency requirements of modern data centers, while the NVIDIA Tesla P40 accelerator is engineered to deliver the highest throughput for scale-up servers, where performance matters most.
“Tesla P4 provides high performance on a tight power budget, and it is small enough to fit in any server (PCIe half-height, half-length). Tesla P4 is 40x more efficient in terms of AlexNet images per second per watt than an Intel Xeon E5 CPU, and 8x more efficient than an Arria 10-115 FPGA,” said NVIDIA.
Tesla P4’s Pascal GP104 GPU provides high floating point throughput and efficiency and features optimized instructions aimed at deep learning inference computations. Specifically, the new IDP2A and IDP4A instructions provide 8-bit integer (INT8) 2- and 4-element vector dot product computations with 32-bit integer accumulation.
Tesla P40 has 3840 CUDA cores with a peak FP32 throughput of 12 TeraFLOP/s, and like it’s little brother P4, P40 also accelerates INT8 vector dot products (IDP2A/IDP4A instructions), with a peak throughput of 47.0 INT8 TOP/s.
Tesla P40 provides speedups on deep learning inference performance of up to 4x compared to the previous generation M40. NVIDIA also introduced two software innovations to accelerate AI inferencing: NVIDIA TensorRT and the NVIDIA DeepStream SDK.
Tesla Accelerator | Tesla M4 | Tesla P4 | Tesla M40 | Tesla P40 |
GPU | Maxwell GM206 | Pascal GP104 | Maxwell GM200 | Pascal GP102 |
SMs | 8 | 20 | 24 | 30 |
FP32 CUDA Cores / SM | 128 | 128 | 128 | 128 |
FP32 CUDA Cores / GPU | 1024 | 2560 | 3072 | 3840 |
Base Clock | 872 MHz | 810 MHz | 948 MHz | 1303 MHz |
GPU Boost Clock | 1072 MHz | 1063 MHz | 1114 MHz | 1531 MHz |
Memory Interface | 128-bit GDDR5 | 256-bit GDDR5 | 384-bit GDDR5 | 384-bit GDDR5 |
Memory Bandwidth | 88 GB/s | 192 GB/s | 288 GB/s | 346 GB/s |
Memory Size | 4 GB | 8 GB | 12/24 GB | 24 GB |
TDP | 50/75 W | 75 W (50W option) | 250 W | 250 W |
Transistors | 2.9 billion | 7.2 billion | 8 billion | 12 billion |
GPU Die Size | 227 mm² | 314 mm² | 601 mm² | 471 mm² |
Manufacturing Process | 28-nm | 16-nm | 28-nm | 16-nm |
The NVIDIA Tesla P4 and P40 would be available in November and October, respectively. Pricing would be revealed when it goes on sale.