Microsoft adds DeepSeek R1 to Azure AI Foundry and GitHub

Microsoft on Wednesday introduced DeepSeek R1 to its extensive model catalog on Azure AI Foundry and GitHub, adding to a collection that now exceeds 1,800 models. These models span from frontier and open-source to industry-specific and task-based AI solutions.

DeepSeek R1 Accessibility and Features

DeepSeek R1 is now accessible through Azure AI Foundry, providing a trusted, scalable, and enterprise-ready platform. This setup allows businesses to integrate advanced AI solutions seamlessly while adhering to service level agreements (SLAs), security standards, and responsible AI practices, all backed by Microsoft’s commitment to reliability and innovation.

AI Reasoning Acceleration

Asha Sharma, Corporate Vice President of AI Platform at Microsoft, highlighted the rapid increase in the accessibility of AI reasoning, which is transforming how developers and businesses utilize advanced AI.

She emphasized that DeepSeek R1 offers a cost-effective model for users to leverage state-of-the-art AI capabilities with minimal infrastructure investment.

Developer Tools and Speed

The integration of DeepSeek R1 on Azure AI Foundry accelerates the experimentation, iteration, and integration processes for developers. With tools for model evaluation, developers can compare outputs, benchmark performance, and scale AI applications rapidly.

Commitment to Trustworthy AI Development

Sharma underscored Microsoft’s dedication to safety and security, noting that DeepSeek R1 has undergone rigorous red teaming and safety checks. Azure AI Content Safety provides default content filtering with opt-out options, and the Safety Evaluation System aids in testing applications before they go live, ensuring a secure deployment environment.

Accessing DeepSeek R1

To use DeepSeek R1:

  • Sign up for an Azure account if you do not have one.
  • Search for DeepSeek R1 in the Azure AI Foundry model catalog.
  • Open the model card and click “deploy” to obtain the inference API and key, and access the playground.
  • Use the API and key with various clients for application integration.

Local Deployment on Copilot+ PCs

Microsoft is also bringing NPU-optimized versions of DeepSeek R1 to Copilot+ PCs, starting with Qualcomm Snapdragon X, followed by Intel Core Ultra 200V.

The initial release, DeepSeek-R1-Distill-Qwen-1.5B, will be available in the AI Toolkit, with 7B and 14B variants to follow. These models allow developers to create and deploy AI-powered applications that run efficiently on-device, utilizing the powerful Neural Processing Units (NPUs).

The NPU on Copilot+ PCs supports efficient model inferencing, enabling semi-continuous execution of generative AI. Microsoft’s efforts with Phi Silica have resulted in competitive time-to-first-token and throughput rates while minimizing the impact on battery life and PC resources.

The optimized DeepSeek models for NPUs employ techniques like low-bit quantization and transformer mapping to the NPU, ensuring compatibility across the Windows ecosystem via ONNX QDQ format.

Silicon Optimizations

The distilled Qwen 1.5B model includes components like a tokenizer, embedding layer, context processing model, token iteration model, language model head, and detokenizer.

Microsoft uses 4-bit block-wise quantization for the embeddings and language model head, with these operations running on the CPU. NPU optimization focuses on the compute-intensive transformer blocks, using int4 per-channel quantization and selective mixed precision for weights with int16 activations.

Microsoft leverages a sliding window design for quick time-to-first-token and long-context support. The 4-bit QuaRot quantization scheme enhances accuracy by eliminating outliers in weights and activations. These optimizations achieve a time-to-first-token of 130 ms and a throughput rate of 16 tokens/s for short prompts (<64 tokens).

Availability

DeepSeek R1 is now available through a serverless endpoint in the Azure AI Foundry model catalog. More resources and step-by-step guides are available on GitHub. Distilled versions of DeepSeek R1 for local deployment on Copilot+ PCs will be accessible soon.


Related Post