Experience the DeepSeek R1 Distilled ‘Reasoning’ Models on AMD Ryzen AI and Radeon

February 12, 2025

SaveSavedRemoved 0

DeepSeek R1 Distilled AMD Ryzen AI and Radeon

Reasoning models are a new class of large language models (LLMs) designed to tackle highly complex tasks by employing chain-of-thought (CoT) reasoning with the tradeoff of taking longer to respond. The DeepSeek R1 is a recently released frontier “reasoning” model which has been distilled into highly capable smaller models. Deploying these DeepSeek R1 distilled models on AMD Ryzen AI processors and Radeon graphics cards is incredibly easy and available now through LM Studio.

Demo showcasing DeepSeek R1 Qwen 1.5 Q4 K M model running on an AMD Ryzen HX 370 series processor in real time.

Reasoning models add a “thinking” stage before the final output – which you can see by expanding the “thinking” window before the model gives its final answer. Unlike conventional LLMs, which one-shot the response, CoT LLMs perform extensive reasoning before answering. The assumptions and self-reflection the LLM performs are visible to the user and this improves the reasoning and analytical capability of the model – albeit at the cost of significantly longer time-to-first-(final output)token.

A reasoning model may first spend thousands of tokens (and you can view this chain of thought!) to analyze the problem before giving a final response. This allows the model to be excellent at complex problem-solving tasks involving math and science and attack a complex problem from all angles before deciding on a response. Depending on your AMD hardware, each of these models will offer state-of-the-art reasoning capability on your AMD Ryzen AI processor or Radeon graphics cards.

How to run DeepSeek R1 Distilled “Reasoning” Models on AMD Ryzen AI and Radeon Graphics Cards

A screenshot of a video game

Description automatically generated

Follow these simple steps to get up and running with DeepSeek R1 distillations in just a few minutes (dependent upon download speed).

Please make sure you are using the optional driver Adrenalin 25.1.1, which can be downloaded directly by clicking this link.

Step 1: Make sure you are on the 25.1.1 Optional or higher Adrenalin driver.

Step 2: Download LM Studio 0.3.8 or above from lmstudio.ai/ryzenai

Step 3: Install LM Studio and skip the onboarding screen.
Step 4: Click on the discover tab.

Step 5: Choose your DeepSeek R1 Distill. Smaller distills like the Qwen 1.5B offer blazing fast performance (and are the recommended starting point) while bigger distills will offer superior reasoning capability. All of them are extremely capable. The table below details the maximum recommended DeepSeek R1 Distill size:

Processor	*DeepSeek R1 Distill (Max Supported)**
AMD Ryzen AI Max+ 395 32GB¹, 64 GB²and 128 GB	DeepSeek-R1-Distill-Llama-70B (64GB and 128GB only) DeepSeek-R1-Distill-Qwen-32B
AMD Ryzen AI HX 370 and 365 24GB and 32 GB	DeepSeek-R1-Distill-Qwen-14B
AMD Ryzen 8040 and Ryzen 7040 32 GB	DeepSeek-R1-Distill-Llama-14B

*= AMD recommends running all distills in Q4 K M quantization.
¹= Requires Variable Graphics Memory set to Custom: 24GB.

²= Requires Variable Graphics Memory set to High.

Graphics Card	*DeepSeek R1 Distill (Max Supported**¹)
AMD Radeon RX 7900 XTX	DeepSeek-R1-Distill-Qwen-32B
AMD Radeon RX 7900 XT	DeepSeek-R1-Distill-Qwen-14B
AMD Radeon RX 7900 GRE	DeepSeek-R1-Distill-Qwen-14B
AMD Radeon RX 7800 XT	DeepSeek-R1-Distill-Qwen-14B
AMD Radeon RX 7700 XT	DeepSeek-R1-Distill-Qwen-14B
AMD Radeon RX 7600 XT	DeepSeek-R1-Distill-Qwen-14B
AMD Radeon RX 7600	DeepSeek-R1-Distill-Llama-8B

*= AMD recommends running all distills in Q4 K M quantization.

¹= Lists the maximum supported distill without partial GPU offload.

Step 6: On the right-hand side, make sure the “Q4 K M” quantization is selected and click “Download”.
Step 7: Once downloaded, head back to the chat tab and select the DeepSeek R1 distill from the drop-down menu and make sure “manually select parameters” is checked.
Step 8: In the GPU offload layers – move the slider all the way to the max.

Step 9: Click model load.Step 10: Interact with a reasoning model running completely on your local AMD hardware!