<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>comfyui Archives - Efison Lisan Teknologi</title>
	<atom:link href="https://efisonlt.com/tag/comfyui/feed/" rel="self" type="application/rss+xml" />
	<link>https://efisonlt.com/tag/comfyui/</link>
	<description>Computation for Everybody</description>
	<lastBuildDate>Thu, 28 May 2026 10:13:53 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=7.0</generator>

<image>
	<url>https://efisonlt.com/wp-content/uploads/2020/03/cropped-efison_logo_orange-skuer-32x32.png</url>
	<title>comfyui Archives - Efison Lisan Teknologi</title>
	<link>https://efisonlt.com/tag/comfyui/</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Our Experience with Asus AMD Radeon AI Pro R9700 Turbo</title>
		<link>https://efisonlt.com/our-experience-with-asus-amd-radeon-ai-pro-r9700-turbo/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=our-experience-with-asus-amd-radeon-ai-pro-r9700-turbo</link>
		
		<dc:creator><![CDATA[Laatansa Imroni]]></dc:creator>
		<pubDate>Sat, 11 Oct 2025 13:13:19 +0000</pubDate>
				<category><![CDATA[Review]]></category>
		<category><![CDATA[5070 ti]]></category>
		<category><![CDATA[amd]]></category>
		<category><![CDATA[comfyui]]></category>
		<category><![CDATA[image generation]]></category>
		<category><![CDATA[llama.cpp]]></category>
		<category><![CDATA[llm]]></category>
		<category><![CDATA[nvidia]]></category>
		<category><![CDATA[r9700]]></category>
		<category><![CDATA[radeon]]></category>
		<guid isPermaLink="false">https://efisonlt.com/?p=1880</guid>

					<description><![CDATA[<p>2025, and AI. What&#8217;s not to love? Again, if somebody were to sell a rendang and they state that it was created using AI, I bet venture capitals would clap and circle like vultures. Okay enough yapping. Now we are talking about a damn GPU. A tool to run real AI. Introducing, AMD Radeon AI&#8230;&#160;<a href="https://efisonlt.com/our-experience-with-asus-amd-radeon-ai-pro-r9700-turbo/" rel="bookmark">Read More &#187;<span class="screen-reader-text">Our Experience with Asus AMD Radeon AI Pro R9700 Turbo</span></a></p>
<p>The post <a href="https://efisonlt.com/our-experience-with-asus-amd-radeon-ai-pro-r9700-turbo/">Our Experience with Asus AMD Radeon AI Pro R9700 Turbo</a> appeared first on <a href="https://efisonlt.com">Efison Lisan Teknologi</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p class="wp-block-paragraph">2025, and AI.</p>



<p class="wp-block-paragraph">What&#8217;s not to love?</p>



<p class="wp-block-paragraph">Again, if somebody were to sell a <a href="https://id.wikipedia.org/wiki/Rendang">rendang</a> and they state that it was created using AI, I bet venture capitals would clap and circle like vultures.</p>



<p class="wp-block-paragraph">Okay enough yapping. Now we are talking about a damn GPU. A tool to run <strong>real AI</strong>.</p>



<h2 class="wp-block-heading">Introducing, AMD Radeon <em>AI</em> Pro R9700</h2>



<p class="wp-block-paragraph">In their wisdom, calling a prosumer/workstation GPU with a &#8220;pro&#8221; moniker doesn&#8217;t quite cut it anymore. It now has <strong><em>AI</em></strong> in its name. Why, do you ask?</p>



<p class="wp-block-paragraph">Because apparently it has 32GB worth of VRAM and it supports another numerical precision (FP8). The VRAM is quite big, we reckon. Still not as big as the last gen Pro (no AI) <a href="https://www.techpowerup.com/gpu-specs/radeon-pro-w7900.c4147">W7900</a>, but with all its new-ness, AI-ness, and goodness, it&#8217;s still pretty nice.</p>



<div class="wp-block-group horizontal-scroll-wrap"><div class="wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained">
<div class="wp-block-group horizontal-scroll-wrap"><div class="wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained">
<figure class="wp-block-table"><table><thead><tr><th></th><th class="has-text-align-center" data-align="center">Radeon AI Pro R9700</th><th class="has-text-align-center" data-align="center">Radeon Pro W7900</th></tr></thead><tbody><tr><td><strong>Architecture</strong></td><td class="has-text-align-center" data-align="center">AMD RDNA 4</td><td class="has-text-align-center" data-align="center">AMD RDNA 3</td></tr><tr><td><strong>Memory size</strong></td><td class="has-text-align-center" data-align="center">32 GB</td><td class="has-text-align-center" data-align="center"><strong>48 GB</strong></td></tr><tr><td><strong>Memory bandwidth</strong></td><td class="has-text-align-center" data-align="center">644.6 GB/s</td><td class="has-text-align-center" data-align="center"><strong>864 GB/s</strong></td></tr><tr><td><strong>Memory ECC support</strong></td><td class="has-text-align-center" data-align="center">Yes (Linux only)*</td><td class="has-text-align-center" data-align="center">Yes</td></tr><tr><td><strong>Peak FP32 (vector) performance</strong></td><td class="has-text-align-center" data-align="center">47.8 TFLOPS</td><td class="has-text-align-center" data-align="center"><strong>61.3 TFLOPS</strong></td></tr><tr><td><strong>Peak FP16 (vector) performance</strong></td><td class="has-text-align-center" data-align="center">95.7 TFLOPS</td><td class="has-text-align-center" data-align="center"><strong>123 TFLOPS</strong></td></tr><tr><td><strong>Peak FP16 (matrix) performance</strong></td><td class="has-text-align-center" data-align="center"><strong>191 TFLOPS</strong></td><td class="has-text-align-center" data-align="center">123 TFLOPS</td></tr><tr><td><strong>Peak FP8 (matrix) performance</strong></td><td class="has-text-align-center" data-align="center"><strong>383 TFLOPS</strong></td><td class="has-text-align-center" data-align="center">N/A</td></tr><tr><td><strong>Peak INT8 (matrix) performance</strong></td><td class="has-text-align-center" data-align="center"><strong>383 TOPS</strong></td><td class="has-text-align-center" data-align="center">123 TOPS</td></tr></tbody></table></figure>
</div></div>
</div></div>



<p class="wp-block-paragraph">Personally we&#8217;re pretty confused about why the memory ECC support on R9700 is stated as Linux only but as we&#8217;re mostly using Linux as our test platform, no complaint there.</p>



<p class="wp-block-paragraph">Even with smaller memory size, it <strong>is</strong> pretty beefy in terms of performance. Especially in theoretical matrix performance. Hence the AI namesake. Unfortunately we have no W7900 or any RX 7900 XTX for real-world comparison purpose, but we do have an RTX 5070 Ti which apparently has a similar profile.</p>



<div class="wp-block-group horizontal-scroll-wrap"><div class="wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained">
<div class="wp-block-group horizontal-scroll-wrap"><div class="wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained">
<div class="wp-block-group horizontal-scroll-wrap"><div class="wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained">
<figure class="wp-block-table"><table><thead><tr><th></th><th class="has-text-align-center" data-align="center">Radeon AI Pro R9700</th><th class="has-text-align-center" data-align="center">Geforce RTX 5070 Ti</th></tr></thead><tbody><tr><td><strong>Process technology</strong></td><td class="has-text-align-center" data-align="center">TSMC N4P</td><td class="has-text-align-center" data-align="center">TSMC 4N</td></tr><tr><td><strong>Die size</strong></td><td class="has-text-align-center" data-align="center">357 mm²</td><td class="has-text-align-center" data-align="center">378 mm²</td></tr><tr><td><strong>Memory size</strong></td><td class="has-text-align-center" data-align="center"><strong>32 GB</strong> GDDR6</td><td class="has-text-align-center" data-align="center">16 GB <strong>GDDR7</strong></td></tr><tr><td><strong>Memory interface</strong></td><td class="has-text-align-center" data-align="center">256-bit</td><td class="has-text-align-center" data-align="center">256-bit</td></tr><tr><td><strong>Memory bandwidth</strong></td><td class="has-text-align-center" data-align="center">644.6 GB/s</td><td class="has-text-align-center" data-align="center"><strong>896 GB/s</strong></td></tr><tr><td><strong>Total board power</strong></td><td class="has-text-align-center" data-align="center">300 W</td><td class="has-text-align-center" data-align="center">300 W</td></tr></tbody></table></figure>
</div></div>
</div></div>
</div></div>



<h2 class="wp-block-heading">The Test Setup</h2>



<div class="wp-block-media-text is-stacked-on-mobile is-image-fill-element" style="grid-template-columns:35% auto"><figure class="wp-block-media-text__media"><a href="https://efisonlt.com/our-experience-with-asus-amd-radeon-ai-pro-r9700-turbo/photo_2025-10-11_14-08-09/"><img fetchpriority="high" decoding="async" width="1024" height="768" src="https://efisonlt.com/wp-content/uploads/2025/10/photo_2025-10-11_14-08-09-1024x768.jpg" alt="" class="wp-image-1883 size-large" style="object-position:50% 50%" srcset="https://efisonlt.com/wp-content/uploads/2025/10/photo_2025-10-11_14-08-09-1024x768.jpg 1024w, https://efisonlt.com/wp-content/uploads/2025/10/photo_2025-10-11_14-08-09-300x225.jpg 300w, https://efisonlt.com/wp-content/uploads/2025/10/photo_2025-10-11_14-08-09-768x576.jpg 768w, https://efisonlt.com/wp-content/uploads/2025/10/photo_2025-10-11_14-08-09-1536x1152.jpg 1536w, https://efisonlt.com/wp-content/uploads/2025/10/photo_2025-10-11_14-08-09-2048x1536.jpg 2048w" sizes="(max-width: 1024px) 100vw, 1024px" /></a></figure><div class="wp-block-media-text__content">
<div class="wp-block-group horizontal-scroll-wrap"><div class="wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained">
<figure class="wp-block-table"><table><thead><tr><th>Component</th><th>Specification</th></tr></thead><tbody><tr><td><strong>CPU</strong></td><td>Intel Core i7-12700K<br>@5.0 GHz</td></tr><tr><td><strong>Motherboard</strong></td><td>ASRock Z690 PG Velocita</td></tr><tr><td><strong>Memory</strong></td><td>4*24 GB Klevv Cras V RGB<br>@DDR5-5600</td></tr><tr><td><strong>Storage</strong></td><td>2TB MSI Spatium M480</td></tr><tr><td><strong>PSU</strong></td><td>1000W 1stPlayer NGDP Gold</td></tr></tbody></table></figure>
</div></div>
</div></div>



<p class="wp-block-paragraph">For the benchmark, we tested LLM using llama.cpp and image generation using Qwen Image Edit 2509.</p>



<h3 class="wp-block-heading">llama.cpp Benchmark Setup</h3>



<p class="wp-block-paragraph">We used llama.cpp build <strong>d2ee056e1 (6713)</strong> and compiled the CPU backend using Intel <strong>oneAPI compiler 2025.2.1 </strong>against <strong>external BLAS library</strong> which is <strong>Intel oneAPI MKL 2025.2</strong>. Why, do you ask? Because it yields faster performance compared to mere <strong>GNU compiler 15.2.1 with no BLAS</strong>.</p>



<p class="wp-block-paragraph">We tested using <a href="https://huggingface.co/unsloth/gpt-oss-120b-GGUF/tree/main/Q4_K_M">unsloth/gpt-oss-120b-Q4_K_M</a> model and <strong>.(7|8|9|[0-9][0-9]|[0-9][0-9][0-9]).ffn_(up|down|gate)_exps.</strong> MoE layers which are then being put to system RAM for CPU offload processing.</p>



<figure class="wp-block-table"><table><thead><tr><th>(In token/s. Higher is better)</th><th class="has-text-align-right" data-align="right">GNU compiler 15.2.1<br>no BLAS</th><th class="has-text-align-right" data-align="right"><strong>oneAPI compiler 2025.2.1<br>BLAS=oneAPI MKL 2025.2</strong></th></tr></thead><tbody><tr><td><strong>Prompt processing (512 tokens)</strong></td><td class="has-text-align-right" data-align="right">180.65 ± 1.74</td><td class="has-text-align-right" data-align="right"><strong>182.39 ± 1.60</strong></td></tr><tr><td><strong>Text generation (256 tokens)</strong></td><td class="has-text-align-right" data-align="right">21.85 ± 0.82</td><td class="has-text-align-right" data-align="right"><strong>32.19 ± 0.04</strong></td></tr></tbody></table></figure>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>Compilation steps for GNU compiler 15.2.1, no BLAS</summary>
<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:flex;align-items:center;padding:10px 0px 10px 16px;margin-bottom:-2px;width:100%;text-align:left;background-color:#2b2b2b;color:#c7c7c7">Bash</span><span role="button" tabindex="0" style="color:#D4D4D4;display:none" aria-label="Copy" class="code-block-pro-copy-button"><pre class="code-block-pro-copy-button-pre" aria-hidden="true"><textarea class="code-block-pro-copy-button-textarea" tabindex="-1" aria-hidden="true" readonly>mkdir build_no-blas-gcc_vulkan &amp;&amp; cd build_no-blas-gcc_vulkan
cmake .. -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ -DGGML_NATIVE=ON -DGGML_VULKAN=1
cmake --build . --config Release -j</textarea></pre><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki dark-plus" style="background-color: #1E1E1E" tabindex="0"><code><span class="line"><span style="color: #DCDCAA">mkdir</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">build_no-blas-gcc_vulkan</span><span style="color: #D4D4D4"> &amp;&amp; </span><span style="color: #DCDCAA">cd</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">build_no-blas-gcc_vulkan</span></span>
<span class="line"><span style="color: #DCDCAA">cmake</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">..</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-DCMAKE_C_COMPILER=gcc</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-DCMAKE_CXX_COMPILER=g++</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-DGGML_NATIVE=ON</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-DGGML_VULKAN=1</span></span>
<span class="line"><span style="color: #DCDCAA">cmake</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">--build</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">.</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">--config</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Release</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-j</span></span></code></pre></div>
</details>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>Compilation steps for oneAPI compiler 2025.2.1, BLAS=oneAPI MKL 2025.2</summary>
<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:flex;align-items:center;padding:10px 0px 10px 16px;margin-bottom:-2px;width:100%;text-align:left;background-color:#2b2b2b;color:#c7c7c7">Bash</span><span role="button" tabindex="0" style="color:#D4D4D4;display:none" aria-label="Copy" class="code-block-pro-copy-button"><pre class="code-block-pro-copy-button-pre" aria-hidden="true"><textarea class="code-block-pro-copy-button-textarea" tabindex="-1" aria-hidden="true" readonly>mkdir build_mkl-ilp64-icx_vulkan &amp;&amp; build_mkl-ilp64-icx_vulkan
cmake -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=Intel10_64ilp -DGGML_NATIVE=ON -DGGML_VULKAN=1
cmake --build . --config Release -j</textarea></pre><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki dark-plus" style="background-color: #1E1E1E" tabindex="0"><code><span class="line"><span style="color: #DCDCAA">mkdir</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">build_mkl-ilp64-icx_vulkan</span><span style="color: #D4D4D4"> &amp;&amp; </span><span style="color: #DCDCAA">build_mkl-ilp64-icx_vulkan</span></span>
<span class="line"><span style="color: #DCDCAA">cmake</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-DCMAKE_C_COMPILER=icx</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-DCMAKE_CXX_COMPILER=icpx</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-DGGML_BLAS=ON</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-DGGML_BLAS_VENDOR=Intel10_64ilp</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-DGGML_NATIVE=ON</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-DGGML_VULKAN=1</span></span>
<span class="line"><span style="color: #DCDCAA">cmake</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">--build</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">.</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">--config</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Release</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-j</span></span></code></pre></div>
</details>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>Run output</summary>
<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:flex;align-items:center;padding:10px 0px 10px 16px;margin-bottom:-2px;width:100%;text-align:left;background-color:#2b2b2b;color:#c7c7c7">Bash</span><span role="button" tabindex="0" style="color:#D4D4D4;display:none" aria-label="Copy" class="code-block-pro-copy-button"><pre class="code-block-pro-copy-button-pre" aria-hidden="true"><textarea class="code-block-pro-copy-button-textarea" tabindex="-1" aria-hidden="true" readonly>GGML_VULKAN_DEVICE=0 ./build_no-blas-gcc_vulkan/bin/llama-bench --model ../MoE/unsloth/gpt-oss-120b-Q4_K_M.gguf -ctk q8_0 -ctv q8_0  --threads 8 -ngl 99 -ot "\.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down|gate)_exps.=CPU" -p 512 -n 256 -fa 1 -ub 4096 -b 4096

GGML_VULKAN_DEVICE=0 ./build_mkl-ilp64-icx_vulkan/bin/llama-bench --model ../MoE/unsloth/gpt-oss-120b-Q4_K_M.gguf -ctk q8_0 -ctv q8_0  --threads 8 -ngl 99 -ot "\.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down|gate)_exps.=CPU" -p 512 -n 256 -fa 1 -ub 4096 -b 4096
WARNING: radv is not a conformant Vulkan implementation, testing use only.
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon Graphics (RADV GFX1201) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
ggml_vulkan: 1 = Intel(R) UHD Graphics 770 (ADL-S GT1) (Intel open-source Mesa driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 65536 | int dot: 1 | matrix cores: none
| model                          |       size |     params | backend    | ngl | n_batch | n_ubatch | type_k | type_v | fa | ot                    |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | -------: | -----: | -----: | -: | --------------------- | --------------: | -------------------: |
| gpt-oss 120B Q4_K - Medium     |  58.45 GiB |   116.83 B | Vulkan     |  99 |    4096 |     4096 |   q8_0 |   q8_0 |  1 | \.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down|gate)_exps.=CPU |           pp512 |        180.65 ± 1.74 |
| gpt-oss 120B Q4_K - Medium     |  58.45 GiB |   116.83 B | Vulkan     |  99 |    4096 |     4096 |   q8_0 |   q8_0 |  1 | \.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down|gate)_exps.=CPU |           tg256 |         21.85 ± 0.82 |

build: d2ee056e1 (6713)
WARNING: radv is not a conformant Vulkan implementation, testing use only.
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon Graphics (RADV GFX1201) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
ggml_vulkan: 1 = Intel(R) UHD Graphics 770 (ADL-S GT1) (Intel open-source Mesa driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 65536 | int dot: 1 | matrix cores: none
| model                          |       size |     params | backend    | threads | n_batch | n_ubatch | type_k | type_v | fa | ot                    |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------: | -------: | -----: | -----: | -: | --------------------- | --------------: | -------------------: |
| gpt-oss 120B Q4_K - Medium     |  58.45 GiB |   116.83 B | Vulkan,BLAS |       8 |    4096 |     4096 |   q8_0 |   q8_0 |  1 | \.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down|gate)_exps.=CPU |           pp512 |        182.39 ± 1.60 |
| gpt-oss 120B Q4_K - Medium     |  58.45 GiB |   116.83 B | Vulkan,BLAS |       8 |    4096 |     4096 |   q8_0 |   q8_0 |  1 | \.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down|gate)_exps.=CPU |           tg256 |         32.19 ± 0.04 |

build: d2ee056e1 (6713)</textarea></pre><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki dark-plus" style="background-color: #1E1E1E" tabindex="0"><code><span class="line"><span style="color: #9CDCFE">GGML_VULKAN_DEVICE</span><span style="color: #D4D4D4">=</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> </span><span style="color: #DCDCAA">./build_no-blas-gcc_vulkan/bin/llama-bench</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">--model</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">../MoE/unsloth/gpt-oss-120b-Q4_K_M.gguf</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctk</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctv</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4">  </span><span style="color: #569CD6">--threads</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">8</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ngl</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">99</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ot</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">&quot;\.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down|gate)_exps.=CPU&quot;</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-p</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">512</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-n</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">256</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-fa</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ub</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-b</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span></span>
<span class="line"></span>
<span class="line"><span style="color: #9CDCFE">GGML_VULKAN_DEVICE</span><span style="color: #D4D4D4">=</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> </span><span style="color: #DCDCAA">./build_mkl-ilp64-icx_vulkan/bin/llama-bench</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">--model</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">../MoE/unsloth/gpt-oss-120b-Q4_K_M.gguf</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctk</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctv</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4">  </span><span style="color: #569CD6">--threads</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">8</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ngl</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">99</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ot</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">&quot;\.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down|gate)_exps.=CPU&quot;</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-p</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">512</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-n</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">256</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-fa</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ub</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-b</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span></span>
<span class="line"><span style="color: #DCDCAA">WARNING:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">radv</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">is</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">not</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">a</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">conformant</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Vulkan</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">implementation,</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">testing</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">use</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">only.</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_vulkan:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Found</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">2</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Vulkan</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">devices:</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_vulkan:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">=</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">AMD</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Radeon</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Graphics</span><span style="color: #D4D4D4"> (RADV </span><span style="color: #CE9178">GFX1201</span><span style="color: #D4D4D4">) (</span><span style="color: #DCDCAA">radv</span><span style="color: #D4D4D4">) | </span><span style="color: #DCDCAA">uma:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fp16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">bf16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">warp</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">size:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">64</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">shared</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">memory:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">65536</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">int</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">dot:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">matrix</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">cores:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">KHR_coopmat</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_vulkan:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">=</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Intel</span><span style="color: #D4D4D4">(</span><span style="color: #DCDCAA">R</span><span style="color: #D4D4D4">) </span><span style="color: #CE9178">UHD</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Graphics</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">770</span><span style="color: #D4D4D4"> (ADL-S </span><span style="color: #CE9178">GT1</span><span style="color: #D4D4D4">) (</span><span style="color: #DCDCAA">Intel</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">open-source</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Mesa</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">driver</span><span style="color: #D4D4D4">) | </span><span style="color: #DCDCAA">uma:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fp16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">bf16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">warp</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">size:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">32</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">shared</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">memory:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">65536</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">int</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">dot:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">matrix</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">cores:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">none</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">model</span><span style="color: #D4D4D4">                          |       </span><span style="color: #DCDCAA">size</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">params</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">backend</span><span style="color: #D4D4D4">    | </span><span style="color: #DCDCAA">ngl</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_batch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_ubatch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_k</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_v</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fa</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">ot</span><span style="color: #D4D4D4">                    |            </span><span style="color: #DCDCAA">test</span><span style="color: #D4D4D4"> |                  </span><span style="color: #DCDCAA">t/s</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">------------------------------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">----------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">--:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------------------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">--------------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------------------:</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">gpt-oss</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">120</span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Q4_K</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Medium</span><span style="color: #D4D4D4">     |  </span><span style="color: #DCDCAA">58.45</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">116.83</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">Vulkan</span><span style="color: #D4D4D4">     |  </span><span style="color: #DCDCAA">99</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">\.(7</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">9</span><span style="color: #D4D4D4">|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;)</span><span style="color: #D7BA7D">\.</span><span style="color: #D4D4D4">ffn_(</span><span style="color: #DCDCAA">up</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">down</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">gate</span><span style="color: #D4D4D4">)_exps.=CPU |           </span><span style="color: #DCDCAA">pp512</span><span style="color: #D4D4D4"> |        </span><span style="color: #DCDCAA">180.65</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1.74</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">gpt-oss</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">120</span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Q4_K</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Medium</span><span style="color: #D4D4D4">     |  </span><span style="color: #DCDCAA">58.45</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">116.83</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">Vulkan</span><span style="color: #D4D4D4">     |  </span><span style="color: #DCDCAA">99</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">\.(7</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">9</span><span style="color: #D4D4D4">|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;)</span><span style="color: #D7BA7D">\.</span><span style="color: #D4D4D4">ffn_(</span><span style="color: #DCDCAA">up</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">down</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">gate</span><span style="color: #D4D4D4">)_exps.=CPU |           </span><span style="color: #DCDCAA">tg256</span><span style="color: #D4D4D4"> |         </span><span style="color: #DCDCAA">21.85</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0.82</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"></span>
<span class="line"><span style="color: #DCDCAA">build:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">d2ee056e1</span><span style="color: #D4D4D4"> (6713)</span></span>
<span class="line"><span style="color: #DCDCAA">WARNING:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">radv</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">is</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">not</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">a</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">conformant</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Vulkan</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">implementation,</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">testing</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">use</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">only.</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_vulkan:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Found</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">2</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Vulkan</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">devices:</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_vulkan:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">=</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">AMD</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Radeon</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Graphics</span><span style="color: #D4D4D4"> (RADV </span><span style="color: #CE9178">GFX1201</span><span style="color: #D4D4D4">) (</span><span style="color: #DCDCAA">radv</span><span style="color: #D4D4D4">) | </span><span style="color: #DCDCAA">uma:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fp16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">bf16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">warp</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">size:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">64</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">shared</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">memory:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">65536</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">int</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">dot:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">matrix</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">cores:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">KHR_coopmat</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_vulkan:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">=</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Intel</span><span style="color: #D4D4D4">(</span><span style="color: #DCDCAA">R</span><span style="color: #D4D4D4">) </span><span style="color: #CE9178">UHD</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Graphics</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">770</span><span style="color: #D4D4D4"> (ADL-S </span><span style="color: #CE9178">GT1</span><span style="color: #D4D4D4">) (</span><span style="color: #DCDCAA">Intel</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">open-source</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Mesa</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">driver</span><span style="color: #D4D4D4">) | </span><span style="color: #DCDCAA">uma:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fp16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">bf16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">warp</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">size:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">32</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">shared</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">memory:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">65536</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">int</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">dot:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">matrix</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">cores:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">none</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">model</span><span style="color: #D4D4D4">                          |       </span><span style="color: #DCDCAA">size</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">params</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">backend</span><span style="color: #D4D4D4">    | </span><span style="color: #DCDCAA">threads</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_batch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_ubatch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_k</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_v</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fa</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">ot</span><span style="color: #D4D4D4">                    |            </span><span style="color: #DCDCAA">test</span><span style="color: #D4D4D4"> |                  </span><span style="color: #DCDCAA">t/s</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">------------------------------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">----------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------------------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">--------------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------------------:</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">gpt-oss</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">120</span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Q4_K</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Medium</span><span style="color: #D4D4D4">     |  </span><span style="color: #DCDCAA">58.45</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">116.83</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">Vulkan,BLAS</span><span style="color: #D4D4D4"> |       </span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">\.(7</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">9</span><span style="color: #D4D4D4">|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;)</span><span style="color: #D7BA7D">\.</span><span style="color: #D4D4D4">ffn_(</span><span style="color: #DCDCAA">up</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">down</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">gate</span><span style="color: #D4D4D4">)_exps.=CPU |           </span><span style="color: #DCDCAA">pp512</span><span style="color: #D4D4D4"> |        </span><span style="color: #DCDCAA">182.39</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1.60</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">gpt-oss</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">120</span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Q4_K</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Medium</span><span style="color: #D4D4D4">     |  </span><span style="color: #DCDCAA">58.45</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">116.83</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">Vulkan,BLAS</span><span style="color: #D4D4D4"> |       </span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">\.(7</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">9</span><span style="color: #D4D4D4">|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;)</span><span style="color: #D7BA7D">\.</span><span style="color: #D4D4D4">ffn_(</span><span style="color: #DCDCAA">up</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">down</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">gate</span><span style="color: #D4D4D4">)_exps.=CPU |           </span><span style="color: #DCDCAA">tg256</span><span style="color: #D4D4D4"> |         </span><span style="color: #DCDCAA">32.19</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0.04</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"></span>
<span class="line"><span style="color: #DCDCAA">build:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">d2ee056e1</span><span style="color: #D4D4D4"> (6713)</span></span></code></pre></div>
</details>



<p class="wp-block-paragraph">For the GPU backend, we used both Vulkan and HIP (ROCm 7) which we won&#8217;t discuss much here to prevent spoilers.</p>



<p class="wp-block-paragraph">We refrained on using ROCm 6.4.x as ROCm 7.0.x is now performing much better on this GPU (or probably all AMD RDNA4 GPUs in general). If you haven&#8217;t heard already, AMD have just released their newest ROCm 7 on <a href="https://rocm.docs.amd.com/en/latest/release/versions.html">September 16, 2025</a>. We did a quick comparison in terms of llama.cpp performance against ROCm 6.4.3 which can be seen on <a href="https://web.facebook.com/share/p/17F5E61mgb/">this guy&#8217;s Facebook post</a>.</p>


<div class="wp-block-image">
<figure data-wp-context="{&quot;imageId&quot;:&quot;6a1d48e3ae100&quot;}" data-wp-interactive="core/image" data-wp-key="6a1d48e3ae100" class="aligncenter size-medium wp-lightbox-container"><img decoding="async" width="300" height="150" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on--pointerdown="actions.preloadImage" data-wp-on--pointerenter="actions.preloadImageWithDelay" data-wp-on--pointerleave="actions.cancelPreload" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://efisonlt.com/wp-content/uploads/2025/10/Screenshot-2025-10-11-at-16-09-56-Facebook-300x150.png" alt="" class="wp-image-1885" srcset="https://efisonlt.com/wp-content/uploads/2025/10/Screenshot-2025-10-11-at-16-09-56-Facebook-300x150.png 300w, https://efisonlt.com/wp-content/uploads/2025/10/Screenshot-2025-10-11-at-16-09-56-Facebook.png 320w" sizes="(max-width: 300px) 100vw, 300px" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			data-wp-bind--aria-label="state.thisImage.triggerButtonAriaLabel"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.thisImage.buttonRight"
			data-wp-style--top="state.thisImage.buttonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>
</div>


<p class="wp-block-paragraph">As for the RTX 5070 Ti, we compiled llama.cpp GPU backend against CUDA without GGML_CUDA_FORCE_CUBLAS.</p>



<p class="wp-block-paragraph">So for references, here are the compilation lines used for various configurations stated above:</p>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>Compilation line for llama.cpp ROCm</summary>
<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:flex;align-items:center;padding:10px 0px 10px 16px;margin-bottom:-2px;width:100%;text-align:left;background-color:#2b2b2b;color:#c7c7c7">Bash</span><span role="button" tabindex="0" style="color:#D4D4D4;display:none" aria-label="Copy" class="code-block-pro-copy-button"><pre class="code-block-pro-copy-button-pre" aria-hidden="true"><textarea class="code-block-pro-copy-button-textarea" tabindex="-1" aria-hidden="true" readonly>cmake .. -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=Intel10_64ilp -DGGML_NATIVE=ON -DGGML_HIP=ON -DGPU_TARGETS=gfx1201</textarea></pre><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki dark-plus" style="background-color: #1E1E1E" tabindex="0"><code><span class="line"><span style="color: #DCDCAA">cmake</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">..</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-DCMAKE_C_COMPILER=icx</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-DCMAKE_CXX_COMPILER=icpx</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-DGGML_BLAS=ON</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-DGGML_BLAS_VENDOR=Intel10_64ilp</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-DGGML_NATIVE=ON</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-DGGML_HIP=ON</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-DGPU_TARGETS=gfx1201</span></span></code></pre></div>
</details>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>Compilation line for llama.cpp Vulkan</summary>
<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:flex;align-items:center;padding:10px 0px 10px 16px;margin-bottom:-2px;width:100%;text-align:left;background-color:#2b2b2b;color:#c7c7c7">Bash</span><span role="button" tabindex="0" style="color:#D4D4D4;display:none" aria-label="Copy" class="code-block-pro-copy-button"><pre class="code-block-pro-copy-button-pre" aria-hidden="true"><textarea class="code-block-pro-copy-button-textarea" tabindex="-1" aria-hidden="true" readonly>cmake .. -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=Intel10_64ilp -DGGML_NATIVE=ON -DGGML_VULKAN=1</textarea></pre><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki dark-plus" style="background-color: #1E1E1E" tabindex="0"><code><span class="line"><span style="color: #DCDCAA">cmake</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">..</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-DCMAKE_C_COMPILER=icx</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-DCMAKE_CXX_COMPILER=icpx</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-DGGML_BLAS=ON</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-DGGML_BLAS_VENDOR=Intel10_64ilp</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-DGGML_NATIVE=ON</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-DGGML_VULKAN=1</span></span></code></pre></div>
</details>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>Compilation line for llama.cpp CUDA</summary>
<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:flex;align-items:center;padding:10px 0px 10px 16px;margin-bottom:-2px;width:100%;text-align:left;background-color:#2b2b2b;color:#c7c7c7">Bash</span><span role="button" tabindex="0" style="color:#D4D4D4;display:none" aria-label="Copy" class="code-block-pro-copy-button"><pre class="code-block-pro-copy-button-pre" aria-hidden="true"><textarea class="code-block-pro-copy-button-textarea" tabindex="-1" aria-hidden="true" readonly>cmake .. -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=Intel10_64ilp -DGGML_NATIVE=ON -DGGML_CUDA=ON</textarea></pre><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki dark-plus" style="background-color: #1E1E1E" tabindex="0"><code><span class="line"><span style="color: #DCDCAA">cmake</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">..</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-DCMAKE_C_COMPILER=icx</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-DCMAKE_CXX_COMPILER=icpx</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-DGGML_BLAS=ON</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-DGGML_BLAS_VENDOR=Intel10_64ilp</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-DGGML_NATIVE=ON</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-DGGML_CUDA=ON</span></span></code></pre></div>
</details>



<p class="wp-block-paragraph">We chose 2 models to be used for testing:</p>



<ul class="wp-block-list">
<li><a href="https://huggingface.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-1M-GGUF/blob/main/Qwen3-Coder-30B-A3B-Instruct-1M-IQ4_XS.gguf">unsloth/Qwen3-Coder-30B-A3B-Instruct-1M-IQ4_XS</a>, <strong>15.25 GB</strong></li>



<li><a href="https://huggingface.co/unsloth/gpt-oss-120b-GGUF/tree/main/Q4_K_M">unsloth/gpt-oss-120b-Q4_K_M</a>, <strong>58.45 GB</strong></li>
</ul>



<h3 class="wp-block-heading">Qwen Image Edit 2509 Benchmark Setup (and rambling about PyTorch for ROCm on Windows situation)</h3>



<p class="wp-block-paragraph">This one is a bit different because it has PyTorch dependencies and it&#8217;s not that simple. Historically, AMD has been neglecting (or unable to make?) Windows PyTorch package for their GPUs. With the <a href="https://www.amd.com/en/resources/support-articles/release-notes/RN-AMDGPU-WINDOWS-PYTORCH-PREVIEW.html">Windows Preview Edition 25.20.01.14 driver</a>, they finally support PyTorch on Windows for Radeons. Yay!</p>



<figure data-wp-context="{&quot;imageId&quot;:&quot;6a1d48e3af10d&quot;}" data-wp-interactive="core/image" data-wp-key="6a1d48e3af10d" class="wp-block-image size-large wp-lightbox-container"><img decoding="async" width="1024" height="822" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on--pointerdown="actions.preloadImage" data-wp-on--pointerenter="actions.preloadImageWithDelay" data-wp-on--pointerleave="actions.cancelPreload" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://efisonlt.com/wp-content/uploads/2025/10/image-1024x822.png" alt="" class="wp-image-1886" srcset="https://efisonlt.com/wp-content/uploads/2025/10/image-1024x822.png 1024w, https://efisonlt.com/wp-content/uploads/2025/10/image-300x241.png 300w, https://efisonlt.com/wp-content/uploads/2025/10/image-768x616.png 768w, https://efisonlt.com/wp-content/uploads/2025/10/image.png 1027w" sizes="(max-width: 1024px) 100vw, 1024px" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			data-wp-bind--aria-label="state.thisImage.triggerButtonAriaLabel"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.thisImage.buttonRight"
			data-wp-style--top="state.thisImage.buttonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<p class="wp-block-paragraph">Albeit the limited roster of supported GPUs&#8230;</p>



<figure data-wp-context="{&quot;imageId&quot;:&quot;6a1d48e3af794&quot;}" data-wp-interactive="core/image" data-wp-key="6a1d48e3af794" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" decoding="async" width="1024" height="694" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on--pointerdown="actions.preloadImage" data-wp-on--pointerenter="actions.preloadImageWithDelay" data-wp-on--pointerleave="actions.cancelPreload" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://efisonlt.com/wp-content/uploads/2025/10/image-1-1024x694.png" alt="" class="wp-image-1887" srcset="https://efisonlt.com/wp-content/uploads/2025/10/image-1-1024x694.png 1024w, https://efisonlt.com/wp-content/uploads/2025/10/image-1-300x203.png 300w, https://efisonlt.com/wp-content/uploads/2025/10/image-1-768x520.png 768w, https://efisonlt.com/wp-content/uploads/2025/10/image-1.png 1026w" sizes="(max-width: 1024px) 100vw, 1024px" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			data-wp-bind--aria-label="state.thisImage.triggerButtonAriaLabel"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.thisImage.buttonRight"
			data-wp-style--top="state.thisImage.buttonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<p class="wp-block-paragraph">Funnily enough, they only listed Windows 11 as the compatible OS. Lo and behold, we managed to use it on Windows 10.</p>



<figure data-wp-context="{&quot;imageId&quot;:&quot;6a1d48e3afcee&quot;}" data-wp-interactive="core/image" data-wp-key="6a1d48e3afcee" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" decoding="async" width="1024" height="576" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on--pointerdown="actions.preloadImage" data-wp-on--pointerenter="actions.preloadImageWithDelay" data-wp-on--pointerleave="actions.cancelPreload" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://efisonlt.com/wp-content/uploads/2025/10/Show-nodes-and-warm-up-comfy-0001-1024x576.jpg" alt="" class="wp-image-1888" srcset="https://efisonlt.com/wp-content/uploads/2025/10/Show-nodes-and-warm-up-comfy-0001-1024x576.jpg 1024w, https://efisonlt.com/wp-content/uploads/2025/10/Show-nodes-and-warm-up-comfy-0001-300x169.jpg 300w, https://efisonlt.com/wp-content/uploads/2025/10/Show-nodes-and-warm-up-comfy-0001-768x432.jpg 768w, https://efisonlt.com/wp-content/uploads/2025/10/Show-nodes-and-warm-up-comfy-0001-1536x864.jpg 1536w, https://efisonlt.com/wp-content/uploads/2025/10/Show-nodes-and-warm-up-comfy-0001-2048x1152.jpg 2048w" sizes="(max-width: 1024px) 100vw, 1024px" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			data-wp-bind--aria-label="state.thisImage.triggerButtonAriaLabel"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.thisImage.buttonRight"
			data-wp-style--top="state.thisImage.buttonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<p class="wp-block-paragraph">Enough with the rambling, now let&#8217;s get onto the setup.</p>



<p class="wp-block-paragraph">We did the tests on 2 different OSes:</p>



<ul class="wp-block-list">
<li>Aurora Linux 42 based on Fedora Kinoite</li>



<li>Windows 10 Pro</li>
</ul>



<p class="wp-block-paragraph">For the PyTorch, we tested different combinations:</p>



<figure class="wp-block-table"><table><thead><tr><th></th><th class="has-text-align-center" data-align="center">PyTorch 2.x for ROCm 6.4.x</th><th class="has-text-align-center" data-align="center">PyTorch 2.x for ROCm 7.0.x</th></tr></thead><tbody><tr><td>Aurora Linux 42</td><td class="has-text-align-center" data-align="center">✘</td><td class="has-text-align-center" data-align="center">&#x2714;</td></tr><tr><td>Windows 10 Pro</td><td class="has-text-align-center" data-align="center">&#x2714;</td><td class="has-text-align-center" data-align="center">&#x2714;</td></tr></tbody></table></figure>



<p class="wp-block-paragraph">The Qwen Image Edit 2509 models used here were <a href="https://huggingface.co/QuantStack/Qwen-Image-Edit-2509-GGUF/blob/main/Qwen-Image-Edit-2509-Q4_K_M.gguf">QuantStack/Qwen-Image-Edit-2509-Q4_K_M</a>. We also used <a href="https://huggingface.co/lightx2v/Qwen-Image-Lightning/blob/main/Qwen-Image-Edit-2509/Qwen-Image-Edit-2509-Lightning-4steps-V1.0-bf16.safetensors">lightx2v/Qwen-Image-Edit-2509-Lightning-4steps-V1.0-bf16</a> LoRA for performance reason. As for the UI and additional package, we used <a href="https://github.com/comfyanonymous/ComfyUI/releases/tag/v0.3.64">ComfyUI v0.3.64</a>. Not to forgot to mention <a href="https://github.com/city96/ComfyUI-GGUF">city96/ComfyUI-GGUF</a> for GGUF models compatibility.</p>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>Additional notes about Qwen Image Edit 2509 test setup and deployment</summary>
<ul class="wp-block-list">
<li>We deployed the ComfyUI with ROCm support on Linux using Podman container based on <a href="https://hub.docker.com/r/rocm/pytorch/tags?name=rocm7.0_ubuntu22.04">Ubuntu 22.04 ROCm 7.0 Docker image</a>, in which the compose scripts can be cloned from git <a href="https://github.com/lslowmotion/stable-diffusion-webui-podman">lslowmotion/stable-diffusion-webui-podman</a></li>



<li>ComfyUI with ROCm support on Windows was deployed using <a href="https://github.com/comfyanonymous/ComfyUI/?tab=readme-ov-file#installing">ComfyUI experimental portable package for AMD GPUs</a> which has PyTorch 2.x for ROCm 6.4.x included, and ROCm 7.0.x tests were done by manually upgrade the PyTorch PIP package to <a href="https://rocm.nightlies.amd.com/v2/gfx120X-all">nightly package targeting gfx120x</a> which is the arch code for the Radeon AI Pro R9700.</li>
</ul>
</details>



<p class="wp-block-paragraph">As for the inputs, we used these images:</p>



<figure data-wp-context="{&quot;imageId&quot;:&quot;6a1d48e3b04e6&quot;}" data-wp-interactive="core/image" data-wp-key="6a1d48e3b04e6" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" decoding="async" width="1024" height="591" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on--pointerdown="actions.preloadImage" data-wp-on--pointerenter="actions.preloadImageWithDelay" data-wp-on--pointerleave="actions.cancelPreload" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://efisonlt.com/wp-content/uploads/2025/10/images-for-qwen-edit-1024x591.png" alt="" class="wp-image-1895" srcset="https://efisonlt.com/wp-content/uploads/2025/10/images-for-qwen-edit-1024x591.png 1024w, https://efisonlt.com/wp-content/uploads/2025/10/images-for-qwen-edit-300x173.png 300w, https://efisonlt.com/wp-content/uploads/2025/10/images-for-qwen-edit-768x443.png 768w, https://efisonlt.com/wp-content/uploads/2025/10/images-for-qwen-edit.png 1210w" sizes="(max-width: 1024px) 100vw, 1024px" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			data-wp-bind--aria-label="state.thisImage.triggerButtonAriaLabel"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.thisImage.buttonRight"
			data-wp-style--top="state.thisImage.buttonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<p class="wp-block-paragraph">And these sentences for the prompt:</p>



<figure data-wp-context="{&quot;imageId&quot;:&quot;6a1d48e3b0978&quot;}" data-wp-interactive="core/image" data-wp-key="6a1d48e3b0978" class="wp-block-image size-full wp-lightbox-container"><img loading="lazy" decoding="async" width="953" height="951" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on--pointerdown="actions.preloadImage" data-wp-on--pointerenter="actions.preloadImageWithDelay" data-wp-on--pointerleave="actions.cancelPreload" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://efisonlt.com/wp-content/uploads/2025/10/Screenshot_20251011_184755.png" alt="" class="wp-image-1896" srcset="https://efisonlt.com/wp-content/uploads/2025/10/Screenshot_20251011_184755.png 953w, https://efisonlt.com/wp-content/uploads/2025/10/Screenshot_20251011_184755-300x300.png 300w, https://efisonlt.com/wp-content/uploads/2025/10/Screenshot_20251011_184755-150x150.png 150w, https://efisonlt.com/wp-content/uploads/2025/10/Screenshot_20251011_184755-768x766.png 768w" sizes="(max-width: 953px) 100vw, 953px" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			data-wp-bind--aria-label="state.thisImage.triggerButtonAriaLabel"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.thisImage.buttonRight"
			data-wp-style--top="state.thisImage.buttonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<h2 class="wp-block-heading">Test Results: llama.cpp</h2>



<p class="wp-block-paragraph">We talked a bit about llama.cpp LLM MoE layer offload to CPU above. But in short, CPU offloading is done to <strong>make sure you can run a big model without the big performance hit caused by memory spill</strong> from VRAM to system RAM.</p>



<p class="wp-block-paragraph">So, if you run a big model on GPU, and the VRAM is smaller than the model, it will still run. But it will run terribly because now the GPU needs to access the data from RAM in which the available VRAM can&#8217;t contain, while the CPU is doing nothing to help the processing.</p>


<div class="wp-block-image">
<figure data-wp-context="{&quot;imageId&quot;:&quot;6a1d48e3b0f6a&quot;}" data-wp-interactive="core/image" data-wp-key="6a1d48e3b0f6a" class="aligncenter size-large wp-lightbox-container"><img loading="lazy" decoding="async" width="1024" height="875" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on--pointerdown="actions.preloadImage" data-wp-on--pointerenter="actions.preloadImageWithDelay" data-wp-on--pointerleave="actions.cancelPreload" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://efisonlt.com/wp-content/uploads/2025/10/Screenshot_20251011_171954-1024x875.png" alt="" class="wp-image-1891" title="" srcset="https://efisonlt.com/wp-content/uploads/2025/10/Screenshot_20251011_171954-1024x875.png 1024w, https://efisonlt.com/wp-content/uploads/2025/10/Screenshot_20251011_171954-300x256.png 300w, https://efisonlt.com/wp-content/uploads/2025/10/Screenshot_20251011_171954-768x656.png 768w, https://efisonlt.com/wp-content/uploads/2025/10/Screenshot_20251011_171954.png 1120w" sizes="(max-width: 1024px) 100vw, 1024px" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			data-wp-bind--aria-label="state.thisImage.triggerButtonAriaLabel"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.thisImage.buttonRight"
			data-wp-style--top="state.thisImage.buttonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button><figcaption class="wp-element-caption">unsloth/gpt-oss-120b-Q4_K_M has the size of 58.45 GB, while the available VRAM is only around 32 GB before substracting other necessary services for the operating system, which is now spilling around 26-27 GB to system RAM.</figcaption></figure>
</div>


<p class="wp-block-paragraph">Now see the difference with CPU offloading.</p>


<div class="wp-block-image">
<figure data-wp-context="{&quot;imageId&quot;:&quot;6a1d48e3b1468&quot;}" data-wp-interactive="core/image" data-wp-key="6a1d48e3b1468" class="aligncenter size-large wp-lightbox-container"><img loading="lazy" decoding="async" width="1024" height="875" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on--pointerdown="actions.preloadImage" data-wp-on--pointerenter="actions.preloadImageWithDelay" data-wp-on--pointerleave="actions.cancelPreload" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://efisonlt.com/wp-content/uploads/2025/10/Screenshot_20251011_172229-1024x875.png" alt="" class="wp-image-1892" srcset="https://efisonlt.com/wp-content/uploads/2025/10/Screenshot_20251011_172229-1024x875.png 1024w, https://efisonlt.com/wp-content/uploads/2025/10/Screenshot_20251011_172229-300x256.png 300w, https://efisonlt.com/wp-content/uploads/2025/10/Screenshot_20251011_172229-768x656.png 768w, https://efisonlt.com/wp-content/uploads/2025/10/Screenshot_20251011_172229.png 1120w" sizes="(max-width: 1024px) 100vw, 1024px" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			data-wp-bind--aria-label="state.thisImage.triggerButtonAriaLabel"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.thisImage.buttonRight"
			data-wp-style--top="state.thisImage.buttonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button><figcaption class="wp-element-caption">With .(4|5|6|7|8|9|[0-9][0-9]|[0-9][0-9][0-9]).ffn_(up|down)_exps. layers being offloaded to CPU and put to the system RAM, now the CPU can help with the processing while preventing the layers loaded to VRAM to spill over.</figcaption></figure>
</div>


<p class="wp-block-paragraph">Probably this article is not the best at explaining how the MoE layers work or the way they&#8217;re offloaded. You can read a little bit more technical stuffs starting from <a href="https://docs.unsloth.ai/new/gpt-oss-how-to-run-and-fine-tune#improving-generation-speed">reading this guide</a>, or maybe a little bit of Google-fu. Sorry.</p>



<p class="wp-block-paragraph">Now, let&#8217;s see the performance difference:</p>



<figure class="wp-block-table"><table><thead><tr><th>(In token/s. Higher is better)</th><th class="has-text-align-right" data-align="right">Without CPU offloading</th><th class="has-text-align-right" data-align="right">With CPU offloading</th></tr></thead><tbody><tr><td><strong>Prompt processing (512 tokens)</strong></td><td class="has-text-align-right" data-align="right">120.48 ± 4.06</td><td class="has-text-align-right" data-align="right"><strong>215.93 ± 1.42</strong></td></tr><tr><td><strong>Text generation (256 tokens)</strong></td><td class="has-text-align-right" data-align="right">11.21 ± 0.02</td><td class="has-text-align-right" data-align="right"><strong>34.97 ± 0.15</strong></td></tr></tbody></table></figure>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>Run output</summary>
<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:flex;align-items:center;padding:10px 0px 10px 16px;margin-bottom:-2px;width:100%;text-align:left;background-color:#2b2b2b;color:#c7c7c7">Bash</span><span role="button" tabindex="0" style="color:#D4D4D4;display:none" aria-label="Copy" class="code-block-pro-copy-button"><pre class="code-block-pro-copy-button-pre" aria-hidden="true"><textarea class="code-block-pro-copy-button-textarea" tabindex="-1" aria-hidden="true" readonly>GGML_VULKAN_DEVICE=0 ./build_mkl-ilp64-icx_vulkan/bin/llama-bench --model ../MoE/unsloth/gpt-oss-120b-Q4_K_M.gguf -ctk q8_0 -ctv q8_0  --threads 8 -ngl 99 -p 512 -n 256 -fa 1 -ub 4096 -b 4096

GGML_VULKAN_DEVICE=0 ./build_mkl-ilp64-icx_vulkan/bin/llama-bench --model ../MoE/unsloth/gpt-oss-120b-Q4_K_M.gguf -ctk q8_0 -ctv q8_0  --threads 8 -ngl 99 -ot "\.(4|5|6|7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down)_exps.=CPU" -p 512 -n 256 -fa 1 -ub 4096 -b 4096
WARNING: radv is not a conformant Vulkan implementation, testing use only.
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon Graphics (RADV GFX1201) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
ggml_vulkan: 1 = Intel(R) UHD Graphics 770 (ADL-S GT1) (Intel open-source Mesa driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 65536 | int dot: 1 | matrix cores: none
| model                          |       size |     params | backend    | threads | n_batch | n_ubatch | type_k | type_v | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: |
| gpt-oss 120B Q4_K - Medium     |  58.45 GiB |   116.83 B | Vulkan,BLAS |       8 |    4096 |     4096 |   q8_0 |   q8_0 |  1 |           pp512 |        120.48 ± 4.06 |
| gpt-oss 120B Q4_K - Medium     |  58.45 GiB |   116.83 B | Vulkan,BLAS |       8 |    4096 |     4096 |   q8_0 |   q8_0 |  1 |           tg256 |         11.21 ± 0.02 |

build: d2ee056e1 (6713)
WARNING: radv is not a conformant Vulkan implementation, testing use only.
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon Graphics (RADV GFX1201) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
ggml_vulkan: 1 = Intel(R) UHD Graphics 770 (ADL-S GT1) (Intel open-source Mesa driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 65536 | int dot: 1 | matrix cores: none
| model                          |       size |     params | backend    | threads | n_batch | n_ubatch | type_k | type_v | fa | ot                    |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------: | -------: | -----: | -----: | -: | --------------------- | --------------: | -------------------: |
| gpt-oss 120B Q4_K - Medium     |  58.45 GiB |   116.83 B | Vulkan,BLAS |       8 |    4096 |     4096 |   q8_0 |   q8_0 |  1 | \.(4|5|6|7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down)_exps.=CPU |           pp512 |        215.93 ± 1.42 |
| gpt-oss 120B Q4_K - Medium     |  58.45 GiB |   116.83 B | Vulkan,BLAS |       8 |    4096 |     4096 |   q8_0 |   q8_0 |  1 | \.(4|5|6|7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down)_exps.=CPU |           tg256 |         34.97 ± 0.15 |

build: d2ee056e1 (6713)</textarea></pre><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki dark-plus" style="background-color: #1E1E1E" tabindex="0"><code><span class="line"><span style="color: #9CDCFE">GGML_VULKAN_DEVICE</span><span style="color: #D4D4D4">=</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> </span><span style="color: #DCDCAA">./build_mkl-ilp64-icx_vulkan/bin/llama-bench</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">--model</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">../MoE/unsloth/gpt-oss-120b-Q4_K_M.gguf</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctk</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctv</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4">  </span><span style="color: #569CD6">--threads</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">8</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ngl</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">99</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-p</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">512</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-n</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">256</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-fa</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ub</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-b</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span></span>
<span class="line"></span>
<span class="line"><span style="color: #9CDCFE">GGML_VULKAN_DEVICE</span><span style="color: #D4D4D4">=</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> </span><span style="color: #DCDCAA">./build_mkl-ilp64-icx_vulkan/bin/llama-bench</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">--model</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">../MoE/unsloth/gpt-oss-120b-Q4_K_M.gguf</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctk</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctv</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4">  </span><span style="color: #569CD6">--threads</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">8</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ngl</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">99</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ot</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">&quot;\.(4|5|6|7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down)_exps.=CPU&quot;</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-p</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">512</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-n</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">256</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-fa</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ub</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-b</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span></span>
<span class="line"><span style="color: #DCDCAA">WARNING:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">radv</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">is</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">not</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">a</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">conformant</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Vulkan</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">implementation,</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">testing</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">use</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">only.</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_vulkan:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Found</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">2</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Vulkan</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">devices:</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_vulkan:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">=</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">AMD</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Radeon</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Graphics</span><span style="color: #D4D4D4"> (RADV </span><span style="color: #CE9178">GFX1201</span><span style="color: #D4D4D4">) (</span><span style="color: #DCDCAA">radv</span><span style="color: #D4D4D4">) | </span><span style="color: #DCDCAA">uma:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fp16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">bf16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">warp</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">size:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">64</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">shared</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">memory:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">65536</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">int</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">dot:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">matrix</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">cores:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">KHR_coopmat</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_vulkan:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">=</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Intel</span><span style="color: #D4D4D4">(</span><span style="color: #DCDCAA">R</span><span style="color: #D4D4D4">) </span><span style="color: #CE9178">UHD</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Graphics</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">770</span><span style="color: #D4D4D4"> (ADL-S </span><span style="color: #CE9178">GT1</span><span style="color: #D4D4D4">) (</span><span style="color: #DCDCAA">Intel</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">open-source</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Mesa</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">driver</span><span style="color: #D4D4D4">) | </span><span style="color: #DCDCAA">uma:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fp16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">bf16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">warp</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">size:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">32</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">shared</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">memory:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">65536</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">int</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">dot:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">matrix</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">cores:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">none</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">model</span><span style="color: #D4D4D4">                          |       </span><span style="color: #DCDCAA">size</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">params</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">backend</span><span style="color: #D4D4D4">    | </span><span style="color: #DCDCAA">threads</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_batch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_ubatch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_k</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_v</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fa</span><span style="color: #D4D4D4"> |            </span><span style="color: #DCDCAA">test</span><span style="color: #D4D4D4"> |                  </span><span style="color: #DCDCAA">t/s</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">------------------------------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">----------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">--------------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------------------:</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">gpt-oss</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">120</span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Q4_K</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Medium</span><span style="color: #D4D4D4">     |  </span><span style="color: #DCDCAA">58.45</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">116.83</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">Vulkan,BLAS</span><span style="color: #D4D4D4"> |       </span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> |           </span><span style="color: #DCDCAA">pp512</span><span style="color: #D4D4D4"> |        </span><span style="color: #DCDCAA">120.48</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4.06</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">gpt-oss</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">120</span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Q4_K</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Medium</span><span style="color: #D4D4D4">     |  </span><span style="color: #DCDCAA">58.45</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">116.83</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">Vulkan,BLAS</span><span style="color: #D4D4D4"> |       </span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> |           </span><span style="color: #DCDCAA">tg256</span><span style="color: #D4D4D4"> |         </span><span style="color: #DCDCAA">11.21</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0.02</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"></span>
<span class="line"><span style="color: #DCDCAA">build:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">d2ee056e1</span><span style="color: #D4D4D4"> (6713)</span></span>
<span class="line"><span style="color: #DCDCAA">WARNING:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">radv</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">is</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">not</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">a</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">conformant</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Vulkan</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">implementation,</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">testing</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">use</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">only.</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_vulkan:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Found</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">2</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Vulkan</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">devices:</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_vulkan:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">=</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">AMD</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Radeon</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Graphics</span><span style="color: #D4D4D4"> (RADV </span><span style="color: #CE9178">GFX1201</span><span style="color: #D4D4D4">) (</span><span style="color: #DCDCAA">radv</span><span style="color: #D4D4D4">) | </span><span style="color: #DCDCAA">uma:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fp16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">bf16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">warp</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">size:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">64</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">shared</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">memory:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">65536</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">int</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">dot:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">matrix</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">cores:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">KHR_coopmat</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_vulkan:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">=</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Intel</span><span style="color: #D4D4D4">(</span><span style="color: #DCDCAA">R</span><span style="color: #D4D4D4">) </span><span style="color: #CE9178">UHD</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Graphics</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">770</span><span style="color: #D4D4D4"> (ADL-S </span><span style="color: #CE9178">GT1</span><span style="color: #D4D4D4">) (</span><span style="color: #DCDCAA">Intel</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">open-source</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Mesa</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">driver</span><span style="color: #D4D4D4">) | </span><span style="color: #DCDCAA">uma:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fp16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">bf16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">warp</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">size:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">32</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">shared</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">memory:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">65536</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">int</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">dot:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">matrix</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">cores:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">none</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">model</span><span style="color: #D4D4D4">                          |       </span><span style="color: #DCDCAA">size</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">params</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">backend</span><span style="color: #D4D4D4">    | </span><span style="color: #DCDCAA">threads</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_batch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_ubatch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_k</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_v</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fa</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">ot</span><span style="color: #D4D4D4">                    |            </span><span style="color: #DCDCAA">test</span><span style="color: #D4D4D4"> |                  </span><span style="color: #DCDCAA">t/s</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">------------------------------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">----------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------------------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">--------------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------------------:</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">gpt-oss</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">120</span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Q4_K</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Medium</span><span style="color: #D4D4D4">     |  </span><span style="color: #DCDCAA">58.45</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">116.83</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">Vulkan,BLAS</span><span style="color: #D4D4D4"> |       </span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">\.(4</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">5</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">6</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">7</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">9</span><span style="color: #D4D4D4">|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;)</span><span style="color: #D7BA7D">\.</span><span style="color: #D4D4D4">ffn_(</span><span style="color: #DCDCAA">up</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">down</span><span style="color: #D4D4D4">)_exps.=CPU |           </span><span style="color: #DCDCAA">pp512</span><span style="color: #D4D4D4"> |        </span><span style="color: #DCDCAA">215.93</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1.42</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">gpt-oss</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">120</span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Q4_K</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Medium</span><span style="color: #D4D4D4">     |  </span><span style="color: #DCDCAA">58.45</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">116.83</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">Vulkan,BLAS</span><span style="color: #D4D4D4"> |       </span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">\.(4</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">5</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">6</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">7</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">9</span><span style="color: #D4D4D4">|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;)</span><span style="color: #D7BA7D">\.</span><span style="color: #D4D4D4">ffn_(</span><span style="color: #DCDCAA">up</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">down</span><span style="color: #D4D4D4">)_exps.=CPU |           </span><span style="color: #DCDCAA">tg256</span><span style="color: #D4D4D4"> |         </span><span style="color: #DCDCAA">34.97</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0.15</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"></span>
<span class="line"><span style="color: #DCDCAA">build:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">d2ee056e1</span><span style="color: #D4D4D4"> (6713)</span></span></code></pre></div>
</details>



<p class="wp-block-paragraph">Let&#8217;s continue with the test results.</p>



<p class="wp-block-paragraph">Here we have different configs of layers for CPU offloading. One is to keep the VRAM usage to under 16 GB, and then maximize the possible 32 GB. The configuration details can be seen below:</p>



<ul class="wp-block-list">
<li>RTX 5070 Ti with just under 16 GB VRAM load, CUDA backend</li>



<li>RTX 5070 Ti with just under 16 GB VRAM load, Vulkan backend</li>



<li>R9700 with just under 16 GB VRAM load, ROCm 7 backend</li>



<li>R9700 with just under 16 GB VRAM load, Vulkan backend</li>



<li>R9700 with just under 32 GB VRAM load, ROCm 7 backend</li>



<li>R9700 with just under 32 GB VRAM load, Vulkan backend</li>
</ul>



<p class="wp-block-paragraph">Here are the results using unsloth/Qwen3-Coder-30B-A3B-Instruct-1M-IQ4_XS:</p>



<figure class="wp-block-table"><table><thead><tr><th>(In token/s. Higher is better)</th><th class="has-text-align-right" data-align="right">5070 Ti, 16G<br>CUDA</th><th class="has-text-align-right" data-align="right">5070 Ti, 16G<br>Vulkan</th><th class="has-text-align-right" data-align="right">R9700, 16G<br>ROCm 7</th><th class="has-text-align-right" data-align="right">R9700, 16G<br>Vulkan</th><th class="has-text-align-right" data-align="right">R9700, 32G<br>ROCm 7</th><th class="has-text-align-right" data-align="right">R9700, 32G<br>Vulkan</th></tr></thead><tbody><tr><td><strong>Prompt processing (512 tokens)</strong></td><td class="has-text-align-right" data-align="right"><strong>3723.33 ± 50.09</strong></td><td class="has-text-align-right" data-align="right">2739.83 ± 30.37</td><td class="has-text-align-right" data-align="right">746.77 ± 3.45</td><td class="has-text-align-right" data-align="right">1236.97 ± 9.19</td><td class="has-text-align-right" data-align="right">797.92 ± 3.61</td><td class="has-text-align-right" data-align="right">1665.47 ± 5.95</td></tr><tr><td><strong>Text generation (256 tokens)</strong></td><td class="has-text-align-right" data-align="right">137.34 ± 0.69</td><td class="has-text-align-right" data-align="right"><strong>138.41 ± 0.79</strong></td><td class="has-text-align-right" data-align="right">88.98 ± 0.16</td><td class="has-text-align-right" data-align="right">105.35 ± 0.75</td><td class="has-text-align-right" data-align="right">100.05 ± 0.08</td><td class="has-text-align-right" data-align="right">122.63 ± 0.52</td></tr></tbody></table></figure>



<p class="wp-block-paragraph">Here are the results using unsloth/gpt-oss-120b-Q4_K_M:</p>



<figure class="wp-block-table"><table><thead><tr><th>(In token/s. Higher is better)</th><th class="has-text-align-right" data-align="right">5070 Ti, 16G<br>CUDA</th><th class="has-text-align-right" data-align="right">5070 Ti, 16G<br>Vulkan</th><th class="has-text-align-right" data-align="right">R9700, 16G<br>ROCm 7</th><th class="has-text-align-right" data-align="right">R9700, 16G<br>Vulkan</th><th class="has-text-align-right" data-align="right">R9700, 32G<br>ROCm 7</th><th class="has-text-align-right" data-align="right">R9700, 32G<br>Vulkan</th></tr></thead><tbody><tr><td><strong>Prompt processing (512 tokens)</strong></td><td class="has-text-align-right" data-align="right"><strong>370.32 ± 4.22</strong></td><td class="has-text-align-right" data-align="right">206.81 ± 3.52</td><td class="has-text-align-right" data-align="right">188.32 ± 4.82</td><td class="has-text-align-right" data-align="right">169.56 ± 2.60</td><td class="has-text-align-right" data-align="right">251.93 ± 6.61</td><td class="has-text-align-right" data-align="right">230.01 ± 2.78</td></tr><tr><td><strong>Text generation (256 tokens)</strong></td><td class="has-text-align-right" data-align="right"><strong>40.24 ± 0.13</strong></td><td class="has-text-align-right" data-align="right">37.90 ± 0.42</td><td class="has-text-align-right" data-align="right">32.59 ± 0.01</td><td class="has-text-align-right" data-align="right">31.49 ± 0.04</td><td class="has-text-align-right" data-align="right">38.73 ± 0.08</td><td class="has-text-align-right" data-align="right">36.22 ± 0.03</td></tr></tbody></table></figure>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>RTX 5070 Ti 16G run output </summary>
<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:flex;align-items:center;padding:10px 0px 10px 16px;margin-bottom:-2px;width:100%;text-align:left;background-color:#2b2b2b;color:#c7c7c7">Bash</span><span role="button" tabindex="0" style="color:#D4D4D4;display:none" aria-label="Copy" class="code-block-pro-copy-button"><pre class="code-block-pro-copy-button-pre" aria-hidden="true"><textarea class="code-block-pro-copy-button-textarea" tabindex="-1" aria-hidden="true" readonly>./build_mkl-ilp64-icx_cuda/bin/llama-bench --model ../MoE/unsloth/Qwen3-Coder-30B-A3B-Instruct-1M-IQ4_XS.gguf -ctk q8_0 -ctv q8_0  --threads 8 -ngl 99 -ot "\.(39|&#91;4-9&#93;&#91;0-9&#93;|&#91;1-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(gate)_exps.=CPU" -p 512 -n 256 -fa 1 -ub 4096 -b 4096

GGML_VULKAN_DEVICE=0 ./build_mkl-ilp64-icx_vulkan/bin/llama-bench --model ../MoE/unsloth/Qwen3-Coder-30B-A3B-Instruct-1M-IQ4_XS.gguf -ctk q8_0 -ctv q8_0  --threads 8 -ngl 99 -ot "\.(39|&#91;4-9&#93;&#91;0-9&#93;|&#91;1-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(gate)_exps.=CPU" -p 512 -n 256 -fa 1 -ub 4096 -b 4096

./build_mkl-ilp64-icx_cuda/bin/llama-bench --model ../MoE/unsloth/gpt-oss-120b-Q4_K_M.gguf -ctk q8_0 -ctv q8_0  --threads 8 -ngl 99 -ot "\.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down|gate)_exps.=CPU" -p 512 -n 256 -fa 1 -ub 4096 -b 4096

GGML_VULKAN_DEVICE=0 ./build_mkl-ilp64-icx_vulkan/bin/llama-bench --model ../MoE/unsloth/gpt-oss-120b-Q4_K_M.gguf -ctk q8_0 -ctv q8_0  --threads 8 -ngl 99 -ot "\.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down|gate)_exps.=CPU" -p 512 -n 256 -fa 1 -ub 4096 -b 4096
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 5070 Ti, compute capability 12.0, VMM: yes
| model                          |       size |     params | backend    | threads | n_batch | n_ubatch | type_k | type_v | fa | ot                    |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------: | -------: | -----: | -----: | -: | --------------------- | --------------: | -------------------: |
| qwen3moe 30B.A3B IQ4_XS - 4.25 bpw |  15.25 GiB |    30.53 B | CUDA,BLAS  |       8 |    4096 |     4096 |   q8_0 |   q8_0 |  1 | \.(39|&#91;4-9&#93;&#91;0-9&#93;|&#91;1-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(gate)_exps.=CPU |           pp512 |      3723.33 ± 50.09 |
| qwen3moe 30B.A3B IQ4_XS - 4.25 bpw |  15.25 GiB |    30.53 B | CUDA,BLAS  |       8 |    4096 |     4096 |   q8_0 |   q8_0 |  1 | \.(39|&#91;4-9&#93;&#91;0-9&#93;|&#91;1-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(gate)_exps.=CPU |           tg256 |        137.34 ± 0.69 |

build: d2ee056e1 (6713)
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 5070 Ti (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
ggml_vulkan: 1 = Intel(R) UHD Graphics 770 (ADL-S GT1) (Intel open-source Mesa driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 65536 | int dot: 1 | matrix cores: none
| model                          |       size |     params | backend    | threads | n_batch | n_ubatch | type_k | type_v | fa | ot                    |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------: | -------: | -----: | -----: | -: | --------------------- | --------------: | -------------------: |
| qwen3moe 30B.A3B IQ4_XS - 4.25 bpw |  15.25 GiB |    30.53 B | Vulkan,BLAS |       8 |    4096 |     4096 |   q8_0 |   q8_0 |  1 | \.(39|&#91;4-9&#93;&#91;0-9&#93;|&#91;1-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(gate)_exps.=CPU |           pp512 |      2739.83 ± 30.37 |
| qwen3moe 30B.A3B IQ4_XS - 4.25 bpw |  15.25 GiB |    30.53 B | Vulkan,BLAS |       8 |    4096 |     4096 |   q8_0 |   q8_0 |  1 | \.(39|&#91;4-9&#93;&#91;0-9&#93;|&#91;1-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(gate)_exps.=CPU |           tg256 |        138.41 ± 0.79 |

build: d2ee056e1 (6713)
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 5070 Ti, compute capability 12.0, VMM: yes
| model                          |       size |     params | backend    | threads | n_batch | n_ubatch | type_k | type_v | fa | ot                    |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------: | -------: | -----: | -----: | -: | --------------------- | --------------: | -------------------: |
| gpt-oss 120B Q4_K - Medium     |  58.45 GiB |   116.83 B | CUDA,BLAS  |       8 |    4096 |     4096 |   q8_0 |   q8_0 |  1 | \.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down|gate)_exps.=CPU |           pp512 |        370.32 ± 4.22 |
| gpt-oss 120B Q4_K - Medium     |  58.45 GiB |   116.83 B | CUDA,BLAS  |       8 |    4096 |     4096 |   q8_0 |   q8_0 |  1 | \.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down|gate)_exps.=CPU |           tg256 |         40.24 ± 0.13 |

build: d2ee056e1 (6713)
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 5070 Ti (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
ggml_vulkan: 1 = Intel(R) UHD Graphics 770 (ADL-S GT1) (Intel open-source Mesa driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 65536 | int dot: 1 | matrix cores: none
| model                          |       size |     params | backend    | threads | n_batch | n_ubatch | type_k | type_v | fa | ot                    |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------: | -------: | -----: | -----: | -: | --------------------- | --------------: | -------------------: |
| gpt-oss 120B Q4_K - Medium     |  58.45 GiB |   116.83 B | Vulkan,BLAS |       8 |    4096 |     4096 |   q8_0 |   q8_0 |  1 | \.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down|gate)_exps.=CPU |           pp512 |        206.81 ± 3.52 |
| gpt-oss 120B Q4_K - Medium     |  58.45 GiB |   116.83 B | Vulkan,BLAS |       8 |    4096 |     4096 |   q8_0 |   q8_0 |  1 | \.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down|gate)_exps.=CPU |           tg256 |         37.90 ± 0.42 |

build: d2ee056e1 (6713)
</textarea></pre><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki dark-plus" style="background-color: #1E1E1E" tabindex="0"><code><span class="line"><span style="color: #DCDCAA">./build_mkl-ilp64-icx_cuda/bin/llama-bench</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">--model</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">../MoE/unsloth/Qwen3-Coder-30B-A3B-Instruct-1M-IQ4_XS.gguf</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctk</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctv</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4">  </span><span style="color: #569CD6">--threads</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">8</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ngl</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">99</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ot</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">&quot;\.(39|&#91;4-9&#93;&#91;0-9&#93;|&#91;1-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(gate)_exps.=CPU&quot;</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-p</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">512</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-n</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">256</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-fa</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ub</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-b</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span></span>
<span class="line"></span>
<span class="line"><span style="color: #9CDCFE">GGML_VULKAN_DEVICE</span><span style="color: #D4D4D4">=</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> </span><span style="color: #DCDCAA">./build_mkl-ilp64-icx_vulkan/bin/llama-bench</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">--model</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">../MoE/unsloth/Qwen3-Coder-30B-A3B-Instruct-1M-IQ4_XS.gguf</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctk</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctv</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4">  </span><span style="color: #569CD6">--threads</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">8</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ngl</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">99</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ot</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">&quot;\.(39|&#91;4-9&#93;&#91;0-9&#93;|&#91;1-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(gate)_exps.=CPU&quot;</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-p</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">512</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-n</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">256</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-fa</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ub</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-b</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span></span>
<span class="line"></span>
<span class="line"><span style="color: #DCDCAA">./build_mkl-ilp64-icx_cuda/bin/llama-bench</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">--model</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">../MoE/unsloth/gpt-oss-120b-Q4_K_M.gguf</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctk</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctv</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4">  </span><span style="color: #569CD6">--threads</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">8</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ngl</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">99</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ot</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">&quot;\.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down|gate)_exps.=CPU&quot;</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-p</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">512</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-n</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">256</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-fa</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ub</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-b</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span></span>
<span class="line"></span>
<span class="line"><span style="color: #9CDCFE">GGML_VULKAN_DEVICE</span><span style="color: #D4D4D4">=</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> </span><span style="color: #DCDCAA">./build_mkl-ilp64-icx_vulkan/bin/llama-bench</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">--model</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">../MoE/unsloth/gpt-oss-120b-Q4_K_M.gguf</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctk</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctv</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4">  </span><span style="color: #569CD6">--threads</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">8</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ngl</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">99</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ot</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">&quot;\.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down|gate)_exps.=CPU&quot;</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-p</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">512</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-n</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">256</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-fa</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ub</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-b</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_cuda_init:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GGML_CUDA_FORCE_MMQ:</span><span style="color: #D4D4D4">    </span><span style="color: #CE9178">no</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_cuda_init:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GGML_CUDA_FORCE_CUBLAS:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">no</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_cuda_init:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">found</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">CUDA</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">devices:</span></span>
<span class="line"><span style="color: #D4D4D4">  </span><span style="color: #DCDCAA">Device</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #CE9178">:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">NVIDIA</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GeForce</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">RTX</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">5070</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Ti,</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">compute</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">capability</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">12.0</span><span style="color: #CE9178">,</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">VMM:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">yes</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">model</span><span style="color: #D4D4D4">                          |       </span><span style="color: #DCDCAA">size</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">params</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">backend</span><span style="color: #D4D4D4">    | </span><span style="color: #DCDCAA">threads</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_batch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_ubatch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_k</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_v</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fa</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">ot</span><span style="color: #D4D4D4">                    |            </span><span style="color: #DCDCAA">test</span><span style="color: #D4D4D4"> |                  </span><span style="color: #DCDCAA">t/s</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">------------------------------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">----------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------------------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">--------------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------------------:</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">qwen3moe</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">30</span><span style="color: #CE9178">B.A3B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">IQ4_XS</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4.25</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">bpw</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">15.25</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">30.53</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">CUDA,BLAS</span><span style="color: #D4D4D4">  |       </span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">\.(39</span><span style="color: #D4D4D4">|&#91;</span><span style="color: #B5CEA8">4</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;|&#91;</span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;)</span><span style="color: #D7BA7D">\.</span><span style="color: #D4D4D4">ffn_(</span><span style="color: #DCDCAA">gate</span><span style="color: #D4D4D4">)_exps.=CPU |           </span><span style="color: #DCDCAA">pp512</span><span style="color: #D4D4D4"> |      </span><span style="color: #DCDCAA">3723.33</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">50.09</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">qwen3moe</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">30</span><span style="color: #CE9178">B.A3B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">IQ4_XS</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4.25</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">bpw</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">15.25</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">30.53</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">CUDA,BLAS</span><span style="color: #D4D4D4">  |       </span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">\.(39</span><span style="color: #D4D4D4">|&#91;</span><span style="color: #B5CEA8">4</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;|&#91;</span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;)</span><span style="color: #D7BA7D">\.</span><span style="color: #D4D4D4">ffn_(</span><span style="color: #DCDCAA">gate</span><span style="color: #D4D4D4">)_exps.=CPU |           </span><span style="color: #DCDCAA">tg256</span><span style="color: #D4D4D4"> |        </span><span style="color: #DCDCAA">137.34</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0.69</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"></span>
<span class="line"><span style="color: #DCDCAA">build:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">d2ee056e1</span><span style="color: #D4D4D4"> (6713)</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_vulkan:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Found</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">2</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Vulkan</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">devices:</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_vulkan:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">=</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">NVIDIA</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GeForce</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">RTX</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">5070</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Ti</span><span style="color: #D4D4D4"> (NVIDIA) | </span><span style="color: #DCDCAA">uma:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fp16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">bf16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">warp</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">size:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">32</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">shared</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">memory:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">49152</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">int</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">dot:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">matrix</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">cores:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">NV_coopmat2</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_vulkan:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">=</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Intel</span><span style="color: #D4D4D4">(</span><span style="color: #DCDCAA">R</span><span style="color: #D4D4D4">) </span><span style="color: #CE9178">UHD</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Graphics</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">770</span><span style="color: #D4D4D4"> (ADL-S </span><span style="color: #CE9178">GT1</span><span style="color: #D4D4D4">) (</span><span style="color: #DCDCAA">Intel</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">open-source</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Mesa</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">driver</span><span style="color: #D4D4D4">) | </span><span style="color: #DCDCAA">uma:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fp16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">bf16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">warp</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">size:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">32</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">shared</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">memory:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">65536</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">int</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">dot:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">matrix</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">cores:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">none</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">model</span><span style="color: #D4D4D4">                          |       </span><span style="color: #DCDCAA">size</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">params</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">backend</span><span style="color: #D4D4D4">    | </span><span style="color: #DCDCAA">threads</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_batch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_ubatch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_k</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_v</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fa</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">ot</span><span style="color: #D4D4D4">                    |            </span><span style="color: #DCDCAA">test</span><span style="color: #D4D4D4"> |                  </span><span style="color: #DCDCAA">t/s</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">------------------------------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">----------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------------------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">--------------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------------------:</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">qwen3moe</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">30</span><span style="color: #CE9178">B.A3B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">IQ4_XS</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4.25</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">bpw</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">15.25</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">30.53</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">Vulkan,BLAS</span><span style="color: #D4D4D4"> |       </span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">\.(39</span><span style="color: #D4D4D4">|&#91;</span><span style="color: #B5CEA8">4</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;|&#91;</span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;)</span><span style="color: #D7BA7D">\.</span><span style="color: #D4D4D4">ffn_(</span><span style="color: #DCDCAA">gate</span><span style="color: #D4D4D4">)_exps.=CPU |           </span><span style="color: #DCDCAA">pp512</span><span style="color: #D4D4D4"> |      </span><span style="color: #DCDCAA">2739.83</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">30.37</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">qwen3moe</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">30</span><span style="color: #CE9178">B.A3B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">IQ4_XS</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4.25</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">bpw</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">15.25</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">30.53</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">Vulkan,BLAS</span><span style="color: #D4D4D4"> |       </span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">\.(39</span><span style="color: #D4D4D4">|&#91;</span><span style="color: #B5CEA8">4</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;|&#91;</span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;)</span><span style="color: #D7BA7D">\.</span><span style="color: #D4D4D4">ffn_(</span><span style="color: #DCDCAA">gate</span><span style="color: #D4D4D4">)_exps.=CPU |           </span><span style="color: #DCDCAA">tg256</span><span style="color: #D4D4D4"> |        </span><span style="color: #DCDCAA">138.41</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0.79</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"></span>
<span class="line"><span style="color: #DCDCAA">build:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">d2ee056e1</span><span style="color: #D4D4D4"> (6713)</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_cuda_init:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GGML_CUDA_FORCE_MMQ:</span><span style="color: #D4D4D4">    </span><span style="color: #CE9178">no</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_cuda_init:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GGML_CUDA_FORCE_CUBLAS:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">no</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_cuda_init:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">found</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">CUDA</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">devices:</span></span>
<span class="line"><span style="color: #D4D4D4">  </span><span style="color: #DCDCAA">Device</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #CE9178">:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">NVIDIA</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GeForce</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">RTX</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">5070</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Ti,</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">compute</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">capability</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">12.0</span><span style="color: #CE9178">,</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">VMM:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">yes</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">model</span><span style="color: #D4D4D4">                          |       </span><span style="color: #DCDCAA">size</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">params</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">backend</span><span style="color: #D4D4D4">    | </span><span style="color: #DCDCAA">threads</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_batch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_ubatch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_k</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_v</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fa</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">ot</span><span style="color: #D4D4D4">                    |            </span><span style="color: #DCDCAA">test</span><span style="color: #D4D4D4"> |                  </span><span style="color: #DCDCAA">t/s</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">------------------------------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">----------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------------------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">--------------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------------------:</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">gpt-oss</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">120</span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Q4_K</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Medium</span><span style="color: #D4D4D4">     |  </span><span style="color: #DCDCAA">58.45</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">116.83</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">CUDA,BLAS</span><span style="color: #D4D4D4">  |       </span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">\.(7</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">9</span><span style="color: #D4D4D4">|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;)</span><span style="color: #D7BA7D">\.</span><span style="color: #D4D4D4">ffn_(</span><span style="color: #DCDCAA">up</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">down</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">gate</span><span style="color: #D4D4D4">)_exps.=CPU |           </span><span style="color: #DCDCAA">pp512</span><span style="color: #D4D4D4"> |        </span><span style="color: #DCDCAA">370.32</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4.22</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">gpt-oss</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">120</span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Q4_K</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Medium</span><span style="color: #D4D4D4">     |  </span><span style="color: #DCDCAA">58.45</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">116.83</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">CUDA,BLAS</span><span style="color: #D4D4D4">  |       </span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">\.(7</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">9</span><span style="color: #D4D4D4">|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;)</span><span style="color: #D7BA7D">\.</span><span style="color: #D4D4D4">ffn_(</span><span style="color: #DCDCAA">up</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">down</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">gate</span><span style="color: #D4D4D4">)_exps.=CPU |           </span><span style="color: #DCDCAA">tg256</span><span style="color: #D4D4D4"> |         </span><span style="color: #DCDCAA">40.24</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0.13</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"></span>
<span class="line"><span style="color: #DCDCAA">build:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">d2ee056e1</span><span style="color: #D4D4D4"> (6713)</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_vulkan:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Found</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">2</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Vulkan</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">devices:</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_vulkan:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">=</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">NVIDIA</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GeForce</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">RTX</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">5070</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Ti</span><span style="color: #D4D4D4"> (NVIDIA) | </span><span style="color: #DCDCAA">uma:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fp16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">bf16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">warp</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">size:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">32</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">shared</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">memory:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">49152</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">int</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">dot:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">matrix</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">cores:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">NV_coopmat2</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_vulkan:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">=</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Intel</span><span style="color: #D4D4D4">(</span><span style="color: #DCDCAA">R</span><span style="color: #D4D4D4">) </span><span style="color: #CE9178">UHD</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Graphics</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">770</span><span style="color: #D4D4D4"> (ADL-S </span><span style="color: #CE9178">GT1</span><span style="color: #D4D4D4">) (</span><span style="color: #DCDCAA">Intel</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">open-source</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Mesa</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">driver</span><span style="color: #D4D4D4">) | </span><span style="color: #DCDCAA">uma:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fp16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">bf16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">warp</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">size:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">32</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">shared</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">memory:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">65536</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">int</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">dot:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">matrix</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">cores:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">none</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">model</span><span style="color: #D4D4D4">                          |       </span><span style="color: #DCDCAA">size</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">params</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">backend</span><span style="color: #D4D4D4">    | </span><span style="color: #DCDCAA">threads</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_batch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_ubatch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_k</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_v</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fa</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">ot</span><span style="color: #D4D4D4">                    |            </span><span style="color: #DCDCAA">test</span><span style="color: #D4D4D4"> |                  </span><span style="color: #DCDCAA">t/s</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">------------------------------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">----------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------------------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">--------------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------------------:</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">gpt-oss</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">120</span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Q4_K</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Medium</span><span style="color: #D4D4D4">     |  </span><span style="color: #DCDCAA">58.45</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">116.83</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">Vulkan,BLAS</span><span style="color: #D4D4D4"> |       </span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">\.(7</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">9</span><span style="color: #D4D4D4">|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;)</span><span style="color: #D7BA7D">\.</span><span style="color: #D4D4D4">ffn_(</span><span style="color: #DCDCAA">up</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">down</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">gate</span><span style="color: #D4D4D4">)_exps.=CPU |           </span><span style="color: #DCDCAA">pp512</span><span style="color: #D4D4D4"> |        </span><span style="color: #DCDCAA">206.81</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">3.52</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">gpt-oss</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">120</span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Q4_K</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Medium</span><span style="color: #D4D4D4">     |  </span><span style="color: #DCDCAA">58.45</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">116.83</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">Vulkan,BLAS</span><span style="color: #D4D4D4"> |       </span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">\.(7</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">9</span><span style="color: #D4D4D4">|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;)</span><span style="color: #D7BA7D">\.</span><span style="color: #D4D4D4">ffn_(</span><span style="color: #DCDCAA">up</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">down</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">gate</span><span style="color: #D4D4D4">)_exps.=CPU |           </span><span style="color: #DCDCAA">tg256</span><span style="color: #D4D4D4"> |         </span><span style="color: #DCDCAA">37.90</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0.42</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"></span>
<span class="line"><span style="color: #DCDCAA">build:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">d2ee056e1</span><span style="color: #D4D4D4"> (6713)</span></span>
<span class="line"></span></code></pre></div>
</details>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>R9700 16G run output </summary>
<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:flex;align-items:center;padding:10px 0px 10px 16px;margin-bottom:-2px;width:100%;text-align:left;background-color:#2b2b2b;color:#c7c7c7">Bash</span><span role="button" tabindex="0" style="color:#D4D4D4;display:none" aria-label="Copy" class="code-block-pro-copy-button"><pre class="code-block-pro-copy-button-pre" aria-hidden="true"><textarea class="code-block-pro-copy-button-textarea" tabindex="-1" aria-hidden="true" readonly>./build_mkl-ilp64-icx_rocm/bin/llama-bench --model ../MoE/unsloth/Qwen3-Coder-30B-A3B-Instruct-1M-IQ4_XS.gguf -ctk q8_0 -ctv q8_0  --threads 8 -ngl 99 -ot "\.(39|&#91;4-9&#93;&#91;0-9&#93;|&#91;1-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(gate)_exps.=CPU" -p 512 -n 256 -fa 1 -ub 4096 -b 4096

GGML_VULKAN_DEVICE=0 ./build_mkl-ilp64-icx_vulkan/bin/llama-bench --model ../MoE/unsloth/Qwen3-Coder-30B-A3B-Instruct-1M-IQ4_XS.gguf -ctk q8_0 -ctv q8_0  --threads 8 -ngl 99 -ot "\.(39|&#91;4-9&#93;&#91;0-9&#93;|&#91;1-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(gate)_exps.=CPU" -p 512 -n 256 -fa 1 -ub 4096 -b 4096

./build_mkl-ilp64-icx_rocm/bin/llama-bench --model ../MoE/unsloth/gpt-oss-120b-Q4_K_M.gguf -ctk q8_0 -ctv q8_0  --threads 8 -ngl 99 -ot "\.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down|gate)_exps.=CPU" -p 512 -n 256 -fa 1 -ub 4096 -b 4096

GGML_VULKAN_DEVICE=0 ./build_mkl-ilp64-icx_vulkan/bin/llama-bench --model ../MoE/unsloth/gpt-oss-120b-Q4_K_M.gguf -ctk q8_0 -ctv q8_0  --threads 8 -ngl 99 -ot "\.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down|gate)_exps.=CPU" -p 512 -n 256 -fa 1 -ub 4096 -b 4096
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon AI PRO R9700, gfx1201 (0x1201), VMM: no, Wave Size: 32
| model                          |       size |     params | backend    | threads | n_batch | n_ubatch | type_k | type_v | fa | ot                    |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------: | -------: | -----: | -----: | -: | --------------------- | --------------: | -------------------: |
| qwen3moe 30B.A3B IQ4_XS - 4.25 bpw |  15.25 GiB |    30.53 B | ROCm,BLAS  |       8 |    4096 |     4096 |   q8_0 |   q8_0 |  1 | \.(39|&#91;4-9&#93;&#91;0-9&#93;|&#91;1-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(gate)_exps.=CPU |           pp512 |        746.77 ± 3.45 |
| qwen3moe 30B.A3B IQ4_XS - 4.25 bpw |  15.25 GiB |    30.53 B | ROCm,BLAS  |       8 |    4096 |     4096 |   q8_0 |   q8_0 |  1 | \.(39|&#91;4-9&#93;&#91;0-9&#93;|&#91;1-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(gate)_exps.=CPU |           tg256 |         88.98 ± 0.16 |

build: d2ee056e1 (6713)
WARNING: radv is not a conformant Vulkan implementation, testing use only.
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon AI PRO R9700 (RADV GFX1201) (radv) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
ggml_vulkan: 1 = Intel(R) UHD Graphics 770 (ADL-S GT1) (Intel open-source Mesa driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 65536 | int dot: 1 | matrix cores: none
| model                          |       size |     params | backend    | threads | n_batch | n_ubatch | type_k | type_v | fa | ot                    |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------: | -------: | -----: | -----: | -: | --------------------- | --------------: | -------------------: |
| qwen3moe 30B.A3B IQ4_XS - 4.25 bpw |  15.25 GiB |    30.53 B | Vulkan,BLAS |       8 |    4096 |     4096 |   q8_0 |   q8_0 |  1 | \.(39|&#91;4-9&#93;&#91;0-9&#93;|&#91;1-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(gate)_exps.=CPU |           pp512 |       1236.97 ± 9.19 |
| qwen3moe 30B.A3B IQ4_XS - 4.25 bpw |  15.25 GiB |    30.53 B | Vulkan,BLAS |       8 |    4096 |     4096 |   q8_0 |   q8_0 |  1 | \.(39|&#91;4-9&#93;&#91;0-9&#93;|&#91;1-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(gate)_exps.=CPU |           tg256 |        105.35 ± 0.75 |

build: d2ee056e1 (6713)
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon AI PRO R9700, gfx1201 (0x1201), VMM: no, Wave Size: 32
| model                          |       size |     params | backend    | threads | n_batch | n_ubatch | type_k | type_v | fa | ot                    |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------: | -------: | -----: | -----: | -: | --------------------- | --------------: | -------------------: |
| gpt-oss 120B Q4_K - Medium     |  58.45 GiB |   116.83 B | ROCm,BLAS  |       8 |    4096 |     4096 |   q8_0 |   q8_0 |  1 | \.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down|gate)_exps.=CPU |           pp512 |        188.32 ± 4.82 |
| gpt-oss 120B Q4_K - Medium     |  58.45 GiB |   116.83 B | ROCm,BLAS  |       8 |    4096 |     4096 |   q8_0 |   q8_0 |  1 | \.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down|gate)_exps.=CPU |           tg256 |         32.59 ± 0.01 |

build: d2ee056e1 (6713)
WARNING: radv is not a conformant Vulkan implementation, testing use only.
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon AI PRO R9700 (RADV GFX1201) (radv) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
ggml_vulkan: 1 = Intel(R) UHD Graphics 770 (ADL-S GT1) (Intel open-source Mesa driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 65536 | int dot: 1 | matrix cores: none
| model                          |       size |     params | backend    | threads | n_batch | n_ubatch | type_k | type_v | fa | ot                    |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------: | -------: | -----: | -----: | -: | --------------------- | --------------: | -------------------: |
| gpt-oss 120B Q4_K - Medium     |  58.45 GiB |   116.83 B | Vulkan,BLAS |       8 |    4096 |     4096 |   q8_0 |   q8_0 |  1 | \.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down|gate)_exps.=CPU |           pp512 |        169.56 ± 2.60 |
| gpt-oss 120B Q4_K - Medium     |  58.45 GiB |   116.83 B | Vulkan,BLAS |       8 |    4096 |     4096 |   q8_0 |   q8_0 |  1 | \.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down|gate)_exps.=CPU |           tg256 |         31.49 ± 0.04 |

build: d2ee056e1 (6713)
</textarea></pre><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki dark-plus" style="background-color: #1E1E1E" tabindex="0"><code><span class="line"><span style="color: #DCDCAA">./build_mkl-ilp64-icx_rocm/bin/llama-bench</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">--model</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">../MoE/unsloth/Qwen3-Coder-30B-A3B-Instruct-1M-IQ4_XS.gguf</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctk</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctv</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4">  </span><span style="color: #569CD6">--threads</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">8</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ngl</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">99</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ot</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">&quot;\.(39|&#91;4-9&#93;&#91;0-9&#93;|&#91;1-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(gate)_exps.=CPU&quot;</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-p</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">512</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-n</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">256</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-fa</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ub</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-b</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span></span>
<span class="line"></span>
<span class="line"><span style="color: #9CDCFE">GGML_VULKAN_DEVICE</span><span style="color: #D4D4D4">=</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> </span><span style="color: #DCDCAA">./build_mkl-ilp64-icx_vulkan/bin/llama-bench</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">--model</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">../MoE/unsloth/Qwen3-Coder-30B-A3B-Instruct-1M-IQ4_XS.gguf</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctk</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctv</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4">  </span><span style="color: #569CD6">--threads</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">8</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ngl</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">99</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ot</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">&quot;\.(39|&#91;4-9&#93;&#91;0-9&#93;|&#91;1-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(gate)_exps.=CPU&quot;</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-p</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">512</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-n</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">256</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-fa</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ub</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-b</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span></span>
<span class="line"></span>
<span class="line"><span style="color: #DCDCAA">./build_mkl-ilp64-icx_rocm/bin/llama-bench</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">--model</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">../MoE/unsloth/gpt-oss-120b-Q4_K_M.gguf</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctk</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctv</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4">  </span><span style="color: #569CD6">--threads</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">8</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ngl</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">99</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ot</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">&quot;\.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down|gate)_exps.=CPU&quot;</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-p</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">512</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-n</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">256</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-fa</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ub</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-b</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span></span>
<span class="line"></span>
<span class="line"><span style="color: #9CDCFE">GGML_VULKAN_DEVICE</span><span style="color: #D4D4D4">=</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> </span><span style="color: #DCDCAA">./build_mkl-ilp64-icx_vulkan/bin/llama-bench</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">--model</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">../MoE/unsloth/gpt-oss-120b-Q4_K_M.gguf</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctk</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctv</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4">  </span><span style="color: #569CD6">--threads</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">8</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ngl</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">99</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ot</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">&quot;\.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down|gate)_exps.=CPU&quot;</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-p</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">512</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-n</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">256</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-fa</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ub</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-b</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_cuda_init:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GGML_CUDA_FORCE_MMQ:</span><span style="color: #D4D4D4">    </span><span style="color: #CE9178">no</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_cuda_init:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GGML_CUDA_FORCE_CUBLAS:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">no</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_cuda_init:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">found</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">ROCm</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">devices:</span></span>
<span class="line"><span style="color: #D4D4D4">  </span><span style="color: #DCDCAA">Device</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #CE9178">:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">AMD</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Radeon</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">AI</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">PRO</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">R9700,</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">gfx1201</span><span style="color: #D4D4D4"> (0x1201), VMM: no, Wave Size: 32</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">model</span><span style="color: #D4D4D4">                          |       </span><span style="color: #DCDCAA">size</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">params</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">backend</span><span style="color: #D4D4D4">    | </span><span style="color: #DCDCAA">threads</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_batch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_ubatch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_k</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_v</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fa</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">ot</span><span style="color: #D4D4D4">                    |            </span><span style="color: #DCDCAA">test</span><span style="color: #D4D4D4"> |                  </span><span style="color: #DCDCAA">t/s</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">------------------------------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">----------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------------------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">--------------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------------------:</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">qwen3moe</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">30</span><span style="color: #CE9178">B.A3B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">IQ4_XS</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4.25</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">bpw</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">15.25</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">30.53</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">ROCm,BLAS</span><span style="color: #D4D4D4">  |       </span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">\.(39</span><span style="color: #D4D4D4">|&#91;</span><span style="color: #B5CEA8">4</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;|&#91;</span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;)</span><span style="color: #D7BA7D">\.</span><span style="color: #D4D4D4">ffn_(</span><span style="color: #DCDCAA">gate</span><span style="color: #D4D4D4">)_exps.=CPU |           </span><span style="color: #DCDCAA">pp512</span><span style="color: #D4D4D4"> |        </span><span style="color: #DCDCAA">746.77</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">3.45</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">qwen3moe</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">30</span><span style="color: #CE9178">B.A3B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">IQ4_XS</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4.25</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">bpw</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">15.25</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">30.53</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">ROCm,BLAS</span><span style="color: #D4D4D4">  |       </span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">\.(39</span><span style="color: #D4D4D4">|&#91;</span><span style="color: #B5CEA8">4</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;|&#91;</span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;)</span><span style="color: #D7BA7D">\.</span><span style="color: #D4D4D4">ffn_(</span><span style="color: #DCDCAA">gate</span><span style="color: #D4D4D4">)_exps.=CPU |           </span><span style="color: #DCDCAA">tg256</span><span style="color: #D4D4D4"> |         </span><span style="color: #DCDCAA">88.98</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0.16</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"></span>
<span class="line"><span style="color: #DCDCAA">build:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">d2ee056e1</span><span style="color: #D4D4D4"> (6713)</span></span>
<span class="line"><span style="color: #DCDCAA">WARNING:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">radv</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">is</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">not</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">a</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">conformant</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Vulkan</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">implementation,</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">testing</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">use</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">only.</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_vulkan:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Found</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">2</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Vulkan</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">devices:</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_vulkan:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">=</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">AMD</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Radeon</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">AI</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">PRO</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">R9700</span><span style="color: #D4D4D4"> (RADV </span><span style="color: #CE9178">GFX1201</span><span style="color: #D4D4D4">) (</span><span style="color: #DCDCAA">radv</span><span style="color: #D4D4D4">) | </span><span style="color: #DCDCAA">uma:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fp16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">bf16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">warp</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">size:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">64</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">shared</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">memory:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">65536</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">int</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">dot:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">matrix</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">cores:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">KHR_coopmat</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_vulkan:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">=</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Intel</span><span style="color: #D4D4D4">(</span><span style="color: #DCDCAA">R</span><span style="color: #D4D4D4">) </span><span style="color: #CE9178">UHD</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Graphics</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">770</span><span style="color: #D4D4D4"> (ADL-S </span><span style="color: #CE9178">GT1</span><span style="color: #D4D4D4">) (</span><span style="color: #DCDCAA">Intel</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">open-source</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Mesa</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">driver</span><span style="color: #D4D4D4">) | </span><span style="color: #DCDCAA">uma:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fp16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">bf16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">warp</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">size:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">32</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">shared</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">memory:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">65536</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">int</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">dot:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">matrix</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">cores:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">none</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">model</span><span style="color: #D4D4D4">                          |       </span><span style="color: #DCDCAA">size</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">params</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">backend</span><span style="color: #D4D4D4">    | </span><span style="color: #DCDCAA">threads</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_batch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_ubatch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_k</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_v</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fa</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">ot</span><span style="color: #D4D4D4">                    |            </span><span style="color: #DCDCAA">test</span><span style="color: #D4D4D4"> |                  </span><span style="color: #DCDCAA">t/s</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">------------------------------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">----------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------------------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">--------------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------------------:</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">qwen3moe</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">30</span><span style="color: #CE9178">B.A3B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">IQ4_XS</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4.25</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">bpw</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">15.25</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">30.53</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">Vulkan,BLAS</span><span style="color: #D4D4D4"> |       </span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">\.(39</span><span style="color: #D4D4D4">|&#91;</span><span style="color: #B5CEA8">4</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;|&#91;</span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;)</span><span style="color: #D7BA7D">\.</span><span style="color: #D4D4D4">ffn_(</span><span style="color: #DCDCAA">gate</span><span style="color: #D4D4D4">)_exps.=CPU |           </span><span style="color: #DCDCAA">pp512</span><span style="color: #D4D4D4"> |       </span><span style="color: #DCDCAA">1236.97</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">9.19</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">qwen3moe</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">30</span><span style="color: #CE9178">B.A3B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">IQ4_XS</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4.25</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">bpw</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">15.25</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">30.53</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">Vulkan,BLAS</span><span style="color: #D4D4D4"> |       </span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">\.(39</span><span style="color: #D4D4D4">|&#91;</span><span style="color: #B5CEA8">4</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;|&#91;</span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;)</span><span style="color: #D7BA7D">\.</span><span style="color: #D4D4D4">ffn_(</span><span style="color: #DCDCAA">gate</span><span style="color: #D4D4D4">)_exps.=CPU |           </span><span style="color: #DCDCAA">tg256</span><span style="color: #D4D4D4"> |        </span><span style="color: #DCDCAA">105.35</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0.75</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"></span>
<span class="line"><span style="color: #DCDCAA">build:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">d2ee056e1</span><span style="color: #D4D4D4"> (6713)</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_cuda_init:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GGML_CUDA_FORCE_MMQ:</span><span style="color: #D4D4D4">    </span><span style="color: #CE9178">no</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_cuda_init:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GGML_CUDA_FORCE_CUBLAS:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">no</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_cuda_init:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">found</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">ROCm</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">devices:</span></span>
<span class="line"><span style="color: #D4D4D4">  </span><span style="color: #DCDCAA">Device</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #CE9178">:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">AMD</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Radeon</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">AI</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">PRO</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">R9700,</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">gfx1201</span><span style="color: #D4D4D4"> (0x1201), VMM: no, Wave Size: 32</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">model</span><span style="color: #D4D4D4">                          |       </span><span style="color: #DCDCAA">size</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">params</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">backend</span><span style="color: #D4D4D4">    | </span><span style="color: #DCDCAA">threads</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_batch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_ubatch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_k</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_v</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fa</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">ot</span><span style="color: #D4D4D4">                    |            </span><span style="color: #DCDCAA">test</span><span style="color: #D4D4D4"> |                  </span><span style="color: #DCDCAA">t/s</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">------------------------------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">----------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------------------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">--------------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------------------:</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">gpt-oss</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">120</span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Q4_K</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Medium</span><span style="color: #D4D4D4">     |  </span><span style="color: #DCDCAA">58.45</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">116.83</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">ROCm,BLAS</span><span style="color: #D4D4D4">  |       </span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">\.(7</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">9</span><span style="color: #D4D4D4">|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;)</span><span style="color: #D7BA7D">\.</span><span style="color: #D4D4D4">ffn_(</span><span style="color: #DCDCAA">up</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">down</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">gate</span><span style="color: #D4D4D4">)_exps.=CPU |           </span><span style="color: #DCDCAA">pp512</span><span style="color: #D4D4D4"> |        </span><span style="color: #DCDCAA">188.32</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4.82</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">gpt-oss</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">120</span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Q4_K</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Medium</span><span style="color: #D4D4D4">     |  </span><span style="color: #DCDCAA">58.45</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">116.83</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">ROCm,BLAS</span><span style="color: #D4D4D4">  |       </span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">\.(7</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">9</span><span style="color: #D4D4D4">|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;)</span><span style="color: #D7BA7D">\.</span><span style="color: #D4D4D4">ffn_(</span><span style="color: #DCDCAA">up</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">down</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">gate</span><span style="color: #D4D4D4">)_exps.=CPU |           </span><span style="color: #DCDCAA">tg256</span><span style="color: #D4D4D4"> |         </span><span style="color: #DCDCAA">32.59</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0.01</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"></span>
<span class="line"><span style="color: #DCDCAA">build:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">d2ee056e1</span><span style="color: #D4D4D4"> (6713)</span></span>
<span class="line"><span style="color: #DCDCAA">WARNING:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">radv</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">is</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">not</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">a</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">conformant</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Vulkan</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">implementation,</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">testing</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">use</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">only.</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_vulkan:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Found</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">2</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Vulkan</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">devices:</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_vulkan:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">=</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">AMD</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Radeon</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">AI</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">PRO</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">R9700</span><span style="color: #D4D4D4"> (RADV </span><span style="color: #CE9178">GFX1201</span><span style="color: #D4D4D4">) (</span><span style="color: #DCDCAA">radv</span><span style="color: #D4D4D4">) | </span><span style="color: #DCDCAA">uma:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fp16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">bf16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">warp</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">size:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">64</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">shared</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">memory:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">65536</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">int</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">dot:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">matrix</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">cores:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">KHR_coopmat</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_vulkan:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">=</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Intel</span><span style="color: #D4D4D4">(</span><span style="color: #DCDCAA">R</span><span style="color: #D4D4D4">) </span><span style="color: #CE9178">UHD</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Graphics</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">770</span><span style="color: #D4D4D4"> (ADL-S </span><span style="color: #CE9178">GT1</span><span style="color: #D4D4D4">) (</span><span style="color: #DCDCAA">Intel</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">open-source</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Mesa</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">driver</span><span style="color: #D4D4D4">) | </span><span style="color: #DCDCAA">uma:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fp16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">bf16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">warp</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">size:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">32</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">shared</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">memory:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">65536</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">int</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">dot:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">matrix</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">cores:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">none</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">model</span><span style="color: #D4D4D4">                          |       </span><span style="color: #DCDCAA">size</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">params</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">backend</span><span style="color: #D4D4D4">    | </span><span style="color: #DCDCAA">threads</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_batch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_ubatch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_k</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_v</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fa</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">ot</span><span style="color: #D4D4D4">                    |            </span><span style="color: #DCDCAA">test</span><span style="color: #D4D4D4"> |                  </span><span style="color: #DCDCAA">t/s</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">------------------------------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">----------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------------------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">--------------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------------------:</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">gpt-oss</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">120</span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Q4_K</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Medium</span><span style="color: #D4D4D4">     |  </span><span style="color: #DCDCAA">58.45</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">116.83</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">Vulkan,BLAS</span><span style="color: #D4D4D4"> |       </span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">\.(7</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">9</span><span style="color: #D4D4D4">|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;)</span><span style="color: #D7BA7D">\.</span><span style="color: #D4D4D4">ffn_(</span><span style="color: #DCDCAA">up</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">down</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">gate</span><span style="color: #D4D4D4">)_exps.=CPU |           </span><span style="color: #DCDCAA">pp512</span><span style="color: #D4D4D4"> |        </span><span style="color: #DCDCAA">169.56</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">2.60</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">gpt-oss</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">120</span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Q4_K</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Medium</span><span style="color: #D4D4D4">     |  </span><span style="color: #DCDCAA">58.45</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">116.83</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">Vulkan,BLAS</span><span style="color: #D4D4D4"> |       </span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">\.(7</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">9</span><span style="color: #D4D4D4">|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;)</span><span style="color: #D7BA7D">\.</span><span style="color: #D4D4D4">ffn_(</span><span style="color: #DCDCAA">up</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">down</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">gate</span><span style="color: #D4D4D4">)_exps.=CPU |           </span><span style="color: #DCDCAA">tg256</span><span style="color: #D4D4D4"> |         </span><span style="color: #DCDCAA">31.49</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0.04</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"></span>
<span class="line"><span style="color: #DCDCAA">build:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">d2ee056e1</span><span style="color: #D4D4D4"> (6713)</span></span>
<span class="line"></span></code></pre></div>
</details>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>R9700 32G run output</summary>
<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:flex;align-items:center;padding:10px 0px 10px 16px;margin-bottom:-2px;width:100%;text-align:left;background-color:#2b2b2b;color:#c7c7c7">Bash</span><span role="button" tabindex="0" style="color:#D4D4D4;display:none" aria-label="Copy" class="code-block-pro-copy-button"><pre class="code-block-pro-copy-button-pre" aria-hidden="true"><textarea class="code-block-pro-copy-button-textarea" tabindex="-1" aria-hidden="true" readonly>./build_mkl-ilp64-icx_rocm/bin/llama-bench --model ../MoE/unsloth/Qwen3-Coder-30B-A3B-Instruct-1M-IQ4_XS.gguf -ctk q8_0 -ctv q8_0  --threads 8 -ngl 99 -p 512 -n 256 -fa 1 -ub 4096 -b 4096

GGML_VULKAN_DEVICE=0 ./build_mkl-ilp64-icx_vulkan/bin/llama-bench --model ../MoE/unsloth/Qwen3-Coder-30B-A3B-Instruct-1M-IQ4_XS.gguf -ctk q8_0 -ctv q8_0  --threads 8 -ngl 99 -p 512 -n 256 -fa 1 -ub 4096 -b 4096

./build_mkl-ilp64-icx_rocm/bin/llama-bench --model ../MoE/unsloth/gpt-oss-120b-Q4_K_M.gguf -ctk q8_0 -ctv q8_0 --threads 8 -ngl 99 -ot "\.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down)_exps.=CPU" -p 512 -n 256 -fa 1 -ub 4096 -b 4096

GGML_VULKAN_DEVICE=0 ./build_mkl-ilp64-icx_vulkan/bin/llama-bench --model ../MoE/unsloth/gpt-oss-120b-Q4_K_M.gguf -ctk q8_0 -ctv q8_0 --threads 8 -ngl 99 -ot "\.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down)_exps.=CPU" -p 512 -n 256 -fa 1 -ub 4096 -b 4096
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon AI PRO R9700, gfx1201 (0x1201), VMM: no, Wave Size: 32
| model                          |       size |     params | backend    | threads | n_batch | n_ubatch | type_k | type_v | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: |
| qwen3moe 30B.A3B IQ4_XS - 4.25 bpw |  15.25 GiB |    30.53 B | ROCm,BLAS  |       8 |    4096 |     4096 |   q8_0 |   q8_0 |  1 |           pp512 |        797.92 ± 3.61 |
| qwen3moe 30B.A3B IQ4_XS - 4.25 bpw |  15.25 GiB |    30.53 B | ROCm,BLAS  |       8 |    4096 |     4096 |   q8_0 |   q8_0 |  1 |           tg256 |        100.05 ± 0.08 |

build: d2ee056e1 (6713)
WARNING: radv is not a conformant Vulkan implementation, testing use only.
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon AI PRO R9700 (RADV GFX1201) (radv) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
ggml_vulkan: 1 = Intel(R) UHD Graphics 770 (ADL-S GT1) (Intel open-source Mesa driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 65536 | int dot: 1 | matrix cores: none
| model                          |       size |     params | backend    | threads | n_batch | n_ubatch | type_k | type_v | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: |
| qwen3moe 30B.A3B IQ4_XS - 4.25 bpw |  15.25 GiB |    30.53 B | Vulkan,BLAS |       8 |    4096 |     4096 |   q8_0 |   q8_0 |  1 |           pp512 |       1665.47 ± 5.95 |
| qwen3moe 30B.A3B IQ4_XS - 4.25 bpw |  15.25 GiB |    30.53 B | Vulkan,BLAS |       8 |    4096 |     4096 |   q8_0 |   q8_0 |  1 |           tg256 |        122.63 ± 0.52 |

build: d2ee056e1 (6713)
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon AI PRO R9700, gfx1201 (0x1201), VMM: no, Wave Size: 32
| model                          |       size |     params | backend    | threads | n_batch | n_ubatch | type_k | type_v | fa | ot                    |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------: | -------: | -----: | -----: | -: | --------------------- | --------------: | -------------------: |
| gpt-oss 120B Q4_K - Medium     |  58.45 GiB |   116.83 B | ROCm,BLAS  |       8 |    4096 |     4096 |   q8_0 |   q8_0 |  1 | \.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down)_exps.=CPU |           pp512 |        251.93 ± 6.61 |
| gpt-oss 120B Q4_K - Medium     |  58.45 GiB |   116.83 B | ROCm,BLAS  |       8 |    4096 |     4096 |   q8_0 |   q8_0 |  1 | \.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down)_exps.=CPU |           tg256 |         38.73 ± 0.08 |

build: d2ee056e1 (6713)
WARNING: radv is not a conformant Vulkan implementation, testing use only.
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon AI PRO R9700 (RADV GFX1201) (radv) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
ggml_vulkan: 1 = Intel(R) UHD Graphics 770 (ADL-S GT1) (Intel open-source Mesa driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 65536 | int dot: 1 | matrix cores: none
| model                          |       size |     params | backend    | threads | n_batch | n_ubatch | type_k | type_v | fa | ot                    |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------: | -------: | -----: | -----: | -: | --------------------- | --------------: | -------------------: |
| gpt-oss 120B Q4_K - Medium     |  58.45 GiB |   116.83 B | Vulkan,BLAS |       8 |    4096 |     4096 |   q8_0 |   q8_0 |  1 | \.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down)_exps.=CPU |           pp512 |        230.01 ± 2.78 |
| gpt-oss 120B Q4_K - Medium     |  58.45 GiB |   116.83 B | Vulkan,BLAS |       8 |    4096 |     4096 |   q8_0 |   q8_0 |  1 | \.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down)_exps.=CPU |           tg256 |         36.22 ± 0.03 |

build: d2ee056e1 (6713)
</textarea></pre><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki dark-plus" style="background-color: #1E1E1E" tabindex="0"><code><span class="line"><span style="color: #DCDCAA">./build_mkl-ilp64-icx_rocm/bin/llama-bench</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">--model</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">../MoE/unsloth/Qwen3-Coder-30B-A3B-Instruct-1M-IQ4_XS.gguf</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctk</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctv</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4">  </span><span style="color: #569CD6">--threads</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">8</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ngl</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">99</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-p</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">512</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-n</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">256</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-fa</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ub</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-b</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span></span>
<span class="line"></span>
<span class="line"><span style="color: #9CDCFE">GGML_VULKAN_DEVICE</span><span style="color: #D4D4D4">=</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> </span><span style="color: #DCDCAA">./build_mkl-ilp64-icx_vulkan/bin/llama-bench</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">--model</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">../MoE/unsloth/Qwen3-Coder-30B-A3B-Instruct-1M-IQ4_XS.gguf</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctk</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctv</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4">  </span><span style="color: #569CD6">--threads</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">8</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ngl</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">99</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-p</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">512</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-n</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">256</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-fa</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ub</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-b</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span></span>
<span class="line"></span>
<span class="line"><span style="color: #DCDCAA">./build_mkl-ilp64-icx_rocm/bin/llama-bench</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">--model</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">../MoE/unsloth/gpt-oss-120b-Q4_K_M.gguf</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctk</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctv</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">--threads</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">8</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ngl</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">99</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ot</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">&quot;\.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down)_exps.=CPU&quot;</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-p</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">512</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-n</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">256</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-fa</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ub</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-b</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span></span>
<span class="line"></span>
<span class="line"><span style="color: #9CDCFE">GGML_VULKAN_DEVICE</span><span style="color: #D4D4D4">=</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> </span><span style="color: #DCDCAA">./build_mkl-ilp64-icx_vulkan/bin/llama-bench</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">--model</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">../MoE/unsloth/gpt-oss-120b-Q4_K_M.gguf</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctk</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctv</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">--threads</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">8</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ngl</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">99</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ot</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">&quot;\.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down)_exps.=CPU&quot;</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-p</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">512</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-n</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">256</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-fa</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ub</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-b</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_cuda_init:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GGML_CUDA_FORCE_MMQ:</span><span style="color: #D4D4D4">    </span><span style="color: #CE9178">no</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_cuda_init:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GGML_CUDA_FORCE_CUBLAS:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">no</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_cuda_init:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">found</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">ROCm</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">devices:</span></span>
<span class="line"><span style="color: #D4D4D4">  </span><span style="color: #DCDCAA">Device</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #CE9178">:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">AMD</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Radeon</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">AI</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">PRO</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">R9700,</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">gfx1201</span><span style="color: #D4D4D4"> (0x1201), VMM: no, Wave Size: 32</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">model</span><span style="color: #D4D4D4">                          |       </span><span style="color: #DCDCAA">size</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">params</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">backend</span><span style="color: #D4D4D4">    | </span><span style="color: #DCDCAA">threads</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_batch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_ubatch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_k</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_v</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fa</span><span style="color: #D4D4D4"> |            </span><span style="color: #DCDCAA">test</span><span style="color: #D4D4D4"> |                  </span><span style="color: #DCDCAA">t/s</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">------------------------------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">----------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">--------------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------------------:</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">qwen3moe</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">30</span><span style="color: #CE9178">B.A3B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">IQ4_XS</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4.25</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">bpw</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">15.25</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">30.53</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">ROCm,BLAS</span><span style="color: #D4D4D4">  |       </span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> |           </span><span style="color: #DCDCAA">pp512</span><span style="color: #D4D4D4"> |        </span><span style="color: #DCDCAA">797.92</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">3.61</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">qwen3moe</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">30</span><span style="color: #CE9178">B.A3B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">IQ4_XS</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4.25</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">bpw</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">15.25</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">30.53</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">ROCm,BLAS</span><span style="color: #D4D4D4">  |       </span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> |           </span><span style="color: #DCDCAA">tg256</span><span style="color: #D4D4D4"> |        </span><span style="color: #DCDCAA">100.05</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0.08</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"></span>
<span class="line"><span style="color: #DCDCAA">build:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">d2ee056e1</span><span style="color: #D4D4D4"> (6713)</span></span>
<span class="line"><span style="color: #DCDCAA">WARNING:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">radv</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">is</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">not</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">a</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">conformant</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Vulkan</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">implementation,</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">testing</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">use</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">only.</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_vulkan:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Found</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">2</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Vulkan</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">devices:</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_vulkan:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">=</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">AMD</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Radeon</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">AI</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">PRO</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">R9700</span><span style="color: #D4D4D4"> (RADV </span><span style="color: #CE9178">GFX1201</span><span style="color: #D4D4D4">) (</span><span style="color: #DCDCAA">radv</span><span style="color: #D4D4D4">) | </span><span style="color: #DCDCAA">uma:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fp16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">bf16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">warp</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">size:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">64</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">shared</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">memory:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">65536</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">int</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">dot:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">matrix</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">cores:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">KHR_coopmat</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_vulkan:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">=</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Intel</span><span style="color: #D4D4D4">(</span><span style="color: #DCDCAA">R</span><span style="color: #D4D4D4">) </span><span style="color: #CE9178">UHD</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Graphics</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">770</span><span style="color: #D4D4D4"> (ADL-S </span><span style="color: #CE9178">GT1</span><span style="color: #D4D4D4">) (</span><span style="color: #DCDCAA">Intel</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">open-source</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Mesa</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">driver</span><span style="color: #D4D4D4">) | </span><span style="color: #DCDCAA">uma:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fp16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">bf16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">warp</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">size:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">32</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">shared</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">memory:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">65536</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">int</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">dot:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">matrix</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">cores:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">none</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">model</span><span style="color: #D4D4D4">                          |       </span><span style="color: #DCDCAA">size</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">params</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">backend</span><span style="color: #D4D4D4">    | </span><span style="color: #DCDCAA">threads</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_batch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_ubatch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_k</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_v</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fa</span><span style="color: #D4D4D4"> |            </span><span style="color: #DCDCAA">test</span><span style="color: #D4D4D4"> |                  </span><span style="color: #DCDCAA">t/s</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">------------------------------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">----------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">--------------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------------------:</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">qwen3moe</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">30</span><span style="color: #CE9178">B.A3B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">IQ4_XS</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4.25</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">bpw</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">15.25</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">30.53</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">Vulkan,BLAS</span><span style="color: #D4D4D4"> |       </span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> |           </span><span style="color: #DCDCAA">pp512</span><span style="color: #D4D4D4"> |       </span><span style="color: #DCDCAA">1665.47</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">5.95</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">qwen3moe</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">30</span><span style="color: #CE9178">B.A3B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">IQ4_XS</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4.25</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">bpw</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">15.25</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">30.53</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">Vulkan,BLAS</span><span style="color: #D4D4D4"> |       </span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> |           </span><span style="color: #DCDCAA">tg256</span><span style="color: #D4D4D4"> |        </span><span style="color: #DCDCAA">122.63</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0.52</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"></span>
<span class="line"><span style="color: #DCDCAA">build:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">d2ee056e1</span><span style="color: #D4D4D4"> (6713)</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_cuda_init:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GGML_CUDA_FORCE_MMQ:</span><span style="color: #D4D4D4">    </span><span style="color: #CE9178">no</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_cuda_init:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GGML_CUDA_FORCE_CUBLAS:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">no</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_cuda_init:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">found</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">ROCm</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">devices:</span></span>
<span class="line"><span style="color: #D4D4D4">  </span><span style="color: #DCDCAA">Device</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #CE9178">:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">AMD</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Radeon</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">AI</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">PRO</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">R9700,</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">gfx1201</span><span style="color: #D4D4D4"> (0x1201), VMM: no, Wave Size: 32</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">model</span><span style="color: #D4D4D4">                          |       </span><span style="color: #DCDCAA">size</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">params</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">backend</span><span style="color: #D4D4D4">    | </span><span style="color: #DCDCAA">threads</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_batch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_ubatch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_k</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_v</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fa</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">ot</span><span style="color: #D4D4D4">                    |            </span><span style="color: #DCDCAA">test</span><span style="color: #D4D4D4"> |                  </span><span style="color: #DCDCAA">t/s</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">------------------------------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">----------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------------------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">--------------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------------------:</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">gpt-oss</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">120</span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Q4_K</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Medium</span><span style="color: #D4D4D4">     |  </span><span style="color: #DCDCAA">58.45</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">116.83</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">ROCm,BLAS</span><span style="color: #D4D4D4">  |       </span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">\.(7</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">9</span><span style="color: #D4D4D4">|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;)</span><span style="color: #D7BA7D">\.</span><span style="color: #D4D4D4">ffn_(</span><span style="color: #DCDCAA">up</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">down</span><span style="color: #D4D4D4">)_exps.=CPU |           </span><span style="color: #DCDCAA">pp512</span><span style="color: #D4D4D4"> |        </span><span style="color: #DCDCAA">251.93</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">6.61</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">gpt-oss</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">120</span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Q4_K</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Medium</span><span style="color: #D4D4D4">     |  </span><span style="color: #DCDCAA">58.45</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">116.83</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">ROCm,BLAS</span><span style="color: #D4D4D4">  |       </span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">\.(7</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">9</span><span style="color: #D4D4D4">|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;)</span><span style="color: #D7BA7D">\.</span><span style="color: #D4D4D4">ffn_(</span><span style="color: #DCDCAA">up</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">down</span><span style="color: #D4D4D4">)_exps.=CPU |           </span><span style="color: #DCDCAA">tg256</span><span style="color: #D4D4D4"> |         </span><span style="color: #DCDCAA">38.73</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0.08</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"></span>
<span class="line"><span style="color: #DCDCAA">build:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">d2ee056e1</span><span style="color: #D4D4D4"> (6713)</span></span>
<span class="line"><span style="color: #DCDCAA">WARNING:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">radv</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">is</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">not</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">a</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">conformant</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Vulkan</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">implementation,</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">testing</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">use</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">only.</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_vulkan:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Found</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">2</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Vulkan</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">devices:</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_vulkan:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">=</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">AMD</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Radeon</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">AI</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">PRO</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">R9700</span><span style="color: #D4D4D4"> (RADV </span><span style="color: #CE9178">GFX1201</span><span style="color: #D4D4D4">) (</span><span style="color: #DCDCAA">radv</span><span style="color: #D4D4D4">) | </span><span style="color: #DCDCAA">uma:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fp16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">bf16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">warp</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">size:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">64</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">shared</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">memory:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">65536</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">int</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">dot:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">matrix</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">cores:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">KHR_coopmat</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_vulkan:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">=</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Intel</span><span style="color: #D4D4D4">(</span><span style="color: #DCDCAA">R</span><span style="color: #D4D4D4">) </span><span style="color: #CE9178">UHD</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Graphics</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">770</span><span style="color: #D4D4D4"> (ADL-S </span><span style="color: #CE9178">GT1</span><span style="color: #D4D4D4">) (</span><span style="color: #DCDCAA">Intel</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">open-source</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Mesa</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">driver</span><span style="color: #D4D4D4">) | </span><span style="color: #DCDCAA">uma:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fp16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">bf16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">warp</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">size:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">32</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">shared</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">memory:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">65536</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">int</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">dot:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">matrix</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">cores:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">none</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">model</span><span style="color: #D4D4D4">                          |       </span><span style="color: #DCDCAA">size</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">params</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">backend</span><span style="color: #D4D4D4">    | </span><span style="color: #DCDCAA">threads</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_batch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_ubatch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_k</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_v</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fa</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">ot</span><span style="color: #D4D4D4">                    |            </span><span style="color: #DCDCAA">test</span><span style="color: #D4D4D4"> |                  </span><span style="color: #DCDCAA">t/s</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">------------------------------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">----------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------------------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">--------------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------------------:</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">gpt-oss</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">120</span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Q4_K</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Medium</span><span style="color: #D4D4D4">     |  </span><span style="color: #DCDCAA">58.45</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">116.83</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">Vulkan,BLAS</span><span style="color: #D4D4D4"> |       </span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">\.(7</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">9</span><span style="color: #D4D4D4">|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;)</span><span style="color: #D7BA7D">\.</span><span style="color: #D4D4D4">ffn_(</span><span style="color: #DCDCAA">up</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">down</span><span style="color: #D4D4D4">)_exps.=CPU |           </span><span style="color: #DCDCAA">pp512</span><span style="color: #D4D4D4"> |        </span><span style="color: #DCDCAA">230.01</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">2.78</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">gpt-oss</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">120</span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Q4_K</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Medium</span><span style="color: #D4D4D4">     |  </span><span style="color: #DCDCAA">58.45</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">116.83</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">Vulkan,BLAS</span><span style="color: #D4D4D4"> |       </span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">\.(7</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">9</span><span style="color: #D4D4D4">|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;)</span><span style="color: #D7BA7D">\.</span><span style="color: #D4D4D4">ffn_(</span><span style="color: #DCDCAA">up</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">down</span><span style="color: #D4D4D4">)_exps.=CPU |           </span><span style="color: #DCDCAA">tg256</span><span style="color: #D4D4D4"> |         </span><span style="color: #DCDCAA">36.22</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0.03</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"></span>
<span class="line"><span style="color: #DCDCAA">build:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">d2ee056e1</span><span style="color: #D4D4D4"> (6713)</span></span>
<span class="line"></span></code></pre></div>
</details>



<p class="wp-block-paragraph">Even though it has twice the VRAM, with the MoE layers CPU offloading strategy, turns out it still can&#8217;t quite compete with the RTX 5070 Ti. This might be caused by the large memory bandwidth gap between both (644.6 GB/s vs 896 GB/s, 39% difference).</p>



<p class="wp-block-paragraph">Now, when you see on unsloth/Qwen3-Coder-30B-A3B-Instruct-1M-IQ4_XS, it has this big performance delta between ROCm 7 backend and Vulkan backend. I wonder why a bunch of open-source developers from Vulkan/SPIR-V/Kompute project can outperform engineers who are paid to make their stuffs performant&#8230; Although, it did better on unsloth/gpt-oss-120b-Q4_K_M by a tiny margin.</p>



<p class="wp-block-paragraph">Also, turns out having a bigger VRAM still can&#8217;t quite defeat the smaller VRAM, if your bandwidth is too slow, and your software isn&#8217;t up to par.</p>



<p class="wp-block-paragraph">It&#8217;s not about the size. It&#8217;s about how you use it.</p>


<div class="wp-block-image">
<figure data-wp-context="{&quot;imageId&quot;:&quot;6a1d48e3b7938&quot;}" data-wp-interactive="core/image" data-wp-key="6a1d48e3b7938" class="aligncenter size-large wp-lightbox-container"><img loading="lazy" decoding="async" width="480" height="270" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on--pointerdown="actions.preloadImage" data-wp-on--pointerenter="actions.preloadImageWithDelay" data-wp-on--pointerleave="actions.cancelPreload" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://efisonlt.com/wp-content/uploads/2025/10/giphy.gif" alt="" class="wp-image-1906"/><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			data-wp-bind--aria-label="state.thisImage.triggerButtonAriaLabel"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.thisImage.buttonRight"
			data-wp-style--top="state.thisImage.buttonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>
</div>


<h2 class="wp-block-heading">Test Results: Qwen Image Edit 2509</h2>



<p class="wp-block-paragraph">Sometimes we question ourselves why do we like to suffer.</p>



<p class="wp-block-paragraph">And this test is no different.</p>



<p class="wp-block-paragraph">We (or me, personally), wonder if being a goose farmer is a better choice for living a happy life.</p>


<div class="wp-block-image">
<figure data-wp-context="{&quot;imageId&quot;:&quot;6a1d48e3b8361&quot;}" data-wp-interactive="core/image" data-wp-key="6a1d48e3b8361" class="aligncenter size-medium wp-lightbox-container"><img loading="lazy" decoding="async" width="269" height="300" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on--pointerdown="actions.preloadImage" data-wp-on--pointerenter="actions.preloadImageWithDelay" data-wp-on--pointerleave="actions.cancelPreload" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://efisonlt.com/wp-content/uploads/2025/10/485870201_1066836342142039_1435760072320172837_n-269x300.jpg" alt="" class="wp-image-1893" srcset="https://efisonlt.com/wp-content/uploads/2025/10/485870201_1066836342142039_1435760072320172837_n-269x300.jpg 269w, https://efisonlt.com/wp-content/uploads/2025/10/485870201_1066836342142039_1435760072320172837_n-768x857.jpg 768w, https://efisonlt.com/wp-content/uploads/2025/10/485870201_1066836342142039_1435760072320172837_n.jpg 860w" sizes="(max-width: 269px) 100vw, 269px" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			data-wp-bind--aria-label="state.thisImage.triggerButtonAriaLabel"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.thisImage.buttonRight"
			data-wp-style--top="state.thisImage.buttonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>
</div>


<p class="wp-block-paragraph">First, let us tell you that running this test for the first run, is not a happy feat. It takes quite a while for the text encoder and VAE to load and process our prompt and image inputs.</p>


<div class="wp-block-image">
<figure data-wp-context="{&quot;imageId&quot;:&quot;6a1d48e3b8931&quot;}" data-wp-interactive="core/image" data-wp-key="6a1d48e3b8931" class="aligncenter size-full wp-lightbox-container"><img loading="lazy" decoding="async" width="232" height="274" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on--pointerdown="actions.preloadImage" data-wp-on--pointerenter="actions.preloadImageWithDelay" data-wp-on--pointerleave="actions.cancelPreload" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://efisonlt.com/wp-content/uploads/2025/10/image-2.png" alt="" class="wp-image-1894"/><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			data-wp-bind--aria-label="state.thisImage.triggerButtonAriaLabel"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.thisImage.buttonRight"
			data-wp-style--top="state.thisImage.buttonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button><figcaption class="wp-element-caption">On a bad day, it can take more than 7 mins of time to generate this for the first run.</figcaption></figure>
</div>


<h3 class="wp-block-heading">UPDATE: IT DOESN&#8217;T HAVE TO BE 7 MINUTES</h3>



<p class="wp-block-paragraph">PyTorch for ROCm apparently has a <a href="https://github.com/ROCm/TheRock/issues/1542">bug</a> in which the VAE stage is extremely slow.</p>



<figure data-wp-context="{&quot;imageId&quot;:&quot;6a1d48e3b8ece&quot;}" data-wp-interactive="core/image" data-wp-key="6a1d48e3b8ece" class="wp-block-image size-full wp-lightbox-container"><img loading="lazy" decoding="async" width="907" height="763" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on--pointerdown="actions.preloadImage" data-wp-on--pointerenter="actions.preloadImageWithDelay" data-wp-on--pointerleave="actions.cancelPreload" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://efisonlt.com/wp-content/uploads/2025/10/image-5.png" alt="" class="wp-image-1950" srcset="https://efisonlt.com/wp-content/uploads/2025/10/image-5.png 907w, https://efisonlt.com/wp-content/uploads/2025/10/image-5-300x252.png 300w, https://efisonlt.com/wp-content/uploads/2025/10/image-5-768x646.png 768w" sizes="(max-width: 907px) 100vw, 907px" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			data-wp-bind--aria-label="state.thisImage.triggerButtonAriaLabel"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.thisImage.buttonRight"
			data-wp-style--top="state.thisImage.buttonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<p class="wp-block-paragraph">The solution? ComfyUI recently <a href="https://github.com/comfyanonymous/ComfyUI/pull/10302">pushed a workaround</a>. Which is to disable cuDNN back-end (?).</p>



<figure data-wp-context="{&quot;imageId&quot;:&quot;6a1d48e3b939b&quot;}" data-wp-interactive="core/image" data-wp-key="6a1d48e3b939b" class="wp-block-image size-full wp-lightbox-container"><img loading="lazy" decoding="async" width="907" height="845" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on--pointerdown="actions.preloadImage" data-wp-on--pointerenter="actions.preloadImageWithDelay" data-wp-on--pointerleave="actions.cancelPreload" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://efisonlt.com/wp-content/uploads/2025/10/image-6.png" alt="" class="wp-image-1951" srcset="https://efisonlt.com/wp-content/uploads/2025/10/image-6.png 907w, https://efisonlt.com/wp-content/uploads/2025/10/image-6-300x279.png 300w, https://efisonlt.com/wp-content/uploads/2025/10/image-6-768x716.png 768w" sizes="(max-width: 907px) 100vw, 907px" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			data-wp-bind--aria-label="state.thisImage.triggerButtonAriaLabel"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.thisImage.buttonRight"
			data-wp-style--top="state.thisImage.buttonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<p class="wp-block-paragraph">Now the first generation isn&#8217;t that painful anymore.</p>


<div class="wp-block-image">
<figure data-wp-context="{&quot;imageId&quot;:&quot;6a1d48e3b98b0&quot;}" data-wp-interactive="core/image" data-wp-key="6a1d48e3b98b0" class="aligncenter size-full wp-lightbox-container"><img loading="lazy" decoding="async" width="236" height="236" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on--pointerdown="actions.preloadImage" data-wp-on--pointerenter="actions.preloadImageWithDelay" data-wp-on--pointerleave="actions.cancelPreload" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://efisonlt.com/wp-content/uploads/2025/10/image-7.png" alt="" class="wp-image-1952" srcset="https://efisonlt.com/wp-content/uploads/2025/10/image-7.png 236w, https://efisonlt.com/wp-content/uploads/2025/10/image-7-150x150.png 150w" sizes="(max-width: 236px) 100vw, 236px" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			data-wp-bind--aria-label="state.thisImage.triggerButtonAriaLabel"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.thisImage.buttonRight"
			data-wp-style--top="state.thisImage.buttonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>
</div>


<p class="wp-block-paragraph">Original article continues below.</p>



<p class="wp-block-paragraph">Here, we only put the generation time after text encoder and VAE had been loaded and then run it in a batch of 5. Then we averaged the time needed to generate the same prompt with the same inputs.</p>



<p class="wp-block-paragraph">Now, let&#8217;s get onto the results:</p>



<figure class="wp-block-table"><table><thead><tr><th>(In seconds, Lower is better)</th><th class="has-text-align-right" data-align="right">RTX 5070 Ti<br>CUDA 12.9<br>Linux</th><th class="has-text-align-right" data-align="right">R9700<br>ROCm 7.0.x<br>Linux</th><th class="has-text-align-right" data-align="right">R9700<br>ROCm 6.4.x<br>Windows</th><th class="has-text-align-right" data-align="right">R9700<br>ROCm 7.0.x<br>Windows</th></tr></thead><tbody><tr><td><strong>Results</strong></td><td class="has-text-align-right" data-align="right"><strong>29.384</strong></td><td class="has-text-align-right" data-align="right">52.17</td><td class="has-text-align-right" data-align="right">69.262</td><td class="has-text-align-right" data-align="right">62.59</td></tr></tbody></table></figure>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>RTX 5070 Ti CUDA 12.9 Linux test screenshot</summary>
<p class="wp-block-paragraph"><a href="https://efisonlt.com/wp-content/uploads/2025/10/5070-Ti-GGUF.png">https://efisonlt.com/wp-content/uploads/2025/10/5070-Ti-GGUF.png</a></p>
</details>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>R9700 ROCm 7.0.x Linux test screenshot</summary>
<p class="wp-block-paragraph"><a href="https://efisonlt.com/wp-content/uploads/2025/10/R9700-GGUF.png">https://efisonlt.com/wp-content/uploads/2025/10/R9700-GGUF.png</a></p>
</details>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>R9700 ROCm 6.4.x Windows test screenshot</summary>
<p class="wp-block-paragraph"><a href="https://efisonlt.com/wp-content/uploads/2025/10/Windows-R9700-GGUF.png">https://efisonlt.com/wp-content/uploads/2025/10/Windows-R9700-GGUF.png</a></p>
</details>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>R9700 ROCm 7.0.x Windows test screenshot</summary>
<p class="wp-block-paragraph"><a href="https://efisonlt.com/wp-content/uploads/2025/10/Windows-R9700-GGUF-gfx120x-nightly.png">https://efisonlt.com/wp-content/uploads/2025/10/Windows-R9700-GGUF-gfx120x-nightly.png</a></p>
</details>



<p class="wp-block-paragraph">Oops.</p>



<p class="wp-block-paragraph">Almost twice faster.</p>



<p class="wp-block-paragraph">Also you can make a case that PyTorch for ROCm on Windows is slower. Almost 20% slower than on Linux. Although this is only on one use case that we tested and we didn&#8217;t confirm with different various use cases.</p>



<p class="wp-block-paragraph">And to make it worse, RTX 5070 Ti does have a trick up its sleeve.</p>



<p class="wp-block-paragraph">Introducing <a href="https://github.com/nunchaku-tech/nunchaku">Nunchaku SVDQuant</a>. An inference engine so fast it cut the inference time of Qwen Image Edit 2509 almost in half even versus already quantized Q4_K_M model.</p>



<figure class="wp-block-table"><table><thead><tr><th>(In seconds, Lower is better)</th><th class="has-text-align-right" data-align="right">RTX 5070 Ti<br>CUDA 12.9<br>Linux<br>Nunchaku FP4 r32</th><th class="has-text-align-right" data-align="right">RTX 5070 Ti<br>CUDA 12.9<br>Linux<br>Q4_K_M GGUF</th></tr></thead><tbody><tr><td><strong>Results</strong></td><td class="has-text-align-right" data-align="right"><strong>15.198</strong></td><td class="has-text-align-right" data-align="right">29.384</td></tr></tbody></table></figure>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>RTX 5070 Ti CUDA 12.9 Linux, Nunchaku FP4 r32 model test screenshot </summary>
<p class="wp-block-paragraph"><a href="https://efisonlt.com/wp-content/uploads/2025/10/5070-Ti-Nunchaku-r32-CPU-offload.png">https://efisonlt.com/wp-content/uploads/2025/10/5070-Ti-Nunchaku-r32-CPU-offload.png</a></p>
</details>



<h2 class="wp-block-heading">Overclocking</h2>



<p class="wp-block-paragraph">Real men do OC.</p>



<p class="wp-block-paragraph">Or men with too much times in hands.</p>



<p class="wp-block-paragraph">We managed to overclock this card using <a href="https://github.com/ilya-zlobintsev/LACT">LACT</a>. No, we didn&#8217;t test overclocking on Windows. Penguins FTW!</p>



<figure data-wp-context="{&quot;imageId&quot;:&quot;6a1d48e3ba50d&quot;}" data-wp-interactive="core/image" data-wp-key="6a1d48e3ba50d" class="wp-block-image size-full wp-lightbox-container"><img loading="lazy" decoding="async" width="925" height="879" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on--pointerdown="actions.preloadImage" data-wp-on--pointerenter="actions.preloadImageWithDelay" data-wp-on--pointerleave="actions.cancelPreload" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://efisonlt.com/wp-content/uploads/2025/10/Screenshot_20251011_165724.png" alt="" class="wp-image-1890" srcset="https://efisonlt.com/wp-content/uploads/2025/10/Screenshot_20251011_165724.png 925w, https://efisonlt.com/wp-content/uploads/2025/10/Screenshot_20251011_165724-300x285.png 300w, https://efisonlt.com/wp-content/uploads/2025/10/Screenshot_20251011_165724-768x730.png 768w" sizes="(max-width: 925px) 100vw, 925px" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			data-wp-bind--aria-label="state.thisImage.triggerButtonAriaLabel"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.thisImage.buttonRight"
			data-wp-style--top="state.thisImage.buttonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<p class="wp-block-paragraph">We found out that 2800 MHz was the ideal maximum VRAM clock to keep it from crashing doing various workloads. Still a healthy 282 MHz increase from the default 2518 MHz. Which translates to <strong>11.2%</strong> more memory bandwidth.</p>



<p class="wp-block-paragraph">The maximum power usage limit also increased to 330 W from 300 W (<strong>10% increase</strong>). And maybe you wonder what does GPU voltage offset about? Why is it being lowered, right?</p>



<p class="wp-block-paragraph">Some of you might have heard that overclocking the GPU clock on RDNA4 is done by shifting the voltage/frequency curve by setting the voltage offset to a lower value, so the GPU would be tricked and boosts to higher frequency.</p>



<p class="wp-block-paragraph">Too difficult to understand? Let us show you a video from our friend <a href="https://www.youtube.com/@Luckyn00bOC">Alva Jonathan</a> who did an excellent job explaining overclocking on another RDNA4 GPU which is a Radeon RX 9070.</p>



<figure class="wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio"><div class="wp-block-embed__wrapper">
<div class="nv-iframe-embed"><iframe title="Test Overclocking &amp; Undervolting ASRock Radeon RX 9070 Steel Legend (Indonesia)" width="1200" height="675" src="https://www.youtube.com/embed/4SF9OK2PwYo?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe></div>
</div></figure>



<p class="wp-block-paragraph">Here are the performance results to show you the gain from the overclocking attempt:</p>



<h3 class="wp-block-heading">llama.cpp unsloth/Qwen3-Coder-30B-A3B-Instruct-1M-IQ4_XS (Overclocked)</h3>



<figure class="wp-block-table"><table><thead><tr><th>(In token/s. Higher is better)</th><th class="has-text-align-right" data-align="right">5070 Ti, 16G<br>CUDA</th><th class="has-text-align-right" data-align="right">R9700, 32G<br>Vulkan<br>Overclocked</th><th class="has-text-align-right" data-align="right">R9700, 32G<br>Vulkan<br>Stock default</th></tr></thead><tbody><tr><td><strong>Prompt processing (512 tokens)</strong></td><td class="has-text-align-right" data-align="right"><strong>3723.33 ± 50.09</strong></td><td class="has-text-align-right" data-align="right">1810.00 ± 11.36</td><td class="has-text-align-right" data-align="right">1665.47 ± 5.95</td></tr><tr><td><strong>Text generation (256 tokens)</strong></td><td class="has-text-align-right" data-align="right"><strong>137.34 ± 0.69</strong></td><td class="has-text-align-right" data-align="right">131.09 ± 0.37</td><td class="has-text-align-right" data-align="right">122.63 ± 0.52</td></tr></tbody></table></figure>



<h3 class="wp-block-heading">llama.cpp unsloth/gpt-oss-120b-Q4_K_M (Overclocked)</h3>



<figure class="wp-block-table"><table><thead><tr><th>(In token/s. Higher is better)</th><th class="has-text-align-right" data-align="right">5070 Ti, 16G<br>CUDA</th><th class="has-text-align-right" data-align="right">R9700, 32G<br>ROCm 7<br>Overclocked</th><th class="has-text-align-right" data-align="right">R9700, 32G<br>ROCm 7<br>Stock default</th></tr></thead><tbody><tr><td><strong>Prompt processing (512 tokens)</strong></td><td class="has-text-align-right" data-align="right"><strong>370.32 ± 4.22</strong></td><td class="has-text-align-right" data-align="right">254.46 ± 4.62</td><td class="has-text-align-right" data-align="right">251.93 ± 6.61</td></tr><tr><td><strong>Text generation (256 tokens)</strong></td><td class="has-text-align-right" data-align="right"><strong>40.24 ± 0.13</strong></td><td class="has-text-align-right" data-align="right">39.72 ± 0.04</td><td class="has-text-align-right" data-align="right">38.73 ± 0.08</td></tr></tbody></table></figure>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>Overclocked R9700 llama.cpp run output</summary>
<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:flex;align-items:center;padding:10px 0px 10px 16px;margin-bottom:-2px;width:100%;text-align:left;background-color:#2b2b2b;color:#c7c7c7">Bash</span><span role="button" tabindex="0" style="color:#D4D4D4;display:none" aria-label="Copy" class="code-block-pro-copy-button"><pre class="code-block-pro-copy-button-pre" aria-hidden="true"><textarea class="code-block-pro-copy-button-textarea" tabindex="-1" aria-hidden="true" readonly>./build_mkl-ilp64-icx_rocm/bin/llama-bench --model ../MoE/unsloth/Qwen3-Coder-30B-A3B-Instruct-1M-IQ4_XS.gguf -ctk q8_0 -ctv q8_0  --threads 8 -ngl 99 -p 512 -n 256 -fa 1 -ub 4096 -b 4096

GGML_VULKAN_DEVICE=0 ./build_mkl-ilp64-icx_vulkan/bin/llama-bench --model ../MoE/unsloth/Qwen3-Coder-30B-A3B-Instruct-1M-IQ4_XS.gguf -ctk q8_0 -ctv q8_0  --threads 8 -ngl 99 -p 512 -n 256 -fa 1 -ub 4096 -b 4096

./build_mkl-ilp64-icx_rocm/bin/llama-bench --model ../MoE/unsloth/gpt-oss-120b-Q4_K_M.gguf -ctk q8_0 -ctv q8_0 --threads 8 -ngl 99 -ot "\.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down)_exps.=CPU" -p 512 -n 256 -fa 1 -ub 4096 -b 4096

GGML_VULKAN_DEVICE=0 ./build_mkl-ilp64-icx_vulkan/bin/llama-bench --model ../MoE/unsloth/gpt-oss-120b-Q4_K_M.gguf -ctk q8_0 -ctv q8_0 --threads 8 -ngl 99 -ot "\.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down)_exps.=CPU" -p 512 -n 256 -fa 1 -ub 4096 -b 4096
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon AI PRO R9700, gfx1201 (0x1201), VMM: no, Wave Size: 32
| model                          |       size |     params | backend    | threads | n_batch | n_ubatch | type_k | type_v | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: |
| qwen3moe 30B.A3B IQ4_XS - 4.25 bpw |  15.25 GiB |    30.53 B | ROCm,BLAS  |       8 |    4096 |     4096 |   q8_0 |   q8_0 |  1 |           pp512 |        834.27 ± 3.99 |
| qwen3moe 30B.A3B IQ4_XS - 4.25 bpw |  15.25 GiB |    30.53 B | ROCm,BLAS  |       8 |    4096 |     4096 |   q8_0 |   q8_0 |  1 |           tg256 |        105.42 ± 0.12 |

build: d2ee056e1 (6713)
WARNING: radv is not a conformant Vulkan implementation, testing use only.
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon AI PRO R9700 (RADV GFX1201) (radv) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
ggml_vulkan: 1 = Intel(R) UHD Graphics 770 (ADL-S GT1) (Intel open-source Mesa driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 65536 | int dot: 1 | matrix cores: none
| model                          |       size |     params | backend    | threads | n_batch | n_ubatch | type_k | type_v | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------: | -------: | -----: | -----: | -: | --------------: | -------------------: |
| qwen3moe 30B.A3B IQ4_XS - 4.25 bpw |  15.25 GiB |    30.53 B | Vulkan,BLAS |       8 |    4096 |     4096 |   q8_0 |   q8_0 |  1 |           pp512 |      1810.00 ± 11.36 |
| qwen3moe 30B.A3B IQ4_XS - 4.25 bpw |  15.25 GiB |    30.53 B | Vulkan,BLAS |       8 |    4096 |     4096 |   q8_0 |   q8_0 |  1 |           tg256 |        131.09 ± 0.37 |

build: d2ee056e1 (6713)
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon AI PRO R9700, gfx1201 (0x1201), VMM: no, Wave Size: 32
| model                          |       size |     params | backend    | threads | n_batch | n_ubatch | type_k | type_v | fa | ot                    |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------: | -------: | -----: | -----: | -: | --------------------- | --------------: | -------------------: |
| gpt-oss 120B Q4_K - Medium     |  58.45 GiB |   116.83 B | ROCm,BLAS  |       8 |    4096 |     4096 |   q8_0 |   q8_0 |  1 | \.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down)_exps.=CPU |           pp512 |        254.46 ± 4.62 |
| gpt-oss 120B Q4_K - Medium     |  58.45 GiB |   116.83 B | ROCm,BLAS  |       8 |    4096 |     4096 |   q8_0 |   q8_0 |  1 | \.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down)_exps.=CPU |           tg256 |         39.72 ± 0.04 |

build: d2ee056e1 (6713)
WARNING: radv is not a conformant Vulkan implementation, testing use only.
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon AI PRO R9700 (RADV GFX1201) (radv) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
ggml_vulkan: 1 = Intel(R) UHD Graphics 770 (ADL-S GT1) (Intel open-source Mesa driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 65536 | int dot: 1 | matrix cores: none
| model                          |       size |     params | backend    | threads | n_batch | n_ubatch | type_k | type_v | fa | ot                    |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------: | -------: | -----: | -----: | -: | --------------------- | --------------: | -------------------: |
| gpt-oss 120B Q4_K - Medium     |  58.45 GiB |   116.83 B | Vulkan,BLAS |       8 |    4096 |     4096 |   q8_0 |   q8_0 |  1 | \.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down)_exps.=CPU |           pp512 |        236.58 ± 2.89 |
| gpt-oss 120B Q4_K - Medium     |  58.45 GiB |   116.83 B | Vulkan,BLAS |       8 |    4096 |     4096 |   q8_0 |   q8_0 |  1 | \.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down)_exps.=CPU |           tg256 |         37.41 ± 0.04 |

build: d2ee056e1 (6713)</textarea></pre><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki dark-plus" style="background-color: #1E1E1E" tabindex="0"><code><span class="line"><span style="color: #DCDCAA">./build_mkl-ilp64-icx_rocm/bin/llama-bench</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">--model</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">../MoE/unsloth/Qwen3-Coder-30B-A3B-Instruct-1M-IQ4_XS.gguf</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctk</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctv</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4">  </span><span style="color: #569CD6">--threads</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">8</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ngl</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">99</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-p</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">512</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-n</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">256</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-fa</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ub</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-b</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span></span>
<span class="line"></span>
<span class="line"><span style="color: #9CDCFE">GGML_VULKAN_DEVICE</span><span style="color: #D4D4D4">=</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> </span><span style="color: #DCDCAA">./build_mkl-ilp64-icx_vulkan/bin/llama-bench</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">--model</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">../MoE/unsloth/Qwen3-Coder-30B-A3B-Instruct-1M-IQ4_XS.gguf</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctk</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctv</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4">  </span><span style="color: #569CD6">--threads</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">8</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ngl</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">99</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-p</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">512</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-n</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">256</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-fa</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ub</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-b</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span></span>
<span class="line"></span>
<span class="line"><span style="color: #DCDCAA">./build_mkl-ilp64-icx_rocm/bin/llama-bench</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">--model</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">../MoE/unsloth/gpt-oss-120b-Q4_K_M.gguf</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctk</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctv</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">--threads</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">8</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ngl</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">99</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ot</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">&quot;\.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down)_exps.=CPU&quot;</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-p</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">512</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-n</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">256</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-fa</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ub</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-b</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span></span>
<span class="line"></span>
<span class="line"><span style="color: #9CDCFE">GGML_VULKAN_DEVICE</span><span style="color: #D4D4D4">=</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> </span><span style="color: #DCDCAA">./build_mkl-ilp64-icx_vulkan/bin/llama-bench</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">--model</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">../MoE/unsloth/gpt-oss-120b-Q4_K_M.gguf</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctk</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ctv</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">q8_0</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">--threads</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">8</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ngl</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">99</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ot</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">&quot;\.(7|8|9|&#91;0-9&#93;&#91;0-9&#93;|&#91;0-9&#93;&#91;0-9&#93;&#91;0-9&#93;)\.ffn_(up|down)_exps.=CPU&quot;</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-p</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">512</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-n</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">256</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-fa</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-ub</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span><span style="color: #D4D4D4"> </span><span style="color: #569CD6">-b</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4096</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_cuda_init:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GGML_CUDA_FORCE_MMQ:</span><span style="color: #D4D4D4">    </span><span style="color: #CE9178">no</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_cuda_init:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GGML_CUDA_FORCE_CUBLAS:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">no</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_cuda_init:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">found</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">ROCm</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">devices:</span></span>
<span class="line"><span style="color: #D4D4D4">  </span><span style="color: #DCDCAA">Device</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #CE9178">:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">AMD</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Radeon</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">AI</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">PRO</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">R9700,</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">gfx1201</span><span style="color: #D4D4D4"> (0x1201), VMM: no, Wave Size: 32</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">model</span><span style="color: #D4D4D4">                          |       </span><span style="color: #DCDCAA">size</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">params</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">backend</span><span style="color: #D4D4D4">    | </span><span style="color: #DCDCAA">threads</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_batch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_ubatch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_k</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_v</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fa</span><span style="color: #D4D4D4"> |            </span><span style="color: #DCDCAA">test</span><span style="color: #D4D4D4"> |                  </span><span style="color: #DCDCAA">t/s</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">------------------------------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">----------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">--------------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------------------:</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">qwen3moe</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">30</span><span style="color: #CE9178">B.A3B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">IQ4_XS</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4.25</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">bpw</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">15.25</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">30.53</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">ROCm,BLAS</span><span style="color: #D4D4D4">  |       </span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> |           </span><span style="color: #DCDCAA">pp512</span><span style="color: #D4D4D4"> |        </span><span style="color: #DCDCAA">834.27</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">3.99</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">qwen3moe</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">30</span><span style="color: #CE9178">B.A3B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">IQ4_XS</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4.25</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">bpw</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">15.25</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">30.53</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">ROCm,BLAS</span><span style="color: #D4D4D4">  |       </span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> |           </span><span style="color: #DCDCAA">tg256</span><span style="color: #D4D4D4"> |        </span><span style="color: #DCDCAA">105.42</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0.12</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"></span>
<span class="line"><span style="color: #DCDCAA">build:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">d2ee056e1</span><span style="color: #D4D4D4"> (6713)</span></span>
<span class="line"><span style="color: #DCDCAA">WARNING:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">radv</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">is</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">not</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">a</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">conformant</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Vulkan</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">implementation,</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">testing</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">use</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">only.</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_vulkan:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Found</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">2</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Vulkan</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">devices:</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_vulkan:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">=</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">AMD</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Radeon</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">AI</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">PRO</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">R9700</span><span style="color: #D4D4D4"> (RADV </span><span style="color: #CE9178">GFX1201</span><span style="color: #D4D4D4">) (</span><span style="color: #DCDCAA">radv</span><span style="color: #D4D4D4">) | </span><span style="color: #DCDCAA">uma:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fp16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">bf16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">warp</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">size:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">64</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">shared</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">memory:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">65536</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">int</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">dot:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">matrix</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">cores:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">KHR_coopmat</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_vulkan:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">=</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Intel</span><span style="color: #D4D4D4">(</span><span style="color: #DCDCAA">R</span><span style="color: #D4D4D4">) </span><span style="color: #CE9178">UHD</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Graphics</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">770</span><span style="color: #D4D4D4"> (ADL-S </span><span style="color: #CE9178">GT1</span><span style="color: #D4D4D4">) (</span><span style="color: #DCDCAA">Intel</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">open-source</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Mesa</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">driver</span><span style="color: #D4D4D4">) | </span><span style="color: #DCDCAA">uma:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fp16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">bf16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">warp</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">size:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">32</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">shared</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">memory:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">65536</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">int</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">dot:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">matrix</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">cores:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">none</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">model</span><span style="color: #D4D4D4">                          |       </span><span style="color: #DCDCAA">size</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">params</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">backend</span><span style="color: #D4D4D4">    | </span><span style="color: #DCDCAA">threads</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_batch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_ubatch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_k</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_v</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fa</span><span style="color: #D4D4D4"> |            </span><span style="color: #DCDCAA">test</span><span style="color: #D4D4D4"> |                  </span><span style="color: #DCDCAA">t/s</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">------------------------------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">----------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">--------------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------------------:</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">qwen3moe</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">30</span><span style="color: #CE9178">B.A3B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">IQ4_XS</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4.25</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">bpw</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">15.25</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">30.53</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">Vulkan,BLAS</span><span style="color: #D4D4D4"> |       </span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> |           </span><span style="color: #DCDCAA">pp512</span><span style="color: #D4D4D4"> |      </span><span style="color: #DCDCAA">1810.00</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">11.36</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">qwen3moe</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">30</span><span style="color: #CE9178">B.A3B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">IQ4_XS</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4.25</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">bpw</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">15.25</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">30.53</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">Vulkan,BLAS</span><span style="color: #D4D4D4"> |       </span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> |           </span><span style="color: #DCDCAA">tg256</span><span style="color: #D4D4D4"> |        </span><span style="color: #DCDCAA">131.09</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0.37</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"></span>
<span class="line"><span style="color: #DCDCAA">build:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">d2ee056e1</span><span style="color: #D4D4D4"> (6713)</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_cuda_init:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GGML_CUDA_FORCE_MMQ:</span><span style="color: #D4D4D4">    </span><span style="color: #CE9178">no</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_cuda_init:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GGML_CUDA_FORCE_CUBLAS:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">no</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_cuda_init:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">found</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">ROCm</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">devices:</span></span>
<span class="line"><span style="color: #D4D4D4">  </span><span style="color: #DCDCAA">Device</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #CE9178">:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">AMD</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Radeon</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">AI</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">PRO</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">R9700,</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">gfx1201</span><span style="color: #D4D4D4"> (0x1201), VMM: no, Wave Size: 32</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">model</span><span style="color: #D4D4D4">                          |       </span><span style="color: #DCDCAA">size</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">params</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">backend</span><span style="color: #D4D4D4">    | </span><span style="color: #DCDCAA">threads</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_batch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_ubatch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_k</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_v</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fa</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">ot</span><span style="color: #D4D4D4">                    |            </span><span style="color: #DCDCAA">test</span><span style="color: #D4D4D4"> |                  </span><span style="color: #DCDCAA">t/s</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">------------------------------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">----------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------------------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">--------------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------------------:</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">gpt-oss</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">120</span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Q4_K</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Medium</span><span style="color: #D4D4D4">     |  </span><span style="color: #DCDCAA">58.45</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">116.83</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">ROCm,BLAS</span><span style="color: #D4D4D4">  |       </span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">\.(7</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">9</span><span style="color: #D4D4D4">|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;)</span><span style="color: #D7BA7D">\.</span><span style="color: #D4D4D4">ffn_(</span><span style="color: #DCDCAA">up</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">down</span><span style="color: #D4D4D4">)_exps.=CPU |           </span><span style="color: #DCDCAA">pp512</span><span style="color: #D4D4D4"> |        </span><span style="color: #DCDCAA">254.46</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">4.62</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">gpt-oss</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">120</span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Q4_K</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Medium</span><span style="color: #D4D4D4">     |  </span><span style="color: #DCDCAA">58.45</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">116.83</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">ROCm,BLAS</span><span style="color: #D4D4D4">  |       </span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">\.(7</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">9</span><span style="color: #D4D4D4">|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;)</span><span style="color: #D7BA7D">\.</span><span style="color: #D4D4D4">ffn_(</span><span style="color: #DCDCAA">up</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">down</span><span style="color: #D4D4D4">)_exps.=CPU |           </span><span style="color: #DCDCAA">tg256</span><span style="color: #D4D4D4"> |         </span><span style="color: #DCDCAA">39.72</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0.04</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"></span>
<span class="line"><span style="color: #DCDCAA">build:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">d2ee056e1</span><span style="color: #D4D4D4"> (6713)</span></span>
<span class="line"><span style="color: #DCDCAA">WARNING:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">radv</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">is</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">not</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">a</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">conformant</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Vulkan</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">implementation,</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">testing</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">use</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">only.</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_vulkan:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Found</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">2</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Vulkan</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">devices:</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_vulkan:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">=</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">AMD</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Radeon</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">AI</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">PRO</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">R9700</span><span style="color: #D4D4D4"> (RADV </span><span style="color: #CE9178">GFX1201</span><span style="color: #D4D4D4">) (</span><span style="color: #DCDCAA">radv</span><span style="color: #D4D4D4">) | </span><span style="color: #DCDCAA">uma:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fp16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">bf16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">warp</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">size:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">64</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">shared</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">memory:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">65536</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">int</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">dot:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">matrix</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">cores:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">KHR_coopmat</span></span>
<span class="line"><span style="color: #DCDCAA">ggml_vulkan:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">=</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Intel</span><span style="color: #D4D4D4">(</span><span style="color: #DCDCAA">R</span><span style="color: #D4D4D4">) </span><span style="color: #CE9178">UHD</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Graphics</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">770</span><span style="color: #D4D4D4"> (ADL-S </span><span style="color: #CE9178">GT1</span><span style="color: #D4D4D4">) (</span><span style="color: #DCDCAA">Intel</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">open-source</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Mesa</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">driver</span><span style="color: #D4D4D4">) | </span><span style="color: #DCDCAA">uma:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fp16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">bf16:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">warp</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">size:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">32</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">shared</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">memory:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">65536</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">int</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">dot:</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">matrix</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">cores:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">none</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">model</span><span style="color: #D4D4D4">                          |       </span><span style="color: #DCDCAA">size</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">params</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">backend</span><span style="color: #D4D4D4">    | </span><span style="color: #DCDCAA">threads</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_batch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">n_ubatch</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_k</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">type_v</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">fa</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">ot</span><span style="color: #D4D4D4">                    |            </span><span style="color: #DCDCAA">test</span><span style="color: #D4D4D4"> |                  </span><span style="color: #DCDCAA">t/s</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">------------------------------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">----------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-----:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">---------------------</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">--------------:</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">-------------------:</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">gpt-oss</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">120</span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Q4_K</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Medium</span><span style="color: #D4D4D4">     |  </span><span style="color: #DCDCAA">58.45</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">116.83</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">Vulkan,BLAS</span><span style="color: #D4D4D4"> |       </span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">\.(7</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">9</span><span style="color: #D4D4D4">|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;)</span><span style="color: #D7BA7D">\.</span><span style="color: #D4D4D4">ffn_(</span><span style="color: #DCDCAA">up</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">down</span><span style="color: #D4D4D4">)_exps.=CPU |           </span><span style="color: #DCDCAA">pp512</span><span style="color: #D4D4D4"> |        </span><span style="color: #DCDCAA">236.58</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">2.89</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"><span style="color: #D4D4D4">| </span><span style="color: #DCDCAA">gpt-oss</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">120</span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Q4_K</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">-</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">Medium</span><span style="color: #D4D4D4">     |  </span><span style="color: #DCDCAA">58.45</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">GiB</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">116.83</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">B</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">Vulkan,BLAS</span><span style="color: #D4D4D4"> |       </span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4"> |    </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |     </span><span style="color: #DCDCAA">4096</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |   </span><span style="color: #DCDCAA">q8_0</span><span style="color: #D4D4D4"> |  </span><span style="color: #DCDCAA">1</span><span style="color: #D4D4D4"> | </span><span style="color: #DCDCAA">\.(7</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">8</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">9</span><span style="color: #D4D4D4">|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;|&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;&#91;</span><span style="color: #B5CEA8">0</span><span style="color: #D4D4D4">-9&#93;)</span><span style="color: #D7BA7D">\.</span><span style="color: #D4D4D4">ffn_(</span><span style="color: #DCDCAA">up</span><span style="color: #D4D4D4">|</span><span style="color: #DCDCAA">down</span><span style="color: #D4D4D4">)_exps.=CPU |           </span><span style="color: #DCDCAA">tg256</span><span style="color: #D4D4D4"> |         </span><span style="color: #DCDCAA">37.41</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">±</span><span style="color: #D4D4D4"> </span><span style="color: #B5CEA8">0.04</span><span style="color: #D4D4D4"> |</span></span>
<span class="line"></span>
<span class="line"><span style="color: #DCDCAA">build:</span><span style="color: #D4D4D4"> </span><span style="color: #CE9178">d2ee056e1</span><span style="color: #D4D4D4"> (6713)</span></span></code></pre></div>
</details>



<p class="wp-block-paragraph">As we can see, the increase is bigger when the model is smaller, as bigger portion of layers fit on the VRAM hence the performance gain from VRAM overclocking. Still a tad slower than RTX 5070 Ti, but we&#8217;ll grab what we can.</p>



<h3 class="wp-block-heading">Qwen Image Edit 2509 Q4_K_M GGUF (Overclocked)</h3>



<figure class="wp-block-table"><table><thead><tr><th>(In seconds, Lower is better)</th><th class="has-text-align-right" data-align="right">RTX 5070 Ti<br>CUDA 12.9<br>Linux</th><th class="has-text-align-right" data-align="right">R9700<br>ROCm 7.0.x<br>Linux<br>Overclocked</th><th class="has-text-align-right" data-align="right">R9700<br>ROCm 7.0.x<br>Linux<br>Stock default</th></tr></thead><tbody><tr><td><strong>Results</strong></td><td class="has-text-align-right" data-align="right"><strong>29.384</strong></td><td class="has-text-align-right" data-align="right">48.628</td><td class="has-text-align-right" data-align="right">52.17</td></tr></tbody></table></figure>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>Overclocked R9700 run screenshot</summary>
<p class="wp-block-paragraph"><a href="https://efisonlt.com/wp-content/uploads/2025/10/R9700-GGUF-OC.png">https://efisonlt.com/wp-content/uploads/2025/10/R9700-GGUF-OC.png</a></p>
</details>



<p class="wp-block-paragraph">7.28% faster. Not bad for a free performance gain.</p>



<h2 class="wp-block-heading">Verdict</h2>



<p class="wp-block-paragraph">So, what do we think?</p>



<p class="wp-block-paragraph">Bigger VRAM doesn&#8217;t always translate to a bigger performance. Especially in our tests which are very inference heavy. This doesn&#8217;t mean this card is DoA or something (pls don&#8217;t be, we need alternatives to the leather jacketed overlord).</p>



<p class="wp-block-paragraph">There are many good takeaways. Such as AMD GPU software team are now finally getting their stuffs together. PyTorch for ROCm on Windows was one of the biggest to-do list. ROCm 7 is being faster than ROCm 6 (at least on RDNA4).</p>



<p class="wp-block-paragraph">They also managed to lower the size of ROCm Docker image and PyTorch package.</p>



<figure data-wp-context="{&quot;imageId&quot;:&quot;6a1d48e3bc854&quot;}" data-wp-interactive="core/image" data-wp-key="6a1d48e3bc854" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" decoding="async" width="1024" height="543" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on--pointerdown="actions.preloadImage" data-wp-on--pointerenter="actions.preloadImageWithDelay" data-wp-on--pointerleave="actions.cancelPreload" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://efisonlt.com/wp-content/uploads/2025/10/image-3-1024x543.png" alt="" class="wp-image-1903" srcset="https://efisonlt.com/wp-content/uploads/2025/10/image-3-1024x543.png 1024w, https://efisonlt.com/wp-content/uploads/2025/10/image-3-300x159.png 300w, https://efisonlt.com/wp-content/uploads/2025/10/image-3-768x407.png 768w, https://efisonlt.com/wp-content/uploads/2025/10/image-3.png 1451w" sizes="(max-width: 1024px) 100vw, 1024px" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			data-wp-bind--aria-label="state.thisImage.triggerButtonAriaLabel"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.thisImage.buttonRight"
			data-wp-style--top="state.thisImage.buttonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<p class="wp-block-paragraph">Unfortunately, turns out it was due to they chose to drop RDNA2 from their supported GPUs. Sad.</p>


<div class="wp-block-image">
<figure data-wp-context="{&quot;imageId&quot;:&quot;6a1d48e3bcd71&quot;}" data-wp-interactive="core/image" data-wp-key="6a1d48e3bcd71" class="aligncenter size-full wp-lightbox-container"><img loading="lazy" decoding="async" width="758" height="618" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on--pointerdown="actions.preloadImage" data-wp-on--pointerenter="actions.preloadImageWithDelay" data-wp-on--pointerleave="actions.cancelPreload" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://efisonlt.com/wp-content/uploads/2025/10/image-4.png" alt="" class="wp-image-1904" srcset="https://efisonlt.com/wp-content/uploads/2025/10/image-4.png 758w, https://efisonlt.com/wp-content/uploads/2025/10/image-4-300x245.png 300w" sizes="(max-width: 758px) 100vw, 758px" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			data-wp-bind--aria-label="state.thisImage.triggerButtonAriaLabel"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.thisImage.buttonRight"
			data-wp-style--top="state.thisImage.buttonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button><figcaption class="wp-element-caption">AMD Radeon ROCm 7 supported GPUs</figcaption></figure>
</div>


<p class="wp-block-paragraph">We still think there would be a better value for a bigger VRAM for different use cases. Probably in LLM finetuning, in 3D modelling/rendering, which we presume wouldn&#8217;t be easier to work around with CPU offload the way we tested MoE LLM inference above.</p>



<figure data-wp-context="{&quot;imageId&quot;:&quot;6a1d48e3bd231&quot;}" data-wp-interactive="core/image" data-wp-key="6a1d48e3bd231" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" decoding="async" width="1024" height="768" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on--pointerdown="actions.preloadImage" data-wp-on--pointerenter="actions.preloadImageWithDelay" data-wp-on--pointerleave="actions.cancelPreload" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://efisonlt.com/wp-content/uploads/2025/10/547259376_1853163105634285_9222241003663848868_n-1024x768.jpg" alt="" class="wp-image-1905" srcset="https://efisonlt.com/wp-content/uploads/2025/10/547259376_1853163105634285_9222241003663848868_n-1024x768.jpg 1024w, https://efisonlt.com/wp-content/uploads/2025/10/547259376_1853163105634285_9222241003663848868_n-300x225.jpg 300w, https://efisonlt.com/wp-content/uploads/2025/10/547259376_1853163105634285_9222241003663848868_n-768x576.jpg 768w, https://efisonlt.com/wp-content/uploads/2025/10/547259376_1853163105634285_9222241003663848868_n-1536x1152.jpg 1536w, https://efisonlt.com/wp-content/uploads/2025/10/547259376_1853163105634285_9222241003663848868_n.jpg 2048w" sizes="(max-width: 1024px) 100vw, 1024px" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			data-wp-bind--aria-label="state.thisImage.triggerButtonAriaLabel"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.thisImage.buttonRight"
			data-wp-style--top="state.thisImage.buttonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<p class="wp-block-paragraph"></p>
<p>The post <a href="https://efisonlt.com/our-experience-with-asus-amd-radeon-ai-pro-r9700-turbo/">Our Experience with Asus AMD Radeon AI Pro R9700 Turbo</a> appeared first on <a href="https://efisonlt.com">Efison Lisan Teknologi</a>.</p>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
