conduit

[Qwen3.6 35b a3b] Used the top config for my setup 8gb vram and 32gb ram, and found that somehow the Q4_K_XL model from Unsloth runs just slightly faster and used less tokens for output compared to Q4_K_M despite more memory usage

· 0 reactions · 0 comments · 0 views
[Qwen3.6 35b a3b] Used the top config for my setup 8gb vram and 32gb ram, and found that somehow the Q4_K_XL model from Unsloth runs just slightly faster and used less tokens for output compared to Q4_K_M despite more memory usage

Config CtxSize: 131,072 GpuLayers: 99 CpuMoeLayers: 38 Threads: 16 BatchSize/UBatchSize: 4096/4096 CacheType K/V: q8_0 Tool Context: file mode (tools.kilocode.official.md) Metric M Model XL Model Difference Avg Tokens/sec 28.92 29.78 +0.86 (+3.0%) Median Tokens/sec 30.96 32.08 +1.12 (+3.6%) Avg Wall Seconds 108.03s 99.93s -8.10s (-7.5%) Avg Output Tokens 3,031.8 2,895.8 -136 (-4.5%) Avg Input Tokens/sec 50.20 55.96 +5.76 (+11.5%) Avg Decode Tokens/sec 75.89 76.44 +0.55 (+0.7%) Runs ~33% slower f

Original article
Reddit
Read full at Reddit →
Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Email

Discussion

More from Reddit