Qwen3.6-27B-INT4 clocking 100 tps with 256k context length on 1x RTX 5090 via vllm 0.19

April 26, 2026 at 8:37 AM · 0 reactions · 0 comments · 0 views

Thanks to the community the Qwen3.6-27B speed keeps getting better. The following improves upon my recipe from yesterday and delivered a whopping 100+ tps (TG). Model: - MTP supported - KLD is decent (much better than NVFP4 per the linked post) with the benefit of being the smallest model - The smaller model size allows for full native 256k context window Tokens per second (TG): 105-108 tps Special credits to this post that helps me discover the Lorbus quant: Note that I didn't mess with TQ in m

Original article

Read full at Reddit →

Anonymous · no account needed

Qwen3.6-27B-INT4 clocking 100 tps with 256k context length on 1x RTX 5090 via vllm 0.19

Discussion

More from Reddit