Qwen3.6-27B at ~80 tps with 218k context window on 1x RTX 5090 served by vllm 0.19

April 25, 2026 at 10:21 AM · 0 reactions · 0 comments · 0 views

Qwen3.6-27B is out for a few days and the NVFP4 with MTP is dropped earlier on HF: Can follow the same recipe I used for Qwen3.5-27B to achieve ~80 tps on a single RTX 5090 at 218k context window via latest vllm 0.19 builds (vLLM 0.19.1rc1)

Original article

Read full at Reddit →

Anonymous · no account needed

Qwen3.6-27B at ~80 tps with 218k context window on 1x RTX 5090 served by vllm 0.19

Discussion

More from Reddit