Qwen3.6-27B at ~80 tps with 218k context window on 1x RTX 5090 served by vllm 0.19
·
0 reactions
·
0 comments
·
0 views
Qwen3.6-27B is out for a few days and the NVFP4 with MTP is dropped earlier on HF: Can follow the same recipe I used for Qwen3.5-27B to achieve ~80 tps on a single RTX 5090 at 218k context window via latest vllm 0.19 builds (vLLM 0.19.1rc1)
Original article
Reddit
Anonymous · no account needed