Qwen3.6-27B-INT4 clocking 100 tps with 256k context length on 1x RTX 5090 via vllm 0.19
·
0 reactions
·
0 comments
·
0 views
Thanks to the community the Qwen3.6-27B speed keeps getting better. The following improves upon my recipe from yesterday and delivered a whopping 100+ tps (TG). Model: - MTP supported - KLD is decent (much better than NVFP4 per the linked post) with the benefit of being the smallest model - The smaller model size allows for full native 256k context window Tokens per second (TG): 105-108 tps Special credits to this post that helps me discover the Lorbus quant: Note that I didn't mess with TQ in m
Original article
Reddit
Anonymous · no account needed