Speculative decoding with Gemma-4-31B + Gemma-4-E2B enables 120 - 200 tok/s output speed for specific tasks
·
0 reactions
·
0 comments
·
2 views
So for my project I was using up until now either Gemini 3 / 2.5 Flash or Flash-lite. All my use cases are not agentic, simply LLM workflows for atomic tasks like extracting references from the law, classifying, adjusting titles to nominative case and so on. All this happens in non-English (LT) language, that's one of the reasons I originally used Google models, as multilingual quality is very great for small base languages. Each single request usually fits in 2k - 6k tokens context. Recently I
Original article
Reddit
Anonymous · no account needed