conduit

CUDA: reduce MMQ stream-k overhead by JohannesGaessler · Pull Request #22298 · ggml-org/llama.cpp

· 0 reactions · 0 comments · 0 views
CUDA: reduce MMQ stream-k overhead by JohannesGaessler · Pull Request #22298 · ggml-org/llama.cpp

CUDA prompt processing speedup on MoE check this

Original article
Reddit
Read full at Reddit →
Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Email

Discussion

More from Reddit