Training LFM-2.5-350M on Reddit post summarization with GRPO on my 3x Mac Minis — final evals and t-test evals are here

April 26, 2026 at 10:28 AM · 0 reactions · 0 comments · 0 views

So, with this project I want to see if a length constrained (like 64 tokens only) quality summarization can be done by tiny LLMs using GRPO! So, I trained two variants of this task: using just length penalty using a single quality reward/combination of those and length penalty I ran LLM-As-A-Judge eval for checking the summarization quality using DeepEval tools. Those are: Consciencess Coverage Clarity Faitfullness Th results are as attached and the final one is follows: with quality (ROUGE-L +

Original article

Read full at Reddit →

Anonymous · no account needed

Training LFM-2.5-350M on Reddit post summarization with GRPO on my 3x Mac Minis — final evals and t-test evals are here

Discussion

More from Reddit