Training LFM-2.5-350M on Reddit post summarization with GRPO on my 3x Mac Minis — final evals and t-test evals are here
·
0 reactions
·
0 comments
·
0 views
So, with this project I want to see if a length constrained (like 64 tokens only) quality summarization can be done by tiny LLMs using GRPO! So, I trained two variants of this task: using just length penalty using a single quality reward/combination of those and length penalty I ran LLM-As-A-Judge eval for checking the summarization quality using DeepEval tools. Those are: Consciencess Coverage Clarity Faitfullness Th results are as attached and the final one is follows: with quality (ROUGE-L +
Original article
Reddit
Anonymous · no account needed