conduit

Training LFM-2.5-350M on Reddit post summarization with GRPO on my 3x Mac Minis — final evals and t-test evals are here

· 0 reactions · 0 comments · 0 views
Training LFM-2.5-350M on Reddit post summarization with GRPO on my 3x Mac Minis — final evals and t-test evals are here

So, with this project I want to see if a length constrained (like 64 tokens only) quality summarization can be done by tiny LLMs using GRPO! So, I trained two variants of this task: using just length penalty using a single quality reward/combination of those and length penalty I ran LLM-As-A-Judge eval for checking the summarization quality using DeepEval tools. Those are: Consciencess Coverage Clarity Faitfullness Th results are as attached and the final one is follows: with quality (ROUGE-L +

Original article
Reddit
Read full at Reddit →
Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Email

Discussion

More from Reddit