conduit

LabelSets — open quality standard for AI training data (LQS v3.1) [D]

· 0 reactions · 0 comments · 0 views
LabelSets — open quality standard for AI training data (LQS v3.1) [D]

Built a third-party quality rating system for ML datasets. Multi-oracle (7 scorers across 5 algorithm families), conformal prediction intervals on downstream F1, Ed25519-signed certs, and a contamination check against 40+ public evals (MMLU, HumanEval, GSM8K, MedQA, LegalBench, etc.). Methodology paper, CC BY 4.0: Free audit (paste any HF dataset URL): Public verification API, no auth: GET /api/verify-lqs-cert/:hash Calibration corpus is at ~1,000 datasets and growing toward 10,000 by Q3 2026 —

Original article
Reddit
Read full at Reddit →
Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Email

Discussion

More from Reddit