Overview
Local Model Lab is a sandbox for running local LLMs alongside retrieval pipelines, then comparing outputs against a benchmark set — all on-device.
Challenge
Local model workflows are easy to spin up but hard to trust. There is no shared harness for "did this actually get better?" that works offline.
Approach
- Fixed evaluation harness with deterministic seeds.
- One-click switching between local and cloud models.
- Record every run with inputs, outputs, and grading rubric.