We have finally completed BrainDrive Evaluator v1 and brought the full evaluation flow together in one place.
This version automates end-to-end evaluation across both open source and closed source models using the WhyFinder → Ikigai Builder → Decision Helper journey as the test harness.
The focus this time was robustness. Less “looks fine” scoring, more evidence based judgment, with outputs that actually reflect what happened in the transcript.
BrainDrive Evaluator runs a full simulated coaching session, generates the Why and Ikigai profiles, then judges the model using 7 metrics:
Clarity, Structural Correctness, Consistency, Coverage, Hallucination Detection, Decision Expertise, and Sensitivity and Safety.
You also get detailed feedback with pros, cons, pinpointed issues with exact quotes, plus per scenario token tracking so we can compare cost vs quality across models.
In parallel, after my previous calls with @davewaring and @DJJones , I made a few modifications to how the judge compares transcript vs profile for improved scoring stability and better separation between models.
Staging is live and feedback is welcome.
Repo: https://github.com/navaneethkrishnansuresh/BrainDriveEvaluator/
Release: v1.0.0