Hey All,
We just wrapped a hands-on round with our HealthEval framework: here’s what I discussed in the video and our current top open-source picks for health-advice–focused models on Hugging Face:
Top Open Source Health Models – BrainDrive
Top 3 (with quick stats)
prithivMLmods/Qwen-UMLS-7B-Instruct · Hugging Face
Score 7.44 | 6-metric profile (Evidence & Transparency, Clinical Safety, Empathy, Clarity, Plan Quality, Trust & Agency Support) | 7B params | UMLS-aligned | EN
prithivMLmods/Qwen-UMLS-7B-Instruct · Hugging Face
microsoft/Phi-3-mini-4k-instruct · Hugging Face
Score 7.43 | 6-metric profile | 3.8B params | MIT license | EN
microsoft/Phi-3-mini-4k-instruct · Hugging Face
m42-health/Llama3-Med42-8B · Hugging Face
Score 7.18 | 6-metric profile | 8B params | Llama 3 base | EN
m42-health/Llama3-Med42-8B · Hugging Face
How we ranked them (HealthEval by BrainDrive)
HealthEval is our evaluation workflow for AI-generated medical and health advice.
We score models on 6 clinically grounded metrics—Evidence & Transparency, Clinical Safety, Empathy, Clarity, Plan Quality, and Trust & Agency Support.
Individual scores roll up into a weighted total, which determines ranking.
Docs (scoring & math):
https://github.com/BrainDriveAI/ModelMatch/tree/main/HealthEval/DOCS
Workflow we used
Model shortlist (20+ HF candidates) → Multi-domain health prompts (chronic care, prevention, patient guidance, treatment safety) → Responses per model → HealthEval scoring → Weighted aggregation → Ranking.
Try it yourself
Code toolkit: https://github.com/BrainDriveAI/ModelMatch/tree/main/HealthEval
No-code evaluator: HealthEval - a Hugging Face Space by BrainDrive
About ModelMatch
ModelMatch helps you discover the most suitable open-source model for your domain and task—starting with summarization, expanding into therapy, email generation, finance evaluation, and now health evaluation.
If you test other models or get different results, ping us; happy to compare notes.
Regards,
Navaneeth