HealthBench
OpenAI Debuts HealthBench Dataset to Evaluate AI Models in Real-World Medical Scenarios
OpenAI; HealthBench; AI healthcare benchmark; large language models; medical AI evaluation; physician rubrics; realistic medical scenarios; AI safety; open-source dataset