BenchmarkPsychiatryBenchA multi-task benchmark for LLMs in psychiatry.11Tasks5,188Items—ModelsView leaderboard
BenchmarkSalamahBenchStandardized safety evaluation for Arabic language models.12Tasks8,170Items—ModelsView leaderboard