MEGA-Bench
Comprehensive multimodal evaluation benchmark covering 500+ real-world tasks across diverse skills, formats, and output types.
MEGA-Bench is a large-scale multimodal evaluation benchmark designed to assess the breadth of multimodal capabilities in VLMs. It covers over 500 real-world tasks spanning diverse visual skills, input modalities, and output formats — from recognition and reasoning to generation and grounding.
Key contributions:
- 505 tasks collected from real-world applications and user needs
- Diverse output formats: free-form text, structured, visual, and code
- Fine-grained skill taxonomy enabling targeted capability analysis
- Evaluation of 20+ frontier VLMs with public leaderboard
Links: Paper · Leaderboard