MEGA-Bench

Comprehensive multimodal evaluation benchmark covering 500+ real-world tasks across diverse skills, formats, and output types.

MEGA-Bench is a large-scale multimodal evaluation benchmark designed to assess the breadth of multimodal capabilities in VLMs. It covers over 500 real-world tasks spanning diverse visual skills, input modalities, and output formats — from recognition and reasoning to generation and grounding.

Key contributions:

  • 505 tasks collected from real-world applications and user needs
  • Diverse output formats: free-form text, structured, visual, and code
  • Fine-grained skill taxonomy enabling targeted capability analysis
  • Evaluation of 20+ frontier VLMs with public leaderboard

Links: Paper · Leaderboard