MEGA-Bench | Dongfu Jiang (姜东甫)

MEGA-Bench is a large-scale multimodal evaluation benchmark designed to assess the breadth of multimodal capabilities in VLMs. It covers over 500 real-world tasks spanning diverse visual skills, input modalities, and output formats — from recognition and reasoning to generation and grounding.

Key contributions:

505 tasks collected from real-world applications and user needs
Diverse output formats: free-form text, structured, visual, and code
Fine-grained skill taxonomy enabling targeted capability analysis
Evaluation of 20+ frontier VLMs with public leaderboard

Links: Paper · Leaderboard