work VerlTool Holistic RL training framework for tool-using language agents. Extends the verl framework with multi-turn rollout, tool execution, and reward integration for agentic tasks. Mantis Interleaved multi-image instruction tuning for multimodal LLMs, with new benchmarks for complex multi-image reasoning. AceCoder Reinforcement learning from execution feedback for competitive-level code generation. Achieves state-of-the-art on coding benchmarks with verified test-case rewards. VideoScore Building automatic metrics for video generation quality via fine-grained human feedback. Covers visual quality, motion smoothness, text alignment, and factual consistency. LLM-Blender Ensemble framework for LLMs using pairwise ranking and generative fusion. Consistently outperforms individual models by combining their diverse strengths. MEGA-Bench Comprehensive multimodal evaluation benchmark covering 500+ real-world tasks across diverse skills, formats, and output types. fun