projects | Dongfu Jiang (姜东甫)

work

VerlTool

Holistic RL training framework for tool-using language agents. Extends the verl framework with multi-turn rollout, tool execution, and reward integration for agentic tasks.

Mantis

Interleaved multi-image instruction tuning for multimodal LLMs, with new benchmarks for complex multi-image reasoning.

AceCoder

Reinforcement learning from execution feedback for competitive-level code generation. Achieves state-of-the-art on coding benchmarks with verified test-case rewards.

VideoScore

Building automatic metrics for video generation quality via fine-grained human feedback. Covers visual quality, motion smoothness, text alignment, and factual consistency.

LLM-Blender

Ensemble framework for LLMs using pairwise ranking and generative fusion. Consistently outperforms individual models by combining their diverse strengths.

MEGA-Bench

Comprehensive multimodal evaluation benchmark covering 500+ real-world tasks across diverse skills, formats, and output types.