Dongfu Jiang (姜东甫)

University of Waterloo · Waterloo, Canada · Vector Institute;

Portrait of Dongfu Jiang

Ph.D. Student in Computer Science

Dongfu Jiang | 姜东甫

AI/NLP researcher working on LLM/VLM post-training, multimodal evaluation, and agentic tool-use systems.

University of Waterloo · Waterloo, Canada · Vector Institute;

  • LLM/VLM Post-Training
  • Evaluation & Benchmarks
  • Agentic Systems

I am on the job market for 2026!

Please feel free to reach out if you find my background a good fit for your organization.

CV

Research Experience at

About

I am a Ph.D. student in Computer Science at the University of Waterloo. I am affiliated with TIGER-Lab and the Vector Institute, where I am advised by Prof. Wenhu Chen. I expect to graduate in June 2026. Before Waterloo, I received my B.E. in Computer Science from Zhejiang University, where I was advised by Prof. Zhou Zhao.

My recent research experience includes NVIDIA ADLR in Santa Clara, the Allen Institute for AI in Seattle, SeaAI in Singapore, and earlier collaboration with the University of Southern California. My work has been recognized with an Outstanding Paper Award at TMLR 2025 for Mantis and a Best Paper Finalist / Oral at CVPR 2024 for MMMU. Across these roles, I have worked on post-training, evaluation, and agentic systems, with several projects later adopted or cited by follow-up model, benchmark, and tooling efforts.

My research goal is to build multimodal language agents that can reason, use tools, and collaborate with humans in open-ended settings. More broadly, I am interested in turning capable foundation models into practical systems through stronger post-training methods, better benchmarks, and reusable research infrastructure. My recent research interests include:

I am actively looking for full-time positions in industry research or engineering. Feel free to reach out by email if my background looks relevant.

Recent News

All news
Mar 20, 2026
🚀 Released Nemotron-Cascade 2, a compact open 30B MoE with Gold Medal-level performance on the IMO, IOI, and ICPC World Finals, while using 20x fewer parameters than frontier open models. Model and data are available on Hugging Face.
Mar 11, 2026
🚀 Released Nemotron 3 Super, an open and efficient hybrid Mamba-Transformer MoE built for strong agentic reasoning.
Feb 9, 2026
🚀 Released OpenResearcher, a fully open pipeline for synthesizing long-horizon deep research trajectories with open data, models, and demo.
Dec 27, 2025
🎉 QuickVideo was accepted to TMLR.
Dec 12, 2025
🎉 StructEval was accepted to TMLR, received the Journal-to-Conference Certificate, and will be presented at ICLR 2026.
Dec 1, 2025
🏆 Mantis received the TMLR 2025 Outstanding Paper Award.
Sep 26, 2025
🎬 Released VideoScore2, a multi-dimensional and interpretable evaluator for generative videos with detailed reasoning traces.
Sep 1, 2025
📄 Released the VerlTool technical report, highlighting a unified ARLT framework with async execution, multi-tool support, and competitive results across 6 domains.
Aug 18, 2025
💼 Started my internship at NVIDIA ADLR.
Jun 1, 2025
🚀 Released the VerlTool codebase, a unified framework for training tool-using language agents with async rollouts and modular tool APIs.
May 20, 2025
🚀 Released General-Reasoner, extending RL-style reasoning beyond math and code with verified web-scale data and a generative answer verifier.
Mar 8, 2025
Will join NVIDIA as a research intern in Santa Clara this summer.
Sep 19, 2024
VideoScore was accepted to EMNLP 2024.
Jun 24, 2024
Started my internship at AI2 today!
Jun 18, 2024
I arrived in Seattle for the CVPR 2024 conference.
May 10, 2024
🐯TIGERScore is accepted to TMLR 2024!
Apr 14, 2024
We release Mantis Mantis, enhancing LMM with Interleaved Multi-Image Instruction Tuning!
Apr 8, 2024
WildVision Arena has been accepted to CVPR 2024 demo track and will be presented at the conference!
Apr 5, 2024
Description of the image MMMU is accepted to CVPR 2024 oral presentation!
Feb 20, 2024
I am excited to announce that I will be joining AI2 Mosaic Team as a research intern this summer!
Feb 12, 2024
GenAI arena is now also online! You can test popular image generation/editing models here!
Dec 1, 2023
We release PairRM-0.4B 🤗 based on LLM-Blender!
Nov 29, 2023
We release the new benchmark Description of the image MMMU for evaluating multi-modal models!
Oct 4, 2023
Check my first work at UW: 🐯TIGERScore!
Sep 2, 2023
I arrived at University of Waterloo and started my Ph.D. journey!
Jun 5, 2023
We release LLM-Blender! It’s accepted to ACL 2023!

Publications (*, + indicate equal contribution)

Google Scholar

2026

  1. arXiv
    Zhuolin Yang, Zihan Liu, Yang Chen, Wenliang Dai, Boxin Wang, Sheng-Chieh Lin, Chankyu Lee, Yangyi Chen, and 9 more authors
    arXiv preprint, Mar 2026
    nemotron_cascade_2_overview.png
  2. arXiv
    Chi Ruan, Dongfu Jiang, Huaye Zeng, Ping Nie, and Wenhu Chen
    arXiv preprint, Mar 2026
    evolve_coder_overview.png
  3. Report
    Core Contributor
    Mar 2026
    Technical report, March 11, 2026
    super_v3_overview.jpeg
  4. Blog
    Zhuofeng Li*Dongfu Jiang*, Xueguang Ma, Haoxiang Zhang, Ping Nie, Yuyu Zhang, Kai Zou, Jianwen Xie, and 2 more authors
    Feb 2026
    Blog post, February 9, 2026
    open_researcher_overview.png
  5. ICLR 2026
    Chi Ruan, Dongfu Jiang, Yubo Wang, and Wenhu Chen
    In The Fourteenth International Conference on Learning Representations, Feb 2026
    critique_coder.png

2025

  1. arXiv
    Xuan He*Dongfu Jiang*, Ping Nie, Minghao Liu, Zhengxuan Jiang, Mingyi Su, Wentao Ma, Junru Lin, and 16 more authors
    arXiv preprint, Sep 2025
    videoscore2.png
  2. arXiv
    Dongfu Jiang*, Yi Lu*, Zhuofeng Li*, Zhiheng Lyu*, Ping Nie, Haozhe Wang, Alex Su, Hui Chen, and 4 more authors
    arXiv preprint, Feb 2025
    verltool.png
  3. TMLR 2025
    Jialin Yang*Dongfu Jiang*, Lipeng He, Sherman Siu, Yuxuan Zhang, Disen Liao, Zhuofeng Li, Huaye Zeng, and 12 more authors
    Transactions on Machine Learning Research, May 2025
    Journal to Conference Certificate at TMLR 2025
    structeval_preview.png
  4. TMLR 2025
    Benjamin Schneider*Dongfu Jiang*Chao DuTianyu Pang, and Wenhu Chen
    Transactions on Machine Learning Research, May 2025
    quick_video_overview.png
  5. ACL 2025
    Huaye Zeng*Dongfu Jiang*, Haozhe Wang, Ping Nie, Xiaotong Chen, and Wenhu Chen
    In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jul 2025
    acecoder.png
  6. ICLR 2025
    Jiacheng Chen*, Tianhao Liang*, Sherman Siu*, Zhengqing Wang, Kai Wang, Yubo Wang, Yuansheng Ni, Ziyan Jiang, and 8 more authors
    In The Thirteenth International Conference on Learning Representations, Jul 2025
    megabench_preview.png

2024

  1. EMNLP 2024
    Xuan He*Dongfu Jiang*, Ge Zhang, Max Ku, Achint Soni, Sherman Siu, Haonan Chen, Abhranil Chandra, and 11 more authors
    In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Nov 2024
    videoscore.png
  2. NeurIPS 2024
    Yujie Lu, Dongfu JiangWenhu Chen, William Yang Wang, Yejin Choi, and Bill Yuchen Lin
    In Advances in Neural Information Processing Systems 37 (NeurIPS 2024) Datasets and Benchmarks Track, Dec 2024
    wildvision.png
  3. NeurIPS 2024
    Dongfu Jiang*, Max Ku*, Tianle Li*, Yuansheng Ni, Shizhuo Sun, Rongqi Fan, and Wenhu Chen
    In Advances in Neural Information Processing Systems 37 (NeurIPS 2024) Datasets and Benchmarks Track, Dec 2024
    genai-arena.png
  4. TMLR 2024
    Dongfu Jiang, Xuan He, Huaye Zeng, Cong Wei, Max W.F. Ku, Qian Liu, and Wenhu Chen
    Transactions on Machine Learning Research, Dec 2024
    Outstanding Paper Award at TMLR 2025 (1 / 1539 selected)
    mantis_preview.png
  5. ACL 2024
    Max Ku, Dongfu Jiang, Cong Wei, Xiang Yue, and Wenhu Chen
    In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Aug 2024
    viescore.png
  6. CVPR 2024
    Xiang Yue*, Yuansheng Ni*, Kai Zhang*, Tianyu Zheng*, Ruoqi Liu, Ge Zhang, Samuel Stevens, Dongfu Jiang, and 14 more authors
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2024
    Best Paper Finalist and Oral at CVPR 2024 (24 / 11,532 selected)
    mmmu_preview.png
  7. TMLR 2024
    Dongfu Jiang*, Yishan Li*, Ge Zhang, Wenhao Huang, Bill Yuchen Lin, and Wenhu Chen
    Transactions on Machine Learning Research, May 2024
    tigerscore_preview.png

2023

  1. ACL 2023
    Dongfu Jiang, Xiang Ren, and Bill Yuchen Lin
    In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jul 2023
    llm_blender_preview.png

Experience

Full CV

Education

  • 2023 - 2026

    Ph.D. in Computer Science

    University of Waterloo, Waterloo, Canada

    • Advised by Prof. Wenhu Chen, TIGER-Lab
    • Affiliate of the Vector Institute for AI
  • 2019 - 2023

    B.E. in Computer Science

    Zhejiang University, Hangzhou, China

    • GPA 3.97 / 4.00
    • Advised by Prof. Zhou Zhao

Research Experience

  • Aug 2025 - Present

    Research Intern

    NVIDIA ADLR, Santa Clara, US

    • Agentic reinforcement learning for tool use
    • Contributing to post-training of Nemotron family of models
  • Jun 2024 - Sep 2024

    Research Intern

    Allen Institute for AI, Seattle, US

    • Active learning with verbalized human feedback
  • Feb 2024 - Sep 2025

    Research Associate

    SeaAI, Singapore (remote)

    • Worked on interleaved multi-image instruction tuning for multimodal language models
  • Mar 2022 - Mar 2023

    Research Intern

    University of Southern California, US (remote)

    • Worked on methods for ensembling large language models with ranking and generation-based fusion

Impact

Full CV
  • 2023 - 2026

    First / co-first works received broad online coverage, including MarkTechPost features on LLM-Blender, GenAI-Arena, OpenResearcher, and AceCoder.

  • 2024 - 2026

    MMMU has been cited by major multimodal model and benchmark works including Llama 3, LLaVA-OneVision, Cambrian-1, Kimi-VL, and Video-MME.

  • 2023 - 2026

    LLM-Blender has been cited by representative LLM systems and evaluation works including FrugalGPT, Prometheus 2, RewardBench, SimPO, and Mixture-of-Agents.

  • 2024 - 2026

    MANTIS has been cited by multimodal follow-up works including LLaVA-OneVision, LLaVA-NeXT-Interleave, Molmo / PixMo, InternVL3, and MMMU-Pro.

  • 2025 - 2026

    VerlTool has been cited by later agentic RL and tool-use works including DeepAgent, SkyRL, and AgentFlow.

  • 2026

    OpenResearcher's data has been adopted by NVIDIA's Nemotron family of models.

  • 2025

    AceCoder synthesized prompts were used in the coding RL data mixture of OLMo 3.

  • 2024 - 2026

    WildVision has been cited by follow-up multimodal evaluation and alignment works including LLaVA-Critic, InternVL3, InternVL3.5, and Mammoth-VL.