AceCoder | Dongfu Jiang (姜东甫)

AceCoder applies reinforcement learning with execution-based feedback to train code generation models. By using test-case pass rates as reward signals rather than human preference labels, AceCoder achieves strong performance on competitive programming benchmarks.

Key contributions:

Execution-based RL reward using test case pass/fail signals
Competitive performance on HumanEval, MBPP, and LiveCodeBench
Applicable to any instruction-tuned code LLM as a post-training stage
Open-source training code and model checkpoints

Links: GitHub · Paper