AceCoder
Reinforcement learning from execution feedback for competitive-level code generation. Achieves state-of-the-art on coding benchmarks with verified test-case rewards.
AceCoder applies reinforcement learning with execution-based feedback to train code generation models. By using test-case pass rates as reward signals rather than human preference labels, AceCoder achieves strong performance on competitive programming benchmarks.
Key contributions:
- Execution-based RL reward using test case pass/fail signals
- Competitive performance on HumanEval, MBPP, and LiveCodeBench
- Applicable to any instruction-tuned code LLM as a post-training stage
- Open-source training code and model checkpoints