Lilian Weng(@lilianweng)2025年10月27日

On-policy distillation provides an elegant way to use the teacher model as a process reward model to...

8.5Score

用这条生成生成视频方案

On-policy distillation provides an elegant way to use the teacher model as a process reward model to...

AI 深度提炼

On-policy蒸馏结合RL纠错能力与SFT奖励密度，优化训练稳定性。
教师模型可充当过程奖励模型，避免rollout阶段的OOD shock问题。
该方法在数学推理和内部聊天助手任务中表现优于传统方法。

#强化学习#模型蒸馏#AI训练

打开原文

Lilian Weng on X: "On-policy distillation provides an elegant way to use the teacher model as a process reward model to provide dense reward while preventing SFT style "OOD shock" during rollout." / X

Don’t miss what’s happening

People on X are the first to know.

Post

See new posts

Conversation

![Image 3](http://x.com/lilianweng)

Lilian Weng ![Image 4](http://x.com/lilianweng)

@lilianweng

On-policy distillation provides an elegant way to use the teacher model as a process reward model to provide dense reward while preventing SFT style "OOD shock" during rollout.

Quote

Thinking Machines

@thinkymachines

Oct 27, 2025

Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other

![Image 6: Image](http://x.com/thinkymachines/status/1982856272023302322/photo/1)

5:31 PM · Oct 27, 2025

142.3K Views

767

292

Read 31 replies

New to X?

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Relevant people

![Image 7](http://x.com/lilianweng) Lilian Weng ![Image 8](http://x.com/lilianweng) @lilianweng Follow Click to Follow lilianweng Co-founder of Thinking Machines Lab @thinkymachines ; Ex-VP, AI Safety & robotics, applied research @OpenAI ; Author of Lil'Log
![Image 9: Square profile picture](http://x.com/thinkymachines) Thinking Machines @thinkymachines Follow Click to Follow thinkymachines Thinking, beeping, and booping. @tinkerapi

Trending now

What’s happening

Trending in United States

Chibi

Sports · Trending

#VegasBorn

Trending in United States

$HIGHER

Trending in United States

Mami

Cookie Policy

Accessibility

Ads info