wiki/concepts/reward_model.md history