「#人間のフィードバック」の人気タグ記事一覧｜note ――つくる、つながる、とどける。

Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

8か月前

1

Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment

9か月前

1

Constitutional AI: Harmlessness from AI Feedback

8か月前