人気の記事一覧

報酬関数の罠とAIの賢さ:リワードハッキングの本質

1か月前

ロボットをシミュレーション上で歩かせるには

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

9か月前

Deep reinforcement learning from human preferences

8か月前

Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data

9か月前

RLIF: Interactive Imitation Learning as Reinforcement Learning

9か月前

Large Language Models Open New Way of AI-Assisted Molecule Design for Chemists

9か月前

From r to Q∗: Your Language Model is Secretly a Q-Function

9か月前