「#プリファレンスファインチューニング」の人気タグ記事一覧｜note ――つくる、つながる、とどける。

Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data

9か月前