「#ビジョンランゲージモデル」の人気タグ記事一覧｜note ――つくる、つながる、とどける。

TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding

10か月前

Video Annotator: A framework for efficiently building video classifiers using vision-language models and active learning

1年前