人気の記事一覧

TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding

6か月前

Video Annotator: A framework for efficiently building video classifiers using vision-language models and active learning

8か月前