人気の記事一覧

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

5か月前

MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding

5か月前

C3LLM: Conditional Multimodal Content Generation Using Large Language Models

5か月前

TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment

5か月前

Agent AI: Surveying the Horizons of Multimodal Interaction

6か月前

Video Annotator: A framework for efficiently building video classifiers using vision-language models and active learning

9か月前

FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild

10か月前