人気の記事一覧

SVGEditBench: A Benchmark Dataset for Quantitative Assessment of LLM's SVG Editing Capabilities

9か月前

【論文要約:自動運転関連】ROAD-Waymo: Action Awareness at Scale for Autonomous Driving

1か月前

SECURE: Benchmarking Generative Large Language Models for Cybersecurity Advisory

7か月前

Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation Dataset

8か月前

Building a Large Japanese Web Corpus for Large Language Models

8か月前

MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries

K-QA: A Real-World Medical Q&A Benchmark