人気の記事一覧

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

7か月前

Scenarios and Approaches for Situated Natural Language Explanations

7か月前

HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models

8か月前

SimPO: Simple Preference Optimization with a Reference-Free Reward

8か月前

Hallucination of Multimodal Large Language Models: A Survey

8か月前

Instruction-Following Evaluation for Large Language Models

9か月前

MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities