Task Type | Dataset Name | Original Data Quantity | Leaderboard Sample Quantity |
---|---|---|---|
qa | nq | 3,610 | 1,000 |
qa | TriviaQA | 11,313 | 1,000 |
qa | popqa | 14,267 | 1,000 |
qa | AmbigQA | 2,002 | 1,000 |
qa | MarcoQA | 101,093 ; 55,636 (filtered no-answer version) | 1,000 (based on filtered) |
qa | WebQuestions | 2,032 | 1,000 |
Multi-hop qa | hotpotqa | 7,405 | 1,000 |
Multi-hop qa | 2WikiMultiHopQA | 12,576 | 1,000 |
Multi-hop qa | Musique | 2,417 | 1,000 |
Multi-hop qa | bamboogle | 125 | 125 (unprocessed) |
Multi-hop qa | strategy-qa | 2,290 | 1,000 |
Multiple-choice | ARC | 3,548 ; (options are uppercase letters A-E, with option E having 1 item) | 1,000 |
Multiple-choice | mmlu | 14,042 ; (options are uppercase letters A-D) | 1,000 |
Long-form QA | ASQA | 948 | 948 (unprocessed) |
fact-verification | FEVER | 13,332 ; (only support and refuse labels retained) | 1,000 |
dialogue | WoW | 3,054 | 1,000 |
slot-filling | T-REx | 5,000 | 1,000 |
Corpus Name | Number of Documents |
---|---|
wiki2018 | 21,015,324 |
wiki2024 | Coming soon |