Skip to main content
We have curated and preprocessed the most commonly used public benchmark datasets and corpora in current RAG research.
All datasets have been released on ModelScope and Huggingface.
Users can directly download and use them without additional cleaning or conversion — they are fully compatible with UltraRAG’s evaluation pipelines.

Benchmark

The following table summarizes the supported task types and their corresponding dataset statistics:
Task TypeDataset NameOriginal Data VolumeEvaluation Samples
QANQ3,6101,000
QATriviaQA11,3131,000
QAPopQA14,2671,000
QAAmbigQA2,0021,000
QAMarcoQA55,6361,000
QAWebQuestions2,0321,000
VQAMP-DocVQA591591
VQAChartQA6363
VQAInfoVQA718718
VQAPlotQA863863
Multi-hop QAHotpotQA7,4051,000
Multi-hop QA2WikiMultiHopQA12,5761,000
Multi-hop QAMusique2,4171,000
Multi-hop QABamboogle125125
Multi-hop QAStrategyQA2,2901,000
Multi-hop VQASlideVQA556556
Multiple-choiceARC3,5481,000
Multiple-choiceMMLU14,0421,000
Multiple-choice VQAArXivQA816816
Long-form QAASQA948948
Fact VerificationFEVER13,3321,000
DialogueWoW3,0541,000
Slot-fillingT-REx5,0001,000
Data Format Specification To ensure full compatibility with UltraRAG modules, users are advised to store test data uniformly in .jsonl format following the specifications below. Non-multiple-choice data format:
https://mintcdn.com/ultrarag/T7GffHzZitf6TThi/images/json.svg?fit=max&auto=format&n=T7GffHzZitf6TThi&q=85&s=81a8c440100333f3454ca984a5b0fe5a
{
  "id": 0, 
  "question": "where does the karate kid 2010 take place", 
  "golden_answers": ["China", "Beijing", "Beijing, China"], 
  "meta_data": {} 
}
Multiple-choice data format:
https://mintcdn.com/ultrarag/T7GffHzZitf6TThi/images/json.svg?fit=max&auto=format&n=T7GffHzZitf6TThi&q=85&s=81a8c440100333f3454ca984a5b0fe5a
{
  "id": 0, 
  "question": "Mast Co. converted from the FIFO method for inventory valuation to the LIFO method for financial statement and tax purposes. During a period of inflation would Mast's ending inventory and income tax payable using LIFO be higher or lower than FIFO? Ending inventory Income tax payable", 
  "golden_answers": ["A"], 
  "choices": ["Lower Lower", "Higher Higher", "Lower Higher", "Higher Lower"], 
  "meta_data": {"subject": "professional_accounting"}
}

Corpus

UltraRAG provides multi-source, high-quality standardized corpora covering both text and image modalities, facilitating the construction of diverse RAG systems.
The following table summarizes the current corpus statistics:
Corpus NameNumber of Documents
Wiki-201821,015,324
Wiki-202430,463,973
MP-DocVQA741
ChartQA500
InfoVQA459
PlotQA9,593
SlideVQA1,284
ArXivQA8,066
Data Format Specification Text corpus format:
https://mintcdn.com/ultrarag/T7GffHzZitf6TThi/images/json.svg?fit=max&auto=format&n=T7GffHzZitf6TThi&q=85&s=81a8c440100333f3454ca984a5b0fe5a
{
  "id": "15106858", 
  "contents": "Arrowhead Stadium 1970s practice would eventually spread to the other NFL stadiums as the 1970s progressed, finally becoming mandatory league-wide in the 1978 season (after being used in Super Bowl XII), and become almost near-universal at the lower levels of football. On January 20, 1974, Arrowhead Stadium hosted the Pro Bowl. Due to an ice storm and brutally cold temperatures the week leading up to the game, the game's participants worked out at the facilities of the San Diego Chargers. On game day, the temperature soared to 41 F, melting most of the ice and snow that accumulated during the week. The AFC defeated the NFC, 15–13."
}
Image corpus format:
https://mintcdn.com/ultrarag/T7GffHzZitf6TThi/images/json.svg?fit=max&auto=format&n=T7GffHzZitf6TThi&q=85&s=81a8c440100333f3454ca984a5b0fe5a
{
  "id": 0, 
  "image_id": "37313.jpeg", 
  "image_path": "image/37313.jpg"
}