Function

Benchmark Server is used to load benchmark datasets, commonly used in the data configuration phase of benchmark testing, question answering tasks, or generation tasks.
We strongly recommend preprocessing the data into .jsonl format.
Sample data:
/images/json.svgdata/sample_asqa_5.jsonl
{"idx": 0, "question": "Where does it rain the most in texas?", "golden_answers": ["Piney Woods", "eastern region of Texas", "the far east"]}
{"idx": 1, "question": "Who won the us open golf in 2017?", "golden_answers": ["Brooks Koepka", "United States", "Park Sung-hyun", "South Korea"]}
{"idx": 2, "question": "Where does the term cupboard love come from?", "golden_answers": ["Sigmund Freud, Anna Freud, Melanie Klein and Mary Ainsworth", "1950s and 1960s", "psychoanalysis"]}
{"idx": 3, "question": "Who took control of the italian government in 1922?", "golden_answers": ["National Fascist Party", "Benito Mussolini", "Italian fascists", "National Fascist Party", "PNF", "Partito Nazionale Fascista", "Benito Mussolini", "Benito Amilcare Andrea Mussolini"]}
{"idx": 4, "question": "What's the percentage of canadian hockey players in the nhl?", "golden_answers": ["75", "slightly less than 50"]}

Parameter Description

Below is the configuration file servers/benchmark/parameter.yaml:
/images/yaml.svgservers/benchmark/parameter.yaml
benchmark:
  name: asqa
  path: data/sample_asqa_5.jsonl
  key_map:
    q_ls: question
    gt_ls: golden_answers
  shuffle: false
  seed: 42
  limit: 2
  • name: Dataset name, used for logging, debugging, or identifying the currently loaded dataset in the system.
  • path: Data file path, serves as the entry point for the get_data utility.
  • key_map: Field mapping table, specifies which fields to extract from each sample and sets their aliases.
    • For example: q_ls: question means mapping the original field question to q_ls.
    • To add extra fields, extend here, such as p_ls: retrieved_passage.
  • shuffle: Whether to enable random sampling.
  • seed: Set random seed.
  • limit: Number of samples to load; -1 means loading all data.

Tool Functions

  • get_data: This utility function is used to load and parse data during the preprocessing phase, extracting key fields (such as questions, answers, retrieved passages, etc.) for downstream modules.