作用

Benchmark Server 用于加载评测数据集,常用于基准测试、问答任务或生成任务中的数据配置阶段。
我们强烈推荐将数据预处理为.jsonl格式。
示例数据:
/images/json.svgdata/sample_asqa_5.jsonl
{"idx": 0, "question": "Where does it rain the most in texas?", "golden_answers": ["Piney Woods", "eastern region of Texas", "the far east"]}
{"idx": 1, "question": "Who won the us open golf in 2017?", "golden_answers": ["Brooks Koepka", "United States", "Park Sung-hyun", "South Korea"]}
{"idx": 2, "question": "Where does the term cupboard love come from?", "golden_answers": ["Sigmund Freud, Anna Freud, Melanie Klein and Mary Ainsworth", "1950s and 1960s", "psychoanalysis"]}
{"idx": 3, "question": "Who took control of the italian government in 1922?", "golden_answers": ["National Fascist Party", "Benito Mussolini", "Italian fascists", "National Fascist Party", "PNF", "Partito Nazionale Fascista", "Benito Mussolini", "Benito Amilcare Andrea Mussolini"]}
{"idx": 4, "question": "What's the percentage of canadian hockey players in the nhl?", "golden_answers": ["75", "slightly less than 50"]}

参数说明

以下是 servers/benchmark/parameter.yaml 的配置文件:
/images/yaml.svgservers/benchmark/parameter.yaml
benchmark:
  name: asqa
  path: data/sample_asqa_5.jsonl
  key_map:
    q_ls: question
    gt_ls: golden_answers
  shuffle: false
  seed: 42
  limit: 2
  • name:数据集名称,用于日志、调试或在系统中标识当前加载的数据集。
  • path:数据文件路径,作为 get_data 工具的读取入口。
  • key_map:字段映射表,指定从每条样本中提取哪些字段,并设定其别名。
    • 例如:q_ls: question 表示将原字段 question 映射为 q_ls
    • 如需添加额外字段,可在此处扩展,如 p_ls: retrieved_passage
  • shuffle:是否启用随机采样
  • seed:设置随机种子
  • limit:加载的样本数量;-1代表加载全部数据

工具函数说明

  • get_data:该工具函数用于在数据预处理阶段加载并解析数据,提取关键字段(如问题、答案、检索段落等),供下游模块使用。