Benchmark Server

作用

Benchmark Server 用于加载评测数据集，常用于基准测试、问答任务或生成任务中的数据配置阶段。

我们强烈推荐将数据预处理为.jsonl格式。

示例数据：

data/sample_asqa_5.jsonl

{"idx": 0, "question": "Where does it rain the most in texas?", "golden_answers": ["Piney Woods", "eastern region of Texas", "the far east"]}
{"idx": 1, "question": "Who won the us open golf in 2017?", "golden_answers": ["Brooks Koepka", "United States", "Park Sung-hyun", "South Korea"]}
{"idx": 2, "question": "Where does the term cupboard love come from?", "golden_answers": ["Sigmund Freud, Anna Freud, Melanie Klein and Mary Ainsworth", "1950s and 1960s", "psychoanalysis"]}
{"idx": 3, "question": "Who took control of the italian government in 1922?", "golden_answers": ["National Fascist Party", "Benito Mussolini", "Italian fascists", "National Fascist Party", "PNF", "Partito Nazionale Fascista", "Benito Mussolini", "Benito Amilcare Andrea Mussolini"]}
{"idx": 4, "question": "What's the percentage of canadian hockey players in the nhl?", "golden_answers": ["75", "slightly less than 50"]}

参数说明

以下是 servers/benchmark/parameter.yaml 的配置文件：

servers/benchmark/parameter.yaml

benchmark:
  name: asqa
  path: data/sample_asqa_5.jsonl
  key_map:
    q_ls: question
    gt_ls: golden_answers
  shuffle: false
  seed: 42
  limit: 2

name：数据集名称，用于日志、调试或在系统中标识当前加载的数据集。
path：数据文件路径，作为 get_data 工具的读取入口。
key_map：字段映射表，指定从每条样本中提取哪些字段，并设定其别名。
- 例如：q_ls: question 表示将原字段 question 映射为 q_ls。
- 如需添加额外字段，可在此处扩展，如 p_ls: retrieved_passage。
shuffle：是否启用随机采样
seed：设置随机种子
limit：加载的样本数量；-1代表加载全部数据

工具函数说明

get_data：该工具函数用于在数据预处理阶段加载并解析数据，提取关键字段（如问题、答案、检索段落等），供下游模块使用。

开始使用

开发指南

作用

参数说明

工具函数说明

开始使用

开发指南

​作用

​参数说明

​工具函数说明

作用

参数说明

工具函数说明