Benchmark

`get_data`

Signature

@app.tool(output="benchmark->q_ls,gt_ls")
def get_data(benchmark: Dict[str, Any]) -> Dict[str, List[Any]]

Function

Loads evaluation samples from a local file, supporting .jsonl / .json / .parquet formats.
Maps original fields to standardized output keys (e.g., q_ls, gt_ls) according to key_map.
Supports sample shuffling (shuffle) and limiting the number of samples (limit).

Output Format (JSON)

{
  "q_ls": ["Question 1", "Question 2"],
  "gt_ls": [["Answer A1", "Answer A2"], ["Answer B"]]
}

servers/benchmark/parameter.yaml

benchmark:
  name: nq
  path: data/sample_nq_10.jsonl
  key_map:
    q_ls: question
    gt_ls: golden_answers
  shuffle: false
  seed: 42
  limit: -1

Parameter Description:

Parameter	Type	Description
`name`	str	Name of the benchmark dataset, used for logging and identification (e.g., `nq`)
`path`	str	Path to the data file, supports `.jsonl`, `.json`, and `.parquet`
`key_map`	dict	Field mapping table that maps original dataset fields to tool output keys
`key_map.q_ls`	str	Name of the question field (e.g., `question`)
`key_map.gt_ls`	str	Name of the ground truth field (e.g., `golden_answers`, supports lists)
`shuffle`	bool	Whether to shuffle the samples (default: `false`)
`seed`	int	Random seed (effective when `shuffle=true`)
`limit`	int	Sampling limit: `-1` means all samples, a positive integer specifies the number of samples to take