Skip to main content

Pipeline Overview

In UR-2.0, a Pipeline defines how an inference task is executed. It functions like a task plan, specifying the exact sequence of operations that the system should perform at each step. With a Pipeline, you can flexibly combine different functions (Tools) from multiple modules (Servers) to build a complete, reproducible, and controllable RAG inference workflow. For example:
  • Load data → Retrieve documents → Construct prompt → Invoke LLM → Evaluate results
  • Or, in multi-round generation, decide dynamically whether to re-retrieve or stop early based on the model’s intermediate outputs
A complete RAG inference workflow can be defined and executed using just one YAML file.

Writing Specification

In UR-2.0, a Pipeline is written in YAML format and defines the full task execution process.
A typical Pipeline file contains two top-level sections:
  • servers — Declares all MCP Server modules used in the current workflow. Each Server corresponds to a functional module (such as retrieval, generation, or evaluation). The key is the module name, and the value is its path within the project.
  • pipeline — Defines the task’s execution logic. Each item represents a specific execution step or control node. The Pipeline supports sequential execution, loops, and conditional branching.
https://mintcdn.com/ultrarag/T7GffHzZitf6TThi/images/yaml.svg?fit=max&auto=format&n=T7GffHzZitf6TThi&q=85&s=69b41e79144bc908039c2ee3abbb1c3bexamples/rag_full.yaml
servers:
  benchmark: servers/benchmark
  retriever: servers/retriever
  prompt: servers/prompt
  generation: servers/generation
  evaluation: servers/evaluation
  custom: servers/custom

pipeline:
- benchmark.get_data
- retriever.retriever_init
- retriever.retriever_embed
- retriever.retriever_index
- retriever.retriever_search
- generation.generation_init
- prompt.qa_rag_boxed
- generation.generate
- custom.output_extract_from_boxed
- evaluation.evaluate