6.11. Manual Sharding

PopRT Manual Sharding 支持通过用户提供的切分点将模型划分成不同的子图, 实现模型并行和流水线并行.

6.11.1. Sharding / 模型并行

PopRT 支持根据用户提供的切分点将 ONNX graph 切分到不同的设备实现模型并行, 适用于超出单个设备内存限制, 需要占用多个设备的大模型.

参考: Sharding 原理说明

Note

使用模型并行, 需要对 PopRT backend options 进行如下设置:

options.virtual_graph_mode = “manual”
options.num_ipus = 设备数量

6.11.2. Pipelining / 流水线并行

PopRT 支持根据用户的提供的切分点将 ONNX graph 切分成不同的流水线阶段, 实现流水线并行, 提高 Throughput.

参考: Pipelining 原理说明

Note

使用流水线并行, 需要基于模型并行并对 PopRT backend options 进行如下设置:

options.enable_pipelining = True
options.batches_per_step = 流水线阶段数量的整数倍

6.11.3. Manual Sharding 流程

PopRT Manual Sharding 基于 ONNX node 对 ONNX graph 进行切分, 切分点可以是任意的 ONNX node.

ONNX graph 中 nodes 是按拓扑排序的顺序排列的. PopRT Manual Sharding 首先对用户设置的切分点进行拓扑排序.
遍历切分点, 以切分点为起点向输入方向遍历 ONNX graph, 将遍历到的所有 ONNX node 放入一个子图中, 如果遇到 node 没有输入 node 或 node 已经设置了切分信息则停止该分支的遍历.
遍历完整后将得到子图, 以 ONNX attribute 的方式对子图设置切分信息:

__ipu_number 指定模型并行中每个子图对应的设备序号

__pipeline_stage 指定流水线并行中每个子图对应的流水线阶段.

Note

通常不同的切分点对应不同的设备序号和流水线阶段, 但同一个设备上可以有多个子图和多个流水线阶段.
根据切分点设置切分信息后, 剩余没有设置切分信息的 node 将被自动设置:

__ipu_number 将被设置为当前已设置的最大设备序号 + 1.

__pipeline_stage 将被设置为当前已设置的最大流水线阶段 + 1.

6.11.4. 配置 Manual Sharding

Manual Sharding 有两种配置方法, 一种是通过 PopRT CLI, 另一种是通过 poprt.converter.Sharder API.

通过 PopRT CLI 配置 Manual Sharding

通过 yaml 文件指定切分点名称, 设备序号和流水线阶段:

Listing 6.17 shard.yaml

-
  node: resnetv17_stage1__plus0
  device: 0
  stage: 0
-
  node: resnetv17_stage4_batchnorm2_fwd
  device: 1
  stage: 1
-
  node: resnetv17_stage4__plus0
  device: 2
  stage: 2

Download shard.yaml

通过 PopRT CLI 中通过 --manual_sharding_config 来配置切分信息:

poprt \
    --input_model model.onnx \
    --manual_sharding_config shard.yaml

通过 PopRT CLI 中 --only_manual_sharding 来确定是否仅对 input_model 进行 Manual Sharding, 默认不设置.

不设置 --only_manual_sharding 表示对 input_model 进行 Convert 优化后再进行 Manual Sharding.

设置 --only_manual_sharding 表示对 input_model 仅进行 Manual Sharding, 仅支持 --input_model, --output_model, --output_dir 和 --manual_sharding_config, 其它参数无效.

poprt \
    --input_model model.onnx \
    --manual_sharding_config shard.yaml \
    --only_manual_sharding

通过 `poprt.converter.Sharder` API 配置 Manual Sharding

sharding_info = {
    "resnetv17_stage1__plus0": 0,
    "resnetv17_stage4_batchnorm2_fwd": 1,
    "resnetv17_stage4__plus0: 2,
}
pipelining_info = {
    "resnetv17_stage1__plus0": 0,
    "resnetv17_stage4_batchnorm2_fwd": 1,
    "resnetv17_stage4__plus0: 2,
}

sharded_model = poprt.converter.Sharder(
                            sharding_info=sharding_info,
                            pipelining_info=pipelining_info
                        ).run(converted_model)

Note

设置 --only_manual_sharding 的 CLI 或使用 poprt.converter.Sharder API 需要保证 ONNX graph 中每一个 node 都有 unique name .
不设置 --only_manual_sharding 的 CLI 无需保证 ONNX graph 中每一个 node 都有 unique name , Convert 优化过程会保证每一个 node 都有 unique name .

6.11.5. 示例

下面是一个简单的 Manual Sharding 的 example:

以 ResNet50 为例.

Listing 6.18 shard.py

# Copyright (c) 2023 Graphcore Ltd. All rights reserved.
import numpy as np
import onnx
import requests

from poprt import runtime
from poprt.compiler import Compiler, CompilerOptions
from poprt.converter import Sharder


def load_model():
    # Download model
    url = 'https://github.com/onnx/models/raw/main/vision/classification/resnet/model/resnet50-v1-7.onnx'
    response = requests.get(url)
    if response.status_code == 200:
        model = onnx.load_model_from_string(response.content)
    else:
        raise Exception(
            f"Failed to download model with status_code {response.status_code}"
        )
    return model


def manual_sharding(model):
    # Fix the batch size to 1
    model.graph.input[0].type.tensor_type.shape.dim[0].dim_value = 1

    # Sharding and pipelining info
    sharding_info = {
        "resnetv17_stage1__plus0": 0,
        "resnetv17_stage4_batchnorm2_fwd": 1,
        "resnetv17_stage4__plus0": 2,
    }
    pipelining_info = {
        "resnetv17_stage1__plus0": 0,
        "resnetv17_stage4_batchnorm2_fwd": 1,
        "resnetv17_stage4__plus0": 2,
    }
    model = Sharder(sharding_info=sharding_info, pipelining_info=pipelining_info).run(
        model
    )

    return model


def compile(model):
    # Compile the model with backend options
    model_bytes = model.SerializeToString()
    outputs = [o.name for o in model.graph.output]

    options = CompilerOptions()
    options.ipu_version = runtime.DeviceManager().ipu_hardware_version()
    # Sharding into 4 IPUs
    options.num_ipus = 4
    # Enable Sharding and Pipelining
    options.enable_pipelining = True
    options.virtual_graph_mode = "manual"
    options.batches_per_step = 16

    executable = Compiler.compile(model_bytes, outputs, options)
    runner_config = runtime.RuntimeConfig()
    runner_config.timeout_ns = 0
    runner = runtime.Runner(executable, runner_config)
    return runner


def run(runner):
    inputs_info = runner.get_execute_inputs()
    outputs_info = runner.get_execute_outputs()

    inputs = {}
    for i in inputs_info:
        inputs[i.name] = np.ones(i.shape, dtype=i.numpy_data_type())

    outputs = {}
    for o in outputs_info:
        outputs[o.name] = np.zeros(o.shape, dtype=o.numpy_data_type())

    runner.execute(inputs, outputs)


if __name__ == '__main__':
    model = load_model()
    model = manual_sharding(model)
    runner = compile(model)
    run(runner)

Download shard.py

6.11. Manual Sharding

6.11.1. Sharding / 模型并行

6.11.2. Pipelining / 流水线并行

6.11.3. Manual Sharding 流程

6.11.4. 配置 Manual Sharding

通过 PopRT CLI 配置 Manual Sharding

通过 poprt.converter.Sharder API 配置 Manual Sharding

6.11.5. 示例

通过 `poprt.converter.Sharder` API 配置 Manual Sharding