5.3. 使用 Dynamic Batch Size

5.3.1. 背景

由于 IPU 仅支持静态图, 在模型编译阶段需要指定固定的 batch size. 而在实际推理过程中, 输入数据的 batch size 通常情况下是不固定的. PopRT 通过补 0 的方式支持任意 batch size 大小的输入数据. 例如, Fig. 5.3 中模型的 shape 大小为 [4, 2], 并且 batch size 的维度是 0, 即 batch size 为 4. 而输入的数据 shape 大小分别为 [1, 2] 和 [7, 2], 即 batch size 分别为 1 和 7.

../_images/dynamic_batch_size.png — Fig. 5.3 Dynamic Batch Size

PopRT 通过补 0 将数据扩展为最近的 N * 模型 batch size 大小的数据. 比如上述的 batch size 为 1 的数据会通过补 0 扩展到 batch size 为 4 的大小, 而 batch size 为 7 的数据会扩展为 batch size 为 8 的大小. 在 IPU 中数据会按照模型的 batch size 进行推理, 如上述扩展后 batch size 为 4 的数据一次推理得到结果后返回, 而 batch size 为 8 的数据会循环推理两次后返回结果.

动态 batch size 的功能对于用户程序来说是透明的, 用户无需关心当前 IPU 中加载模型的 batch size 大小, 按照应用的需求发送推理请求数据就可以了.

5.3.2. 示例

Listing 5.2 中是动态 batch size 的示例代码. 示例中创建一个输入 shape 为 [4, 2] 的模型, 其 batch size 为 4. 应用程序分别使用 batch size 为 1, 4, 7 的数据进行推理, 无需考虑加载模型的 batch size 大小.

Listing 5.2 dynamic_batch_size.py

# Copyright (c) 2023 Graphcore Ltd. All rights reserved.
import datetime

import numpy as np
import numpy.testing as npt
import onnx

from onnx import helper

from poprt import runtime
from poprt.compiler import Compiler, CompilerOptions
from poprt.converter import Converter
from poprt.runtime import RuntimeConfig
from poprt.utils import get_logger


def default_model():
    """Create a test model."""
    TensorProto = onnx.TensorProto
    add = helper.make_node("Add", ["X", "Y"], ["O"])
    graph = helper.make_graph(
        [add],
        "test",
        [
            helper.make_tensor_value_info("X", TensorProto.FLOAT, (4, 2)),
            helper.make_tensor_value_info("Y", TensorProto.FLOAT, (4, 2)),
        ],
        [helper.make_tensor_value_info("O", TensorProto.FLOAT, (4, 2))],
    )
    opset_imports = [helper.make_opsetid("", 11)]
    original_model = helper.make_model(graph, opset_imports=opset_imports)
    return original_model


def compile(model: onnx.ModelProto):
    """Compile ONNX to PopEF."""
    model_bytes = model.SerializeToString()
    outputs = [o.name for o in model.graph.output]
    executable = Compiler.compile(model_bytes, outputs)
    return executable


def run(executable):
    """Run PopEF."""
    config = RuntimeConfig()
    config.timeout_ns = datetime.timedelta(microseconds=300)
    config.batching_dim = 0
    model_runner = runtime.ModelRunner(executable, config)
    batch_sizes = [1, 4, 7]
    for batch_size in batch_sizes:
        inputs = {}
        inputs['X'] = np.random.uniform(0, 1, [batch_size, 2]).astype(np.float32)
        inputs['Y'] = np.random.uniform(0, 1, [batch_size, 2]).astype(np.float32)

        outputs = {}
        outputs['O'] = np.zeros([batch_size, 2], dtype=np.float32)
        model_runner.execute(inputs, outputs)
        expected = inputs['X'] + inputs['Y']
        npt.assert_array_equal(
            outputs['O'],
            expected,
            f"Result: outputs['O'] not equal with expected: {expected}",
        )
        print(f'Successfully run with input data in batch size {batch_size}')


if __name__ == '__main__':
    model = default_model()
    executable = compile(model)
    run(executable)

Download dynamic_batch_size.py

运行示例得到如下的输出信息:

Successfully run with input data in batch size 1
Successfully run with input data in batch size 4
Successfully run with input data in batch size 7