6. Python API

6.1. poprt module

class poprt.Converter(input_shape=None, convert_version=11, precision='fp32', checkpoints=None, eightbitsio=False, fp16_skip_op_types=None, skip_passes=None, used_passes=[], check=False, disable_fast_norm=False, pack_args=None, fp8_skip_op_names=None, fp8_params=None, quantize=False, enable_insert_remap=False, enable_erf_gelu=False, serialize_matmul=None, serialize_matmul_add=None, remap_mode='after_matmul', max_tensor_size=- 1, infer_shape_ahead=False, logger=None)

Convert genernal ONNX model to IPU friendly ONNX model.

Construct a new Converter.

Parameters
  • input_shape (Dict[str, List[int]]) – the shape of inputs.

  • convert_version (int) – Convert opset to a specific version.

  • precision (str) – convert model to a soecific precision. Support precision: fp32/fp16/fp8.

  • checkpoints (str) – set output tensor names.

  • eightbitsio (bool) – enable 8bits io feature.

  • fp16_skip_op_types (str) – the list of ops which will keep fp32 precision in fp16 precision mode.

  • skip_passes (str) – the list of passes which will skip.

  • used_passes (List[str]) – user specified passes.

  • disable_fast_norm (bool) – disable to transfer layer_norm Op to fast_norm Op.

  • pack_args (Dict) – enable packed transformer.

  • fp8_skip_op_names (str) – The Op names which will keep fp32/fp16 in fp8 mode, such as ‘Conv_1,Conv_2’.

  • fp8_params (str) – Set parameters to fp8 model, the format is ‘input_format,weight_format,input_scale,weight_scale’.

  • quantize (bool) – whether to use quantization method.

  • enable_insert_remap (bool) – Enable insert remap automatically to improve tensor layout.

  • enable_erf_gelu (bool) – Enable replace Erf Gelu patterns with Gelu Op.

  • serialize_matmul (Dict[str, str]) – Enable to serialize MatMul Op to save memory on chip.

  • serialize_matmul_add (Dict[str, str]) – Enable to serialize MatMul weights and Add bias with weights last dim to save memory on chip.

  • remap_mode (str) – The position of remap, support after_matmul and before_add.

  • max_tensor_size (int) – Max tensor size(bytes) generated by constant_folding, -1 means do not set max_tensor_size by default.

  • infer_shape_ahead (bool) – Fix input shape and infer shapes at beginning.

  • check (bool) –

  • logger (Logger) –

convert(model)

Convert genernal ONNX model to IPU friendly ONNX model.

Parameters
  • model (ModelProto) – A ONNX ModelProto class object to be converted.

  • logger

Returns

A ONNX ModelProto class object representing the ONNX model.

Return type

ModelProto

6.2. poprt.compiler module

class poprt.compiler.Compiler(self: poprt._compiler.Compiler) None

Compile ONNX model to PopEF.

Return type

None

static compile(model: str, outputs: List[str], options: poprt._compiler.CompilerOptions = <poprt._compiler.CompilerOptions object at 0x7f8a01405430>) poprt::compiler::Executable
Parameters
  • model (Union[AnyStr, ModelProto]) –

  • outputs (List[str]) –

  • options (CompilerOptions) –

Return type

Executable

static compile_and_export(model: str, outputs: List[str], filename: str, options: poprt._compiler.CompilerOptions = <poprt._compiler.CompilerOptions object at 0x7f8a014051b0>) None
Parameters
  • model (Union[AnyStr, ModelProto]) –

  • outputs (List[str]) –

  • filename (str) –

  • options (CompilerOptions) –

Return type

None

static compile_and_get_summary_report(model: str, outputs: List[str], options: poprt._compiler.CompilerOptions = <poprt._compiler.CompilerOptions object at 0x7f8a014053b0>, reset_profile: bool = True) str
Parameters
  • model (Union[AnyStr, ModelProto]) –

  • outputs (List[str]) –

  • options (CompilerOptions) –

  • reset_profile (bool) –

Return type

str

class poprt.compiler.CompilerOptions(self: poprt._compiler.CompilerOptions) None

6.3. poprt.runtime module

class poprt.runtime.ModelRunner(popef, config=<poprt._runtime.RuntimeConfig object>)

Load PopEF model, and execute.

Parameters
  • popef (Union[str, Executable]) – input popef

  • config (Optional[RuntimeConfig]) – runtime config

Return type

None

Deprecated since version 1.1.0: This will be removed in 1.2.0. poprt.runtime.ModelRunner has been integrated and replaced by thepoprt.runtime.Runner. For further details, we suggest referring to our documentation or examples attached herewith.

execute(input, output)

execute runner.

Parameters
  • input (Union[InputMemoryView, Dict[str, ndarray]]) –

  • output (Union[OutputMemoryView, Dict[str, ndarray]]) –

Return type

None

class poprt.runtime.DeviceManager(self: poprt._runtime.DeviceManager) None

Device Manager.

Return type

None

get_device(num_ipus)

Get Devices.

Parameters

num_ipus (int) – num_ipus

Return type

Device

get_num_devices()

Get the number of Devices.

Return type

int

ipu_hardware_version()

Get IPU version.

ipu21: C600 cards

ipu2: mk2/Bow cards

Return type

str

6.4. poprt.backends module

class poprt.backends.Backend(path_or_bytes, export_popef=None, compiler_options=<poprt.compiler.CompilerOptions object>, runtime_options=<poprt._runtime.RuntimeConfig object>, align_output_dtype=False, logger=None)

PopRT Backend.

Parameters
  • path_or_bytes (Union[AnyStr, onnx.ModelProto]) – input onnx model

  • export_popef (str) – target PopEF export path

  • compiler_options (compiler.CompilerOptions) – compiler options, see poprt.compiler.CompilerOptions

  • runtime_options (runtime.RuntimeConfig) – runtime options, see poprt.runtime.RuntimeConfig

  • align_output_dtype (bool) – flag to align output dtype based on the onnx model. Backend.run also have parameter align_output_dtype, the value will be True if one of them is set to be True

  • logger (logging.Logger) – custom logger

Return type

None

get_io_info()

Return meta info of input/outputs, include dtype, name, shape.

Return type

tuple[Dict[str, Any], Dict[str, Any]]

run(output_names, inputs, align_output_dtype=False)

Run the Model.

Parameters
  • output_names (List[str]) – output tensor names

  • inputs (Dict[str, ndarray]) – input tensor data

  • align_output_dtype (bool) – flag to align output dtype based on the onnx model

Return type

List[ndarray]

set_opaque_blobs()

Pass dynamic input anchor info to pack.

Return type

None

class poprt.backends.ORTBackend(path_or_bytes, sess_options=None, providers=None, provider_options=None, lazy_load=False, **kwargs)

Bases: Backend

onnxruntime.InferenceSession API compatible Backend.

Parameters
  • path_or_bytes – input onnx model

  • sess_optionsonnxruntime.InferenceSession compatible API, not used

  • providersonnxruntime.InferenceSession compatible API, not used

  • provider_optionsonnxruntime.InferenceSession compatible API, not used

  • lazy_load – ORTBackend will load ONNX model by default, set to True to prevent it

  • **kwargs – see poprt.Backend for more args

Return type

None

run(output_names, input_feed, run_options=None)

Run the Model.

Parameters
  • output_names – output tensor names

  • inputs – input tensor data

  • align_output_dtype – flag to align output dtype based on the onnx model

Return type

List[ndarray]

6.5. poprt.quantizer module

poprt.quantizer.quantize(onnx_model, input_model, output_dir, data_preprocess=None, quantize_loss_type='kld', num_of_layers_keep_fp16=0, options=None)

Quantize the model according strategy. At now, we only support SimpleQuantizer.

Parameters
  • onnx_model (ModelProto) – onnx ModelProto

  • input_model (str) – the origin model

  • data_preprocess (Optional[str]) – path of pickle format file for data preprocessing, the storage format is {input_name_1: ndarray_1, input_name_2: ndarray_2, …}

  • output_dir (str) – the output dir

  • options (Optional[Dict[str, Any]]) – options

  • quantize_loss_type (str) –

  • num_of_layers_keep_fp16 (int) –

Returns

A quantized onnx ModelProto

Return type

ModelProto

class poprt.quantizer.FP8Quantizer(output_dir, loss_type, data_preprocess=None, num_of_layers_keep_fp16=0, options=None)

Return the Input Model.

Parameters
  • output_dir (str) –

  • loss_type (str) –

  • data_preprocess (str) –

  • num_of_layers_keep_fp16 (int) –

  • options (Dict[str, Any]) –