7. Python API

7.1. poprt module

class poprt.Converter(*, input_shape=None, convert_version=11, precision='fp32', checkpoints=None, eightbitsio=False, fp16_skip_op_types=None, skip_passes=None, used_passes=[], check=False, disable_fast_norm=False, pack_args=None, fp8_skip_op_names=None, fp8_params='F143, F143, 0, 0', quantize=False, enable_insert_remap=False, enable_erf_gelu=False, serialize_matmul=None, serialize_matmul_add=None, remap_mode='after_matmul', max_tensor_size=-1, infer_shape_ahead=False, enable_avoid_overflow_patterns=False, disable_progress_bar=False, logger=<Logger poprt (WARNING)>)

Convert genernal ONNX model to IPU friendly ONNX model.

Construct a new Converter.

Parameters
  • input_shape (Dict[str, List[int]]) – the shape of inputs.

  • convert_version (int) – Convert opset to a specific version.

  • precision (str) – convert model to a soecific precision. Support precision: fp32/fp16/fp8.

  • checkpoints (str) – set output tensor names.

  • eightbitsio (bool) – enable 8bits io feature.

  • fp16_skip_op_types (str) – the list of ops which will keep fp32 precision in fp16 precision mode.

  • skip_passes (str) – the list of passes which will skip.

  • used_passes (List[str]) – user specified passes.

  • disable_fast_norm (bool) – disable to transfer layer_norm Op to fast_norm Op.

  • pack_args (Dict) – enable packed transformer.

  • fp8_skip_op_names (str) – The Op names which will keep fp32/fp16 in fp8 mode, such as ‘Conv_1,Conv_2’.

  • fp8_params (str) – Set parameters to fp8 model, the format is ‘input_format,weight_format,input_scale,weight_scale’.

  • quantize (bool) – whether to use quantization method.

  • enable_insert_remap (bool) – Enable insert remap automatically to improve tensor layout.

  • enable_erf_gelu (bool) – Enable replace Erf Gelu patterns with Gelu Op.

  • serialize_matmul (Dict[str, str]) – Enable to serialize MatMul Op to save memory on chip.

  • serialize_matmul_add (Dict[str, str]) – Enable to serialize MatMul weights and Add bias with weights last dim to save memory on chip.

  • remap_mode (str) – The position of remap, support after_matmul and before_add.

  • max_tensor_size (int) – Max tensor size(bytes) generated by constant_folding, -1 means do not set max_tensor_size by default.

  • infer_shape_ahead (bool) – Fix input shape and infer shapes at beginning.

  • enable_avoid_overflow_patterns (bool) – Enable to keep fp32 for several specific patterns in fp16 model.

  • check (bool) –

  • disable_progress_bar (bool) –

  • logger (Logger) –

convert(model)

Convert genernal ONNX model to IPU friendly ONNX model.

Parameters
  • model (ModelProto) – A ONNX ModelProto class object to be converted.

  • logger

Returns

A ONNX ModelProto class object representing the ONNX model.

Return type

ModelProto

7.2. poprt.compiler module

class poprt.compiler.Compiler(self: poprt._compiler.Compiler) None

Compile ONNX model to PopEF.

Return type

None

static compile(model: str, outputs: List[str], options: poprt._compiler.CompilerOptions = <poprt._compiler.CompilerOptions object at 0x7fde92e1e1b0>) poprt::compiler::Executable
Parameters
  • model (Union[AnyStr, ModelProto]) –

  • outputs (List[str]) –

  • options (CompilerOptions) –

Return type

Executable

static compile_and_export(model: str, outputs: List[str], filename: str, options: poprt._compiler.CompilerOptions = <poprt._compiler.CompilerOptions object at 0x7fde92e19f30>) None
Parameters
  • model (Union[AnyStr, ModelProto]) –

  • outputs (List[str]) –

  • filename (str) –

  • options (CompilerOptions) –

Return type

None

static compile_and_get_summary_report(model: str, outputs: List[str], options: poprt._compiler.CompilerOptions = <poprt._compiler.CompilerOptions object at 0x7fde92e1e130>, reset_profile: bool = True) str
Parameters
  • model (Union[AnyStr, ModelProto]) –

  • outputs (List[str]) –

  • options (CompilerOptions) –

  • reset_profile (bool) –

Return type

str

class poprt.compiler.CompilerOptions(self: poprt._compiler.CompilerOptions) None
Return type

None

7.3. poprt.runtime module

class poprt.runtime.Runner(popef, config=None)

Load PopEF model, and execute.

Parameters
  • popef (Union[str, Executable]) – input popef

  • config (Union[RuntimeConfig, PackRunnerConfig]) – runtime config

Return type

None

execute(input, output)

execute runner.

Parameters
  • input (Union[InputMemoryView, Dict[str, ndarray]]) –

  • output (Union[OutputMemoryView, Dict[str, ndarray]]) –

Return type

None

class poprt.runtime.DeviceManager(self: poprt._runtime.DeviceManager) None

Device Manager.

Return type

None

get_device(num_ipus)

Get Devices.

Parameters

num_ipus (int) – num_ipus

Return type

Device

get_num_devices()

Get the number of Devices.

Return type

int

ipu_hardware_version()

Get IPU version.

ipu21: C600 cards

ipu2: mk2/Bow cards

Return type

str

7.4. poprt.frontend module

class poprt.frontend.OnnxFrontend(path, **kwargs)

Onnx Frontend.

Parameters

path (str) – input model path

Return type

None

get_onnx_name(dir_or_name)

Filter out non onnx file.

Parameters
  • files – list of file name

  • dir_or_name (str) –

Returns

ONNX Model if there are only one onnx, otherwise throw error.

Return type

Optional[str]

load_model()

Load ONNX Model.

Parameters

dir_or_name – directory or name of the model. If directory, there should only one model

Returns

ONNX Model

Return type

ModelProto

class poprt.frontend.TensorflowFrontend(path, *, saved_model=True, signature_def='', tag='', opset=11, inputs_as_nchw=None, outputs_as_nchw=None, input_shape=None, outputs=None, **kwargs)

TensorFlow Frontend.

Parameters
  • path (str) – input model path

  • saved_model (bool) – whether is tf saved_model

  • signature_def (str) – signature_def from saved_model to use

  • tag (str) – tag to use for saved_model

  • opset (int) – opset version to use for onnx domain in tf frontend

  • inputs_as_nchw (str) – transpose inputs as from nhwc to nchw

  • outputs_as_nchw (str) – transpose outputs as from nhwc to nchw

  • output_names – model output_names (optional for saved_model)

  • input_shape (Dict) –

  • outputs (str) –

Return type

None

load_model()

Load tensorflow model and convert to onnx ModelProto.

Return type

ModelProto

7.5. poprt.backends module

class poprt.backends.Backend(path_or_bytes, *, export_popef=None, compiler_options=<poprt.compiler.CompilerOptions object>, runtime_options=<poprt.runtime.RuntimeConfig object>, align_output_dtype=False, logger=None)

PopRT Backend.

Parameters
  • path_or_bytes (Union[AnyStr, IO[bytes], onnx.ModelProto]) – input onnx model

  • export_popef (str) – target PopEF export path

  • compiler_options (compiler.CompilerOptions) – compiler options, see poprt.compiler.CompilerOptions

  • runtime_options (runtime.AnyConfig) – runtime options, see poprt.runtime.RuntimeConfig

  • align_output_dtype (bool) – flag to align output dtype based on the onnx model. Backend.run also have parameter align_output_dtype, the value will be True if one of them is set to be True

  • logger (logging.Logger) – custom logger

Return type

None

get_io_info()

Return meta info of input/outputs, include dtype, name, shape.

Return type

tuple[Dict[str, Any], Dict[str, Any]]

run(output_names, inputs, align_output_dtype=False)

Run the Model.

Parameters
  • output_names (List[str]) – output tensor names

  • inputs (Dict[str, ndarray]) – input tensor data

  • align_output_dtype (bool) – flag to align output dtype based on the onnx model

Return type

List[ndarray]

set_opaque_blobs()

Pass dynamic input anchor info to pack.

Return type

None

class poprt.backends.ORTBackend(path_or_bytes, sess_options=None, providers=None, provider_options=None, lazy_load=False, **kwargs)

Bases: Backend

onnxruntime.InferenceSession API compatible Backend.

Parameters
  • path_or_bytes – input onnx model

  • sess_optionsonnxruntime.InferenceSession compatible API, not used

  • providersonnxruntime.InferenceSession compatible API, not used

  • provider_optionsonnxruntime.InferenceSession compatible API, not used

  • lazy_load – ORTBackend will load ONNX model by default, set to True to prevent it

  • **kwargs – see poprt.Backend for more args

Return type

None

run(output_names, input_feed, run_options=None)

Run the Model.

Parameters
  • output_names – output tensor names

  • inputs – input tensor data

  • align_output_dtype – flag to align output dtype based on the onnx model

Return type

List[ndarray]

7.6. poprt.quantizer module

poprt.quantizer.quantize(onnx_model, input_model, output_dir, data_preprocess=None, precision='fp8', quantize_loss_type='kld', num_of_layers_keep_fp16=0, options=None)

Quantize the model according strategy. At now, we only support SimpleQuantizer.

Parameters
  • onnx_model (ModelProto) – onnx ModelProto

  • input_model (str) – the origin model

  • data_preprocess (Optional[str]) – path of pickle format file for data preprocessing, the storage format is {input_name_1: ndarray_1, input_name_2: ndarray_2, …}

  • precision (typing_extensions.Literal[fp8, fp8_weight]) – convert the model to the specfied type

  • output_dir (str) – the output dir

  • options (Optional[Dict[str, Any]]) – options

  • quantize_loss_type (str) –

  • num_of_layers_keep_fp16 (int) –

Returns

A quantized onnx ModelProto

Return type

ModelProto

class poprt.quantizer.FP8Quantizer(output_dir, loss_type, data_preprocess=None, precision='fp8', num_of_layers_keep_fp16=0, options=None)

Return the Input Model.

Parameters
  • output_dir (str) –

  • loss_type (str) –

  • data_preprocess (str) –

  • precision (typing_extensions.Literal[fp8, fp8_weight]) –

  • num_of_layers_keep_fp16 (int) –

  • options (Dict[str, Any]) –