7. Python API

7.1. `poprt` module

class poprt.Converter(*, input_shape=None, convert_version=11, precision='fp32', checkpoints=None, eightbitsio=False, fp16_skip_op_types=None, skip_passes=None, used_passes=[], check=False, disable_fast_norm=False, pack_args=None, fp8_skip_op_names=None, fp8_params='F143, F143, 0, 0', quantize=False, enable_insert_remap=False, enable_erf_gelu=False, serialize_matmul=None, serialize_matmul_add=None, remap_mode='after_matmul', max_tensor_size=-1, infer_shape_ahead=False, enable_avoid_overflow_patterns=False, disable_progress_bar=False, logger=<Logger poprt (WARNING)>)

Convert genernal ONNX model to IPU friendly ONNX model.

Construct a new Converter.

Parameters

input_shape (Dict[str, List[int]]) – the shape of inputs.
convert_version (int) – Convert opset to a specific version.
precision (str) – convert model to a soecific precision. Support precision: fp32/fp16/fp8.
checkpoints (str) – set output tensor names.
eightbitsio (bool) – enable 8bits io feature.
fp16_skip_op_types (str) – the list of ops which will keep fp32 precision in fp16 precision mode.
skip_passes (str) – the list of passes which will skip.
used_passes (List[str]) – user specified passes.
disable_fast_norm (bool) – disable to transfer layer_norm Op to fast_norm Op.
pack_args (Dict) – enable packed transformer.
fp8_skip_op_names (str) – The Op names which will keep fp32/fp16 in fp8 mode, such as ‘Conv_1,Conv_2’.
fp8_params (str) – Set parameters to fp8 model, the format is ‘input_format,weight_format,input_scale,weight_scale’.
quantize (bool) – whether to use quantization method.
enable_insert_remap (bool) – Enable insert remap automatically to improve tensor layout.
enable_erf_gelu (bool) – Enable replace Erf Gelu patterns with Gelu Op.
serialize_matmul (Dict[str, str]) – Enable to serialize MatMul Op to save memory on chip.
serialize_matmul_add (Dict[str, str]) – Enable to serialize MatMul weights and Add bias with weights last dim to save memory on chip.
remap_mode (str) – The position of remap, support after_matmul and before_add.
max_tensor_size (int) – Max tensor size(bytes) generated by constant_folding, -1 means do not set max_tensor_size by default.
infer_shape_ahead (bool) – Fix input shape and infer shapes at beginning.
enable_avoid_overflow_patterns (bool) – Enable to keep fp32 for several specific patterns in fp16 model.
check (bool) –
disable_progress_bar (bool) –
logger (Logger) –

convert(model)

Convert genernal ONNX model to IPU friendly ONNX model.

Parameters

model (ModelProto) – A ONNX ModelProto class object to be converted.
logger –

Returns

A ONNX ModelProto class object representing the ONNX model.

Return type

ModelProto

7.2. `poprt.compiler` module

class poprt.compiler.Compiler(self: poprt._compiler.Compiler) → None

Compile ONNX model to PopEF.

Return type: None

static compile(model: str, outputs: List[str], options: poprt._compiler.CompilerOptions = <poprt._compiler.CompilerOptions object at 0x7fde92e1e1b0>) → poprt::compiler::Executable

Parameters

model (Union[AnyStr, ModelProto]) –
outputs (List[str]) –
options (CompilerOptions) –

Return type

Executable

static compile_and_export(model: str, outputs: List[str], filename: str, options: poprt._compiler.CompilerOptions = <poprt._compiler.CompilerOptions object at 0x7fde92e19f30>) → None

Parameters

model (Union[AnyStr, ModelProto]) –
outputs (List[str]) –
filename (str) –
options (CompilerOptions) –

Return type

None

static compile_and_get_summary_report(model: str, outputs: List[str], options: poprt._compiler.CompilerOptions = <poprt._compiler.CompilerOptions object at 0x7fde92e1e130>, reset_profile: bool = True) → str

Parameters

model (Union[AnyStr, ModelProto]) –
outputs (List[str]) –
options (CompilerOptions) –
reset_profile (bool) –

Return type

str

class poprt.compiler.CompilerOptions(self: poprt._compiler.CompilerOptions) → None

Return type: None

7.3. `poprt.runtime` module

class poprt.runtime.Runner(popef, config=None)

Load PopEF model, and execute.

Parameters

popef (Union[str, Executable]) – input popef
config (Union[RuntimeConfig, PackRunnerConfig]) – runtime config

Return type

None

execute(input, output)

execute runner.

Parameters

input (Union[InputMemoryView, Dict[str, ndarray]]) –
output (Union[OutputMemoryView, Dict[str, ndarray]]) –

Return type

None

class poprt.runtime.DeviceManager(self: poprt._runtime.DeviceManager) → None

Device Manager.

Return type: None

get_device(num_ipus)

Get Devices.

Parameters: num_ipus (int) – num_ipus
Return type: Device

get_num_devices()

Get the number of Devices.

Return type: int

ipu_hardware_version()

Get IPU version.

ipu21: C600 cards

ipu2: mk2/Bow cards

Return type: str

7.4. `poprt.frontend` module

class poprt.frontend.OnnxFrontend(path, **kwargs)

Onnx Frontend.

Parameters: path (str) – input model path
Return type: None

get_onnx_name(dir_or_name)

Filter out non onnx file.

Parameters

files – list of file name
dir_or_name (str) –

Returns

ONNX Model if there are only one onnx, otherwise throw error.

Return type

Optional[str]

load_model()

Load ONNX Model.

Parameters: dir_or_name – directory or name of the model. If directory, there should only one model
Returns: ONNX Model
Return type: ModelProto

class poprt.frontend.TensorflowFrontend(path, *, saved_model=True, signature_def='', tag='', opset=11, inputs_as_nchw=None, outputs_as_nchw=None, input_shape=None, outputs=None, **kwargs)

TensorFlow Frontend.

Parameters

path (str) – input model path
saved_model (bool) – whether is tf saved_model
signature_def (str) – signature_def from saved_model to use
tag (str) – tag to use for saved_model
opset (int) – opset version to use for onnx domain in tf frontend
inputs_as_nchw (str) – transpose inputs as from nhwc to nchw
outputs_as_nchw (str) – transpose outputs as from nhwc to nchw
output_names – model output_names (optional for saved_model)
input_shape (Dict) –
outputs (str) –

Return type

None

load_model()

Load tensorflow model and convert to onnx ModelProto.

Return type: ModelProto

7.5. `poprt.backends` module

class poprt.backends.Backend(path_or_bytes, *, export_popef=None, compiler_options=<poprt.compiler.CompilerOptions object>, runtime_options=<poprt.runtime.RuntimeConfig object>, align_output_dtype=False, logger=None)

PopRT Backend.

Parameters

path_or_bytes (Union[AnyStr, IO[bytes], onnx.ModelProto]) – input onnx model
export_popef (str) – target PopEF export path
compiler_options (compiler.CompilerOptions) – compiler options, see poprt.compiler.CompilerOptions
runtime_options (runtime.AnyConfig) – runtime options, see poprt.runtime.RuntimeConfig
align_output_dtype (bool) – flag to align output dtype based on the onnx model. Backend.run also have parameter align_output_dtype, the value will be True if one of them is set to be True
logger (logging.Logger) – custom logger

Return type

None

get_io_info()

Return meta info of input/outputs, include dtype, name, shape.

Return type: tuple[Dict[str, Any], Dict[str, Any]]

run(output_names, inputs, align_output_dtype=False)

Run the Model.

Parameters

output_names (List[str]) – output tensor names
inputs (Dict[str, ndarray]) – input tensor data
align_output_dtype (bool) – flag to align output dtype based on the onnx model

Return type

List[ndarray]

set_opaque_blobs()

Pass dynamic input anchor info to pack.

Return type: None

class poprt.backends.ORTBackend(path_or_bytes, sess_options=None, providers=None, provider_options=None, lazy_load=False, **kwargs)

Bases: Backend

onnxruntime.InferenceSession API compatible Backend.

Parameters

path_or_bytes – input onnx model
sess_options – onnxruntime.InferenceSession compatible API, not used
providers – onnxruntime.InferenceSession compatible API, not used
provider_options – onnxruntime.InferenceSession compatible API, not used
lazy_load – ORTBackend will load ONNX model by default, set to True to prevent it
**kwargs – see poprt.Backend for more args

Return type

None

run(output_names, input_feed, run_options=None)

Run the Model.

Parameters

output_names – output tensor names
inputs – input tensor data
align_output_dtype – flag to align output dtype based on the onnx model

Return type

List[ndarray]

7.6. `poprt.quantizer` module

poprt.quantizer.quantize(onnx_model, input_model, output_dir, data_preprocess=None, precision='fp8', quantize_loss_type='kld', num_of_layers_keep_fp16=0, options=None)

Quantize the model according strategy. At now, we only support SimpleQuantizer.

Parameters

onnx_model (ModelProto) – onnx ModelProto
input_model (str) – the origin model
data_preprocess (Optional[str]) – path of pickle format file for data preprocessing, the storage format is {input_name_1: ndarray_1, input_name_2: ndarray_2, …}
precision (typing_extensions.Literal[fp8, fp8_weight]) – convert the model to the specfied type
output_dir (str) – the output dir
options (Optional[Dict[str, Any]]) – options
quantize_loss_type (str) –
num_of_layers_keep_fp16 (int) –

Returns

A quantized onnx ModelProto

Return type

ModelProto

class poprt.quantizer.FP8Quantizer(output_dir, loss_type, data_preprocess=None, precision='fp8', num_of_layers_keep_fp16=0, options=None)

Return the Input Model.

Parameters

output_dir (str) –
loss_type (str) –
data_preprocess (str) –
precision (typing_extensions.Literal[fp8, fp8_weight]) –
num_of_layers_keep_fp16 (int) –
options (Dict[str, Any]) –

7. Python API

7.1. poprt module

7.2. poprt.compiler module

7.3. poprt.runtime module

7.4. poprt.frontend module

7.5. poprt.backends module

7.6. poprt.quantizer module

7.1. `poprt` module

7.2. `poprt.compiler` module

7.3. `poprt.runtime` module

7.4. `poprt.frontend` module

7.5. `poprt.backends` module

7.6. `poprt.quantizer` module