6. Python API
6.1. poprt
module
- class poprt.Converter(input_shape=None, convert_version=11, precision='fp32', checkpoints=None, eightbitsio=False, fp16_skip_op_types=None, skip_passes=None, used_passes=[], check=False, disable_fast_norm=False, pack_args=None, fp8_skip_op_names=None, fp8_params=None, quantize=False, enable_insert_remap=False, enable_erf_gelu=False, serialize_matmul=None, serialize_matmul_add=None, remap_mode='after_matmul', max_tensor_size=- 1, infer_shape_ahead=False, logger=None)
Convert genernal ONNX model to IPU friendly ONNX model.
Construct a new Converter.
- Parameters
input_shape (Dict[str, List[int]]) – the shape of inputs.
convert_version (int) – Convert opset to a specific version.
precision (str) – convert model to a soecific precision. Support precision: fp32/fp16/fp8.
checkpoints (str) – set output tensor names.
eightbitsio (bool) – enable 8bits io feature.
fp16_skip_op_types (str) – the list of ops which will keep fp32 precision in fp16 precision mode.
skip_passes (str) – the list of passes which will skip.
used_passes (List[str]) – user specified passes.
disable_fast_norm (bool) – disable to transfer layer_norm Op to fast_norm Op.
pack_args (Dict) – enable packed transformer.
fp8_skip_op_names (str) – The Op names which will keep fp32/fp16 in fp8 mode, such as ‘Conv_1,Conv_2’.
fp8_params (str) – Set parameters to fp8 model, the format is ‘input_format,weight_format,input_scale,weight_scale’.
quantize (bool) – whether to use quantization method.
enable_insert_remap (bool) – Enable insert remap automatically to improve tensor layout.
enable_erf_gelu (bool) – Enable replace Erf Gelu patterns with Gelu Op.
serialize_matmul (Dict[str, str]) – Enable to serialize MatMul Op to save memory on chip.
serialize_matmul_add (Dict[str, str]) – Enable to serialize MatMul weights and Add bias with weights last dim to save memory on chip.
remap_mode (str) – The position of remap, support after_matmul and before_add.
max_tensor_size (int) – Max tensor size(bytes) generated by constant_folding, -1 means do not set max_tensor_size by default.
infer_shape_ahead (bool) – Fix input shape and infer shapes at beginning.
check (bool) –
logger (Logger) –
- convert(model)
Convert genernal ONNX model to IPU friendly ONNX model.
- Parameters
model (ModelProto) – A ONNX ModelProto class object to be converted.
logger –
- Returns
A ONNX ModelProto class object representing the ONNX model.
- Return type
ModelProto
6.2. poprt.compiler
module
- class poprt.compiler.Compiler(self: poprt._compiler.Compiler) None
Compile ONNX model to PopEF.
- Return type
None
- static compile(model: str, outputs: List[str], options: poprt._compiler.CompilerOptions = <poprt._compiler.CompilerOptions object at 0x7f8a01405430>) poprt::compiler::Executable
- Parameters
model (Union[AnyStr, ModelProto]) –
outputs (List[str]) –
options (CompilerOptions) –
- Return type
Executable
- static compile_and_export(model: str, outputs: List[str], filename: str, options: poprt._compiler.CompilerOptions = <poprt._compiler.CompilerOptions object at 0x7f8a014051b0>) None
- Parameters
model (Union[AnyStr, ModelProto]) –
outputs (List[str]) –
filename (str) –
options (CompilerOptions) –
- Return type
None
- static compile_and_get_summary_report(model: str, outputs: List[str], options: poprt._compiler.CompilerOptions = <poprt._compiler.CompilerOptions object at 0x7f8a014053b0>, reset_profile: bool = True) str
- Parameters
model (Union[AnyStr, ModelProto]) –
outputs (List[str]) –
options (CompilerOptions) –
reset_profile (bool) –
- Return type
str
- class poprt.compiler.CompilerOptions(self: poprt._compiler.CompilerOptions) None
6.3. poprt.runtime
module
- class poprt.runtime.ModelRunner(popef, config=<poprt._runtime.RuntimeConfig object>)
Load PopEF model, and execute.
- Parameters
popef (Union[str, Executable]) – input popef
config (Optional[RuntimeConfig]) – runtime config
- Return type
None
Deprecated since version 1.1.0: This will be removed in 1.2.0. poprt.runtime.ModelRunner has been integrated and replaced by thepoprt.runtime.Runner. For further details, we suggest referring to our documentation or examples attached herewith.
- execute(input, output)
execute runner.
- Parameters
input (Union[InputMemoryView, Dict[str, ndarray]]) –
output (Union[OutputMemoryView, Dict[str, ndarray]]) –
- Return type
None
- class poprt.runtime.DeviceManager(self: poprt._runtime.DeviceManager) None
Device Manager.
- Return type
None
- get_device(num_ipus)
Get Devices.
- Parameters
num_ipus (int) – num_ipus
- Return type
Device
- get_num_devices()
Get the number of Devices.
- Return type
int
- ipu_hardware_version()
Get IPU version.
ipu21: C600 cards
ipu2: mk2/Bow cards
- Return type
str
6.4. poprt.backends
module
- class poprt.backends.Backend(path_or_bytes, export_popef=None, compiler_options=<poprt.compiler.CompilerOptions object>, runtime_options=<poprt._runtime.RuntimeConfig object>, align_output_dtype=False, logger=None)
PopRT Backend.
- Parameters
path_or_bytes (Union[AnyStr, onnx.ModelProto]) – input onnx model
export_popef (str) – target PopEF export path
compiler_options (compiler.CompilerOptions) – compiler options, see poprt.compiler.CompilerOptions
runtime_options (runtime.RuntimeConfig) – runtime options, see poprt.runtime.RuntimeConfig
align_output_dtype (bool) – flag to align output dtype based on the onnx model.
Backend.run
also have parameteralign_output_dtype
, the value will be True if one of them is set to be Truelogger (logging.Logger) – custom logger
- Return type
None
- get_io_info()
Return meta info of input/outputs, include dtype, name, shape.
- Return type
tuple[Dict[str, Any], Dict[str, Any]]
- run(output_names, inputs, align_output_dtype=False)
Run the Model.
- Parameters
output_names (List[str]) – output tensor names
inputs (Dict[str, ndarray]) – input tensor data
align_output_dtype (bool) – flag to align output dtype based on the onnx model
- Return type
List[ndarray]
- set_opaque_blobs()
Pass dynamic input anchor info to pack.
- Return type
None
- class poprt.backends.ORTBackend(path_or_bytes, sess_options=None, providers=None, provider_options=None, lazy_load=False, **kwargs)
Bases:
Backend
onnxruntime.InferenceSession
API compatible Backend.- Parameters
path_or_bytes – input onnx model
sess_options –
onnxruntime.InferenceSession
compatible API, not usedproviders –
onnxruntime.InferenceSession
compatible API, not usedprovider_options –
onnxruntime.InferenceSession
compatible API, not usedlazy_load – ORTBackend will load ONNX model by default, set to
True
to prevent it**kwargs – see
poprt.Backend
for more args
- Return type
None
- run(output_names, input_feed, run_options=None)
Run the Model.
- Parameters
output_names – output tensor names
inputs – input tensor data
align_output_dtype – flag to align output dtype based on the onnx model
- Return type
List[ndarray]
6.5. poprt.quantizer
module
- poprt.quantizer.quantize(onnx_model, input_model, output_dir, data_preprocess=None, quantize_loss_type='kld', num_of_layers_keep_fp16=0, options=None)
Quantize the model according strategy. At now, we only support SimpleQuantizer.
- Parameters
onnx_model (ModelProto) – onnx ModelProto
input_model (str) – the origin model
data_preprocess (Optional[str]) – path of pickle format file for data preprocessing, the storage format is {input_name_1: ndarray_1, input_name_2: ndarray_2, …}
output_dir (str) – the output dir
options (Optional[Dict[str, Any]]) – options
quantize_loss_type (str) –
num_of_layers_keep_fp16 (int) –
- Returns
A quantized onnx ModelProto
- Return type
ModelProto
- class poprt.quantizer.FP8Quantizer(output_dir, loss_type, data_preprocess=None, num_of_layers_keep_fp16=0, options=None)
Return the Input Model.
- Parameters
output_dir (str) –
loss_type (str) –
data_preprocess (str) –
num_of_layers_keep_fp16 (int) –
options (Dict[str, Any]) –