7. Python API
7.1. poprt
module
- class poprt.Converter(*, input_shape=None, convert_version=11, precision='fp32', checkpoints=None, eightbitsio=False, fp16_skip_op_types=None, skip_passes=None, used_passes=[], check=False, disable_fast_norm=False, pack_args=None, fp8_skip_op_names=None, fp8_params='F143, F143, 0, 0', quantize=False, enable_insert_remap=False, enable_erf_gelu=False, serialize_matmul=None, serialize_matmul_add=None, remap_mode='after_matmul', max_tensor_size=-1, infer_shape_ahead=False, enable_avoid_overflow_patterns=False, disable_progress_bar=False, logger=<Logger poprt (WARNING)>)
Convert genernal ONNX model to IPU friendly ONNX model.
Construct a new Converter.
- Parameters
input_shape (Dict[str, List[int]]) – the shape of inputs.
convert_version (int) – Convert opset to a specific version.
precision (str) – convert model to a soecific precision. Support precision: fp32/fp16/fp8.
checkpoints (str) – set output tensor names.
eightbitsio (bool) – enable 8bits io feature.
fp16_skip_op_types (str) – the list of ops which will keep fp32 precision in fp16 precision mode.
skip_passes (str) – the list of passes which will skip.
used_passes (List[str]) – user specified passes.
disable_fast_norm (bool) – disable to transfer layer_norm Op to fast_norm Op.
pack_args (Dict) – enable packed transformer.
fp8_skip_op_names (str) – The Op names which will keep fp32/fp16 in fp8 mode, such as ‘Conv_1,Conv_2’.
fp8_params (str) – Set parameters to fp8 model, the format is ‘input_format,weight_format,input_scale,weight_scale’.
quantize (bool) – whether to use quantization method.
enable_insert_remap (bool) – Enable insert remap automatically to improve tensor layout.
enable_erf_gelu (bool) – Enable replace Erf Gelu patterns with Gelu Op.
serialize_matmul (Dict[str, str]) – Enable to serialize MatMul Op to save memory on chip.
serialize_matmul_add (Dict[str, str]) – Enable to serialize MatMul weights and Add bias with weights last dim to save memory on chip.
remap_mode (str) – The position of remap, support after_matmul and before_add.
max_tensor_size (int) – Max tensor size(bytes) generated by constant_folding, -1 means do not set max_tensor_size by default.
infer_shape_ahead (bool) – Fix input shape and infer shapes at beginning.
enable_avoid_overflow_patterns (bool) – Enable to keep fp32 for several specific patterns in fp16 model.
check (bool) –
disable_progress_bar (bool) –
logger (Logger) –
- convert(model)
Convert genernal ONNX model to IPU friendly ONNX model.
- Parameters
model (ModelProto) – A ONNX ModelProto class object to be converted.
logger –
- Returns
A ONNX ModelProto class object representing the ONNX model.
- Return type
ModelProto
7.2. poprt.compiler
module
- class poprt.compiler.Compiler(self: poprt._compiler.Compiler) None
Compile ONNX model to PopEF.
- Return type
None
- static compile(model: str, outputs: List[str], options: poprt._compiler.CompilerOptions = <poprt._compiler.CompilerOptions object at 0x7fde92e1e1b0>) poprt::compiler::Executable
- Parameters
model (Union[AnyStr, ModelProto]) –
outputs (List[str]) –
options (CompilerOptions) –
- Return type
Executable
- static compile_and_export(model: str, outputs: List[str], filename: str, options: poprt._compiler.CompilerOptions = <poprt._compiler.CompilerOptions object at 0x7fde92e19f30>) None
- Parameters
model (Union[AnyStr, ModelProto]) –
outputs (List[str]) –
filename (str) –
options (CompilerOptions) –
- Return type
None
- static compile_and_get_summary_report(model: str, outputs: List[str], options: poprt._compiler.CompilerOptions = <poprt._compiler.CompilerOptions object at 0x7fde92e1e130>, reset_profile: bool = True) str
- Parameters
model (Union[AnyStr, ModelProto]) –
outputs (List[str]) –
options (CompilerOptions) –
reset_profile (bool) –
- Return type
str
- class poprt.compiler.CompilerOptions(self: poprt._compiler.CompilerOptions) None
- Return type
None
7.3. poprt.runtime
module
- class poprt.runtime.Runner(popef, config=None)
Load PopEF model, and execute.
- Parameters
popef (Union[str, Executable]) – input popef
config (Union[RuntimeConfig, PackRunnerConfig]) – runtime config
- Return type
None
- execute(input, output)
execute runner.
- Parameters
input (Union[InputMemoryView, Dict[str, ndarray]]) –
output (Union[OutputMemoryView, Dict[str, ndarray]]) –
- Return type
None
- class poprt.runtime.DeviceManager(self: poprt._runtime.DeviceManager) None
Device Manager.
- Return type
None
- get_device(num_ipus)
Get Devices.
- Parameters
num_ipus (int) – num_ipus
- Return type
Device
- get_num_devices()
Get the number of Devices.
- Return type
int
- ipu_hardware_version()
Get IPU version.
ipu21: C600 cards
ipu2: mk2/Bow cards
- Return type
str
7.4. poprt.frontend
module
- class poprt.frontend.OnnxFrontend(path, **kwargs)
Onnx Frontend.
- Parameters
path (str) – input model path
- Return type
None
- get_onnx_name(dir_or_name)
Filter out non onnx file.
- Parameters
files – list of file name
dir_or_name (str) –
- Returns
ONNX Model if there are only one onnx, otherwise throw error.
- Return type
Optional[str]
- load_model()
Load ONNX Model.
- Parameters
dir_or_name – directory or name of the model. If directory, there should only one model
- Returns
ONNX Model
- Return type
ModelProto
- class poprt.frontend.TensorflowFrontend(path, *, saved_model=True, signature_def='', tag='', opset=11, inputs_as_nchw=None, outputs_as_nchw=None, input_shape=None, outputs=None, **kwargs)
TensorFlow Frontend.
- Parameters
path (str) – input model path
saved_model (bool) – whether is tf saved_model
signature_def (str) – signature_def from saved_model to use
tag (str) – tag to use for saved_model
opset (int) – opset version to use for onnx domain in tf frontend
inputs_as_nchw (str) – transpose inputs as from nhwc to nchw
outputs_as_nchw (str) – transpose outputs as from nhwc to nchw
output_names – model output_names (optional for saved_model)
input_shape (Dict) –
outputs (str) –
- Return type
None
- load_model()
Load tensorflow model and convert to onnx ModelProto.
- Return type
ModelProto
7.5. poprt.backends
module
- class poprt.backends.Backend(path_or_bytes, *, export_popef=None, compiler_options=<poprt.compiler.CompilerOptions object>, runtime_options=<poprt.runtime.RuntimeConfig object>, align_output_dtype=False, logger=None)
PopRT Backend.
- Parameters
path_or_bytes (Union[AnyStr, IO[bytes], onnx.ModelProto]) – input onnx model
export_popef (str) – target PopEF export path
compiler_options (compiler.CompilerOptions) – compiler options, see poprt.compiler.CompilerOptions
runtime_options (runtime.AnyConfig) – runtime options, see poprt.runtime.RuntimeConfig
align_output_dtype (bool) – flag to align output dtype based on the onnx model.
Backend.run
also have parameteralign_output_dtype
, the value will be True if one of them is set to be Truelogger (logging.Logger) – custom logger
- Return type
None
- get_io_info()
Return meta info of input/outputs, include dtype, name, shape.
- Return type
tuple[Dict[str, Any], Dict[str, Any]]
- run(output_names, inputs, align_output_dtype=False)
Run the Model.
- Parameters
output_names (List[str]) – output tensor names
inputs (Dict[str, ndarray]) – input tensor data
align_output_dtype (bool) – flag to align output dtype based on the onnx model
- Return type
List[ndarray]
- set_opaque_blobs()
Pass dynamic input anchor info to pack.
- Return type
None
- class poprt.backends.ORTBackend(path_or_bytes, sess_options=None, providers=None, provider_options=None, lazy_load=False, **kwargs)
Bases:
Backend
onnxruntime.InferenceSession
API compatible Backend.- Parameters
path_or_bytes – input onnx model
sess_options –
onnxruntime.InferenceSession
compatible API, not usedproviders –
onnxruntime.InferenceSession
compatible API, not usedprovider_options –
onnxruntime.InferenceSession
compatible API, not usedlazy_load – ORTBackend will load ONNX model by default, set to
True
to prevent it**kwargs – see
poprt.Backend
for more args
- Return type
None
- run(output_names, input_feed, run_options=None)
Run the Model.
- Parameters
output_names – output tensor names
inputs – input tensor data
align_output_dtype – flag to align output dtype based on the onnx model
- Return type
List[ndarray]
7.6. poprt.quantizer
module
- poprt.quantizer.quantize(onnx_model, input_model, output_dir, data_preprocess=None, precision='fp8', quantize_loss_type='kld', num_of_layers_keep_fp16=0, options=None)
Quantize the model according strategy. At now, we only support SimpleQuantizer.
- Parameters
onnx_model (ModelProto) – onnx ModelProto
input_model (str) – the origin model
data_preprocess (Optional[str]) – path of pickle format file for data preprocessing, the storage format is {input_name_1: ndarray_1, input_name_2: ndarray_2, …}
precision (typing_extensions.Literal[fp8, fp8_weight]) – convert the model to the specfied type
output_dir (str) – the output dir
options (Optional[Dict[str, Any]]) – options
quantize_loss_type (str) –
num_of_layers_keep_fp16 (int) –
- Returns
A quantized onnx ModelProto
- Return type
ModelProto
- class poprt.quantizer.FP8Quantizer(output_dir, loss_type, data_preprocess=None, precision='fp8', num_of_layers_keep_fp16=0, options=None)
Return the Input Model.
- Parameters
output_dir (str) –
loss_type (str) –
data_preprocess (str) –
precision (typing_extensions.Literal[fp8, fp8_weight]) –
num_of_layers_keep_fp16 (int) –
options (Dict[str, Any]) –