6. Python API
6.1. poprt
module
- class poprt.Converter(*, input_shape=None, convert_version=11, precision='fp32', checkpoints=None, eightbitsio=False, fp16_skip_op_types=None, skip_passes=None, used_passes=[], check=False, disable_fast_norm=False, pack_args=None, fp8_skip_op_names=None, fp8_params='F143, F143, 0, 0', quantize=False, enable_insert_remap=False, enable_erf_gelu=False, serialize_matmul=None, serialize_matmul_add=None, merge_matmul=None, merge_matmul_add=None, remap_mode='after_matmul', max_tensor_size=-1, infer_shape_ahead=False, enable_avoid_overflow_patterns=False, disable_progress_bar=False, batch_size=None, batch_axis=None, remove_outputs=[], logger=<Logger poprt (WARNING)>)
Convert genernal ONNX model to IPU friendly ONNX model.
Construct a new Converter.
- Parameters
input_shape (Dict[str, List[int]]) – the shape of inputs.
convert_version (int) – Convert opset to a specific version.
precision (str) – convert model to a soecific precision. Support precision: fp32/fp16/fp8.
checkpoints (str) – set output tensor names.
eightbitsio (bool) – enable 8bits io feature.
fp16_skip_op_types (str) – the list of ops which will keep fp32 precision in fp16 precision mode.
skip_passes (str) – the list of passes which will skip.
used_passes (List[str]) – user specified passes.
disable_fast_norm (bool) – disable to transfer layer_norm Op to fast_norm Op.
pack_args (Dict) – enable packed transformer.
fp8_skip_op_names (str) – The Op names which will keep fp32/fp16 in fp8 mode, such as ‘Conv_1,Conv_2’.
fp8_params (str) – Set parameters to fp8 model, the format is ‘input_format,weight_format,input_scale,weight_scale’.
quantize (bool) – whether to use quantization method.
enable_insert_remap (bool) – Enable insert remap automatically to improve tensor layout.
enable_erf_gelu (bool) – Enable replace Erf Gelu patterns with Gelu Op.
serialize_matmul (Dict[str, str]) – Enable to serialize MatMul Op to save memory on chip.
serialize_matmul_add (Dict[str, str]) – Enable to serialize MatMul weights and Add bias with weights last dim to save memory on chip.
merge_matmul (str) – Enable to merge MatMul operations with last dim to save cycles.
merge_matmul_add (str) – Enable to merge MatMul/Add operations with last dim to save cycles.
remap_mode (str) – The position of remap, support after_matmul and before_add.
max_tensor_size (int) – Max tensor size(bytes) generated by constant_folding, -1 means do not set max_tensor_size by default.
infer_shape_ahead (bool) – Fix input shape and infer shapes at beginning.
batch_size (int) – Set the batch size for all inputs, working with the batch_axis parameter.
batch_axis (int) – Specify the batch axis for all inputs, working with the batch_size parameter.
remove_outputs (List[str]) – Remove the specific outputs and useless structures from the graph.
check (bool) –
enable_avoid_overflow_patterns (bool) –
disable_progress_bar (bool) –
logger (Logger) –
- convert(model)
Convert genernal ONNX model to IPU friendly ONNX model.
- Parameters
model (ModelProto) – A ONNX ModelProto class object to be converted.
logger –
- Returns
A ONNX ModelProto class object representing the ONNX model.
- Return type
ModelProto
6.2. poprt.compiler
module
- class poprt.compiler.Compiler(self: poprt._compiler.Compiler) None
Compile ONNX model to PopEF.
- Return type
None
- static compile(model: str, outputs: List[str], options: poprt._compiler.CompilerOptions = <poprt._compiler.CompilerOptions object at 0x7f8a93a373b0>) poprt::Executable
- Parameters
model (Union[AnyStr, ModelProto]) –
outputs (List[str]) –
options (CompilerOptions) –
- Return type
Executable
- static compile_and_export(model: str, outputs: List[str], filename: str, options: poprt._compiler.CompilerOptions = <poprt._compiler.CompilerOptions object at 0x7f8a93a2fb70>) None
- Parameters
model (Union[AnyStr, ModelProto]) –
outputs (List[str]) –
filename (str) –
options (CompilerOptions) –
- Return type
None
- static compile_and_get_summary_report(model: str, outputs: List[str], options: poprt._compiler.CompilerOptions = <poprt._compiler.CompilerOptions object at 0x7f8a93a37330>, reset_profile: bool = True) str
- Parameters
model (Union[AnyStr, ModelProto]) –
outputs (List[str]) –
options (CompilerOptions) –
reset_profile (bool) –
- Return type
str
- class poprt.compiler.CompilerOptions(self: poprt._compiler.CompilerOptions) None
- Return type
None
6.3. poprt.runtime
module
- class poprt.runtime.Runner(popef, config=None)
Load PopEF model, and execute.
- Parameters
popef (Union[str, Executable]) – input popef
config (Union[RuntimeConfig, PackRunnerConfig]) – runtime config
- Return type
None
- execute(input, output)
execute runner.
- Parameters
input (Dict[str, ndarray]) –
output (Dict[str, ndarray]) –
- Return type
None
- class poprt.runtime.DeviceManager(self: poprt._runtime.DeviceManager) None
Device Manager.
- Return type
None
- get_device(num_ipus)
Get devices.
- Parameters
num_ipus (int) – num_ipus
- Return type
Device
- get_num_devices()
Get the number of devices.
- Return type
int
- get_specific_device(device_id)
Get specific devices.
- Parameters
device_id (int) – target device id
- Return type
Device
- ipu_hardware_version()
Get IPU version.
ipu21: C600 cards
ipu2: mk2/Bow cards
- Return type
str
6.4. poprt.frontend
module
- class poprt.frontend.OnnxFrontend(path, **kwargs)
Onnx Frontend.
- Parameters
path (str) – input model path
- Return type
None
- get_onnx_name(dir_or_name)
Filter out non onnx file.
- Parameters
files – list of file name
dir_or_name (str) –
- Returns
ONNX Model if there are only one onnx, otherwise throw error.
- Return type
Optional[str]
- load_model()
Load ONNX Model.
- Parameters
dir_or_name – directory or name of the model. If directory, there should only one model
- Returns
ONNX Model
- Return type
ModelProto
- class poprt.frontend.TensorflowFrontend(path, *, saved_model=True, signature_def='', tag='', opset=11, inputs_as_nchw=None, outputs_as_nchw=None, input_shape=None, outputs=None, **kwargs)
TensorFlow Frontend.
- Parameters
path (str) – input model path
saved_model (bool) – whether is tf saved_model
signature_def (str) – signature_def from saved_model to use
tag (str) – tag to use for saved_model
opset (int) – opset version to use for onnx domain in tf frontend
inputs_as_nchw (str) – transpose inputs as from nhwc to nchw
outputs_as_nchw (str) – transpose outputs as from nhwc to nchw
output_names – model output_names (optional for saved_model)
input_shape (Dict) –
outputs (str) –
- Return type
None
- load_model()
Load tensorflow model and convert to onnx ModelProto.
- Return type
ModelProto
6.5. poprt.backends
module
- class poprt.backends.Backend(path_or_bytes, *, export_popef=None, compiler_options=<poprt.compiler.CompilerOptions object>, runtime_options=poprt.runtime.RuntimeConfig{deviceWaitConfig=poprt.runtime.DeviceWaitConfig{timeoutSec=1, sleepTimeSec=6}, timeoutNS=10000000, threadSafe=True, validateIOParams=True, batchingDim=4294967295, checkPackageHash=True, ringBufferSizeMultiplier=2, autoReset=False, flushOnWaitingOutputs=False, batchSizeTimeoutNS=9223372036854775807, dataParallelTimeoutNS=9223372036854775807, isBatchSizeTimeoutEnabled=False, requestTracepointsBufferSize=1000, }, align_output_dtype=False, logger=None)
PopRT Backend.
- Parameters
path_or_bytes (Union[AnyStr, IO[bytes], onnx.ModelProto]) – input onnx model
export_popef (str) – target PopEF export path
compiler_options (compiler.CompilerOptions) – compiler options, see poprt.compiler.CompilerOptions
runtime_options (runtime.AnyConfig) – runtime options, see poprt.runtime.RuntimeConfig
align_output_dtype (bool) – flag to align output dtype based on the onnx model.
Backend.run
also have parameteralign_output_dtype
, the value will be True if one of them is set to be Truelogger (logging.Logger) – custom logger
- Return type
None
- get_io_info()
Return meta info of input/outputs, include dtype, name, shape.
- Return type
tuple[Dict[str, Any], Dict[str, Any]]
- run(output_names, inputs, align_output_dtype=False)
Run the Model.
- Parameters
output_names (List[str]) – output tensor names
inputs (Dict[str, ndarray]) – input tensor data
align_output_dtype (bool) – flag to align output dtype based on the onnx model
- Return type
List[ndarray]
- set_opaque_blobs()
Pass dynamic input anchor info to pack.
- Return type
None
- class poprt.backends.ORTBackend(path_or_bytes, sess_options=None, providers=None, provider_options=None, lazy_load=False, **kwargs)
Bases:
Backend
onnxruntime.InferenceSession
API compatible Backend.- Parameters
path_or_bytes – input onnx model
sess_options –
onnxruntime.InferenceSession
compatible API, not usedproviders –
onnxruntime.InferenceSession
compatible API, not usedprovider_options –
onnxruntime.InferenceSession
compatible API, not usedlazy_load – ORTBackend will load ONNX model by default, set to
True
to prevent it**kwargs – see
poprt.Backend
for more args
- Return type
None
- run(output_names, input_feed, run_options=None)
Run the Model.
- Parameters
output_names – output tensor names
inputs – input tensor data
align_output_dtype – flag to align output dtype based on the onnx model
- Return type
List[ndarray]
6.6. poprt.quantizer
module
- poprt.quantizer.quantize(onnx_model, input_model, output_dir, data_preprocess=None, precision='fp8', quantize_loss_type='kld', num_of_layers_keep_fp16=0, options=None)
Quantize the model according strategy. At now, we only support SimpleQuantizer.
- Parameters
onnx_model (ModelProto) – onnx ModelProto
input_model (str) – the origin model
data_preprocess (Optional[str]) – path of pickle format file for data preprocessing, the storage format is {input_name_1: ndarray_1, input_name_2: ndarray_2, …}
precision (typing_extensions.Literal[fp8, fp8_weight]) – convert the model to the specfied type
output_dir (str) – the output dir
options (Optional[Dict[str, Any]]) – options
quantize_loss_type (str) –
num_of_layers_keep_fp16 (int) –
- Returns
A quantized onnx ModelProto
- Return type
ModelProto
- class poprt.quantizer.FP8Quantizer(output_dir, loss_type, data_preprocess=None, precision='fp8', num_of_layers_keep_fp16=0, options=None)
Return the Input Model.
- Parameters
output_dir (str) –
loss_type (str) –
data_preprocess (str) –
precision (typing_extensions.Literal[fp8, fp8_weight]) –
num_of_layers_keep_fp16 (int) –
options (Dict[str, Any]) –
6.7. poprt.passes
module
- class poprt.Pass(*args, **kwargs)
Abstract Base Class for Passes.
A new Pass could be like:
import onnx from poprt.passes import register, Pass @register('dummy_pass') class Dummy(Pass): def __init__(self, *args, **kwargs) -> None: super().__init__(*args, **kwargs) def run(self, onnx_model: onnx.ModelProto) -> onnx.ModelProto: print(f"producer_name: {onnx_model.producer_name}") return onnx_model
- Return type
None
- static against_passes(pass_names)
Register
against
property for a Pass.Passes can’t work with
against
passes.- Parameters
pass_names (List[str]) –
- Return type
Callable[[Any], PassReg]
- static constraint_passes(constraint_name, pass_names)
Register constraints for pass.
valid constraints are against, depend, before
- Parameters
constraint_name (str) –
pass_names (List[str]) –
- Return type
Callable[[Any], PassReg]
- static get_pass(name, *args, **kwargs)
Get a Pass by registered name.
- Parameters
name (str) – registered name of a Pass
- Returns
a Pass instance
- Return type
Example:
import poprt # get a Pass with parameters onnx_model = poprt.get_pass('float_to_half', skip_op_types=['Gelu'])(onnx_model) poprt.get_pass('model_overview')(onnx_model)
- static get_registered_passes()
Get all registered Passes.
- Return type
Dict[str, PassReg]
- static get_typed_registered_passes(pass_type)
Get typed registered Passes.
- Parameters
pass_type (Any) –
- Return type
Dict[str, Pass]
- static property_register(k, v)
Register a property for Pass.
- Parameters
k (str) –
v (Any) –
- Return type
Callable[[Any], PassReg]
- static register_pass(pass_name)
Register a Pass.
- Parameters
pass_name (str) –
- Return type
Callable[[Any], PassReg]
- run(onnx_model)
Run Pass, inherited subclasses should override this method.
- Parameters
onnx_model (ModelProto) – input onnx model
- Returns
the optimized onnx model
- Return type
ModelProto
- traverse_graph(graph, transform, is_main_graph=True)
Traverse a GraphProto and transform GraphProtos.
- Parameters
graph (GraphProto) – Input Graph.
transform (Callable[[GraphProto, bool], GraphProto]) – Transform function.
is_main_graph (bool) –
- Return type
GraphProto
- class poprt.PassManager(used_passes=[], gather_ir_passes=False)
Manage Passes.
- Parameters
used_passes (List[Union[str, Pass]]) – passes that will be used
gather_ir_passes (bool) – gather onnx ir passes and execute it in one turn.
- Return type
None
Example:
import poprt pm = poprt.PassManager( [ 'model_overview', 'float_to_half', poprt.get_pass('model_overview'), ] ) pm.run(onnx_model)
- add_passes(used_passes=[], gather_ir_passes=False)
Add passes for PassManager.
- Parameters
used_passes (List[Union[str, Pass]]) – passes that will be used
gather_ir_passes (bool) – gather onnx ir passes and execute it in one turn.
- Return type
None
- get_all_pass_names()
Get all Pass names.
- Return type
List[str]
- run(onnx_model)
Apply passes to the onnx model.
- Parameters
onnx_model (ModelProto) – onnx model that will be optimized
- Return type
ModelProto
- sort_passes()
Solve Pass dependency.
6.7.1. Built-in passes
Refer to the Section 5.1, Passes section for more details about passes.
- class poprt.passes.add_checkpoints.AddCheckpoints(checkpoints)
Add intermediate tensor to output.
- Parameters
checkpoints (List[str]) –
- Return type
None
This pass is registered as add_checkpoints
.
- class poprt.passes.apply_host_concat_split.ApplyHostConcatSplit(merged_inputs)
Merge model inputs with the same shape.
NOTE: this is an experimental feature. Only tested on merging 2D inputs.
For example, a onnx graph has 3 inputs with same shape [512, 1] and dtype fp16. Before this Pass(only show inputs info):
+--+-----+---+ +-512*1*fp16-+ +--+-----+---+ +-512*1*fp16-+ +--+-----+---+ +-512*1*fp16-+
After applying this Pass:
+------+ -> +--+-----+---+ +------------+ | | +-512*1*fp16-+ | | | | | 3*512*fp16 | --> | HCSR | -> +--+-----+---+ | | | Node | +-512*1*fp16-+ +------------+ | | | | -> +--+-----+---+ +------+ +-512*1*fp16-+
The raw inputs will be replaced with one new input with shape [3, 512] and same dtype. The HCSR Node is a custom operation equal to Split + Reshape. it has 3 new outputs, these outputs have same shape and data from raw model inputs.
- Parameters
merged_inputs (List[Any]) –
This pass is registered as apply_host_concat_split
.
- class poprt.passes.apply_ir_pass.ApplyIrPass(passes=[])
Apply passes based on onnx IR.
- Parameters
passes (List[str]) –
- Return type
None
This pass is registered as apply_ir_pass
.
- class poprt.passes.auto_insert_remap.AutoInsertRemap(remap_mode='after_matmul')
Insert remap after matmul.
This is an experimental feature. There are two different insert mode: after_matmul and before_add. For after_matmul mode, it’s more general but more likely OOM, for before_add mode, it’s target to reduce cycles of attention + mask in transformer- based model.
- Parameters
remap_mode (str) –
- Return type
None
This pass is registered as auto_insert_remap
.
- class poprt.passes.workarounds.BatchNormWorkaround(*args, **kwargs)
Workaround for BatchNorm Operator.
- Return type
None
This pass is registered as batchnorm_workaround
.
- class poprt.passes.check_with_fake_data.CheckWithFakeData(origin_model)
Checking model with fake data using onnxruntime.
- Parameters
origin_model (ModelProto) –
- Return type
None
This pass is registered as check_with_fake_data
.
- class poprt.passes.const_batch_size.ConstBatchSize(const_batch_size=1)
Convert unknown batch size to a const value.
- Parameters
const_batch_size (int) –
- Return type
None
This pass is registered as const_batch_size
.
- class poprt.passes.const_input_shape.ConstInputShape(const_input_shape={}, batch_size=None, batch_axis=None)
Convert input shape to const values.
- Parameters
const_input_shape (Dict[str, Any]) –
batch_size (int) –
batch_axis (int) –
This pass is registered as const_input_shape
.
- class poprt.passes.constant_folding.ConstantFolding(max_tensor_size=- 1)
Support constant folding.
- Parameters
max_tensor_size (int) –
- Return type
None
This pass is registered as constant_folding
.
- class poprt.passes.workarounds.CumSumWorkaround(*args, **kwargs)
Workaround for CumSum Operator.
- Return type
None
This pass is registered as cumsum_workaround
.
- class poprt.passes.double_to_float.DoubleToFloat(*args, **kwargs)
Transfer double to float(only for initializer).
- Return type
None
This pass is registered as double_to_float
.
- class poprt.passes.eight_bits_io.EightBitsIO
Insert norm operator after input image.
This pass is registered as eight_bits_io
.
- class poprt.passes.apply_ir_pass.eliminate_deadend(*args, **kwargs)
- Return type
None
This pass is registered as eliminate_deadend
.
- class poprt.passes.apply_ir_pass.eliminate_duplicate_initializer(*args, **kwargs)
- Return type
None
This pass is registered as eliminate_duplicate_initializer
.
- class poprt.passes.apply_ir_pass.eliminate_identity(*args, **kwargs)
- Return type
None
This pass is registered as eliminate_identity
.
- class poprt.passes.apply_ir_pass.eliminate_nop_arithmetic(*args, **kwargs)
- Return type
None
This pass is registered as eliminate_nop_arithmetic
.
- class poprt.passes.apply_ir_pass.eliminate_nop_cast(*args, **kwargs)
- Return type
None
This pass is registered as eliminate_nop_cast
.
- class poprt.passes.apply_ir_pass.eliminate_nop_expand(*args, **kwargs)
- Return type
None
This pass is registered as eliminate_nop_expand
.
- class poprt.passes.apply_ir_pass.eliminate_nop_flatten(*args, **kwargs)
- Return type
None
This pass is registered as eliminate_nop_flatten
.
- class poprt.passes.apply_ir_pass.eliminate_nop_if(*args, **kwargs)
- Return type
None
This pass is registered as eliminate_nop_if
.
- class poprt.passes.apply_ir_pass.eliminate_nop_pad(*args, **kwargs)
- Return type
None
This pass is registered as eliminate_nop_pad
.
- class poprt.passes.apply_ir_pass.eliminate_nop_reshape(*args, **kwargs)
- Return type
None
This pass is registered as eliminate_nop_reshape
.
- class poprt.passes.apply_ir_pass.eliminate_nop_transpose(*args, **kwargs)
- Return type
None
This pass is registered as eliminate_nop_transpose
.
- class poprt.passes.apply_ir_pass.eliminate_unused_initializer(*args, **kwargs)
- Return type
None
This pass is registered as eliminate_unused_initializer
.
- class poprt.passes.erf_gelu_pattern.ErfGeluPattern(*args, **kwargs)
Recognise the pattern of Erf Gelu Op and replace the pattern with Erf Gelu.
- Return type
None
This pass is registered as erf_gelu_pattern
.
- class poprt.passes.apply_ir_pass.extract_constant_to_initializer(*args, **kwargs)
- Return type
None
This pass is registered as extract_constant_to_initializer
.
- class poprt.passes.fill_squeeze_axes.FillSqueezeAxes(*args, **kwargs)
Fill the empty axes of Squeeze Op to ensure that shape-inference work.
- Return type
None
This pass is registered as fill_squeeze_axes
.
- class poprt.passes.final_check.FinalCheck(*args, **kwargs)
Final check for dtype and shape of the converted model.
- Return type
None
This pass is registered as final_check
.
- class poprt.passes.workarounds.FloatOpsWorkaround(*args, **kwargs)
Workaround for Operators which are required with float32 / float16 inputs.
- Return type
None
This pass is registered as float_ops_workaround
.
- class poprt.passes.float_to_fp8.Float2FP8(fp8_params=['F143', 'F143', 0, 0], skip_op_names=[], convert_model='fp8', fp8_input_dict=None, fp8_weight_dict=None)
Convert a model from fp32 or fp16 to fp8.
- Parameters
fp8_params (List[Union[typing_extensions.Literal[F143, F152], str]]) – Set parameters to fp8 model, the format is [input_format, weight_format, input_scale, weight_scale]
skip_op_names (List[str]) – The Op names which will keep fp32/fp16 in fp8 mode, such as [‘Conv_1’, ‘Conv_2’]
convert_model (typing_extensions.Literal[fp8, fp8_weight]) – Specifies which type the model is converted to, can be set to ‘fp8’ or ‘fp8_weight’
fp8_input_dict (Dict[str, int]) – Set parameters for each fp8 input node of fp8 model, if it’s not None,
fp8_params
will be discardedfp8_weight_dict (Dict[str, int]) – Set parameters for each fp8 weight node of fp8 model, if it’s not None,
fp8_params
will be discarded
- Return type
None
This pass is registered as float_to_fp8
.
- class poprt.passes.float_to_half.Float2Half(skip_op_types=[], enable_avoid_overflow_patterns=False)
Convert a model from fp32 to fp16.
Create Float2Half instance.
- Parameters
skip_op_types (List[str]) – The Op types which will keep fp32 in fp16 mode.
enable_avoid_overflow_patterns (bool) – Enable to keep fp32 for several specific patterns in fp16 model.
- Returns
A Float2Half instance
- Return type
None
This pass is registered as float_to_half
.
- class poprt.passes.float_to_mixed.Float2Mixed
Convert a model from fp32 to mixed precision.
- Return type
None
This pass is registered as float_to_mixed
.
- class poprt.passes.apply_ir_pass.fuse_bn_into_conv(*args, **kwargs)
- Return type
None
This pass is registered as fuse_bn_into_conv
.
- class poprt.passes.fuse_bn_into_gemm.FuseBnIntoGemm
Fuse BatchNormalization to Matmul/Gemm.
- Condition:
condition 1: Matmul/Gemm use initializer condition 2: No multi outputs in Gemm/Matmul condition 3: Initializers used across operaters is not supported
- Return type
None
This pass is registered as fuse_bn_into_gemm
.
- class poprt.passes.fuse_cast_into_onehot.FuseCastIntoOnehot
Fuse Cast into OneHot.
- Return type
None
This pass is registered as fuse_cast_into_onehot
.
- class poprt.passes.apply_ir_pass.fuse_consecutive_cast(*args, **kwargs)
- Return type
None
This pass is registered as fuse_consecutive_cast
.
- class poprt.passes.apply_ir_pass.fuse_consecutive_reshape(*args, **kwargs)
- Return type
None
This pass is registered as fuse_consecutive_reshape
.
- class poprt.passes.apply_ir_pass.fuse_consecutive_squeeze(*args, **kwargs)
- Return type
None
This pass is registered as fuse_consecutive_squeeze
.
- class poprt.passes.apply_ir_pass.fuse_consecutive_transpose(*args, **kwargs)
- Return type
None
This pass is registered as fuse_consecutive_transpose
.
- class poprt.passes.apply_ir_pass.fuse_consecutive_unsqueeze(*args, **kwargs)
- Return type
None
This pass is registered as fuse_consecutive_unsqueeze
.
- class poprt.passes.fuse_mul_into_matmul.FuseMulIntoMatmul
Fuse Mul into MatMul.
- Return type
None
This pass is registered as fuse_mul_into_matmul
.
- class poprt.passes.fused_attention.FusedAttention(*args, **kwargs)
Recognise the pattern of MultiHeadAttention and replace it with Fused MultiHeadAttention. Attention Pattern as below:
Add | Reshape -- -- -- | \ \ MatMul MatMul MatMul | | | Reshape Reshape Reshape | | | Add Add Add | | | Reshape Reshape Reshape
Fused Attention Pattern as below:
Add | Concat | MatMul | Add | Reshape | Transpose | Split
- Return type
None
This pass is registered as fused_attention
.
- class poprt.passes.gelu_pattern.GeluPattern(*args, **kwargs)
Recognise the pattern of Gelu Op and replace the pattern with Gelu.
- Return type
None
This pass is registered as gelu_pattern
.
- class poprt.passes.workarounds.IndicesWorkaround(*args, **kwargs)
Workaround for Gather / GatherElements Operator.
- Return type
None
This pass is registered as indices_workaround
.
- class poprt.passes.insert_attention_mask.InsertAttentionMask(*args, **kwargs)
Replace Reshap-Cast-Sub-Mul with Cast-AttentionMask.
- Return type
None
This pass is registered as insert_attention_mask
.
- class poprt.passes.int64_to_int32.Int64ToInt32(*args, **kwargs)
Transfer int64 to int32.
- Return type
None
This pass is registered as int64_to_int32
.
- class poprt.passes.layer_norm_pattern.LayerNormPattern(*args, **kwargs)
Recognise the pattern of LayerNorm Op and replace the pattern with GroupNorm.
- Return type
None
This pass is registered as layer_norm_pattern
.
- class poprt.passes.layer_precision_compare.LayerPrecisionCompare(origin_model, data_preprocess=None, options=None, output_dir='./')
Compare the output of conv/matmul/gemm operator of the origin model and the fp8 model.
It will randomly takes a batch of data from the calibration for inference, and then records the output of the origin model and the converted model. We use cosine distance to evaluate the error because it is a normalized number that measures the angle between vectors. The closer the value is to 0, the smaller the error. The log will write to a log file.
Create LayerPrecisionCompare instance.
- Parameters
data_preprocess (str) – Path of pickle format file for data preprocessing.
options (Dict[str, Any]) – options for session.
output_dir (str) – The save path of log.
origin_model (ModelProto) –
- Returns
A LayerPrecisionCompare instance
- Return type
None
This pass is registered as layer_precision_compare
.
- class poprt.passes.manual_sharding.ManualSharding(sharding_info=None, pipelining_info=None)
Shard the graph to several subgraphs manually in terms of specific nodes.
- Parameters
sharding_info (Dict[str, int]) –
pipelining_info (Dict[str, int]) –
- Return type
None
This pass is registered as manual_sharding
.
- class poprt.passes.matmul_rotary_embedding.MatmulRotaryEmbedding
Recognise the pattern of element-wised rotary embedding and replace the pattern with equivalent matmul.
- Return type
None
This pass is registered as matmul_rotary_embedding
.
- class poprt.passes.merge_matmul.MergeMatmul(merge_str=None)
- Parameters
merge_str (str) –
- Return type
None
This pass is registered as merge_matmul
.
- class poprt.passes.merge_matmul_add.MergeMatmulAdd(merge_str=None)
- Parameters
merge_str (str) –
- Return type
None
This pass is registered as merge_matmul_add
.
- class poprt.passes.model_overview.ModelOverview(use_print=True, *args, **kwargs)
Show the overview of the model, just print the information to stdout.
- Return type
None
This pass is registered as model_overview
.
- class poprt.passes.move_subgraph_initializer.MoveSubgraphInitializer
Move subgraph’s initializers into main graph.
PopART only search initializers from main graph.
- Return type
None
This pass is registered as move_subgraph_initializer
.
- class poprt.passes.workarounds.OneHotWorkaround(*args, **kwargs)
Workaround for OneHot Op which is required with int32 depth and positive axis.
- Return type
None
This pass is registered as onehot_workaround
.
- class poprt.passes.overlap_io.OverlapIO
Enable overlap io.
- Return type
None
This pass is registered as overlap_io
.
- class poprt.passes.packed_transformer.PackedTransformer(args)
Recognise the pattern of SelfAttention and replace it with Packed SelfAttention.
This pass is registered as packed_transformer
.
- class poprt.passes.post_expand.PostExpand(*args, **kwargs)
- Return type
None
This pass is registered as post_expand
.
- class poprt.passes.pre_scale.PreScale(*args, **kwargs)
Pre scale: attention matrix Q to Q/sqrt(d), and remove 1/sqrt(d) node.
- Return type
None
This pass is registered as pre_scale
.
- class poprt.passes.remove_duplicated_initializer.RemoveDuplicatedInitializer
Remove duplicated initializer to save memory.
- Return type
None
This pass is registered as remove_duplicated_initializer
.
- class poprt.passes.workarounds.RemoveEmptyConcatInputs(*args, **kwargs)
Workaround for Concat Op which does not support empty inputs in PopART.
- Return type
None
This pass is registered as remove_empty_concat_inputs
.
- class poprt.passes.remove_initializer_from_input.RemoveInitializerFromInput(*args, **kwargs)
Remove initializer from model inputs.
Model: https://github.com/onnx/models/blob/main/vision/classification/resnet/model/resnet50-v1-7.onnx
- Return type
None
This pass is registered as remove_initializer_from_input
.
- class poprt.passes.remove_input_cast.RemoveInputCast(*args, **kwargs)
Remove input cast: input(fp16)->cast(fp16->int32)->gather to input(int32)->gather.
- Return type
None
This pass is registered as remove_input_cast
.
- class poprt.passes.remove_outputs.RemoveOutputs(outputs=[])
Remove specific outputs and useless structures of the graph.
- Parameters
outputs (List[str]) –
- Return type
None
This pass is registered as remove_outputs
.
- class poprt.passes.replace_bn_with_mul_add.ReplaceBNWithMulAdd(*args, **kwargs)
Replace BatchNormalization Op with Mul + Add.
- Return type
None
This pass is registered as replace_bn_with_mul_add
.
- class poprt.passes.replace_castlike.ReplaceCastLike(*args, **kwargs)
Replace onnx CastLike op to Cast.
- Return type
None
This pass is registered as replace_castlike
.
- class poprt.passes.replace_clip_empty_inputs.ReplaceClipInputs(*args, **kwargs)
Replace Clip Op empty inputs.
- Return type
None
This pass is registered as replace_clip_empty_inputs
.
- class poprt.passes.replace_consecutive_cast_with_notzero.ReplaceConsecuiveCastWithNotZero(*args, **kwargs)
Recognise the pattern of consecutive Cast Ops and replace the pattern with a NotZero Op.
- Return type
None
This pass is registered as replace_consecutive_cast_with_notzero
.
- class poprt.passes.replace_div_with_mul.ReplaceDivWithMul(*args, **kwargs)
Replace Div with Mul if the divisor is constant.
Model: https://github.com/onnx/models/blob/main/text/machine_comprehension/gpt-2/model/gpt2-10.onnx
- Return type
None
This pass is registered as replace_div_with_mul
.
- class poprt.passes.apply_ir_pass.replace_einsum_with_matmul(*args, **kwargs)
- Return type
None
This pass is registered as replace_einsum_with_matmul
.
- class poprt.passes.replace_erf_with_erfv2.ReplaceErfWithErfV2(*args, **kwargs)
Replace Erf Op with ErfV2.
ErfV2 is more efficient with bigger error.
- Return type
None
This pass is registered as replace_erf_with_erfv2
.
- class poprt.passes.replace_gemm_with_matmul.ReplaceGemmWithMatMul(*args, **kwargs)
replace Gemm with MatMul in onnx model.
- Return type
None
This pass is registered as replace_gemm_with_matmul
.
- class poprt.passes.replace_greater_or_equal.ReplaceGreaterOrEqual(*args, **kwargs)
Replace GreaterOrEqual Op with Less Op and Not Op.
- Return type
None
This pass is registered as replace_greater_or_equal
.
- class poprt.passes.replace_groupnorm_with_fast_norm.ReplaceGroupNormWithFastNorm(*args, **kwargs)
Replace GroupNormalization to FastNorm if datatype is fp16 and num_groups=1.
- Return type
None
This pass is registered as replace_groupnorm_with_fast_norm
.
- class poprt.passes.replace_half_reducemean.ReplaceHalfReduceMean(*args, **kwargs)
Replace ReduceMean Op in fp16 mode with ReduceSum + Mul in case of overflow.
- Return type
None
This pass is registered as replace_half_reducemean
.
- class poprt.passes.replace_hardswish.ReplaceHardSwish(*args, **kwargs)
Replace HardSwish Op with HardSigmoid Op and Mul Op.
Replacement is required for the opset before 14 since HardSwish is only supported 14.
- Return type
None
This pass is registered as replace_hardswish
.
- class poprt.passes.replace_isinf.ReplaceIsInf(*args, **kwargs)
Replace IsInf Op with IsInfV2 Op(support detect_negative/_positive).
- Return type
None
This pass is registered as replace_isinf
.
- class poprt.passes.replace_less_or_equal.ReplaceLessOrEqual(*args, **kwargs)
Replace LessOrEqual Op with Less Op and Not Op.
- Return type
None
This pass is registered as replace_less_or_equal
.
- class poprt.passes.replace_nonzero.ReplaceNonZero(*args, **kwargs)
Replace NonZero by ArgMax when the number of nonzero element is known.
Right now only single element is supported, going to support multi elements with TopK.
- Return type
None
This pass is registered as replace_nonzero
.
- class poprt.passes.replace_pow.ReplacePow(*args, **kwargs)
Replace Pow Op with Square Op and Mul Op.
- Return type
None
This pass is registered as replace_pow
.
- class poprt.passes.replace_round.ReplaceRound(*args, **kwargs)
Replace Round Op with RoundV2 Op(half to even mode).
- Return type
None
This pass is registered as replace_round
.
- class poprt.passes.replace_softmax.ReplaceSoftmax(*args, **kwargs)
Replace Softmax Op with SoftmaxV2 Op when the axis is the lowest dim and the lowest dim is an odd.
- Return type
None
This pass is registered as replace_softmax
.
- class poprt.passes.replace_where_mask.ReplaceWhereMask(*args, **kwargs)
Change attention_mask method from where to add.
- Return type
None
This pass is registered as replace_where_mask
.
- class poprt.passes.replace_where_with_mul_add.ReplaceWhereWithMulAdd
Where(condition, X, Y) = Add(Mul(condition, X), Mul(neg_condition, Y)).
This pass is registered as replace_where_with_mul_add
.
- class poprt.passes.replace_where_with_wherev2.ReplaceWhereWithWhereV2(*args, **kwargs)
Replace Where Op with WhereV2.
- Return type
None
This pass is registered as replace_where_with_wherev2
.
- class poprt.passes.serialize_matmul.SerializeMatmul(serialize_dict=None)
Enable to serialize Matmul Op to save memory on chip.
- Parameters
serialize_dict (Dict) –
- Return type
None
This pass is registered as serialize_matmul
.
- class poprt.passes.serialize_matmul_add.SerializeMatmulAdd(serialize_dict=None)
- Parameters
serialize_dict (Dict) –
- Return type
None
This pass is registered as serialize_matmul_add
.
- class poprt.passes.shape_inference.ReplacePow(*args, **kwargs)
Do shape inference.
- Return type
None
This pass is registered as shape_inference
.
- class poprt.passes.workarounds.TopKWorkaround(*args, **kwargs)
Workaround for TopK Op which is required with positive axis.
- Return type
None
This pass is registered as topk_workaround
.
- class poprt.passes.apply_ir_pass.trace_folding(*args, **kwargs)
- Return type
None
This pass is registered as trace_folding
.
- class poprt.passes.apply_ir_pass.unique_name_for_nodes(*args, **kwargs)
- Return type
None
This pass is registered as unique_name_for_nodes
.