8. C++ API
8.1. PopRT Compiler
-
class Compiler
Public Static Functions
-
static void compileAndExport(const std::string &model, const std::vector<std::string> &outputs, std::ostream &out, const CompilerOptions &options = CompilerOptions())
Compile model and Export PopEF to stream.
- Parameters
model – [in] An ONNX model protobuf, or the name of a file containing an ONNX model protobuf.
outputs – [in] Output tensor names
out – [out] The stream that the compiled PopEF will be written to.
options – [in] The user configuration options for the Compiler class. Default: CompilerOptions().
-
static void compileAndExport(const std::string &model, const std::vector<std::string> &outputs, const std::string &fileName, const CompilerOptions &options = CompilerOptions())
Compile model and Export PopEF to file.
- Parameters
model – [in] An ONNX model protobuf, or the name of a file containing an ONNX model protobuf.
outputs – [in] Output tensor names
fileName – [out] The file name that the compiled PopEF will be written to.
options – [in] The user configuration options for the Compiler class. Default: CompilerOptions().
-
static std::shared_ptr<Executable> compile(const std::string &model, const std::vector<std::string> &outputs, const CompilerOptions &options = CompilerOptions())
Compile and return a Executable object.
- Parameters
model – [in] An ONNX model protobuf, or the name of a file containing an ONNX model protobuf.
outputs – [in] Output tensor names
options – [in] The user configuration options for the Compiler class. Default: CompilerOptions().
-
static std::string compileAndGetSummaryReport(const std::string &model, const std::vector<std::string> &outputs, const CompilerOptions &options, bool resetProfile = true)
Compile model and return summary report.
- Parameters
model – [in] An ONNX model protobuf, or the name of a file containing an ONNX model protobuf.
outputs – [in] Output tensor names
options – [in] The user configuration options for the Compiler class. Default: CompilerOptions().
resetProfile – [in] If
true
, resets the execution profile. Default =true
.
- Returns
A string containing the report.
Public Static Attributes
-
static opaqueblobs::OpaqueBlobs compileTimeBlobs_
-
static void compileAndExport(const std::string &model, const std::vector<std::string> &outputs, std::ostream &out, const CompilerOptions &options = CompilerOptions())
-
struct CompilerOptions
Public Functions
-
inline bool operator==(const CompilerOptions &other) const
Public Members
-
int64_t numIpus = 1
Number IPUs to select
-
std::string ipuVersion = ""
IPU version, auto detect if empty.
-
int64_t batchesPerStep = 1
The number of batches to run on the chip before returning
-
std::string partialsType = "half"
Set the partials type globally for matmuls and convolutions. Valid values are
"float"
and"half"
.
-
float availableMemoryProportion = 0.6
Set the available memory proportion globally for matmuls and convolutions. Valid values are between 0 and 1 (inclusive) [=0.6].
-
int64_t numIOTiles = 0
Number of IPU tiles dedicated to IO.
-
bool enableModelFusion = false
Enable model fusion.
-
bool enablePrefetchDatastreams = true
Enable prefetching for input data streams.
Poplar will speculatively read data for a stream before it is required in order to allow the ‘preparation’ of the data to occur in parallel with compute. Enabled when
true
. Default:true
.
-
unsigned streamBufferingDepth = 1
Specify the default buffering depth value used for streams.
-
bool enablePadConvChannel = false
Custom patterns
-
bool serializeIr = false
Serialize Ir
-
std::string serializedIrDest = ""
Destination to dump ir serialization stream
-
bool enableGatherSimplifier = true
Simplify Gather operator
-
std::map<std::string, std::string> engineOptions
Poplar engine options
-
bool showCompilationProgressBar = true
Show progress bar when compilation
-
std::vector<std::string> customPatterns
Specify custom patterns.
-
std::map<std::string, std::vector<std::string>> customTransformApplierSettings
Specify custom transforms.
-
std::map<std::string, std::string> opaqueBlobs
Specify opaque blob messages.
-
bool use128BitConvUnitLoad = false
Bit-Width of conv load.
-
bool enableFastReduce = false
Enable fast reduce.
-
bool enableOutlining = true
Enable out lining.
-
bool groupHostSync = false
Specify to group the h2d streams at the beginning of the schedule, the d2h streams at the end of the schedule.
When
true
, tensors will stay live for longer.Default:
false
.
-
bool rearrangeStreamsOnHost = false
Enable rearrangement of h2d tensors to be done on the host.
Default:
false
(Rearrangement done on device).
-
bool rearrangeAnchorsOnHost = true
Enable rearrangement of d2h tensors to be done on the host.
Default:
true
(Rearrangement done on host to save device memory).
-
float outlineThreshold = 1.0f
Specify the incremental value that a sub-graph requires, relative to its nested sub-graphs (if any), to be eligible for outlining.
Default: 1.0f.
-
bool enableNonStableSoftmax = false
Enable the non-stable softmax Poplar function.
Default:
false
(not enabled).
-
bool enablePipelining = false
Enable pipelining of virtual graphs.
Default:
false
(not enabled).
-
bool enableEngineCaching = false
Enable Poplar executable caching. The file is saved to the location defined with cachePath.
Default:
false
(not enabled).
-
std::string cachePath = ""
Folder to save the poplar executable to.
Default: “” (not enabled).
-
uint64_t subgraphCopyingStrategy = 0
Specify how copies for inputs and outputs for subgraphs are lowered.
Default: popart::OnEnterAndExit.
-
std::string virtualGraphMode = "off"
Specify how to place ops on virtual graphs to achieve model parallelism, either manually using model annotations, or automatically.
Default: popart::VirtualGraphMode::Off.
-
inline bool operator==(const CompilerOptions &other) const
-
class Executable
Executable of model compiled
Public Functions
-
Executable() = delete
-
~Executable() = default
-
Executable(const Executable&) = delete
-
Executable &operator=(const Executable &other) = delete
-
Executable(Executable&&) = default
-
Executable &operator=(Executable&&) = default
-
Executable(std::unique_ptr<popef::Model> popefModel, std::map<std::string, std::string> popefOpaque = {})
Create the executable of model
- Parameters
popefModel – [in] The popefModel contained
popefOpaque – [in] The opaque messages contained
-
std::shared_ptr<popef::Model> getPopefModel()
Get the PopEF Model
- Returns
The PopEF Model
-
const std::map<std::string, std::string> &getOpaqueBlobs()
Get the opaque blobs of the Model
- Returns
a map stores name and value of the opaqueBlob
-
Executable() = delete
-
class CustomTransformManager
8.2. PopRT Runtime
8.2.1. ModelRunner
-
class ModelRunner : public poprt::runtime::BaseRunner
Load PopEF model, and execute
Public Functions
-
ModelRunner(const ModelRunner&) = delete
-
ModelRunner &operator=(const ModelRunner &other) = delete
-
ModelRunner(ModelRunner&&)
Default forward constructor.
-
ModelRunner &operator=(ModelRunner&&)
Default move assignment operator.
-
~ModelRunner() override
Default destructor.
-
ModelRunner(const std::string &popefPath, const RuntimeConfig &config = RuntimeConfig())
Create a new ModelRunner object.
- Parameters
popefPath – The path to PopEF files from which the model will be loaded.
config – The runtime configuration.
Create a new ModelRunner object.
- Parameters
executable – The Executable which the model will be loaded.
config – The runtime configuration.
-
virtual void execute(const InputMemoryView &inputData, const OutputMemoryView &outputData) override
Run a model synchronously. The user allocates and passes pointers to output memory.
- Parameters
inputData – [in] The user-allocated tensor buffer for all executable input tensors.
outputData – [in] The user-allocated tensor buffer for all executable output tensors
-
virtual OutputFutureMemoryView executeAsync(const InputMemoryView &inputData, const OutputMemoryView &outputData) override
Run a model asynchronously. The user allocates and passes pointers to output memory.
- Parameters
inputData – [in] The user-allocated tensor buffer for all executable input tensors.
outputData – [in] The user-allocated tensor buffer for all executable output tensors.
- Returns
The future result of an asynchronous call for all executable output tensors.
-
virtual std::vector<InputDesc> getExecuteInputs() const override
Get a description of all the user-provided input data. In addition to the data used by the execute calls, it will return a description of all tensors used by the model which must be provided during the phase of loading the model onto the device. The data required for the additional tensors may be included in PopEF files. In this case, the descriptions of the additional are loaded automatically by ModelRunner.
- Returns
A vector of DataDesc instances.
-
virtual std::vector<OutputDesc> getExecuteOutputs() const override
Get a description of all the user-provided output data. In addition to the data used by the execute calls, it will return a list of descriptions of all tensors used by the model that the loading phase requires (weights tensors as an example). The data for these additional tensors can be included in PopEF files that are loaded automatically by the ModelRunner.
- Returns
The vector of DataDesc instances.
-
ModelRunner(const ModelRunner&) = delete
-
struct RuntimeConfig
Public Functions
-
inline bool operator==(const RuntimeConfig &other) const
Public Members
-
bool isPack = false
-
DeviceWaitConfig deviceWaitConfig
By default, the model runner throws an exception when it is not able to attach to any device required by the given model. This behavior can be changed by setting a custom DeviceWaitConfig.
-
bool threadSafe = true
If true, the mutex will be locked on each execution call. If false, the mutex will not be locked. By default the model runner is not thread-safe and each replica has an independent mutex. Default: true.
-
std::chrono::nanoseconds timeoutNS = std::chrono::microseconds(5000)
Duration in nanoseconds to wait before calling timeout callback when the IPU is waiting for input data, which is not available. If 0, never call the timeout, in other words, wait forever for the data.
-
bool validateIOParams = true
If true, the I/O parameters will be checked during the execution ModelRunner “execute” functions. If false, this check is not done. Default: true.
-
uint32_t batchingDim = std::numeric_limits<uint32_t>::max()
The dimesion on which the input data will extend with batch size. For example, the PopEF model with shape [4, 4, 3, 3] and batchingDim=0, means the batch size extends on dimension 0. And the input data shape [?, 4, 3, 3] will be allowed. Where ? can be [1, 2, …, N]
The default value is std::numeric_limits<uint32_t>::max(), which means the dynamic batch size disabled, and the input data only can be N * batch_size_of_popef_model, for example [n * 4, 4, 3, 3] for the above model, where n can be [1, 2, …, N]
-
bool checkPackageHash = true
If true, the Poplar hash will be checked before the executable is loaded onto the device. If false, this check is not done. Default: true.
-
inline bool operator==(const RuntimeConfig &other) const
-
struct DeviceWaitConfig
-
struct DataDesc
The description of data used by ModelRunner.
Public Functions
-
DataDesc(std::string name, int64_t sizeInBytes, std::vector<int64_t> shape, popef::DataType dataType, bool popefContainsTensorData = false)
Create description of input/output data.
- Parameters
name – [in] The name of the input/output tensor.
sizeInBytes – [in] The size of the tensor measured in bytes.
shape – [in] A vector defining the shape of the tensor. The size of the vector is equal to the number of tensor dimensions. Each element of the vector indicates the size of a single dimension.
dataType – [in] The data type of a single tensor element.
popefContainsTensorData – [in] If true, the model has a tensor data blob associated with the tensor. If false, the model does not have a tensor data blob associated with the tensor. Default: false.
Public Members
-
std::string name
The name of the input/output tensor.
-
int64_t sizeInBytes
The size of the tensor measured in bytes.
-
std::vector<int64_t> shape
A vector defining the shape of the tensor. The size of the vector is equal to the number of tensor dimensions. Each element of the vector indicates the size of a single dimension.
-
popef::DataType dataType
The data type of a single tensor element.
-
bool popefContainsTensorData
If true, the model has a tensor data blob associated with the tensor. If false, the model does not have a tensor data blob associated with the tensor.
-
DataDesc(std::string name, int64_t sizeInBytes, std::vector<int64_t> shape, popef::DataType dataType, bool popefContainsTensorData = false)
-
using poprt::runtime::InputDesc = DataDesc
Description of input data required by ModelRunner.
-
using poprt::runtime::OutputDesc = DataDesc
Description of output data required by ModelRunner.
-
using poprt::runtime::InputMemoryView = std::unordered_map<std::string, ConstTensorMemoryView>
Mapping between a tensor name and an immutable memory view. Used as input to ModelRunner::execute.
-
using poprt::runtime::OutputMemoryView = std::unordered_map<std::string, TensorMemoryView>
Mapping between a tensor name and a memory view. Used as output from ModelRunner::execute, when the output memory is allocated and managed by the ModelRunner client.
-
struct ConstTensorMemoryView
Immutable view to already allocated memory.
Public Functions
-
ConstTensorMemoryView() = default
Default constructor.
-
ConstTensorMemoryView(const TensorMemoryView &other)
Default copy constructor.
-
ConstTensorMemoryView(const void *data, uint64_t dataSizeBytes)
Immutable view to const memory.
- Parameters
data – [in] The pointer to the allocated memory.
dataSizeBytes – [in] The size of the memory block, in bytes.
-
ConstTensorMemoryView() = default
-
struct TensorMemoryView
Mutable view to already allocated memory.
8.2.2. PackRunner
-
class PackRunner : public poprt::runtime::BaseRunner
Load PopEF model, and execute.
Public Functions
-
PackRunner(const PackRunner&) = delete
-
PackRunner &operator=(const PackRunner &other) = delete
-
PackRunner(PackRunner&&)
Default forward constructor.
-
PackRunner &operator=(PackRunner&&)
Default move assignment operator.
-
~PackRunner() override
Default destructor.
-
PackRunner(const std::string &popefPath, const PackRunnerConfig &config)
Create a new PackRunner object.
- Parameters
popefPath – [in] The path to PopEF files from which the model will be loaded.
config – [in] The pack runner configuration.
Create a new PackRunner object.
- Parameters
executable – [in] The Executable which the model will be loaded.
config – [in] The pack runner configuration.
-
inline virtual void execute(const InputMemoryView &inputData, const OutputMemoryView &outputData) override
-
virtual OutputFutureMemoryView executeAsync(const InputMemoryView &inputData, const OutputMemoryView &outputData) override
Run a model asynchronously. The user allocates and passes pointers to output memory.
- Parameters
inputData – [in] The user allocated tensor buffer for all executable input tensors.
outputData – [in] The user allocated tensor buffer for all executable output tensors.
- Returns
The future result of an asynchronous call for all executable output tensors.
-
virtual std::vector<InputDesc> getExecuteInputs() const override
Get a description of the input data required in the execute class methods.
- Returns
A vector of DataDesc instances.
-
virtual std::vector<OutputDesc> getExecuteOutputs() const override
Get a description of the output data required in the execute class methods.
- Returns
A vector of DataDesc instances.
-
PackRunner(const PackRunner&) = delete
-
struct PackRunnerConfig
Public Functions
-
inline explicit PackRunnerConfig(int timeoutInMicroSeconds = 0, int maxValidNum = 0, std::string dynamicInputName = "", std::string unpackInfoInputName = "")
-
void enablePaddingRemovePattern(std::string maskName, std::vector<std::string> dynamicGroup)
Used to remove padding from user based on mask.
-
void enableSingleRowMode(std::string maskName, std::string unpackInfoName = "", int delimiterNum = 0)
Enable pack mode in which data can no across rows.
Public Members
-
int timeoutInMicroSeconds = 0
Used to determine when to force to push the user input data into the queue, even if the PackRunner can receive more data. The value of
timeoutInMicroSeconds
should be greater than 0.
-
int maxValidNum = 0
maxValidNum is the maximum samples that PackRunner can reach. PackRunner will stop pack when reached maxValidNum samples or reached the maximum space that user allowed or reached limited time.
-
std::string dynamicInputName = ""
Dynamic sequence input name.
-
std::string unpackInfoInputName = ""
Unpack info input name.
-
std::string maskName = ""
Attention mask name, used to remove padding of user input.
-
std::vector<std::string> dynamicGroup
Used to specify group of dynamic inputs when remove padding(e.g.,
{input_ids, mask, token_type, position_ids}
). Fixed size input name should not be in dynamicGroup.
-
PackAlgorithm algo = PackAlgorithm::NextFit
-
bool disableDataAcrossRows = false
User input can not across rows in this pack mode.
-
bool enablePaddingRemove = false
Remove pad from user.
-
int delimiterNum = 0
Used to insert delimiter before pack.
-
inline explicit PackRunnerConfig(int timeoutInMicroSeconds = 0, int maxValidNum = 0, std::string dynamicInputName = "", std::string unpackInfoInputName = "")
8.2.3. Device
-
class Device
Create a device.
-
class DeviceManager
Select which device to run on.
Public Functions
-
DeviceManager &operator=(const DeviceManager &other) = delete
-
DeviceManager &operator=(DeviceManager&&) = delete
-
DeviceManager()
Constructor with default values.
-
~DeviceManager()
-
DeviceManager(const DeviceManager&) = delete
Delete copy constructor.
-
DeviceManager(DeviceManager&&)
Default forward constructor.
-
std::string ipuHardwareVersion()
Get the version of the IPU on the physical system.
- Returns
The version of the IPU on the physical system.
-
std::size_t getNumDevices() const
-
DeviceManager &operator=(const DeviceManager &other) = delete