gRPC support
Enabling gRPC in SSF
To serve endpoints with the gRPC framework in SSF add --api grpc
to SSF command line arguments. For example:
Support for a gRPC framework has been implemented using the grpcio Python module. SSF servers supports reflection so the gRPC framework can be discovered using clients that support reflection. The gRPC runtime is based on Predict Protocol - Version 2 proposed by KServe. Predict Protocol support is limited to the data types that are supported in SSF.
Configuration
--grpc-max-connections
[default: 10] - number of possible simultaneous connections.
Compatibility between FastAPI and gRPC api
Any
and ListAny
The gRPC API features that are available match the FastAPI features, with the sole exception of the Any
type, which is only available with FastAPI. If the SSF config YAML file contains Any
or ListAny
types then a gRPC enabled SSF will fail to boot. For gRPC API supporting applications, users must replace Any
and ListAny
types with the best matching strong type.
Server replication
Direct control over the number of receiver threads running is not possible when using gRPC. Instead, the user can specify the maximum number of parallel client connections using the --grpc-max-connections
command line argument when calling gc-ssf. The default number of gRPC workers is set to 10.
Testing
Most of the SSF examples contain tests that are designed to communicate using a REST style. The SSF testing engine can reuse those tests with a gRPC framework by using a proxy session which translates REST requests to gRPC requests. For the tests in the tests
directory, proxy sessions can be used for tests that are build on top of the model.TestModel
class. In order to run the gRPC test execute self.test_model(API_GRPC)
.
Sending a request with Predict Protocol - Version 2 (Python)
gRPC clients can be implemented with many different frameworks and programming languages and describing such implementations is out of scope of this documentation. SSF starts a server that is reflection capable so the protocol itself can be discovered by the client. Nevertheless there are some SSF specific concepts that have to be obeyed in order to successfully communicate with the SSF gRPC API.
The quick guide below shows an example of how to properly construct a request using Python:
- Create an empty
request
:
- Specify details for the endpoints in the
parameters
container of the request usingversion
andendpoint
keys. The values must match the values defined in the SSF application config YAML file.
request.parameters["version"].string_param = "1"
request.parameters["endpoint"].string_param = "TestEndpoint"
- Add inputs according to the application config.
name
for the input must match the one defined in the SSF application config YAML file.
inputs = []
# adding float values
values = [1.3, 4.5, 5.0, 3.4]
inputs.append(self.proto_predict_v2.ModelInferRequest().InferInputTensor())
inputs[-1].name = "name_of_the_input__float"
inputs[-1].shape.append(len(values))
inputs[-1].contents.fp64_contents.extend(values)
# adding integer values
values = [1, 4, 5, 3]
inputs.append(self.proto_predict_v2.ModelInferRequest().InferInputTensor())
inputs[-1].name = "name_of_the_input__int"
inputs[-1].shape.append(len(values))
inputs[-1].contents.int_contents.extend(values)
# adding boolean values
values = [True, False, True, False, False]
inputs.append(self.proto_predict_v2.ModelInferRequest().InferInputTensor())
inputs[-1].name = "name_of_the_input__boolean"
inputs[-1].shape.append(len(values))
inputs[-1].contents.bool_contents.extend(values)
# adding string values
values = ["test string 1", "test string 2"]
inputs.append(self.proto_predict_v2.ModelInferRequest().InferInputTensor())
inputs[-1].name = "name_of_the_input__string"
inputs[-1].shape.append(len(values))
values_encoded = [bytes(elt, "UTF-8") for elt in values]
inputs[-1].contents.bytes_contents.extend(values_encoded)
# adding file input
# for file client has to additionally specify file name with extension in `file_name` parameter
with open(input_image, "rb") as fp:
inputs.append(self.proto_predict_v2.ModelInferRequest().InferInputTensor())
inputs[-1].name = "name_of_the_input__file"
inputs[-1].shape.append(1)
inputs[-1].parameters["file_name"].string_param = "test_file.jpg"
inputs[-1].contents.bytes_contents.append(fp.read())
request.inputs.extend(inputs)
- Send the request (how to perform a server initialization is described in the selected framework guide i.e. for Python gRPC in offical documentation)
Receiving a response with Predict Protocol - Version 2 (Python)
When receiving a request the client may follow these steps:
- Check if the response contains member
outputs
.
- For each output get the
datatype
attribute and process it accordingly:
# integers are stored in int_contents member
if output.datatype == "INT32":
ret_val_decoded = [elt for elt in output.contents.int_contents]
# floats are stored in fp64_contents member
elif output.datatype == "FP64":
ret_val_decoded = [elt for elt in output.contents.fp64_contents]
# booleans are stored in bool_contents member
elif output.datatype == "BOOL":
ret_val_decoded = [elt for elt in output.contents.bool_contents]
# strings are stored in bytes_contents member
elif output.datatype == "BYTE_STRING":
ret_val = output.contents.bytes_contents
while len(ret_val):
# string values have to be decoded
ret_val_decoded.append(ret_val.pop(0).decode())
# byte arrays are stored in bytes_contents member
elif output.datatype == "BYTES":
ret_val = output.contents.bytes_contents
while len(ret_val):
# pop instead of list comprehension to free the memory on the fly
ret_val_decoded.append(ret_val.pop(0))
Testing gRPC
To test your gRPC, any client that supports reflection can be used. A quick way to test is to use grpcui
which starts a client directly from DockerHub. When initializing the client, make sure to match the port that was set when booting SSF: