gRPC support

Enabling gRPC in SSF

To serve endpoints with the gRPC framework in SSF add --api grpc to SSF command line arguments. For example:

gc-ssf --config example/simple/ssf_config_grpc.yaml build run --api grpc

Support for a gRPC framework has been implemented using the grpcio Python module. SSF servers supports reflection so the gRPC framework can be discovered using clients that support reflection. The gRPC runtime is based on Predict Protocol - Version 2 proposed by KServe. Predict Protocol support is limited to the data types that are supported in SSF.

Configuration

--grpc-max-connections [default: 10] - number of possible simultaneous connections.

Compatibility between FastAPI and gRPC api

`Any` and `ListAny`

The gRPC API features that are available match the FastAPI features, with the sole exception of the Any type, which is only available with FastAPI. If the SSF config YAML file contains Any or ListAny types then a gRPC enabled SSF will fail to boot. For gRPC API supporting applications, users must replace Any and ListAny types with the best matching strong type.

Server replication

Direct control over the number of receiver threads running is not possible when using gRPC. Instead, the user can specify the maximum number of parallel client connections using the --grpc-max-connections command line argument when calling gc-ssf. The default number of gRPC workers is set to 10.

Testing

Most of the SSF examples contain tests that are designed to communicate using a REST style. The SSF testing engine can reuse those tests with a gRPC framework by using a proxy session which translates REST requests to gRPC requests. For the tests in the tests directory, proxy sessions can be used for tests that are build on top of the model.TestModelclass. In order to run the gRPC test execute self.test_model(API_GRPC).

Sending a request with Predict Protocol - Version 2 (Python)

gRPC clients can be implemented with many different frameworks and programming languages and describing such implementations is out of scope of this documentation. SSF starts a server that is reflection capable so the protocol itself can be discovered by the client. Nevertheless there are some SSF specific concepts that have to be obeyed in order to successfully communicate with the SSF gRPC API.

The quick guide below shows an example of how to properly construct a request using Python:

Create an empty request:

request = proto_predict_v2.ModelInferRequest()

Specify details for the endpoints in the parameters container of the request using version and endpoint keys. The values must match the values defined in the SSF application config YAML file.

request.parameters["version"].string_param = "1"
request.parameters["endpoint"].string_param = "TestEndpoint"

Add inputs according to the application config. name for the input must match the one defined in the SSF application config YAML file.

inputs = []

# adding float values
values = [1.3, 4.5, 5.0, 3.4]
inputs.append(self.proto_predict_v2.ModelInferRequest().InferInputTensor())
inputs[-1].name = "name_of_the_input__float"
inputs[-1].shape.append(len(values))
inputs[-1].contents.fp64_contents.extend(values)

# adding integer values
values = [1, 4, 5, 3]
inputs.append(self.proto_predict_v2.ModelInferRequest().InferInputTensor())
inputs[-1].name = "name_of_the_input__int"
inputs[-1].shape.append(len(values))
inputs[-1].contents.int_contents.extend(values)

# adding boolean values
values = [True, False, True, False, False]
inputs.append(self.proto_predict_v2.ModelInferRequest().InferInputTensor())
inputs[-1].name = "name_of_the_input__boolean"
inputs[-1].shape.append(len(values))
inputs[-1].contents.bool_contents.extend(values)

# adding string values
values = ["test string 1", "test string 2"]
inputs.append(self.proto_predict_v2.ModelInferRequest().InferInputTensor())
inputs[-1].name = "name_of_the_input__string"
inputs[-1].shape.append(len(values))
values_encoded = [bytes(elt, "UTF-8") for elt in values]
inputs[-1].contents.bytes_contents.extend(values_encoded)

# adding file input
# for file client has to additionally specify file name with extension in `file_name` parameter
with open(input_image, "rb") as fp:
    inputs.append(self.proto_predict_v2.ModelInferRequest().InferInputTensor())
    inputs[-1].name = "name_of_the_input__file"
    inputs[-1].shape.append(1)
    inputs[-1].parameters["file_name"].string_param = "test_file.jpg"
    inputs[-1].contents.bytes_contents.append(fp.read())

request.inputs.extend(inputs)

Send the request (how to perform a server initialization is described in the selected framework guide i.e. for Python gRPC in offical documentation)

Receiving a response with Predict Protocol - Version 2 (Python)

When receiving a request the client may follow these steps:

Check if the response contains member outputs.

if hasattr(response, "outputs"):
    # process outputs
else:
    pass

For each output get the datatype attribute and process it accordingly:

# integers are stored in int_contents member
if output.datatype == "INT32":
    ret_val_decoded = [elt for elt in output.contents.int_contents]

# floats are stored in fp64_contents member
elif output.datatype == "FP64":
    ret_val_decoded = [elt for elt in output.contents.fp64_contents]

# booleans are stored in bool_contents member
elif output.datatype == "BOOL":
    ret_val_decoded = [elt for elt in output.contents.bool_contents]

# strings are stored in bytes_contents member
elif output.datatype == "BYTE_STRING":
    ret_val = output.contents.bytes_contents
    while len(ret_val):
        # string values have to be decoded
        ret_val_decoded.append(ret_val.pop(0).decode())

# byte arrays are stored in bytes_contents member
elif output.datatype == "BYTES":
    ret_val = output.contents.bytes_contents
    while len(ret_val):
        # pop instead of list comprehension to free the memory on the fly
        ret_val_decoded.append(ret_val.pop(0))

Testing gRPC

To test your gRPC, any client that supports reflection can be used. A quick way to test is to use grpcui which starts a client directly from DockerHub. When initializing the client, make sure to match the port that was set when booting SSF:

docker run --init --rm -p 8080:8080 --network=host fullstorydev/grpcui -max-msg-sz 10063522 -plaintext 0.0.0.0:50051