Walkthrough
This walkthrough will show how to serve an application using SSF and deploy it on Gcore to use IPUs. As a prerequisite we need to follow the installation instructions to install and enable the Poplar SDK with Poptorch in the environment.
Select a model
For this example we deploy a pre-trained question answering model from Huggingface.
Distilbert-base-cased-distilled-squad will do the trick 🤗
The model itself can be imported from the optimum-graphcore library as an inference pipeline:
from optimum.graphcore import pipeline
question_answerer = pipeline(
"question-answering", model="distilbert-base-cased-distilled-squad"
)
Note that the input is a dictionary containing question
and context
strings.
The output is also a dictionary containing an answer
string, the score
, and the start
and end
positions of the answer in the context
string.
Implement the application interface
To interface our model with SSF we need to implement the application interface SSFApplicationInterface
.
The following file my_app.py
shows the code needed for this:
from optimum.graphcore import pipeline
import logging
from ssf.application_interface.application import SSFApplicationInterface
from ssf.application_interface.utils import get_ipu_count
from ssf.application_interface.results import RESULT_OK, RESULT_APPLICATION_ERROR
logger = logging.getLogger()
class MyApplication(SSFApplicationInterface):
def __init__(self):
self.question_answerer: pipeline = None
self.dummy_inputs_dict = {
"question": "What is your name?",
"context": "My name is Rob.",
}
def build(self) -> int:
if get_ipu_count() >= 2:
logger.info("Compiling model...")
build_pipeline = pipeline(
"question-answering", model="distilbert-base-cased-distilled-squad"
)
build_pipeline(self.dummy_inputs_dict)
else:
logger.info(
"IPU requirements not met on this device, skipping compilation."
)
return RESULT_OK
def startup(self) -> int:
logger.info("App started")
self.question_answerer = pipeline(
"question-answering", model="distilbert-base-cased-distilled-squad"
)
self.question_answerer(self.dummy_inputs_dict)
return RESULT_OK
def request(self, params: dict, meta: dict) -> dict:
result = self.question_answerer(params)
return result
def shutdown(self) -> int:
return RESULT_OK
def watchdog(self) -> int:
result = self.question_answerer(self.dummy_inputs_dict)
return RESULT_OK if result["answer"] == "Rob" else RESULT_APPLICATION_ERROR
Now let's explain this step-by-step.
SSF will serve an instance of MyApplication
.
To implement the interface we need to define the 5 methods build
, startup
, request
, shutdown
and watchdog
:
- In the
__init__
method we define a placeholder for thequestion_answerer
. We also define a dummy input dictionary that will be used to test the pipeline.
class MyApplication(SSFApplicationInterface):
def __init__(self):
self.question_answerer: pipeline = None
self.dummy_inputs_dict = {
"question": "What is your name?",
"context": "My name is Rob.",
}
- The
build
method is called when issuing gc-ssf build. It should contain any preliminary steps that we want to happen offline, before running the server.
Since we are using IPUs, we can compile the model in advance to save time at server startup.
To do that, we should call thepipeline
object at least once (the first call triggers compilation). The IPU compilation generates a cacheexe_cache/
, we explain later how to package this cache alongside the server. HuggingFace libraries will also download and cache model weights. We may not have access to IPUs to run thebuild
step outside of our deployment environment - we can check this by using the utility functionget_ipu_count
. If we don't have access to IPUs it will skip compilation which will then be triggered bystartup
when deployed.
Note: we usereturn RESULT_OK
from ssf return codes, this is equivalent toreturn 0
def build(self) -> int:
if get_ipu_count() >= 2:
logger.info("Compiling model...")
build_pipeline = pipeline(
"question-answering", model="distilbert-base-cased-distilled-squad"
)
build_pipeline(self.dummy_inputs_dict)
else:
logger.info(
"IPU requirements not met on this device, skipping compilation."
)
return RESULT_OK
- The
startup
method is called every time the server starts (when issuing gc-ssf run) so it can contain any warmup code we need. We instantiate and call the pipeline with dummy inputs: if the compilation cache exists, this first call will have the effect of attaching the model to available IPUs. If not, it will compile it first. Since the lifespan ofself.question_answerer
is the same asMyApplication
, the model will stay attached to the IPUs as long as theMyApplication
instance is alive.
def startup(self) -> int:
logger.info("App started")
self.question_answerer = pipeline(
"question-answering", model="distilbert-base-cased-distilled-squad"
)
self.question_answerer(self.dummy_inputs_dict)
return RESULT_OK
- The
request
method is the function executed by our API call. It is important to understand what will be in the dictionariesparams
(the inputs) andreturn
(the output) as SSF will use it later to generate the API.
def request(self, params: dict, meta: dict) -> dict:
result = self.question_answerer(params)
return result
- Any resource freeing can be carried out in the
shutdown
method. We have left it empty:
- Finally,
watchdog
will be called periodically by our server while no requests are being issued (see option--watchdog-ready-period
in options). If it fails, the server will try to kill and restartMyApplication
. As an example we verify that we get an expected output from a known input:
def watchdog(self) -> int:
result = self.question_answerer(self.dummy_inputs_dict)
return RESULT_OK if result["answer"] == "Rob" else RESULT_APPLICATION_ERROR
Write SSF config
The SSF config is the point of contact between our application and SSF. This will define all the metadata, the requirements (such as Python libraries needed for our application, the base Docker image to use, and so on), and also define our API.
The SSF config folder can be considered the primary application folder or context. All files and modules should be specified relative to the SSF config folder.
The current working directory will be set to the application module folder before SSF calls any of the application entry points (build
or request
etc.).
Let's create ssf_config.yaml
:
# Copyright (c) 2023 Graphcore Ltd. All rights reserved.
ssf_version: 1.0.0
application:
id: qa_api
name: Question Answering API
desc: A very simple QA API
version: 1.0
module: my_app.py
ipus: 2
trace: True
artifacts: [exe_cache/*]
dependencies:
python: --find-links https://download.pytorch.org/whl/cpu/torch_stable.html torch==2.0.1+cpu, optimum-graphcore==0.7.1, tokenizers==0.11.1, numpy==1.23.5
poplar: ["3.3.0"]
poplar_wheels: poptorch
package:
inclusions: [exe_cache/*]
exclusions: []
docker:
baseimage: "graphcore/pytorch:latest"
endpoints:
- id: QA
version: 1
desc: Question answering model
custom: ~
inputs:
- id: context
type: String
desc: Context
example: "The large green ball bounced down the twisty road"
- id: question
type: String
desc: Question
example: "What colour is the ball?"
outputs:
- id: answer
type: String
desc: Answer in the text
- id: score
type: Float
desc: Probability score
Now let's explain the main lines:
- Under
application
:
module
tells us where to find the interface that we have implemented:
Since we are using IPUs, let's check the resources used. The distillbert-base IPU config indicates 2 IPUs. With this config line, SSF will verify the system can acquire 2 IPUs when running the command run
or test
.
Application dependencies must be declared. This includes required Python packages plus Poplar SDK and wheels if IPU will be used. Our model needs optimum-graphcore
with some specific supporting packages plus Poplar 3.3.0 with Poptorch:
dependencies:
python: --find-links https://download.pytorch.org/whl/cpu/torch_stable.html torch==2.0.1+cpu, optimum-graphcore==0.7.1, tokenizers==0.11.1, numpy==1.23.5
poplar: ["3.3.0"]
poplar_wheels: poptorch
The package
section refers to the gc-ssf package command, we can edit how we want SSF to build our container.
We can include any files used by our application (and exclude some others), glob patterns are supported. Let's include everthing generated in the compilation cache.
Finally we want to use Graphcore's base image with pre-installed PyTorch, so we can run optimum-graphcore
without issue:
- Under
endpoints
:
This is how SSF will generate our API.
This endpoint path will be v1/QA
.
Now let's remember our application request(self, params: dict, meta: dict)
method.
We want to describe here the inputs
dictionaries using the names of the keys (context
, question
) and ssf types.
inputs:
- id: context
type: String
desc: A context
example: "The large green ball bounced down the twisty road"
- id: question
type: String
desc: The question
example: "What colour is the ball?"
We also want to describe the outputs
.
Notice we are only selecting answer
and score
from our results as we are not interested in returning the start
and end
keys.
outputs:
- id: answer
type: String
desc: Answer in the text
- id: score
type: Float
desc: Probability score
Our application is officially ready!
Now let's see what SSF can do.
Use SSF
We should now have the following file structure:
Running the application locally for development and testing
We can use the SSF commands to run our application:
The output should look similar to this:
demo@dev2:~/workspace/ssf$ gc-ssf init build run --config examples/walkthrough/ssf_config.yaml
2023-10-19 12:55:14,483 420941 INFO > Config /nethome/demmo/workspace/ssf/examples/walkthrough/ssf_config.yaml (cli.py:639)
2023-10-19 12:55:14,490 420941 INFO application.license_name not specified. Defaulting to 'None' (load_config.py:375)
...
2023-10-19 12:56:22,004 420941 INFO > Lifespan start (server.py:82)
2023-10-19 12:56:22,004 420941 INFO Lifespan start : start application (threaded) (server.py:83)
2023-10-19 12:56:22,005 420941 INFO Application startup complete. (on.py:62)
2023-10-19 12:56:22,006 420941 INFO Uvicorn running on http://0.0.0.0:8100 (Press CTRL+C to quit) (server.py:219)
...
2023-10-19 12:56:44,676 420941 INFO Dispatcher ready (dispatcher.py:542)
2023-10-19 12:56:49,493 421639 INFO [0] Dispatcher polling application replica watchdog (dispatcher.py:242)
We can see the address and port on which the application end-points have been started. In this case localhost (0.0.0.0) and port 8100. We can also see when the dispatcher object through which calls (requests) to the application are made is ready. We will see repeated polls of the application watchdog while the end-point is idle; this is the SSF built-in watchdog feature. See watchdog for further details.
Use a browser to open the endpoint docs with the format http://<address>/docs
, for example http://0.0.0.0:8100/docs
.
Use CTRL-C to stop the running application.
Tip:
If you are using Visual Studio Code with a remote connection to your development host then you can use the Port forwarding feature to add the served port (8100
in this example) and browse the endpoint directly in your VSC client using the built-in simple browser.
Packaging and Deployment
Once we are satisfied the application is working correctly, we can package and deploy it.
We can use the SSF commands for several different scenarios. First we should decide which commands we want to run locally (on our current machine) and which commands will run on the remote (deployment) machine.
Let's look at a couple of examples.
Example 1 - deployment using an application-specific packaged image
In this first example we build our server locally and deploy its image via Docker Hub. The workflow can be summarised as follows:
- The
init
andbuild
steps are run locally, to compile the model before packaging. - The
package
step creates the container image locally, andpublish
pushes it to Docker Hub. - Finally
deploy
sends and executes the deployment script on our deployment target. In the previous step, the container image was packaged in such a way to ensure it executesrun
when started on the remote machine.
Let's build our container:
We use --package-tag
to replace the tag from the config with our Docker username since we will push the image to Docker hub.
The output should look similar to this:
demo@dev2:~/workspace/ssf$ gc-ssf init build package --config examples/walkthrough/ssf_config.yaml --package-tag graphcore/cloudsolutions-dev:walkthrough_api
2023-10-19 10:11:17,670 380504 INFO > Config /nethome/demo/workspace/ssf/examples/walkthrough/ssf_config.yaml (cli.py:639)
2023-10-19 10:11:17,675 380504 INFO application.license_name not specified. Defaulting to 'None' (load_config.py:375)
2023-10-19 10:11:17,675 380504 INFO application.license_url not specified. Defaulting to 'None' (load_config.py:375)
2023-10-19 10:11:17,675 380504 INFO application.terms_of_service not specified. Defaulting to 'None' (load_config.py:375)
2023-10-19 10:11:17,675 380504 INFO application.startup_timeout not specified. Defaulting to '300' (load_config.py:375)
2023-10-19 10:11:17,675 380504 INFO application.package.name not specified. Defaulting to 'qa_api.1.0.tar.gz' (load_config.py:375)
2023-10-19 10:11:17,675 380504 INFO application.package.tag not specified. Defaulting to 'qa_api:1.0' (load_config.py:375)
2023-10-19 10:11:17,675 380504 INFO application.package.docker.run not specified. Defaulting to '' (load_config.py:375)
2023-10-19 10:11:17,675 380504 INFO application.max_batch_size not specified. Defaulting to '1' (load_config.py:375)
2023-10-19 10:11:17,675 380504 INFO application.syspaths not specified. Defaulting to '[]' (load_config.py:375)
2023-10-19 10:11:17,676 380504 INFO endpoints.0.http_param_format not specified. Defaulting to 'None' (load_config.py:375)
2023-10-19 10:11:17,676 380504 INFO endpoints.0.outputs.0.example not specified. Defaulting to 'None' (load_config.py:375)
2023-10-19 10:11:17,676 380504 INFO endpoints.0.outputs.1.example not specified. Defaulting to 'None' (load_config.py:375)
2023-10-19 10:11:17,676 380504 INFO Adding syspath /nethome/demo/workspace/ssf/examples/walkthrough (cli.py:683)
2023-10-19 10:11:17,676 380504 INFO > ==== Init ==== (init.py:17)
2023-10-19 10:11:17,676 380504 INFO > Cleaning endpoints (init.py:19)
2023-10-19 10:11:17,677 380504 INFO > Cleaning application (init.py:22)
2023-10-19 10:11:17,678 380504 INFO Clean /nethome/demo/workspace/ssf/examples/walkthrough/exe_cache/8218824126841776145.popef (init.py:36)
2023-10-19 10:11:17,679 380504 INFO > ==== Build ==== (build.py:19)
2023-10-19 10:11:17,679 380504 INFO > Generate_endpoints (build.py:26)
2023-10-19 10:11:17,679 380504 INFO loading module /nethome/demo/workspace/ssf/ssf/generate_endpoints_fastapi.py with module name generate_endpoints (utils.py:298)
2023-10-19 10:11:17,684 380504 INFO > Load application (build.py:29)
2023-10-19 10:11:17,684 380504 INFO Creating application main interface (application.py:160)
2023-10-19 10:11:17,684 380504 INFO Checking application dependencies (application.py:161)
2023-10-19 10:11:17,684 380504 INFO installing python packages git+https://github.com/huggingface/optimum-graphcore.git@97c11c3 (utils.py:276)
2023-10-19 10:11:26,230 380504 INFO Loading qa_api application main interface from /nethome/demo/workspace/ssf/examples/walkthrough/my_app.py with module id qa_api (application.py:168)
2023-10-19 10:11:26,230 380504 INFO loading module /nethome/demo/workspace/ssf/examples/walkthrough/my_app.py with module name qa_api (utils.py:298)
2023-10-19 10:11:27,765 380504 INFO Created a temporary directory at /tmp/tmpu3ter_oa (instantiator.py:21)
2023-10-19 10:11:27,766 380504 INFO Writing /tmp/tmpu3ter_oa/_remote_module_non_scriptable.py (instantiator.py:76)
2023-10-19 10:11:28,553 380504 INFO <module 'qa_api' from '/nethome/demo/workspace/ssf/examples/walkthrough/my_app.py'> (application.py:172)
2023-10-19 10:11:28,553 380504 INFO Found <class 'qa_api.MyApplication'>, MyApplication (application.py:225)
2023-10-19 10:11:28,553 380504 INFO instance=<qa_api.MyApplication object at 0x7f6c3ba5d3a0> (build.py:32)
2023-10-19 10:11:28,553 380504 INFO > Build application (build.py:34)
2023-10-19 10:11:28,626 380504 INFO Compiling model... (my_app.py:26)
No padding arguments specified, so padding to 384 by default. Inputs longer than 384 will be truncated.
Graph compilation: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:36<00:00]
2023-10-19 10:12:15,724 380504 INFO > ==== Package ==== (package.py:51)
2023-10-19 10:12:15,724 380504 INFO > Packaging qa_api to /nethome/demo/workspace/ssf/.package/qa_api (package.py:71)
2023-10-19 10:12:15,725 380504 INFO > Package name qa_api.1.0.tar.gz (package.py:72)
2023-10-19 10:12:15,725 380504 INFO > Package tag graphcore/cloudsolutions-dev:walkthrough_api (package.py:73)
2023-10-19 10:12:15,806 380504 INFO > Package SSF from /nethome/demo/workspace/ssf/ssf (package.py:111)
2023-10-19 10:12:15,862 380504 INFO > Package Application from /nethome/demo/workspace/ssf/examples/walkthrough (package.py:169)
2023-10-19 10:12:15,960 380504 INFO > Package Endpoint files (package.py:185)
2023-10-19 10:12:15,962 380504 INFO > Gathering pip requirements (package.py:206)
2023-10-19 10:12:15,963 380504 INFO > Generate container image (package.py:245)
2023-10-19 10:12:15,963 380504 INFO application.package.docker_run not specified. Defaulting to '' (utils.py:67)
2023-10-19 10:14:31,414 380504 INFO > Package: (package.py:280)
2023-10-19 10:14:31,414 380504 INFO > qa_api.1.0.tar.gz (from /nethome/demo/workspace/ssf/.package/qa_api/src) (package.py:281)
2023-10-19 10:14:31,414 380504 INFO > Test run: 'cd /nethome/demo/workspace/ssf/.package/qa_api/src && ./run.sh' (package.py:282)
2023-10-19 10:14:31,414 380504 INFO > Docker: (package.py:284)
2023-10-19 10:14:31,414 380504 INFO > Run: 'docker run --rm -d --network host --name qa_api graphcore/cloudsolutions-dev:walkthrough_api' (package.py:285)
2023-10-19 10:14:31,414 380504 INFO > Run with IPU devices: 'gc-docker -- --rm -d --name qa_api graphcore/cloudsolutions-dev:walkthrough_api' (package.py:288)
2023-10-19 10:14:31,414 380504 INFO > Logs: 'docker logs -f qa_api' (package.py:292)
2023-10-19 10:14:31,414 380504 INFO > Stop: 'docker stop qa_api' (package.py:293)
2023-10-19 10:14:31,414 380504 INFO Exit with 0 (cli.py:739)
We should be able to see that a new .package/
directory has been created, it contains the packaged application with name qa_api
. This is the packaged source used to build the docker image. We can test or debug the packaged application source locally by moving to the .package/qa_api/src
directory and running ./run.sh
. This is the same entry-point that will be used when the application docker image is deployed remotely.
We can also verify that our application image was created during the package step with docker, for example:
demo@dev2:~/workspace/ssf$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
graphcore/cloudsolutions-dev walkthrough_api 9e301f5fb62f 7 minutes ago 3.57GB
For the next step, a login to a Docker Hub registry is necessary.
SSF will log in temporarily to Docker when using --docker-username
and --docker-password
options.
But if you are already logged in with the correct account, you don't need these options for this step.
We can now publish our image on Docker hub.
Let's specify that we will push it with the same tag as during the package step using --package-tag
gc-ssf publish --config ssf_config.yaml --package-tag <docker-username>/<repo-name> --docker-username <docker-username> --docker-password <token>
Finally, we can deploy the container to Gcore.
We will assume that we have already set up a VM with at least 2 IPUs with the IP address 123.456.789.0 and the default username "ubuntu".
To access it, we have a private key. SSF will need this to access the VM and deploy. To pass the key securely we will store it in an env
variable and use the option --add-ssh-key
. For instance we can set it from a file:
Now let's run the deploy
command with the following set of options. We also pass our Docker token via the option --docker-password
which is needed to pull the image from our Docker hub repo to the remote VM (but this is not needed if you use a public repo).
gc-ssf deploy --config ssf_config.yaml --deploy-platform Gcore --port 8100 --deploy-gcore-target-address 123.456.789.0 --deploy-gcore-target-username ubuntu --docker-username <docker-username> --docker-password <token> --package-tag <username>/<repo-name>:latest --deploy-package
-
Notice the use of
--deploy-package
to specify that we want to deploy the application-specific package that we published previously. -
You can combine the
--add-ssh-key
argument with other SSF commands, so we might choose to include it withgc-ssf deploy
and use a single invocation of SSF to configure keys and deploy the packaged application image.
With this configuration our API endpoint should be available at http://123.456.789.0:8100/v1/QA.
Since it's using FastAPI you can also test it with Swagger UI under the path http://123.456.789.0:8100/docs
.
Under the hood, the deploy
command will simply run a script on the Gcore VM to pull our custom image from Docker Hub and run it.
It is valid to run gc-ssf ... init build package publish
on a machine that supports the target (for example, has the required IPU and Poplar SDK for the build
) and then later deploy the published application from any client with gc-ssf ... deploy
. In which case the client from which deploy
is issued doesn't strictly need IPUs or the Poplar SDK.
The following diagram summarises the operations of this first example.
Example 2 - deployment using the generic SSF image
Sometimes our local environment doesn't allow us to build containers, or we just want to experiment quickly. In this second example we won't build an application-specific container image. This means that the following workflow is possible:
This is made possible by storing our model in a repository and using a pre-built SSF image. First we need to set up a remote repository for our model. For example, using a GitHub account we could do:
cd project_directory && git init
git add -A
git commit -m 'First commit'
git remote add origin git@github.com:your-username/project_directory.git
git push -u -f origin main
To register the VM SSH key locally we could do:
If you use a private repo, you will also need to allow your VM to clone from it.
To do that you will need to generate a GitHub deploy-key for your repo (or an equivalent access-limited SSH key).
Then, pass it with the deploy
command using an env variable (for example MY_DEPLOY_KEY
) and --add-ssh-key
.
Now let's use deploy
targeting our git repo:
MY_DEPLOY_KEY=$(cat github_deploy_key) gc-ssf deploy --config 'git@github.com:your-username/project_directory.git|ssf_config.yaml' --port 8100 --deploy-platform Gcore --deploy-gcore-target-address 123.456.789.0 --deploy-gcore-target-username ubuntu --add-ssh-key MY_DEPLOY_KEY
- Notice this time we are not using
--deploy-package
, so SSF will deploy the default public generic SSF image.gc-ssf --help
can be used to see the default SSF image used for deployment.
Under the hood the deploy
command will send and run a script on the Gcore VM. That will pull the public SSF image from Docker Hub, build and run it. The container entry point will clone our repo and issue the three commands init build run
.
As in the first example, our API endpoint should be available at
http://123.456.789.0:8100/v1/QA
.
You can also test it with Swagger UI under the path http://123.456.789.0:8100/docs
.
The following diagram summarises the operations of this second example.
Note that we only deployed a Docker container on a Gcore VM.
You can still SSH normally into your VM and use the usual Docker commands, for example docker container ls
, docker log...
, docker stop ...
.
NOTE:
This feature defaults to using the generic Graphcore published SSF image corresponding to your local version of SSF (gc-ssf
).
If this is not available for your current version of SSF then you can still use the feature by creating your own generic SSF image:
- Build the generic SSF image for your local version with
gc-ssf package --package-tag ssf
(this will package SSF without binding an application) - Publish the resulting SSF image to your own Docker repository
- When deploying, add
--package-tag <published ssf image>
to deploy your published SSF image instead of the default SSF image
Discussion: Example 1 vs Example 2
These examples have shown two different ways to deploy on the Gcore platform with SSF. Both are serving your application with the same API, but it's important to underline their differences.
Example 1 - deployment using an application-specific packaged image
This gives you more control:
By packaging your app in advance with SSF, you create your own custom Docker image. Then you can version your images via Docker hub. This method can also have runtime advantages. By building and packaging some runtime-generated files in advance (such as IPUs pre-compiled executables) you can save some precious server startup time.
Example 2 - deployment using the generic SSF image
This is quicker but can have some runtime impact:
By deploying your model with the generic SSF image, you don't need to run Docker locally or worry about the packaging step, and your model can be versioned via git.
The server startup time might be impacted since the application build
step will be triggered in the deployment environment before the server startup. Of course, you can still include cached files as part of the model repository. But depending on the size of the files you might prefer to package your app in advance and follow Example 1, for instance if you have a very large model to compile.