Many applications can take advantage of GPU acceleration, in particular resource-intensive Machine Learning (ML) applications. The development time of such applications may vary based on the hardware of the machine we use for development. Containerization will facilitate development due to reproducibility and will make the setup easily transferable to other machines. Most importantly, a containerized application is easily deployable to platforms such as Amazon ECS, where it can take advantage of different hardware configurations.
In this tutorial, we discuss how to develop GPU-accelerated applications in containers locally and how to use Docker Compose to easily deploy them to the cloud (the Amazon ECS platform). We make the transition from the local environment to a cloud effortless, the GPU-accelerated application being packaged with all its dependencies in a Docker image, and deployed in the same way regardless of the target environment.
In order to follow this tutorial, we need the following tools installed locally:
- Windows and MacOS: install Docker Desktop
- Linux: install Docker Engine and Compose CLI
- To deploy to Amazon ECS: an AWS account
For deploying to a cloud platform, we rely on the new Docker Compose implementation embedded into the Docker CLI binary. Therefore, when targeting a cloud platform we are going to run
docker compose commands instead of
docker-compose. For local commands, both implementations of Docker Compose should work. If you find a missing feature that you use, report it on the issue tracker.
Keep in mind that what we want to showcase is how to structure and manage a GPU accelerated application with Docker Compose, and how we can deploy it to the cloud. We do not focus on GPU programming or the AI/ML algorithms, but rather on how to structure and containerize such an application to facilitate portability, sharing and deployment.
For this tutorial, we rely on sample code provided in the Tensorflow documentation, to simulate a GPU-accelerated translation service that we can orchestrate with Docker Compose. The original code can be found documented at https://www.tensorflow.org/tutorials/text/nmt_with_attention. For this exercise, we have reorganized the code such that we can easily manage it with Docker Compose.
This sample uses the Tensorflow platform which can automatically use GPU devices if available on the host. Next, we will discuss how to organize this sample in services to containerize them easily and what the challenges are when we locally run such a resource-intensive application.
Note: The sample code to use throughout this tutorial can be found here. It needs to be downloaded locally to exercise the commands we are going to discuss.
1. Local environment
Let’s assume we want to build and deploy a service that can translate simple sentences to a language of our choice. For such a service, we need to train an ML model to translate from one language to another and then use this model to translate new inputs.
We choose to separate the phases of the ML process in two different Compose services:
- A training service that trains a model to translate between two languages (includes the data gathering, preprocessing and all the necessary steps before the actual training process).
- A translation service that loads a model and uses it to `translate` a sentence.
This structure is defined in the docker-compose.dev.yaml from the downloaded sample application which has the following content:
services: training: build: backend command: python model.py volumes: - models:/checkpoints translator: build: backend volumes: - models:/checkpoints ports: - 5000:5000 volumes: models:
We want the training service to train a model to translate from English to French and to save this model to a named volume models that is shared between the two services. The translator service has a published port to allow us to query it easily.
Deploy locally with Docker Compose
The reason for starting with the simplified compose file is that it can be deployed locally whether a GPU is present or not. We will see later how to add the GPU resource reservation to it.
Before deploying, rename the docker-compose.dev.yaml to docker-compose.yaml to avoid setting the file path with the flag -f for every compose command.
To deploy the Compose file, all we need to do is open a terminal, go to its base directory and run:
$ docker compose up The new 'docker compose' command is currently experimental. To provide feedback or request new features please open issues at https://github.com/docker/compose-cli [+] Running 4/0 ⠿ Network "gpu_default" Created 0.0s ⠿ Volume "gpu_models" Created 0.0s ⠿ gpu_translator_1 Created 0.0s ⠿ gpu_training_1 Created 0.0s Attaching to gpu_training_1, gpu_translator_1 ... translator_1 | * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit) ... HTTP/1.1" 200 - training_1 | Epoch 1 Batch 0 Loss 3.3540 training_1 | Epoch 1 Batch 100 Loss 1.6044 training_1 | Epoch 1 Batch 200 Loss 1.3441 training_1 | Epoch 1 Batch 300 Loss 1.1679 training_1 | Epoch 1 Loss 1.4679 training_1 | Time taken for 1 epoch 218.06381964683533 sec training_1 | training_1 | Epoch 2 Batch 0 Loss 0.9957 training_1 | Epoch 2 Batch 100 Loss 1.0288 training_1 | Epoch 2 Batch 200 Loss 0.8737 training_1 | Epoch 2 Batch 300 Loss 0.8971 training_1 | Epoch 2 Loss 0.9668 training_1 | Time taken for 1 epoch 211.0763041973114 sec ... training_1 | Checkpoints saved in /checkpoints/eng-fra training_1 | Requested translator service to reload its model, response status: 200 translator_1 | 172.22.0.2 - - [18/Dec/2020 10:23:46] "GET /reload?lang=eng-fra
Docker Compose deploys a container for each service and attaches us to their logs which allows us to follow the progress of the training service.
Every 10 cycles (epochs), the training service requests the translator to reload its model from the last checkpoint. If the translator is queried before the first training phase (10 cycles) is completed, we should get the following message.
$ curl -d "text=hello" localhost:5000/ No trained model found / training may be in progress...
From the logs, we can see that each training cycle is resource-intensive and may take very long (depending on parameter setup in the ML algorithm).
The training service runs continuously and checkpoints the model periodically to a named volume shared between the two services.
$ docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES f11fc947a90a gpu_training "python model.py" 14 minutes ago Up 54 minutes gpu_training_1 baf147fbdf18 gpu_translator "/bin/bash -c 'pytho..." 14 minutes ago Up 54 minutes 0.0.0.0:5000->5000/tcp gpu_translator_1
We can now query the translator service which uses the trained model:
$ $ curl -d "text=hello" localhost:5000/ salut ! $ curl -d "text=I want a vacation" localhost:5000/ je veux une autre . $ curl -d "text=I am a student" localhost:5000/ je suis etudiant .
Keep in mind that, for this exercise, we are not concerned about the accuracy of the translation but how to set up the entire process following a service approach that will make it easy to deploy with Docker Compose.
During development, we may have to re-run the training process and evaluate it each time we tweak the algorithm. This is a very time consuming task if we do not use development machines built for high performance.
An alternative is to use on-demand cloud resources. For example, we could use cloud instances hosting GPU devices to run the resource-intensive components of our application. Running our sample application on a machine with access to a GPU will automatically switch to train the model on the GPU. This will speed up the process and significantly reduce the development time.
The first step to deploy this application to some faster cloud instances is to pack it as a Docker image and push it to Docker Hub, from where we can access it from cloud instances.
Build and Push images to Docker Hub
During the deployment with
compose up, the application is packed as a Docker image which is then used to create the containers. We need to tag the built images and push them to Docker Hub.
A simple way to do this is by setting the image property for services in the Compose file. Previously, we had only set the build property for our services, however we had no image defined. Docker Compose requires at least one of these properties to be defined in order to deploy the application.
We set the image property following the pattern <account>/<name>:<tag>
services: training: image: myhubuser/gpudemo build: backend command: python model.py volumes: - models:/checkpoints translator: image: myhubuser/gpudemo build: backend volumes: - models:/checkpoints ports: - 5000:5000 volumes: models:
To build the images run:
$ docker compose build The new 'docker compose' command is currently experimental. To provide feedback or request new features please open issues at https://github.com/docker/compose-cli [+] Building 1.0s (10/10) FINISHED => [internal] load build definition from Dockerfile 0.0s => => transferring dockerfile: 206B ... => exporting to image 0.8s => => exporting layers 0.8s => => writing image sha256:b53b564ee0f1986f6a9108b2df0d810f28bfb209 4743d8564f2667066acf3d1f 0.0s => => naming to docker.io/myhubuser/gpudemo $ docker images | grep gpudemo myhubuser/gpudemo latest b53b564ee0f1 2 minutes ago 5.83GB
Notice the image has been named according to what we set in the Compose file.
Before pushing this image to Docker Hub, we need to make sure we are logged in. For this we run:
$ docker login ... Login Succeeded
Push the image we built:
$ docker compose push Pushing training (myhubuser/gpudemo:latest)... The push refers to repository [docker.io/myhubuser/gpudemo] c765bf51c513: Pushed 9ccf81c8f6e0: Layer already exists ... latest: digest: sha256:c40a3ca7388d5f322a23408e06bddf14b7242f9baf7fb e7201944780a028df76 size: 4306
The image pushed is public unless we set it to private in Docker Hub’s repository settings. The Docker documentation covers this in more detail.
With the image stored in a public image registry, we will look now at how we can use it to deploy our application on Amazon ECS and how we can use GPUs to accelerate it.
2. Deploy to Amazon ECS for GPU-acceleration
To deploy the application to Amazon ECS, we need to have credentials for accessing an AWS account and to have Docker CLI set to target the platform.
Let’s assume we have a valid set of AWS credentials that we can use to connect to AWS services. We need now to create an ECS Docker context to redirect all Docker CLI commands to Amazon ECS.
Create an ECS context
To create an ECS context run the following command:
$ docker context create ecs cloud ? Create a Docker context using: [Use arrows to move, type to filter] > AWS environment variables An existing AWS profile A new AWS profile
This prompts users with 3 options, depending on their familiarity with the AWS credentials setup.
For this exercise, to skip the details of AWS credential setup, we choose the first option. This requires us to have the
AWS_SECRET_KEY set in our environment, when running Docker commands that target Amazon ECS.
We can now run Docker commands and set the context flag for all commands targeting the platform, or we can switch it to be the context in use to avoid setting the flag on each command.
Set Docker CLI to target ECS
Set the context we created previously as the context in use by running:
$ docker context use cloud $ docker context ls NAME TYPE DESCRIPTION DOCKER ENDPOINT KUBERNETES ENDPOINT ORCHESTRATOR default moby Current DOCKER_HOST based configuration unix:///var/run/docker.sock swarm cloud * ecs credentials read from environment
Starting from here, all the subsequent Docker commands are going to target Amazon ECS. To switch back to the default context targeting the local environment, we can run the following:
$ docker context use default
For the following commands, we keep ECS context as the current context in use. We can now run a command to check we can successfully access ECS.
$ AWS_ACCESS_KEY="*****" AWS_SECRET_KEY="******" docker compose ls NAME STATUS
Before deploying the application to Amazon ECS, let’s have a look at how to update the Compose file to request GPU access for the training service. This blog post describes a way to define GPU reservations. In the next section, we cover the new format supported in the local compose and the legacy
Define GPU reservation in the Compose file
Tensorflow can make use of NVIDIA GPUs with CUDA compute capabilities to speed up computations. To reserve NVIDIA GPUs, we edit the docker-compose.yaml that we defined previously and add the deploy property under the training service as follows:
... training: image: myhubuser/gpudemo command: python model.py eng-fra volumes: - models:/checkpoints deploy: resources: reservations: memory:32Gb devices: - driver: nvidia count: 2 capabilities: [gpu] ...
For this example we defined a reservation of 2 NVIDIA GPUs and 32GB memory dedicated to the container. We can tweak these parameters according to the resources of the machine we target for deployment. If our local dev machine hosts an NVIDIA GPU, we can tweak the reservation accordingly and deploy the Compose file locally. Ensure you have installed the NVIDIA container runtime and set up the Docker Engine to use it before deploying the Compose file.
We focus in the next part on how to make use of GPU cloud instances to run our sample application.
Note: We assume the image we pushed to Docker Hub is public. If so, there is no need to authenticate in order to pull it (unless we exceed the pull rate limit). For images that need to be kept private, we need to define the x-aws-pull_credentials property with a reference to the credentials to use for authentication. Details on how to set it can be found in the documentation.
Deploy to Amazon ECS
Export the AWS credentials to avoid setting them for every command.
$ export AWS_ACCESS_KEY="*****" $ export AWS_SECRET_KEY="******"
When deploying the Compose file, Docker Compose will also reserve an EC2 instance with GPU capabilities that satisfies the reservation parameters. In the example we provided, we ask to reserve an instance with 32GB and 2 Nvidia GPUs. Docker Compose matches this reservation with the instance that satisfies this requirement. Before setting the reservation property in the Compose file, we recommend to check the Amazon GPU instance types and set your reservation accordingly. Ensure you are targeting an Amazon region that contains such instances.
WARNING: Aside from ECS containers, we will have a `g4dn.12xlarge` EC2 instance reserved. Before deploying to the cloud, check the Amazon documentation for the resource cost this will incur.
To deploy the application, we run the same command as in the local environment.
$ docker compose up [+] Running 29/29 ⠿ gpu CreateComplete 423.0s ⠿ LoadBalancer CreateComplete 152.0s ⠿ ModelsAccessPoint CreateComplete 6.0s ⠿ DefaultNetwork CreateComplete 5.0s ... ⠿ TranslatorService CreateComplete 205.0s ⠿ TrainingService CreateComplete 161.0s
Check the status of the services:
$ docker compose ps NAME SERVICE STATE PORTS task/gpu/3311e295b9954859b4c4576511776593 training Running task/gpu/78e1d482a70e47549237ada1c20cc04d translator Running gpu-LoadBal-6UL1B4L7OZB1-d2f05c385ceb31e2.elb.eu-west-3.amazonaws.com:5000->5000/tcp
Query the exposed translator endpoint. We notice the same behaviour as in the local deployment (the model reload has not been triggered yet by the training service).
$ curl -d "text=hello" gpu-LoadBal-6UL1B4L7OZB1-d2f05c385ceb31e2.elb.eu-west-3.amazonaws.com:5000/ No trained model found / training may be in progress...
Check the logs for the GPU device’s tensorflow detected. We can easily identify the 2 GPU devices we reserved and how the training is almost 10X faster than our CPU-based local training.
$ docker compose logs ... training | 2021-01-08 20:50:51.595796: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: training | pciBusID: 0000:00:1c.0 name: Tesla T4 computeCapability: 7.5 training | coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 14.75GiB deviceMemoryBandwidth: 298.08GiB/s ... training | 2021-01-08 20:50:51.596743: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 1 with properties: training | pciBusID: 0000:00:1d.0 name: Tesla T4 computeCapability: 7.5 training | coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 14.75GiB deviceMemoryBandwidth: 298.08GiB/s ... training | Epoch 1 Batch 300 Loss 1.2269 training | Epoch 1 Loss 1.4794 training | Time taken for 1 epoch 42.98397183418274 sec ... training | Epoch 2 Loss 0.9750 training | Time taken for 1 epoch 35.13995909690857 sec ... training | Epoch 9 Batch 0 Loss 0.1375 ... training | Epoch 9 Loss 0.1558 training | Time taken for 1 epoch 32.444278955459595 sec ... training | Epoch 10 Batch 300 Loss 0.1663 training | Epoch 10 Loss 0.1383 training | Time taken for 1 epoch 35.29659080505371 sec training | Checkpoints saved in /checkpoints/eng-fra training | Requested translator service to reload its model, response status: 200.
The training service runs continuously and triggers the model reload on the translation service every 10 cycles (epochs). Once the translation service has been notified at least once, we can stop and remove the training service and release the GPU instances at any time we choose.
We can easily do this by removing the service from the Compose file:
services: translator: image: myhubuser/gpudemo build: backend volumes: - models:/checkpoints ports: - 5000:5000 volumes: models:
and then run
docker compose up again to update the running application. This will apply the changes and remove the training service.
$ docker compose up [+] Running 0/0 ⠋ gpu UpdateInProgress User Initiated ⠋ LoadBalancer CreateComplete ⠋ ModelsAccessPoint CreateComplete ... ⠋ Cluster CreateComplete ⠋ TranslatorService CreateComplete
We can list the services running to see the training service has been removed and we only have the translator one:
$ docker compose ps NAME SERVICE STATE PORTS task/gpu/78e1d482a70e47549237ada1c20cc04d translator Running gpu-LoadBal-6UL1B4L7OZB1-d2f05c385ceb31e2.elb.eu-west-3.amazonaws.com:5000->5000/tcp
Query the translator:
$ curl -d "text=hello" gpu-LoadBal-6UL1B4L7OZB1-d2f05c385ceb31e2.elb.eu-west-3.amazonaws.com:5000/ salut !
To remove the application from Amazon ECS run:
$ docker compose down
We discussed how to setup a resource-intensive ML application to make it easily deployable in different environments with Docker Compose. We have exercised how to define the use of GPUs in a Compose file and how to deploy it on Amazon ECS.