How to Deploy GPU-Accelerated Applications on Amazon ECS with Docker Compose


Feb 16 2021

Many applications can take advantage of GPU acceleration, in particular resource-intensive Machine Learning (ML) applications. The development time of such applications may vary based on the hardware of the machine we use for development. Containerization will facilitate development due to reproducibility and will make the setup easily transferable to other machines. Most importantly, a containerized application is easily deployable to platforms such as Amazon ECS, where it can take advantage of different hardware configurations.

In this tutorial, we discuss how to develop GPU-accelerated applications in containers locally and how to use Docker Compose to easily deploy them to the cloud (the Amazon ECS platform). We make the transition from the local environment to a cloud effortless, the GPU-accelerated application being packaged with all its dependencies in a Docker image, and deployed in the same way regardless of the target environment.

Requirements

In order to follow this tutorial, we need the following tools installed locally:

For deploying to a cloud platform, we rely on the new Docker Compose implementation embedded into the Docker CLI binary. Therefore, when targeting a cloud platform we are going to run docker compose commands instead of docker-compose. For local commands, both implementations of Docker Compose should work. If you find a missing feature that you use, report it on the issue tracker.

Sample application

Keep in mind that what we want to showcase is how to structure and manage a GPU accelerated application with Docker Compose, and how we can deploy it to the cloud. We do not focus on GPU programming or the AI/ML algorithms, but rather on how to structure and containerize such an application to facilitate portability, sharing and deployment.

For this tutorial, we rely on sample code provided in the Tensorflow documentation, to simulate a GPU-accelerated translation service that we can orchestrate with Docker Compose. The original code can be found documented at  https://www.tensorflow.org/tutorials/text/nmt_with_attention. For this exercise, we have reorganized the code such that we can easily manage it with Docker Compose.

This sample uses the Tensorflow platform which can automatically use GPU devices if available on the host. Next, we will discuss how to organize this sample in services to containerize them easily and what the challenges are when we locally run such a resource-intensive application.

Note: The sample code to use throughout this tutorial can be found here. It needs to be downloaded locally to exercise the commands we are going to discuss.

1. Local environment

Let’s assume we want to build and deploy a service that can translate simple sentences to a language of our choice. For such a service, we need to train an ML model to translate from one language to another and then use this model to translate new inputs. 

Application setup

We choose to separate the phases of the ML process in two different Compose services:

  • A training service that trains a model to translate between two languages (includes the data gathering, preprocessing and all the necessary steps before the actual training process).
  • A translation service that loads a model and uses it to `translate` a sentence.

This structure is defined in the docker-compose.dev.yaml from the downloaded sample application which has the following content:

docker-compose.yml

services:

 training:
   build: backend
   command: python model.py
   volumes:
     - models:/checkpoints

 translator:
   build: backend
   volumes:
     - models:/checkpoints
   ports:
     - 5000:5000

volumes:
 models:

We want the training service to train a model to translate from English to French and to save this model to a named volume models that is shared between the two services. The translator service has a published port to allow us to query it easily.

Deploy locally with Docker Compose

The reason for starting with the simplified compose file is that it can be deployed locally whether a GPU is present or not. We will see later how to add the GPU resource reservation to it.

Before deploying, rename the docker-compose.dev.yaml to docker-compose.yaml to avoid setting the file path with the flag -f for every compose command.

To deploy the Compose file, all we need to do is open a terminal, go to its base directory and run:

$ docker compose up
The new 'docker compose' command is currently experimental.
To provide feedback or request new features please open
issues at https://github.com/docker/compose-cli
[+] Running 4/0
 ⠿ Network "gpu_default"  Created                               0.0s
 ⠿ Volume "gpu_models"    Created                               0.0s
 ⠿ gpu_translator_1       Created                               0.0s
 ⠿ gpu_training_1         Created                               0.0s
Attaching to gpu_training_1, gpu_translator_1
...
translator_1  |  * Running on http://0.0.0.0:5000/ (Press CTRL+C
to quit)
...
HTTP/1.1" 200 -
training_1    | Epoch 1 Batch 0 Loss 3.3540
training_1    | Epoch 1 Batch 100 Loss 1.6044
training_1    | Epoch 1 Batch 200 Loss 1.3441
training_1    | Epoch 1 Batch 300 Loss 1.1679
training_1    | Epoch 1 Loss 1.4679
training_1    | Time taken for 1 epoch 218.06381964683533 sec
training_1    | 
training_1    | Epoch 2 Batch 0 Loss 0.9957
training_1    | Epoch 2 Batch 100 Loss 1.0288
training_1    | Epoch 2 Batch 200 Loss 0.8737
training_1    | Epoch 2 Batch 300 Loss 0.8971
training_1    | Epoch 2 Loss 0.9668
training_1    | Time taken for 1 epoch 211.0763041973114 sec
...
training_1    | Checkpoints saved in /checkpoints/eng-fra
training_1    | Requested translator service to reload its model,
response status: 200
translator_1  | 172.22.0.2 - - [18/Dec/2020 10:23:46] 
"GET /reload?lang=eng-fra 

Docker Compose deploys a container for each service and attaches us to their logs which allows us to follow the progress of the training service.

Every 10 cycles (epochs), the training service requests the translator to reload its model from the last checkpoint. If the translator is queried before the first training phase (10 cycles) is completed, we should get the following message. 

$ curl -d "text=hello" localhost:5000/
No trained model found / training may be in progress...

From the logs, we can see that each training cycle is resource-intensive and may take very long (depending on parameter setup in the ML algorithm).

The training service runs continuously and checkpoints the model periodically to a named volume shared between the two services. 

$ docker ps -a
CONTAINER ID   IMAGE            COMMAND                  CREATED          STATUS                     PORTS                    NAMES
f11fc947a90a   gpu_training     "python model.py"        14 minutes ago   Up 54 minutes                   gpu_training_1                           
baf147fbdf18   gpu_translator   "/bin/bash -c 'pytho..." 14 minutes ago   Up 54 minutes              0.0.0.0:5000->5000/tcp   gpu_translator_1

We can now query the translator service which uses the trained model:

$ $ curl -d "text=hello" localhost:5000/
salut !
$ curl -d "text=I want a vacation" localhost:5000/
je veux une autre . 
$ curl -d "text=I am a student" localhost:5000/
je suis etudiant .

Keep in mind that, for this exercise, we are not concerned about the accuracy of the translation but how to set up the entire process following a service approach that will make it easy to deploy with Docker Compose.

During development, we may have to re-run the training process and evaluate it each time we tweak the algorithm. This is a very time consuming task if we do not use development machines built for high performance.

An alternative is to use on-demand cloud resources. For example, we could use cloud instances hosting GPU devices to run the resource-intensive components of our application. Running our sample application on a machine with access to a GPU will automatically switch to train the model on the GPU. This will speed up the process and significantly reduce the development time.

The first step to deploy this application to some faster cloud instances is to pack it as a Docker image and push it to Docker Hub, from where we can access it from cloud instances.

Build and Push images to Docker Hub

During the deployment with compose up, the application is packed as a Docker image which is then used to create the containers. We need to tag the built images and push them to Docker Hub.

 A simple way to do this is by setting the image property for services in the Compose file. Previously, we had only set the build property for our services, however we had no image defined. Docker Compose requires at least one of these properties to be defined in order to deploy the application.

We set the image property following the pattern <account>/<name>:<tag> where the tag is optional (default to ‘latest’). We take for example a Docker Hub account ID myhubuser and the application name gpudemo. Edit the compose file and set the image property for the two services as below:

docker-compose.yml

services:

 training:
   image: myhubuser/gpudemo
   build: backend
   command: python model.py
   volumes:
     - models:/checkpoints

 translator:
   image: myhubuser/gpudemo
   build: backend
   volumes:
     - models:/checkpoints
   ports:
     - 5000:5000

volumes:
 models:

To build the images run:

$ docker compose build
The new 'docker compose' command is currently experimental. To
provide feedback or request new features please open issues
 at https://github.com/docker/compose-cli
[+] Building 1.0s (10/10) FINISHED 
 => [internal] load build definition from Dockerfile
0.0s 
=> => transferring dockerfile: 206B
...
 => exporting to image
0.8s 
 => => exporting layers    
0.8s  
 => => writing image sha256:b53b564ee0f1986f6a9108b2df0d810f28bfb209
4743d8564f2667066acf3d1f
0.0s
 => => naming to docker.io/myhubuser/gpudemo

$ docker images | grep gpudemo
myhubuser/gpudemo  latest   b53b564ee0f1   2 minutes ago 
  5.83GB   

Notice the image has been named according to what we set in the Compose file.

Before pushing this image to Docker Hub, we need to make sure we are logged in. For this we run:

$ docker login
...
Login Succeeded

Push the image we built:

$ docker compose push
Pushing training (myhubuser/gpudemo:latest)...
The push refers to repository [docker.io/myhubuser/gpudemo]
c765bf51c513: Pushed
9ccf81c8f6e0: Layer already exists
...
latest: digest: sha256:c40a3ca7388d5f322a23408e06bddf14b7242f9baf7fb
e7201944780a028df76 size: 4306

The image pushed is public unless we set it to private in Docker Hub’s repository settings. The Docker documentation covers this in more detail.

With the image stored in a public image registry, we will look now at how we can use it to deploy our application on Amazon ECS and how we can use GPUs to accelerate it.

2. Deploy to Amazon ECS for GPU-acceleration

To deploy the application to Amazon ECS, we need to have credentials for accessing an AWS account and to have Docker CLI set to target the platform.

Let’s assume we have a valid set of AWS credentials that we can use to connect to AWS services.  We need now to create an ECS Docker context to redirect all Docker CLI commands to Amazon ECS.

Create an ECS context

To create an ECS context run the following command:

$ docker context create ecs cloud
? Create a Docker context using:  [Use arrows to move, type
to filter]
> AWS environment variables 
  An existing AWS profile
  A new AWS profile

This prompts users with 3 options, depending on their familiarity with the AWS credentials setup.

For this exercise, to skip the details of  AWS credential setup, we choose the first option. This requires us to have the AWS_ACCESS_KEY and AWS_SECRET_KEY set in our environment,  when running Docker commands that target Amazon ECS.

We can now run Docker commands and set the context flag for all commands targeting the platform, or we can switch it to be the context in use to avoid setting the flag on each command.

Set Docker CLI to target ECS

Set the context we created previously as the context in use by running:

$ docker context use cloud

$ docker context ls
NAME                TYPE                DESCRIPTION                               DOCKER ENDPOINT               KUBERNETES ENDPOINT   ORCHESTRATOR
default             moby                Current DOCKER_HOST based configuration   unix:///var/run/docker.sock                         swarm
cloud *             ecs                 credentials read from environment

Starting from here, all the subsequent Docker commands are going to target Amazon ECS. To switch back to the default context targeting the local environment, we can run the following:

$ docker context use default

For the following commands, we keep ECS context as the current context in use. We can now run a command to check we can successfully access ECS.

$ AWS_ACCESS_KEY="*****" AWS_SECRET_KEY="******" docker compose ls
NAME                                STATUS 

Before deploying the application to Amazon ECS, let’s have a look at how to update the Compose file to request GPU access for the training service. This blog post describes a way to define GPU reservations. In the next section, we cover the new format supported in the local compose and the legacy docker-compose.

Define GPU reservation in the Compose file

Tensorflow can make use of NVIDIA GPUs with CUDA compute capabilities to speed up computations. To reserve NVIDIA GPUs,  we edit the docker-compose.yaml  that we defined previously and add the deploy property under the training service as follows:

...
 training:
   image: myhubuser/gpudemo
   command: python model.py eng-fra
   volumes:
     - models:/checkpoints
   deploy:
     resources:
       reservations:
         memory:32Gb
         devices:
         - driver: nvidia
           count: 2
           capabilities: [gpu]
...

For this example we defined a reservation of 2 NVIDIA GPUs and 32GB memory dedicated to the container. We can tweak these parameters according to the resources of the machine we target for deployment. If our local dev machine hosts an NVIDIA GPU, we can tweak the reservation accordingly and deploy the Compose file locally.  Ensure you have installed the NVIDIA container runtime and set up the Docker Engine to use it before deploying the Compose file.

We focus in the next part on how to make use of GPU cloud instances to run our sample application.

Note: We assume the image we pushed to Docker Hub is public. If so, there is no need to authenticate in order to pull it (unless we exceed the pull rate limit). For images that need to be kept private, we need to define the x-aws-pull_credentials property with a reference to the credentials to use for authentication. Details on how to set it can be found in the documentation.

Deploy to Amazon ECS

Export the AWS credentials to avoid setting them for every command.

$ export AWS_ACCESS_KEY="*****" 
$ export AWS_SECRET_KEY="******"

When deploying the Compose file, Docker Compose will also reserve an EC2 instance with GPU capabilities that satisfies the reservation parameters. In the example we provided, we ask to reserve an instance with 32GB and 2 Nvidia GPUs. Docker Compose matches this reservation with the instance that satisfies this requirement. Before setting the reservation property in the Compose file, we recommend to check the Amazon GPU instance types and set your reservation accordingly. Ensure you are targeting an Amazon region that contains such instances.

WARNING: Aside from ECS containers, we will have a `g4dn.12xlarge` EC2 instance reserved. Before deploying to the cloud, check the Amazon documentation for the resource cost this will incur.

To deploy the application, we run the same command as in the local environment.

$ docker compose up     
[+] Running 29/29
 ⠿ gpu                 CreateComplete          423.0s  
 ⠿ LoadBalancer        CreateComplete          152.0s
 ⠿ ModelsAccessPoint   CreateComplete            6.0s
 ⠿ DefaultNetwork      CreateComplete            5.0s
...
 ⠿ TranslatorService   CreateComplete          205.0s
 ⠿ TrainingService     CreateComplete          161.0s

Check the status of the services:

$ docker compose ps
NAME                                        SERVICE             STATE               PORTS
task/gpu/3311e295b9954859b4c4576511776593   training            Running             
task/gpu/78e1d482a70e47549237ada1c20cc04d   translator          Running             gpu-LoadBal-6UL1B4L7OZB1-d2f05c385ceb31e2.elb.eu-west-3.amazonaws.com:5000->5000/tcp

Query the exposed translator endpoint. We notice the same behaviour as in the local deployment (the model reload has not been triggered yet by the training service).

$ curl -d "text=hello" gpu-LoadBal-6UL1B4L7OZB1-d2f05c385ceb31e2.elb.eu-west-3.amazonaws.com:5000/
No trained model found / training may be in progress...

Check the logs for the GPU device’s tensorflow detected. We can easily identify the 2 GPU devices we reserved and how the training is almost 10X faster than our CPU-based local training.

$ docker compose logs
...
training    | 2021-01-08 20:50:51.595796: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
training    | pciBusID: 0000:00:1c.0 name: Tesla T4 computeCapability: 7.5
training    | coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 14.75GiB deviceMemoryBandwidth: 298.08GiB/s
...
training    | 2021-01-08 20:50:51.596743: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 1 with properties: 
training    | pciBusID: 0000:00:1d.0 name: Tesla T4 computeCapability: 7.5
training    | coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 14.75GiB deviceMemoryBandwidth: 298.08GiB/s
...

training      | Epoch 1 Batch 300 Loss 1.2269
training      | Epoch 1 Loss 1.4794
training      | Time taken for 1 epoch 42.98397183418274 sec
...
training      | Epoch 2 Loss 0.9750
training      | Time taken for 1 epoch 35.13995909690857 sec
...
training      | Epoch 9 Batch 0 Loss 0.1375
...
training      | Epoch 9 Loss 0.1558
training      | Time taken for 1 epoch 32.444278955459595 sec
...
training      | Epoch 10 Batch 300 Loss 0.1663
training      | Epoch 10 Loss 0.1383
training      | Time taken for 1 epoch 35.29659080505371 sec
training      | Checkpoints saved in /checkpoints/eng-fra
training      | Requested translator service to reload its model, response status: 200.

The training service runs continuously and triggers the model reload on the translation service every 10 cycles (epochs). Once the translation service has been notified at least once, we can stop and remove the training service and release the GPU instances at any time we choose. 

We can easily do this by removing the service from the Compose file:

services:
 translator:
   image: myhubuser/gpudemo
   build: backend
   volumes:
     - models:/checkpoints
   ports:
     - 5000:5000
volumes:
 models:

and then run docker compose up again to update the running application. This will apply the changes and remove the training service.

$ docker compose up      
[+] Running 0/0
 ⠋ gpu                  UpdateInProgress User Initiated    
 ⠋ LoadBalancer         CreateComplete      
 ⠋ ModelsAccessPoint    CreateComplete     
...
 ⠋ Cluster              CreateComplete     
 ⠋ TranslatorService    CreateComplete   

We can list the services running to see the training service has been removed and we only have the translator one:

$ docker compose ps
NAME                                        SERVICE             STATE               PORTS
task/gpu/78e1d482a70e47549237ada1c20cc04d   translator          Running             gpu-LoadBal-6UL1B4L7OZB1-d2f05c385ceb31e2.elb.eu-west-3.amazonaws.com:5000->5000/tcp

Query the translator:

$ curl -d "text=hello" gpu-LoadBal-6UL1B4L7OZB1-d2f05c385ceb31e2.elb.eu-west-3.amazonaws.com:5000/
salut ! 

To remove the application from Amazon ECS run:

$ docker compose down

Summary

We discussed how to setup a resource-intensive ML application to make it easily deployable in different environments with Docker Compose. We have exercised how to define the use of GPUs in a Compose file and how to deploy it on Amazon ECS.

Resources: