Depend on Docker for Kubeflow

Run Kubeflow natively on Docker Desktop for Mac or Windows

This is a guest post by Alex Iankoulski, Docker Captain and full stack software and infrastructure architect at Shell New Energies. The views expressed here are his own and are neither opposed or endorsed by Shell or Docker. 

In this blog, I will show you how to use Docker Desktop for Mac or Windows to run Kubeflow. To make this easier, I used my Depend on Docker project, which you can find on Github.

Rationale

Even though we are experiencing a tectonic shift of development workflows in the cloud era towards hosted and remote environments, a substantial amount of work and experimentation still happens on developer’s local machines. The ability to scale down allows us to mimic a cloud deployment locally and enables us to play, learn quickly, and make changes in a safe, isolated environment. A good example of this rationale is provided by Kubeflow and MiniKF.

Overview

Since Kubeflow was first released by Google in 2018, adoption has increased significantly, particularly in the data science world for orchestration of machine learning pipelines. There are various ways to deploy Kubeflow both on desktops and servers as described in its Getting Started guide. However, the desktop deployments for Mac and Windows rely on running virtual machines using Vagrant and VirtualBox. If you do not wish to install Vagrant and VirtualBox on your Mac or PC but would still like to run Kubeflow, then you can simply depend on Docker! This article will show you how to deploy Kubeflow natively on Docker Desktop. 

Setup

Prerequisites

Kubeflow has a hard dependency on Kubernetes and the Docker runtime. The easiest way to satisfy both of these requirements on Mac or Windows is to install Docker Desktop (version 2.1.x.x or higher). In the settings of Docker Desktop, navigate to the Kubernetes tab and check “Enable Kubernetes”:

Enabling kubernetes in docker desktop by selecting it under settings.
Fig. 1 – Kubernetes Settings in Docker Desktop

Enabling the Kubernetes feature in Docker Desktop creates a single node Kubernetes cluster on your local machine.

This article offers a detailed walkthrough of setting up Kubeflow on Docker Desktop for Mac. Deploying Kubeflow on Docker Desktop for Windows using Linux containers requires two additional prerequisites: 

  1. Linux shell – to run the bash commands from the Kubeflow installation instructions 
  2. Kfctl and kubectl CLI – to initialize, generate, and apply the Kubeflow deployment

The easiest way to satisfy both of these dependencies is to run a Linux container that has the kfctl and kubectl utilities. A Depend on Docker project was created for this purpose. To start a bash shell with the two CLI’s available, just execute:

docker run -it --rm -v <kube_config_folder_path>:/root/.kube iankoulski/kfctl bash

The remaining setup steps for both Mac and Windows are the same.

Resource Requirements

The instructions for deployment of Kubeflow on a pre-existing Kubernetes cluster specify the following resource requirements:

  • 4 vCPUs
  • 50 GB storage
  • 12 GB memory

The settings in Docker Desktop need to be adjusted to accommodate these requirements as shown below.

Setting the resource limits available to docker engine in docker desktop.
Fig. 2 – CPU and Memory settings in Docker Desktop
Configuring the disk image size in docker desktop.
Fig. 3 – Disk image size setting in Docker Desktop

Note that the settings are adjusted to more than the minimum required resources to accommodate system containers and other applications that may be running on the local machine.

Deployment

We will follow instructions for the kfctl_k8s_istio configuration.

  1. Download your preferred version from the release archive:
    curl -L -o kfctl_v0.6.2_darwin.tar.gz https://github.com/kubeflow/kubeflow/releases/download/v0.6.2/kfctl_darwin.tar.gz
  2. Extract the archive:
    tar -xvf kfctl_v0.6.2_darwin.tar.gz
  3. Set environment variables:
    export PATH=$PATH:$(pwd)
    export KFAPP=localkf
    export CONFIG=https://raw.githubusercontent.com/kubeflow/kubeflow/v0.6-branch/bootstrap/config/kfctl_k8s_istio.0.6.2.yaml

  4. Initialize deployment:
    kfctl init ${KFAPP} --config=${CONFIG}
    cd ${KFAPP}
    kfctl generate all -V

Note: The above instructions are for Kubeflow release 0.6.2 and are meant to use as an example. Other releases would have slightly different archive filename, environment variable names and values, and kfctl commands. Those would be available in the release-specific deployment instructions.

  1. Pre-pull container images (optional)

To facilitate the deployment of Kubeflow locally, we can pre-pull all required Docker images. When the container images are already present on the machine, the memory usage of Docker Desktop stays low. Pulling all images at the time of deployment may cause large spikes in memory utilization and can cause Docker Daemon to run out of resources. Pre-pulling images is especially helpful when running Kubeflow on a 16GB laptop.

To pre-pull all container images, execute the following one-line script in your $KFAPP/kustomize folder:

for i in $(grep -R image: . | cut -d ':' -f 3,4 | uniq | sed -e 's/ //' -e 's/^"//' -e 's/"$//'); do echo "Pulling $i"; docker pull $i; done;
The cli output when you pre-pull kubeflow container images.
Fig. 4 – Pre-pulling Kubeflow container images

Depending on your Internet connection, this could take several minutes to complete. Even if Docker Desktop runs out of resources, restarting it and running the script again will resume pulling the remaining images from where you left off. 

If you are using the kfctl container on Windows, you may wish to modify the one-line script above so it saves the docker pull commands to a file and then execute them from your preferred Docker shell.

  1. Apply Kubeflow deployment to Kubernetes:
cd ${KFAPP}
kfctl apply all -V
The deployment output and kubeflow pods when running in docker desktop.
Fig. 5 – Deployment output and Kubeflow pods – found by executing ‘kubectl get pods –all-namespaces’ – running in Docker Desktop.

Note: An existing deployment can be removed by executing “kfctl delete all -V”

  1. Determine the Kubeflow entrypoint

To determine the endpoint, list all services in the istio-system namespace:
kubectl get svc -n istio-system

The istio ingress gateway service cli output.
Fig. 6 – Istio Ingress Gateway service.

The Kubeflow end-point service is through the ingress-gateway service on the NodePort connected with the default HTTP port (80). The Node Port number is 31380. To access Kubeflow use: http://127.0.0.1:31380

Using Kubeflow

The Kubeflow central dashboard is now accessible:

The kubeflow central dashboard.
Fig. 7 – Kubeflow dashboard

We can run one of the sample pipelines that is included in Kubeflow. Select Pipelines, then Experiments, and choose Conditional expression (or just click the [Sample] Basic – Conditional expression link on the dashboard screen). 

The kubeflow conditional execution pipeline view.
Fig. 8 – Conditional execution pipeline

Next, click the +Create run button, enter a name (e.g. conditional-execution-test), choose an experiment, and then click Start to initiate the run. Navigate to your pipeline by selecting it from the list of runs.

The kubeflow conditional execution pipeline run.
 Fig. 9 – Conditional execution pipeline run

The completed pipeline run looks similar to Fig. 9 above. Due to the random nature of the coin flip in this pipeline, your actual output is likely to be different. Select a node in the graph to review various assets associated with that node, including its logs.

Conclusion

Docker Desktop enables you to easily run container applications on your local machine, including ones that require a Kubernetes cluster. Kubeflow is a deployment that typically targets larger clusters either in cloud or on-prem environments. In this article we’ve demonstrated how to deploy and use Kubeflow locally on your Docker Desktop. 

References

  1. Docker Desktop
  2. About Kubeflow
  3. MiniKF Rationale
  4. Kubernetes
  5. Kubeflow Getting Started
  6. Vagrant
  7. Virtual Box
  8. Kubeflow deployment instructions
  9. Depend on Docker project
  10. Kfctl container image

Credits

I’d like to thank the following people for their help with this post and related topics:

  • Yannis Zarkadas, Arrikto 
  • Constantinos Venetsanopoulos, Arrikto
  • Josh Bottum, Arrikto
  • Fabio Nonato de Paula, Shell
  • Jenny Burcio, Docker
  • David Aronchick, Microsoft
  • Stephen Turner, Docker
  • David Friedlander, Docker

To learn more about Docker Desktop and running Kubernetes with Docker:


Feedback

0 thoughts on "Depend on Docker for Kubeflow"