Docker Model Runner on the new NVIDIA DGX Spark: a new paradigm for developing AI locally

投稿日 10月 13, 2025

We’re thrilled to bring NVIDIA DGX™ Spark support to Docker Model Runner. The new NVIDIA DGX Spark delivers incredible performance, and Docker Model Runner makes it accessible. With Model Runner, you can easily run and iterate on larger models right on your local machine, using the same intuitive Docker experience you already trust.

In this post, we’ll show how DGX Spark and Docker Model Runner work together to make local model development faster and simpler, covering the unboxing experience, how to set up Model Runner, and how to use it in real-world developer workflows.

What is NVIDIA DGX Spark

NVIDIA DGX Spark is the newest member of the DGX family: a compact, workstation-class AI system, powered by the Grace Blackwell GB10 Superchip  that delivers incredible  performance for local model development. Designed for researchers and developers, it makes prototyping, fine-tuning, and serving large models fast and effortless, all without relying on the cloud.

Here at Docker, we were fortunate to get a preproduction version of  DGX Spark. And yes, it’s every bit as impressive in person as it looks in NVIDIA’s launch materials.

Why Run Local AI Models and How Docker Model Runner and NVIDIA DGX Spark Make It Easy 

Many of us at Docker and across the broader developer community are experimenting with local AI models. Running locally has clear advantages:

  • Data privacy and control: no external API calls; everything stays on your machine
  • Offline availability: work from anywhere, even when you’re disconnected
  •  Ease of customization: experiment with prompts, adapters, or fine-tuned variants without relying on remote infrastructure

But there are also familiar tradeoffs:

  • Local GPUs and memory can be limiting for large models
  • Setting up CUDA, runtimes, and dependencies often eats time
  • Managing security and isolation for AI workloads can be complex

This is where DGX Spark and Docker Model Runner (DMR) shine. DMR provides an easy and secure way to run AI models in a sandboxed environment, fully integrated with Docker Desktop or Docker Engine. When combined with the DGX Spark’s NVIDIA AI software stack and large 128GB unified memory, you get the best of both worlds: plug-and-play GPU acceleration and Docker-level simplicity.

Unboxing NVIDIA DGX Spark

The device arrived well-packaged, sleek, and surprisingly small, resembling more a mini-workstation than a server.

Setup was refreshingly straightforward: plug in power, network, and peripherals, then boot into NVIDIA DGX OS, which includes NVIDIA drivers, CUDA, and AI software stack pre-installed.

Nividia 1

Once on the network, enabling SSH access makes it easy to integrate the Spark into your existing workflow.

This way, the DGX Spark becomes an AI co-processor for your everyday development environment, augmenting, not replacing, your primary machine.

Getting Started with Docker Model Runner on NVIDIA DGX Spark

Installing Docker Model Runner on the DGX Spark is simple and can be done in a matter of minutes.

1. Verify Docker CE is Installed

DGX OS comes with Docker Engine (CE) preinstalled. Confirm you have it:

docker version

If it’s missing or outdated, install following the regular Ubuntu installation instructions.

2. Install the Docker Model CLI Plugin

The Model Runner CLI is distributed as a Debian package via Docker’s apt repository. Once the repository is configured (see linked instructions above) install via the following commands:

sudo apt-get update
sudo apt-get apt-get install docker-model-plugin

Or use Docker’s handy installation script:

curl -fsSL https://get.docker.com | sudo bash

You can confirm it’s installed with:

docker model version

3. Pull and Run a Model

Now that the plugin is installed, let’s pull a model from the Docker Hub AI Catalog. For example, the Qwen 3 Coder model:

docker model pull ai/qwen3-coder

The Model Runner container will automatically expose an OpenAI-compatible endpoint at:

http://localhost:12434/engines/v1

You can verify it’s live with a quick test:

# Test via API

curl http://localhost:12434/engines/v1/chat/completions   -H 'Content-Type: application/json'   -d 
'{"model":"ai/qwen3-coder","messages":[{"role":"user","content":"Hello!"}]}'

# Or via CLI
docker model run ai/qwen3-coder

GPUs are allocated to the Model Runner container via nvidia-container-runtime and the Model Runner will take advantage of any available GPUs automatically. To see GPU usage:

nvidia-smi

4. Architecture Overview

Here’s what’s happening under the hood:

[ DGX Spark Hardware (GPU + Grace CPU) ]

             │

     (NVIDIA Container Runtime)

             │

     [ Docker Engine (CE) ]

             │

     [ Docker Model Runner Container ]

             │

     OpenAI-compatible API :12434

The NVIDIA Container Runtime bridges the NVIDIA GB10 Grace Blackwell Superchip drivers and Docker Engine, so containers can access CUDA directly. Docker Model Runner then runs inside its own container, managing the model lifecycle and providing the standard OpenAI API endpoint. (For more info on Model Runner architecture, see this blog).

From a developer’s perspective, interact with models similarly to any other Dockerized service — docker model pull, list, inspect, and run all work out of the box.

Using Local Models in Your Daily Workflows

If you’re using a laptop or desktop as your primary machine, the DGX Spark can act as your remote model host. With a few SSH tunnels, you can both access the Model Runner API and monitor GPU utilization via the DGX dashboard, all from your local workstation.

1. Forward the DMR Port (for Model Access)

To access the DGX Spark via SSH first set up an SSH server:

Using Local Models in Your Daily Workflows
If you’re using a laptop or desktop as your primary machine, the DGX Spark can act as your remote model host. With a few SSH tunnels, you can both access the Model Runner API and monitor GPU utilization via the DGX dashboard, all from your local workstation.

sudo apt install openssh-server
sudo systemctl enable --now ssh

Run the following command to access Model Runner via your local machine. Replace user with the username you configured when you first booted the DGX Spark and replace dgx-spark.local with the IP address of the DGX Spark on your local network or a hostname configured in /etc/hosts. 

ssh -N -L localhost:12435:localhost:12434 user@dgx-spark.local


This forwards the Model Runner API from the DGX Spark to your local machine.
Now, in your IDE, CLI tool, or app that expects an OpenAI-compatible API, just point it to:

http://localhost:12435/engines/v1/models

Set the model name (e.g. ai/qwen3-coder) and you’re ready to use local inference seamlessly.

2. Forward the DGX Dashboard Port (for Monitoring)

The DGX Spark exposes a lightweight browser dashboard showing real-time GPU, memory, and thermal stats, usually served locally at:

http://localhost:11000

You can forward it through the same SSH session or a separate one:

ssh -N -L localhost:11000:localhost:11000 user@dgx-spark.local

Then open http://localhost:11000 in your browser on your main workstation to monitor the DGX Spark performance while running your models.

Nividia 2



This combination makes the DGX Spark feel like a remote, GPU-powered extension of your development environment. Your IDE or tools still live on your laptop, while model execution and resource-heavy workloads happen securely on the Spark.

Example application: Configuring Opencode with Qwen3-Coder


Let’s make this concrete.

Suppose you use OpenCode, an open-source, terminal-based AI coding agent.

Once your DGX Spark is running Docker Model Runner with ai/qwen3-coder pulled and the port is forwarded, you can configure OpenCode by adding the following to ~/.config/opencode/opencode.json

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "dmr": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Docker Model Runner",
      "options": {
        "baseURL": "http://localhost:12435/engines/v1"   // DMR’s OpenAI-compatible base
      },
      "models": {
        "ai/qwen3-coder": { "name": "Qwen3 Coder" }
      }
    }
  },
  "model": "ai/qwen3-coder"
}


Now run opencode and select Qwen3 Coder with the /models command:

Nividia 3


That’s it! Completions and chat requests will be routed through Docker Model Runner on your DGX Spark, meaning Qwen3-Coder now powers your agentic development experience locally.

Nividia 4 1


You can verify that the model is running by opening http://localhost:11000 (the DGX dashboard) to watch GPU utilization in real time while coding.
This setup lets you:

  • Keep your laptop light while leveraging the DGX Spark GPUs
  • Experiment with custom or fine-tuned models through DMR
  • Stay fully within your local environment for privacy and cost-control

概要

Running Docker Model Runner on the NVIDIA DGX Spark makes it remarkably easy to turn powerful local hardware into a seamless extension of your everyday Docker workflow.

You install one plugin and use familiar Docker commands (docker model pull, docker model run).
You get full GPU acceleration through NVIDIA’s container runtime.
You can forward both the model API and monitoring dashboard to your main workstation for effortless development and visibility.

This setup bridges the gap between developer productivity and AI infrastructure, giving you the speed, privacy, and flexibility of local execution with the reliability and simplicity Docker provides.

As local model workloads continue to grow, the DGX Spark + Docker Model Runner combo represents a practical, developer-friendly way to bring serious AI compute to your desk — no data center or cloud dependency required.

Learn more:

  • Read the official announcement of DGX Spark launch on NVIDIA newsroom
  • Check out the Docker Model Runner General Availability announcement
  • Visit our Model Runner GitHub repo. Docker Model Runner is open-source, and we welcome collaboration and contributions from the community! Star, fork and contribute.

投稿カテゴリ

関連記事