Fine-Tuning Local Models with Docker Offload and Unsloth

Posted Oct 2, 2025

I’ve been experimenting with local models for a while now, and the progress in making them accessible has been exciting. Initial experiences are often fantastic, many models, like Gemma 3 270M, are lightweight enough to run on common hardware. This potential for broad deployment is a major draw.

However, as I’ve tried to build meaningful, specialized applications with these smaller models, I’ve consistently encountered challenges in achieving the necessary performance for complex tasks. For instance, in a recent experiment testing the tool-calling efficiency of various models, we observed that many local models (and even several remote ones) struggled to meet the required performance benchmarks. This realization prompted a shift in my strategy.

I’ve come to appreciate that simply relying on small, general-purpose models is often insufficient for achieving truly effective results on specific, demanding tasks. Even larger models can require significant effort to reach acceptable levels of performance and efficiency.

And yet, the potential of local models is too compelling to set aside. The advantages are significant:

  • Privacy
  • Offline capabilities
  • No token usage costs
  • No more “overloaded” error messages

So I started looking for alternatives, and that’s when I came across Unsloth, a project designed to make fine-tuning models much faster and more accessible. Its growing popularity (star history) made me curious enough to give it a try.

In this post, I’ll walk you through fine-tuning a sub-1GB model to redact sensitive info without breaking your Python setup. With Docker Offload and Unsloth, you can go from a baseline model to a portable, shareable GGUF artifact on Docker Hub in less than 30 minutes. In part 2 of this post, I will share the detailed steps of fine-tuning the model. 

Challenges of fine-tuning models

Setting up the right environment to fine-tune models can be… painful. It’s fragile, error-prone, and honestly a little scary at times. I always seem to break my Python environment one way or another, and I lose hours just wrestling with dependencies and runtime versions before I can even start training.

Fortunately, the folks at Unsloth solved this with a ready-to-use Docker image. Instead of wasting time (and patience) setting everything up, I can just run a container and get started immediately.

Of course, there’s still the hardware requirement. I work on a MacBook Pro, and Unsloth doesn’t support MacBooks natively, so normally, that would be a deal-breaker.

But here’s where Docker Offload comes in. With Offload, I can spin up GPU-backed resources in the cloud and tap into NVIDIA acceleration, all while keeping my local workflow. That means I now have everything I need to fine-tune models, without fighting my laptop.

Let’s go for it.

How to fine-tune models locally with Unsloth and Docker

Can a model smaller than 1GB reliably mask personally identifiable information (PII)?

Here’s the test input:

This is an example of text that contains some data. 
The author of this text is Ignacio López Luna, but everybody calls him Ignasi.
His ID number is 123456789.
He has a son named Arnau López, who was born on 21-07-2021.

Desired output:

This is an example of text that contains some data. 
The author of this text is [MASKED] [MASKED], but everybody calls him [MASKED].
His ID number is [MASKED].
He has a son named [MASKED], who was born on [MASKED].

When tested with Gemma 3 270M using Docker Model Runner, the output was:

[PERSON]

Clearly, not usable. Time to fine-tune.

Step 1: Clone the example project

​​git clone https://github.com/ilopezluna/fine-tuning-examples.git
cd fine-tuning-examples/pii-masking

The project contains a ready to use python script to fine tune Gemma 3 using the pii-masking-400k dataset from ai4privacy.

Step 2: Start Docker Offload (with GPU)

docker offload start
  • Select your account.
  • Answer Yes when asked about GPU support (you’ll get an NVIDIA L4-backed instance).

Check status:

docker offload status

See the Docker Offload Quickstart guide.

Step 3: Run the Unsloth container

The official Unsloth image includes Jupyter and some example notebooks. You can start it like this:

docker run -d -e JUPYTER_PORT=8000 \
  -e JUPYTER_PASSWORD="mypassword" \
  -e USER_PASSWORD="unsloth2024" \
  -p 8000:8000 \
  -v $(pwd):/workspace/work \
  --gpus all \
  unsloth/unsloth

Now, let’s attach a shell to the container: 

 docker exec -it $(docker ps -q) bash

Useful paths inside the container:

  • /workspace/unsloth-notebooks/ → example fine-tuning notebooks
  • /workspace/work/ → your mounted working directory

Thanks to Docker Offload (with Mutagen under the hood), the folder /workspace/work/ stays in sync between cloud GPU and local dev machine.

Step 4: Fine-tune

The script finetune.py is a small training loop built around Unsloth. Its purpose is to take a base language model and adapt it to a new task using supervised fine-tuning with LoRA. In this example, the model is trained on a dataset that teaches it how to mask personally identifiable information (PII) in text.

LoRA makes the process lightweight: instead of updating all of the model’s parameters, it adds small adapter layers and only trains those. That means the fine-tune runs quickly, fits on a single GPU, and produces a compact set of weights you can later merge back into the base model.

When you run:

unsloth@46b6d7d46c1a:/workspace$ cd work
unsloth@46b6d7d46c1a:/workspace/work$ python finetune.py
Unsloth: Will patch your computer to enable 2x faster free finetuning.
[...]

The script loads the base model, prepares the dataset, runs a short supervised fine-tuning pass, and saves the resulting LoRA weights into your mounted /workspace/work/ folder. Thanks to Docker Offload, those results are also synced back to your local machine automatically.

The whole training run is designed to complete in under 20 minutes on a modern GPU, leaving you with a model that has “learned” the new masking behavior and is ready for conversion in the next step.

For a deeper walkthrough of how the dataset is built, why it’s important and how LoRA is configured, stay tuned for part 2 of this blog!  

Step 5: Convert to GGUF

At this point you’ll have the fine-tuned model artifacts sitting under /workspace/work/.

To package the model for Docker Hub and Docker Model Runner usage, it must be in GGUF format. (Unsloth will support this directly soon, but for now we convert manually.)

unsloth@1b9b5b5cfd49:/workspace/work$ cd ..
unsloth@1b9b5b5cfd49:/workspace$ git clone https://github.com/ggml-org/llama.cpp
Cloning into 'llama.cpp'...
[...]
Resolving deltas: 100% (45613/45613), done.
unsloth@1b9b5b5cfd49:/workspace$ python ./llama.cpp/convert_hf_to_gguf.py work/result/ --outfile work/result.gguf
[...]
INFO:hf-to-gguf:Model successfully exported to work/result.gguf

Next, check that the file already exists locally (this indicates the automatic Mutagen-powered file sync process did finish):

unsloth@46b6d7d46c1a:/workspace$ exit
exit
((.env3.12) ) ilopezluna@localhost pii-masking % ls -alh result.gguf
-rw-r--r--@ 1 ilopezluna  staff   518M Sep 23 15:58 result.gguf

At this point, you can stop Docker Offload:

docker offload stop

Step 6: Package and share on Docker Hub

Now let’s package the fine-tuned model and push it to Docker Hub:

((.env3.12) ) ilopezluna@localhost pii-masking % docker model package --gguf /Users/ilopezluna/Projects/fine-tuning-examples/pii-masking/result.gguf ignaciolopezluna020/my-awesome-model:version1 --push
Adding GGUF file from "/Users/ilopezluna/Projects/fine-tuning-examples/pii-masking/result.gguf"
Pushing model to registry...
Uploaded: 517.69 MB
Model pushed successfully

You can find more details on distributing models in the Docker blog on packaging models.

Step 7: Try the results!

Finally, run the fine-tuned model using Docker Model Runner:

docker model run ignaciolopezluna020/my-awesome-model:version1 "Mask all PII in the following text. Replace each entity with the exact UPPERCASE label in square brackets (e.g., [PERSON], [EMAIL], [PHONE], [USERNAME], [ADDRESS], [CREDIT_CARD], [TIME], etc.). Preserve all non-PII text, whitespace, ' ' and punctuation exactly. Return ONLY the redacted text. Text: This is an example of text that contains some data. The author of this text is Ignacio López Luna, but everybody calls him Ignasi. His ID number is 123456789. He has a son named Arnau López, who was born on 21-07-2021"
This is an example of text that contains some data. The author of this text is [GIVENNAME_1] [SURNAME_1], but everybody calls him [GIVENNAME_1]. His ID number is [IDCARDNUM_1]. He has a son named [GIVENNAME_1] [SURNAME_1], who was born on [DATEOFBIRTH_1]

Just compare with the original Gemma 3 270M output:

((.env3.12) ) ilopezluna@F2D5QD4D6C pii-masking % docker model run ai/gemma3:270M-F16 "Mask all PII in the following text. Replace each entity with the exact UPPERCASE label in square brackets (e.g., [PERSON], [EMAIL], [PHONE], [USERNAME], [ADDRESS], [CREDIT_CARD], [TIME], etc.). Preserve all non-PII text, whitespace, ' ' and punctuation exactly. Return ONLY the redacted text. Text: This is an example of text that contains some data. The author of this text is Ignacio López Luna, but everybody calls him Ignasi. His ID number is 123456789. He has a son named Arnau López, who was born on 21-07-2021"
[PERSON]

The fine tuned model is far more useful, and now it’s already published on Docker Hub for anyone to try.

Why fine-tuning models with Docker matters

This experiment shows that small local models don’t have to stay as “toys” or curiosities. With the right tooling, they can become practical, specialized assistants for real-world problems.

  • Speed: Fine-tuning a sub-1GB model took under 20 minutes with Unsloth and Docker Offload. That’s fast enough for iteration and experimentation.
  • Accessibility: Even on a machine without a GPU, Docker Offload unlocked GPU-backed training without extra hardware.
  • Portability: Once packaged, the model is easy to share, and runs anywhere thanks to Docker.
  • Utility: Instead of producing vague or useless answers, the fine-tuned model reliably performs one job, masking PII, something that could be immediately valuable in many workflows.

This is the power of fine-tuning models: turning small, general-purpose models into focused, reliable tools. And with Docker’s ecosystem, you don’t need to be an ML researcher with a huge workstation to make it happen. You can train, test, package, and share, all with familiar Docker workflows.

So next time you think “small models aren’t useful”, remember, with a bit of fine-tuning, they absolutely can be.

This takes small local models from “interesting demo” to practical, usable tools.

We’re building this together!

Docker Model Runner is a community-friendly project at its core, and its future is shaped by contributors like you. If you find this tool useful, please head over to our GitHub repository. Show your support by giving us a star, fork the project to experiment with your own ideas, and contribute. Whether it’s improving documentation, fixing a bug, or a new feature, every contribution helps. Let’s build the future of model deployment together!

Start with Docker Offload for GPU on demand

Learn more

Related Posts