LoRA Explained: Faster, More Efficient Fine-Tuning with Docker

Posted Oct 9, 2025

Ignasi Lopez Luna

Fine-tuning a language model doesn’t have to be daunting. In our previous post on fine-tuning models with Docker Offload and Unsloth, we walked through how to train small, local models efficiently using Docker’s familiar workflows. This time, we’re narrowing the focus.

Instead of asking a model to be good at everything, we can specialize it: teaching it a narrow but valuable skill, like consistently masking personally identifiable information (PII) in text. Thanks to techniques like LoRA (Low-Rank Adaptation), this process is not only feasible on modest resources, it’s fast and efficient.

Even better, with Docker’s ecosystem the entire fine-tuning pipeline: training, packaging, and sharing, becomes approachable. You don’t need a bespoke ML setup or a research lab workstation. You can iterate quickly, keep your workflow portable, and publish results for others to try with the same Docker commands you already know.

In this post, I’ll walk through a hands-on fine-tuning experiment: adapting the Gemma 3 270M model into a compact assistant capable of reliably masking PII.

What’s Low-Rank Adaptation (LoRA)?

Fine-tuning starts with a pre-trained model, one that has already learned the general structure and patterns of language.

Instead of training it from scratch (which would consume massive amounts of compute and risk catastrophic forgetting, where the model loses its prior knowledge), we can use a more efficient method called LoRA (Low-Rank Adaptation).

LoRA allows us to teach the model new tasks or behaviors without overwriting what it already knows, by adding small, trainable adapter layers while keeping the base model frozen.

How does LoRA work?

At a high level, LoRA works like this:

Freeze the base model: The model’s original weights (its core knowledge of language) remain unchanged.
Add adapter layers: Small, trainable “side modules” are inserted into specific parts of the model. These adapters learn only the new behavior or skill you want to teach.
Train efficiently: During fine-tuning, only the adapter parameters are updated. The rest of the model stays static, which dramatically reduces compute and memory requirements.

LoRA experiment: Fine-tune Gemma 3 270M to mask PII

For this experiment, the model already knows how to read, write, and follow instructions. Our job is simply to teach it the specific pattern we care about, for example:

“Given some text, replace PII with standardized placeholders while leaving everything else untouched.”

The fine-tuning process consists of four steps:

Prepare the dataset
Prepare LoRA adapter
Train the model
Export the resulting model

Figure 1: Four steps of fine-tuning with LoRA

In this example, we use Supervised Fine-Tuning (SFT): each training example pairs raw text containing PII with its correctly redacted version. Over many such examples, the model internalizes the pattern and learns to generalize the redaction rules.

The quality of the dataset is critical, the cleaner and more representative your dataset, the better your fine-tuned model will perform.

Before we dive into the steps, it’s crucial to understand Chat Templates.

Understanding Chat Templates

When you send a request like below to Gemma 3 270M, the model doesn’t see this JSON structure directly.

 "messages": [
        {
            "role": "user",
            "content": "Mask all PII in the following text. Replace each entity with the exact UPPERCASE label in square brackets (e.g., [PERSON], [EMAIL], [PHONE], [USERNAME], [ADDRESS], [CREDIT_CARD], [TIME], etc.). Preserve all non-PII text, whitespace, ' ' and punctuation exactly. Return ONLY the redacted text. Text: This is an example of text that contains some data. The author of this text is Ignacio López Luna, but everybody calls him Ignasi. His ID number is 123456789. He has a son named Arnau López, who was born on 21-07-2021"
        }
    ]

Instead, the input is transformed into a chat-formatted prompt with special tokens:

&lt;start_of_turn&gt;user Mask all PII in the following text. Replace each entity with the exact UPPERCASE label in square brackets (e.g., [PERSON], [EMAIL], [PHONE], [USERNAME], [ADDRESS], [CREDIT_CARD], [TIME], etc.). Preserve all non-PII text, whitespace, ' ' and punctuation exactly. Return ONLY the redacted text. Text: This is an example of text that contains some data. The author of this text is Ignacio López Luna, but everybody calls him Ignasi. His ID number is 123456789. He has a son named Arnau López, who was born on 21-07-2021&lt;end_of_turn&gt;

Notice how the message has been rewrapped and extra tokens like <start_of_turn> and <end_of_turn> have been inserted. These tokens are part of the model’s chat template, the standardized structure it expects at inference time.

Different models use different templates. For example, Gemma uses <start_of_turn> markers, while other models might rely on <bos> or others.

This is exactly why the first step is “Prepare the dataset.” When fine-tuning, you must format your training data with the same chat template that the model will use during inference. This alignment ensures the fine-tuned model is robust, because it has been trained on data that looks exactly like what it will encounter in production.

Prepare the dataset: Teaching through examples

The dataset is the bridge between general-purpose language ability and task-specific expertise. Each example is a demonstration of what we want the model to do: a prompt with raw text containing PII, and a response showing the redacted version.

In the script this is how the original Dataset is formatted using the Chat Template of the model (see the apply_chat_template function):

max_seq_length = 2048
model, tokenizer = FastModel.from_pretrained(
    model_name="unsloth/gemma-3-270m-it",
    max_seq_length=max_seq_length,
    load_in_4bit=False,
    load_in_8bit=False,
    full_finetuning=False,
)

with open("pii_redaction_train.json", "r", encoding="utf-8") as f:
    data = json.load(f)

ds = Dataset.from_list(data)

def to_text(ex):
    resp = ex["response"]
    if not isinstance(resp, str):
        resp = json.dumps(resp, ensure_ascii=False)
    msgs = [
        {"role": "user", "content": ex["prompt"]},
        {"role": "assistant", "content": resp},
    ]
    return {
        "text": tokenizer.apply_chat_template(
            msgs, tokenize=False, add_generation_prompt=False
        )
    }

dataset = ds.map(to_text, remove_columns=ds.column_names)

You can print some of the pairs to see how it looks like via:

for i in range(3):
    print(dataset[i]["text"])
    print("=" * 80)

An example of a dataset entry:

&lt;bos&gt;&lt;start_of_turn&gt;user
Mask all PII in the following text. Replace each entity with the exact UPPERCASE label in square brackets (e.g., [PERSON], [EMAIL], [PHONE], [USERNAME], [ADDRESS], [CREDIT_CARD], [TIME], etc.). Preserve all non-PII text, whitespace, and punctuation exactly. Return ONLY the redacted text.

Text:
&lt;p&gt;My child faozzsd379223 (DOB: May/58) will undergo treatment with Dr. faozzsd379223, office at Hill Road. Our ZIP code is 28170-6392. Consult policy M.UE.227995. Contact number: 0070.606.322.6244. Handle transactions with 6225427220412963. Queries? Email: faozzsd379223@outlook.com.&lt;/p&gt;&lt;end_of_turn&gt;
&lt;start_of_turn&gt;model
&lt;p&gt;My child [USERNAME_2] (DOB: [DATEOFBIRTH_1]) will undergo treatment with Dr. [USERNAME_1], office at [STREET_1]. Our ZIP code is [ZIPCODE_1]. Consult policy M.UE.227995. Contact number: [TELEPHONENUM_1]. Handle transactions with [CREDITCARDNUMBER_1]. Queries? Email: [EMAIL_1].&lt;/p&gt;&lt;end_of_turn&gt;

Prepare LoRA adapter: Standing on the shoulders of a base model

Instead of starting from a blank slate, we begin with Gemma-3 270M-IT, a small but capable instruction-tuned model. By loading both the weights and the tokenizer, we get not just a model that understands text, but also the exact rules it uses to split and reconstruct sentences.

Fine-tuning isn’t reinventing language, it’s layering task-specific expertise on top of a foundation that already knows how to read and write.

For that, we’ll use the LoRA technique.

Why we use LoRA

Training a large language model from scratch is extremely costly, because it means adjusting billions of parameters.

But the good news is: you usually don’t need to change everything to teach the model a new skill.

That’s where LoRA comes in. Instead of re-training the entire model, LoRA adds a few small, extra components, like “add-ons.” When we fine-tune the model, we only adjust these add-ons, while the main model stays the same.

from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=16, 
    lora_alpha=32, 
    target_modules=["q_proj", "v_proj"], 
    lora_dropout=0.05
)

model = get_peft_model(base_model, lora_config)

These few lines tell the model: keep your parameters frozen, but learn through a small set of low-rank adapters. That’s why fine-tuning is efficient and affordable.

Train the model: Fine-tuning in practice

With the dataset ready and LoRA adapters in place, the actual training looks like classic supervised learning.

Feed in the input (a user prompt).
Compare the model’s output with the expected response.
Adjust the adapter weights to minimize the difference.

model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    eval_dataset = None, # Can set up evaluation!
    args = SFTConfig(
        dataset_text_field = "text",
        per_device_train_batch_size = 8,
        gradient_accumulation_steps = 1, # Use GA to mimic batch size!
        warmup_steps = 5,
        num_train_epochs = 1, # Set this for 1 full training run.
        # max_steps = 100,
        learning_rate = 5e-5, # Reduce to 2e-5 for long training runs
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir="outputs",
        report_to = "none", # Use this for WandB etc
    ),
)

trainer_stats = trainer.train()

Over many iterations, the model internalizes the rules of PII masking, learning not only to replace emails with [EMAIL] but also to preserve punctuation, whitespace, and all non-PII content exactly as instructed.

What’s important here is that fine-tuning doesn’t overwrite the model’s general capabilities. The model still knows how to generate coherent text, we’re just biasing it toward one more skill.

Export the resulting model: Merging weights

Once training finishes, we have a base model plus a set of LoRA adapters. That’s useful for experimentation, but for deployment we often prefer a single consolidated model.

By merging the adapters back into the base weights, we produce a standalone checkpoint that behaves just like the original model, except it now has PII masking expertise built in.

model.save_pretrained_merged("result", tokenizer, save_method = "merged_16bit")

Try and share your model

After fine-tuning, the next natural step is to try your model in action and, if it works well, share it with others. With Docker Model Runner, you can package your fine-tuned model, push it to Docker Hub, and make it instantly runnable anywhere. No messy setup, no GPU-specific headaches, just a familiar Docker workflow for distributing and testing AI models.

So once your adapters are trained and merged, don’t stop there: run it, publish it, and let others try it too. In the previous post, I showed how easy it is to do that step-by-step.

Fine-tuning makes your model specialized, but Docker makes it accessible and shareable. Together, they turn small local models from curiosities into practical tools ready to be used, and reused, by the community.

We’re building this together!

Docker Model Runner is a community-friendly project at its core, and its future is shaped by contributors like you. If you find this tool useful, please head over to our GitHub repository. Show your support by giving us a star, fork the project to experiment with your own ideas, and contribute. Whether it’s improving documentation, fixing a bug, or a new feature, every contribution helps. Let’s build the future of model deployment together!

Learn more

Learn how to fine-tune local models with Docker Offload and Unsloth
Check out the Docker Model Runner General Availability announcement
Visit our Model Runner GitHub repo! Docker Model Runner is open-source, and we welcome collaboration and contributions from the community!
Get started with Model Runner with a simple hello GenAI application

LoRA Explained: Faster, More Efficient Fine-Tuning with Docker

What’s Low-Rank Adaptation (LoRA)?

How does LoRA work?

LoRA experiment: Fine-tune Gemma 3 270M to mask PII

Understanding Chat Templates

Prepare the dataset: Teaching through examples

Prepare LoRA adapter: Standing on the shoulders of a base model

Why we use LoRA

Train the model: Fine-tuning in practice

Export the resulting model: Merging weights

Try and share your model

We’re building this together!

Learn more

Related Posts

A New Approach for Coding Agent Safety

Dynamic MCPs with Docker: Stop Hardcoding Your Agents’ World

Docker + E2B: Building the Future of Trusted AI

From Compose to Kubernetes to Cloud: Designing and Operating Infrastructure with Kanvas

Docker, JetBrains, and Zed: Building a Common Language for Agents and IDEs

Announcing vLLM v0.12.0, Ministral 3 and DeepSeek-V3.2 for Docker Model Runner

Products

Features

Developers

Pricing

Company

Languages