IBM Granite 4.0 Models Now Available on Docker Hub

投稿日 10月 6, 2025

Developers can now discover and run IBM’s latest open-source Granite 4.0 language models from the Docker Hub model catalog, and start building in minutes with Docker Model Runner. Granite 4.0 pairs strong, enterprise-ready performance with a lightweight footprint, so you can prototype locally and scale confidently.

The Granite 4.0 family is designed for speed, flexibility, and cost-effectiveness, making it easier than ever to build and deploy generative AI applications.

About Docker Hub

Docker Hub is the world’s largest registry for containers, trusted by millions of developers to find and share high-quality container images at scale. Building on this legacy, it is now also becoming a go-to place for developers to discover, manage, and run local AI models. Docker Hub hosts our curated local AI model collection, packaged as OCI Artifacts and ready to run. You can easily download, share, and upload models on Docker Hub, making it a central hub for both containerized applications and the next wave of generative AI.

Why Granite 4.0 on Docker Hub matters

Granite 4.0 isn’t just another set of language models. It introduces a next-generation hybrid architecture that delivers incredible performance and efficiency, even when compared to larger models.

  • Hybrid architecture. Granite 4.0 cleverly combines the linear-scaling efficiency Mamba-2 with the precision of transformers. Select models also leverage a Mixture of Experts (MoE) strategy – instead of using the entire model for every task, it only activates the necessary “experts”, or subssets of parameters. This results in faster processing and memory usage reductions of more than 70% compared to similarly sized traditional models.
  • “Theoretically Unconstrained” Context. By removing positional encoding, Granite 4.0 can process incredibly long documents, with context lengths tested up to 128,000 tokens. Context length is limited only by your hardware, opening up powerful use cases for document analysis and Retrieval-Augmented Generation (RAG).
  • Fit-for-Purpose Sizes. The family includes several sizes, from the 3B parameter Micro models to the 32B parameter Small model, allowing you to pick the perfect balance of performance and resource usage for your specific need

What’s in the Granite 4.0 family

  • Sizes and targets (8-bit, batch=1, 128K context):
  • H-Small (32B total, ~9B active): Workhorse for RAG and agents; runs on L4-class GPUs.
  • H-Tiny (7B total, ~1B active): Latency-friendly for edge/local; consumer-grade GPUs like RTX 3060.
  • H-Micro (3B, dense): Ultra-light for on-device and concurrent agents; extremely low RAM footprint.
  • Micro (3B, dense): Traditional dense option when Mamba-2 support isn’t available.
  • In practice, these footprints mean you can run capable models on accessible hardware – a big win for local development and iterative agent design.

Run in seconds with Docker Model Runner

Docker Model Runner gives you a portable, reproducible way to run local models with an OpenAI-compatible API from laptop dev to CI and cloud.

# Example: start a chat with Granite 4.0 Micro
docker model run ai/granite-4.0-micro

Prefer a different size? Pick your Granite 4.0 variant in the Model Catalog and run it with the same command style. See the Model Runner guide for enabling the runner, chat mode, and API usage.

What you can build (fast)

Granite’s lightweight and versatile nature makes it perfect for a wide range of applications. Combined with Docker Model Runner, you can easily build and scale projects like:

  • Document Summarization and Analysis: Process and summarize long legal contracts, technical manuals, or research papers with ease.
  • Smarter RAG Systems: Build powerful chatbots and assistants that pull information from external knowledge bases, CRMs, or document repositories.
  • Complex Agentic Workflows: Leverage the compact models to run multiple AI agents concurrently for sophisticated, multi-step reasoning tasks.
  • Edge AI Applications: Deploy Granite 4.0 Tiny in resource-constrained environments for on-device chatbots or smart assistants that don’t rely on the cloud.

Join the Open-Source AI Community

This partnership is all about empowering developers to build the next generation of AI applications. The Granite 4.0 models are available under a permissive Apache 2.0 license, giving you the freedom to customize and use them commercially.

We invite you to explore the models on Docker Hub and start building today. To help us improve the developer experience for running local models, head over to our Docker Model Runner repository.

Head over to our GitHub repository to get involved:

  • Star the repo to show your support
  • Fork it to experiment
  • Consider contributing back with your own improvements


Granite 4.0 is here. Run it, build with it, and see what’s possible with Granite 4.0 and Docker Model Runner.

投稿カテゴリ

関連記事