Docker Model Runnerã®æ©èœæ¡åŒµ
æ¬æ¥ã Docker Model Runner ãvLLMæšè«ãšã³ãžã³ãšã»ãŒããã³ãµãŒã¢ãã«ãçµ±åããæ¢ã«äœ¿ã£ãŠããDockerããŒã«ã§é«ã¹ã«ãŒãããã®AIæšè«ãè§£æŸããããšãçºè¡šã§ããããšãå¬ããæããŸãã
Docker Model Runnerãæåã«å°å ¥ããéãéçºè ãDockerã䜿ã£ãŠå€§èŠæš¡èšèªã¢ãã«(LLM)ãç°¡åã«å®è¡ã»å®éšã§ããããã«ããããšãç®æšã§ãããç§ãã¡ã¯llama.cppããå§ãŸãè€æ°ã®æšè«ãšã³ãžã³ãçµ±åããããèšèšããã©ãã§ãã¢ãã«ãç°¡åã«åãããããã«ããŸããã
ä»ãç§ãã¡ã¯ãã®æ ã®æ¬¡ã®ã¹ãããã«èžã¿åºããŠããŸããvLLMçµ±åã«ãããDockerã®ã¯ãŒã¯ãããŒããé¢ããããšãªããäœäŸ¡æ Œåž¯ãã髿§èœãªNvidiaããŒããŠã§ã¢ãŸã§AIã¯ãŒã¯ããŒããã¹ã±ãŒã«ã§ããŸãã
ãªãvLLMãªã®ã?
vLLM ã¯ãå€§èŠæš¡èšèªã¢ãã«ãå€§èŠæš¡ã«å¹ççã«æäŸããããã«èšèšãããé«ã¹ã«ãŒãããã®ãªãŒãã³ãœãŒã¹æšè«ãšã³ãžã³ã§ããã¹ã«ãŒããããã¬ã€ãã³ã·ãã¡ã¢ãªå¹çã«éç¹ã眮ããããæ¥çå šäœã§æ¬çªã¬ãã«ã®LLMã®å±éã«äœ¿ãããŠããŸãã
vLLMãéç«ã£ãŠããã®ã¯ä»¥äžã®éãã§ã:
- æé©åãããããã©ãŒãã³ã¹:ã¡ã¢ãªãªãŒããŒããããæå°éã«æãGPUå©çšçãæå€§åããé«åºŠãªæ³šæã¢ã«ãŽãªãºã ã§ããPagedAttentionã䜿çšããŸãã
- ã¹ã±ãŒã©ãã«ãªãµãŒãã³ã°:ããããªã¯ãšã¹ããã¹ããªãŒãã³ã°åºåããã€ãã£ãã§åŠçããã€ã³ã¿ã©ã¯ãã£ãã§ãã©ãã£ãã¯ã®å€ãAIãµãŒãã¹ã«æé©ã§ãã
- ã¢ãã«ã®æè»æ§:GPT-OSSãQwen3ãMistralãLlama 3ãªã©ã®äººæ°ã®ãªãŒãã³ãŠã§ã€ãã¢ãã«ãšã»ãŒããã³ãµãŒåœ¢åŒã§ã·ãŒã ã¬ã¹ã«åäœããŸãã
vLLMãDocker Model Runnerã«å°å ¥ããããšã§ãé«éãªããŒã«ã«å®éšãšå ç¢ãªæ¬çªæšè«ã®ã®ã£ãããåããŠããŸãã
vLLMã®ä»çµã¿
Docker Model Runnerã§vLLMã¢ãã«ãåããã®ã¯ãããã¯ãšã³ããã€ã³ã¹ããŒã«ããŠã¢ãã«ãå®è¡ããã ãã§ãç¹å¥ãªã»ããã¢ããã¯äžèŠã§ãã
vLLMããã¯ãšã³ãã§Docker Model Runnerãã€ã³ã¹ããŒã«ãã:
docker model install-runner --backend vllm --gpu cuda
èšçœ®ãå®äºããããããã«äœ¿ãå§ããããšãã§ããŸã:
docker model run ai/smollm2-vllm "Can you read me?"
Sure, I am ready to read you.
ãŸãã¯APIçµç±ã§ã¢ã¯ã»ã¹ããããšãã§ããŸã:
curl --location 'http://localhost:12434/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"model": "ai/smollm2-vllm",
"messages": [
{
"role": "user",
"content": "Can you read me?"
}
]
}'
HTTPãªã¯ãšã¹ããCLIã³ãã³ãã«ã¯vLLMãžã®èšåã¯ãããŸããã
ããã¯ãDocker Model Runnerã䜿çšããã¢ãã«ã«åºã¥ããŠãªã¯ãšã¹ããèªåçã«æ£ããæšè«ãšã³ãžã³ã«ã«ãŒãã£ã³ã°ããããã§ãllama.cppã§ãvLLMã§ãã·ãŒã ã¬ã¹ãªäœéšãä¿èšŒãããŠããããã§ãã
ãªãè€æ°ã®æšè«ãšã³ãžã³ã䜿ãã®ã?
ãããŸã§éçºè ã¯ã·ã³ãã«ããšããã©ãŒãã³ã¹ã®ã©ã¡ãããéžã°ãªããã°ãªããŸããã§ãããã¢ãã«ãç°¡åã«å®è¡ããããšãã§ããŸã(Docker Model Runnerã®ãããªç°¡æããŒã¿ãã«ããŒã«ã䜿ãã°ãllama.cppãŸãã¯æå€§ã¹ã«ãŒããããéæãã(vLLMã®ãããªãã¬ãŒã ã¯ãŒã¯ã§)ã
Docker Model Runnerã¯ä»ãäž¡æ¹ãæäŸã§ããŸãã
ã§ããŸãïŒ
- llama.cppã§ããŒã«ã«ãããã¿ã€ããäœã£ãŠã¿ãŠã
- vLLMã§æ¬çªç°å¢ãŸã§ã¹ã±ãŒã«ã¢ããããŸãã
åãäžè²«ããDockerã³ãã³ããCI/CDã¯ãŒã¯ãããŒããããã€ç°å¢ãå šäœçã«äœ¿ããŸãããã
ãã®æè»æ§ã«ãããDocker Model Runnerã¯æ¥çåã®ååšã§ãããåäžã®ããŒã¿ãã«ã³ã³ããåãããã¯ãŒã¯ãããŒå ã§è€æ°ã®æšè«ãšã³ãžã³ãåãæ¿ããããããŒã«ã¯ä»ã«ãããŸããã
ãããã®ãšã³ãžã³ãäžã€ã®ã€ã³ã¿ãŒãã§ãŒã¹ã«çµ±åããããšã§ãDockerã¯ããŒãããœã³ã³ããã¯ã©ã¹ã¿ããããŠãã®éã®ããããç°å¢ã«è³ããŸã§ãAIãçã«ç§»æ€å¯èœã«ããŠããŸãã
ã»ãŒããã³ãµãŒ(vLLM)察GGUF(llama.cpp):é©åãªãã©ãŒãããã®éžæ
vLLMã®è¿œå ã«ãããDocker Model Runnerã¯çŸåšãæãæ¯é çãª2ã€ã®ãªãŒãã³ãœãŒã¹ã¢ãã«ãã©ãŒãããã§ããSafetensorsãšGGUFãšäºææ§ãæã€ããã«ãªããŸãããModel Runnerã¯ãšã³ãžã³èšå®ã®è€éããæœè±¡åããŸããããããã®ãã©ãŒãããã®éããçè§£ããããšã§ãã€ã³ãã©ã«é©ããããŒã«ãéžã¶ã®ã«åœ¹ç«ã¡ãŸãã
- GGUF(GPTçæçµ±äžãã©ãŒããã): llama.cppã®ãã€ãã£ããã©ãŒãããã¯ãGGUFã¯é«ãç§»æ€æ§ãšéååãéèŠããŠèšèšãããŠããŸããã¡ã¢ãªåž¯åå¹ ãéãããŠããæ±çšããŒããŠã§ã¢äžã§ã¢ãã«ãåããã®ã«æé©ã§ããã¢ãã«ã¢ãŒããã¯ãã£ãšéã¿ä»ãã1ã€ã®ãã¡ã€ã«ã«ãŸãšããŸãã
- ã»ãŒããã³ãµãŒ: vLLMã®ãã€ãã£ããã©ãŒãããã§ããããã€ãšã³ãæšè«ã®çŸä»£æšæºã§ããã»ãŒããã³ãµãŒã¯ãé«ã¹ã«ãŒãããæ§èœã®ããã«æ§ç¯ãããŠããŸãã
Docker Model Runnerã¯ããªãã®ãªã¯ãšã¹ããè³¢ãã«ãŒãã£ã³ã°ããŸããGGUFã¢ãã«ãåŒããšãllama.cppãå©çšããŸããã»ãŒããã³ãµãŒã¢ãã«ã掻çšããã°ãvLLMã®åãæŽ»çšã§ããŸããDocker Model Runnerã§ã¯ãäž¡æ¹ãšãOCIã€ã¡ãŒãžãšããŠä»»æã®OCIã¬ãžã¹ããªã«ããã·ã¥ã»ãã«ã§ããŸãã
Docker Hubäžã®vLLMäºæã¢ãã«
vLLMã¢ãã«ã¯ã»ãŒããã³ãµãŒåœ¢åŒã§ããDocker Hubã§å©çšå¯èœãªåæã®ã»ãŒããã³ãµãŒã¢ãã«ã®äžéš:
çŸåšå ¥æå¯èœ:x86_64 Nvidiaã§
åæãªãªãŒã¹ã¯Nvidia GPUãæèŒããx86_64 ã¢ãŒããã¯ãã£ãæèŒããã·ã¹ãã åãã«æé©åãããŠãããå©çšå¯èœã§ããç§ãã¡ã®ããŒã ã¯ãã®ãã©ãããã©ãŒã ã§å®å®ããäœéšãäœãããã«å šåãå°œãããŠããããã®éãããã£ãšå®æããŠããã ãããšç¢ºä¿¡ããŠããŸãã
次ã¯äœã§ãã?
ãã®ããŒã³ãã¯å§ãŸãã«éããŸãããç§ãã¡ã®vLLMããŒããããã¯ããã©ãããã©ãŒã ã¢ã¯ã»ã¹ã®æ¡å€§ãšç¶ç¶çãªããã©ãŒãã³ã¹ãã¥ãŒãã³ã°ãšãã2ã€ã®éèŠãªåéã«çŠç¹ãåœãŠãŠããŸãã
- WSL2/Docker Desktop äºææ§: ç§ãã¡ã¯ãã·ãŒã ã¬ã¹ãªãã€ã³ããŒã«ãŒãããéçºè ã«ãšã£ãŠæ¥µããŠéèŠã§ããããšãç¥ã£ãŠããŸããç§ãã¡ã¯WSL2ãéããŠvLLMããã¯ãšã³ããWindowsã«å°å ¥ããããã«ç©æ¥µçã«åãçµãã§ããŸããããã«ãããLinuxç°å¢ã§äœ¿ãã¯ãŒã¯ãããŒ(ãŸãã¯Nvidia Windowsãã·ã³)ã§é«ã¹ã«ãŒãããAIã¢ããªã±ãŒã·ã§ã³ã®Docker Desktopãæ§ç¯ããã¹ãããããã¿ã€ãã§ããŸãã
- DGX Sparkã®äºææ§: ç§ãã¡ã¯Model RunnerãããŸããŸãªçš®é¡ã®ããŒããŠã§ã¢åãã«æé©åããŠããŸããNvidia DGXã·ã¹ãã ãšã®äºææ§è¿œå ã«åãçµãã§ããŸãã
- ããã©ãŒãã³ã¹æé©å: æ¹åãã¹ãç¹ãç©æ¥µçã«è¿œè·¡ããŠããŸããvLLMã¯é©ç°çãªã¹ã«ãŒããããæäŸããŸãããçŸåšã¯llama.cppãããèµ·åæéãé ãããšãèªèããŠããŸããããã¯ãæ¥éãªéçºãµã€ã¯ã«ãå®çŸããããã«ãæåã®ããŒã¯ã³ãŸã§ã®æéããæ¹åããããã®ä»åŸã®åŒ·åã§æé©åããããšããŠããéèŠãªåéã§ãã
æé·ã«é¢ããäžã§ã®ãæ¯æŽãšãèŸæ±ã«æè¬ããããŸãã
åå æ¹æ³
Docker Model Runnerã®åŒ·ã¿ã¯ã³ãã¥ããã£ã«ãããæé·ã®äœå°ã¯åžžã«ãããŸãããã®ãããžã§ã¯ããæé«ã®ãã®ã«ããããã«ãçããã®ãååãå¿ èŠã§ããåå ããã«ã¯ã以äžã®æ¹æ³ããããŸã:
- ãªããžããªã«æãã€ãã: Docker Model Runnerãªããžããªã«æãã€ããŠãç§ãã¡ã®èªç¥åºŠãé«ããããã«ãæ¯æŽããã ããã°å¹žãã§ãã
- ã¢ã€ãã¢ãæçš¿ããŠãã ããã æ°æ©èœããã°ä¿®æ£ã®ã¢ã€ãã¢ã¯ãããŸãã?åé¡ãäœæããŠè°è«ããŸãããŸãã¯ããªããžããªããã©ãŒã¯ãã倿Žãå ããŠãpull request ãéä¿¡ããŸããç§ãã¡ã¯ããªããã©ããªã¢ã€ãã¢ãæã£ãŠããããèŠãã®ã楜ãã¿ã«ããŠããŸã!
- èšèãåºãã: å人ãååãããã³ Docker ã§ AI ã¢ãã«ãå®è¡ããããšã«èå³ãããå¯èœæ§ã®ãã人ã«äŒããŠãã ããã
ç§ãã¡ã¯ Docker Model Runner ã®ãã®æ°ããç« ã«éåžžã«è奮ããŠãããäžç·ã«äœãæ§ç¯ã§ããããèŠãã®ãåŸ ã¡ãããŸããããããä»äºã«åãæãããŸããã!