Namo R1

You: I don't have GPUs to run VLMs. Namo R1: Hold my beer.... let's do this on CPU.

Namo R1 🔥🔥 surpassed SmolVLM and Moondream2 in terms of same size! And we are keep evolving, more advanced models are under training!

Introduction

We are excited to open-source Namo, an extremly small yet mighty MLLM. While numerous MLLMs exist, few offer true extensibility or fully open-source their training data, model architectures, and training schedulers - critical components for reproducible AI research.

The AI community has largely overlooked the potential of compact MLLMs, despite their demonstrated efficiency advantages. Our analysis reveals significant untapped potential in sub-billion parameter models, particularly for edge deployment and specialized applications. To address this gap, we're releasing Namo R1, a foundational 500M parameter model trained from scratch using innovative architectural choices.

Key innovations include:

CPU friendly: Even on CPUs, Namo R1 can runs very fast;
Omni-modal Scalability: Native support for future expansion into audio (ASR/TTS) and cross-modal fusion;
Training Transparency: Full disclosure of data curation processes and dynamic curriculum scheduling techniques.

👇 Video Demo Runs on CPU:

namodemo3.mp4

Please join us in discord! https://discord.gg/5ftPBVspXj . For Chinese users, we will publish WeChat group in discord as well.

Updates

2025.02.22: more to come...!
2025.02.22: 🔥🔥 SigLIP2 added! You can now training with SigLIP2 as vision encoder, Join us in discord;
2025.02.21: 🔥🔥 The first version is ready to open, fire the MLLM power able to runs on CPU!
2025.02.17: Namo R1 start training.

Results

the result might keep updating as new models trained.

Model	MMB-EN-T	MMB-CN-T	Size
Namo-500M	68.8	48.7	500M
Namo-700M	training	training	700M
Namo-500M-R1	training	training	500M
Namo-700M-R1	training	training	700M
SmolVLM-500M	53.8	35.4	500M
SmolVLM-Instruct-DPO	67.5	49.8	2.3B
Moondream1	62.3	19.8	1.9B
Moondream2	70	28.7	1.9B

⚠️ Currently, the testing has only been conducted on a limited number of benchmarks. In the near future, more metrics will be reported. Even so, we've observed significant improvements compared to other small models.

Get Started

Install & Run in Cli

All you need to do is:

pip install -U namo

A simple demo would be:

from namo.api.vl import VLInfer

# model will download automatically
model = VLInfer(
    model_type="namo", device="cuda:0" if torch.cuda.is_available() else "cpu"
)

# default will have streaming
model.generate(images='images/cats.jpg', prompt='what is this?')

That's all!

For cli multi-turn chat in terminal you can run python demo.py. (Namo cli directly in your terminal would be avaiable later.)

OpenAI server & Run in OpenWebUI

namo server --model checkpoints/Namo-500M-V1

then, you will have OpenAI like serving in local.

Showcases

Namo-500M, our first small series of models, is capable of performing remarkable tasks such as multilingual OCR, general concept understanding, image captioning, and more. And it has only 500 million parameters! You can run it directly on a CPU!

📁 Show more real use cases

Features of Namo R1

In contrast to open-source VLMs like Qwen2.5-3B and MiniCPM, the Namo series offers the following features that enable anyone to train their own VLMs from scratch:

Extremely Small: Our first series has only 500 million parameters yet powerful on various tasks.
OCR Capability: With just a 500M model, you can perform multilingual OCR, covering not only Chinese and English but also Japanese and other languages.
Dynamic Resolution: We support native dynamic resolution as input, making it robust for images of any ratio.
Fully Open Source: We opensource all model codes including training steps and scripts!
R1 Support: Yes, we now support R1 for post-training.

Above all, we are also ready to help when u want train your MLLM from scratch at any tasks!

Roadmap

We are still actively training on new models, here are few things we will arrive:

Speech model;
Vision model with more decent vision encoders, such as SigLip2;
TTS ability;
Slightly larger models, up to 7B;

Trouble Shooting

Got error when using deepspeed: AssertionError: no_sync context manager is incompatible with gradient partitioning logic of ZeRO stage 2 ?

Please upgrade transformers to 4.48+ and use latest deepspeed.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
images		images
namo		namo
scripts		scripts
.gitignore		.gitignore
demo.py		demo.py
demo_bare.py		demo_bare.py
readme.md		readme.md
train_grpo.py		train_grpo.py
train_mdpo.py		train_mdpo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Namo R1

Introduction

Updates

Results

Get Started

Install & Run in Cli

OpenAI server & Run in OpenWebUI

Showcases

Features of Namo R1

Roadmap

Trouble Shooting

Copyright

About

Releases

Packages

Languages

lucasjinreal/Namo-R1

Folders and files

Latest commit

History

Repository files navigation

Namo R1

Introduction

Updates

Results

Get Started

Install & Run in Cli

OpenAI server & Run in OpenWebUI

Showcases

Features of Namo R1

Roadmap

Trouble Shooting

Copyright

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages