NVIDIA Nemotron 2 Nano 9B Japanese: 日本のソブリンAIを支える最先端小規模言語モデル

Hugging Face Blog

NVIDIA Nemotron 2 Nano 9B Japanese: 日本のソブリンAIを支える最先端小規模言語モデル

Beginner

Hugging Face Blog2/17/2026

Key Summary

•NVIDIA Nemotron-Nano-9B-v2-Japanese is a small but powerful Japanese language model built for real company use on private servers.
•It learns Japanese deeply through continued pre-training on open Japanese texts and keeps strong agent skills for tools and APIs.
•A special seed dataset called Nemotron-Personas-Japan helps generate huge amounts of culturally accurate synthetic training data.
•With just under 10 billion parameters, it runs efficiently on edge GPUs and is easier to fine-tune than very large models.
•On the Nejumi Leaderboard 4, it ranked first among models under 10B, beating strong baselines like Qwen3-8B in size-to-performance.
•The architecture, inherited from Nemotron 2 Nano (Transformer-Mamba), delivers up to 6x higher throughput than many open-source alternatives.
•It is optimized for multi-turn conversations, reliable tool calling, and structured data generation for APIs.
•Companies can deploy it directly for agents or quickly customize it for their own domains using NVIDIA’s NeMo tools.
•This approach advances Japan’s sovereign AI by enabling high-quality, on-premises Japanese AI without sending sensitive data to the cloud.

Why This Research Matters

This model lets Japanese companies run powerful AI privately, protecting sensitive data like customer records and medical notes. Its small size and high speed mean lower costs and quicker responses, which is crucial for real-time support and automation. Cultural accuracy improves user trust by producing polite, natural Japanese that fits business etiquette. Reliable tool calling turns chat into action, so AI can actually do work like querying databases or scheduling tasks. The open personas and recipes help other regions replicate the approach for their own sovereign AI. Better size-to-performance means more teams can adopt AI without huge hardware investments. Altogether, it’s a practical blueprint for safe, strong, and local AI.

Reading Workflow

Turn this paper into a decision

Scan fast. Promote only the papers that survive triage.

No workflow history yet.

Detailed Explanation

Tap terms for definitions

01Background & Problem Definition

🍞 Hook: You know how a smaller backpack is easier to carry to school, but you still want it to hold all the important books? In AI, companies want a “small backpack” model that still carries everything they need.

🥬 The Concept: Small Language Models (SLMs) are compact AI brains that read and write text really well without needing giant computers.

How it works: 1) Start with a smaller model, 2) teach it with smart data, 3) focus it on the most useful skills, 4) make sure it runs fast on normal hardware.
Why it matters: Without SLMs, many companies can’t run AI on their own servers, can’t protect secret data, and have to pay high cloud bills.

🍞 Anchor: A 9B-parameter model like Nemotron-Nano-9B-v2-Japanese can fit on a single strong GPU at an office, while still talking and reasoning in fluent Japanese.

🍞 Hook: Imagine your family keeps important papers in a locked drawer at home instead of sending them to a public locker. Some companies need to do the same with their data.

🥬 The Concept: On-premises deployment means running AI inside a company’s private network instead of on the public internet.

How it works: 1) Install the model on local servers, 2) connect it to company tools and databases, 3) keep traffic inside secure walls.
Why it matters: Without on-prem, sensitive data (like customer info or trade secrets) might leave the building, which many industries can’t allow.

🍞 Anchor: A bank uses Nemotron-Nano-9B-v2-Japanese on its own GPUs to answer staff questions about rules without any customer data touching the cloud.

🍞 Hook: Think of a super helpful intern who doesn’t just chat but can also open apps, fetch files, or schedule meetings.

🥬 The Concept: Agent-based AI is an AI helper that not only talks but also takes actions through tools and APIs.

How it works: 1) Understand the task, 2) plan steps, 3) call the right tools (like search, calendar, code), 4) check results and continue.
Why it matters: Without agent abilities, AI is just a talker—it can’t actually get work done.

🍞 Anchor: When asked “Find sales numbers and draft a summary,” the AI queries a database tool, gathers numbers, and writes a clean report.

🍞 Hook: Picture learning Japanese by reading lots of books and websites day after day—you steadily get smarter.

🥬 The Concept: Continuous pre-training is feeding a model more high-quality text over time so it keeps improving its language brain.

How it works: 1) Start from a strong base model, 2) train on new corpora (like Wikipedia, Aozora Bunko), 3) adjust weights carefully, 4) keep earlier skills.
Why it matters: Without it, a model might sound clumsy or miss local expressions—especially in languages with unique styles like Japanese.

🍞 Anchor: After continued training on Japanese open-source text, the model understands polite forms like “お世話になっております” in the right contexts.

🍞 Hook: You know how actors practice different roles to sound natural in any scene?

🥬 The Concept: Personas are detailed character profiles (age, job, region, style) that help generate realistic, culturally accurate conversations.

How it works: 1) Define many personas reflecting real populations, 2) use them to create dialogues and tasks, 3) train the model on this diversity.
Why it matters: Without personas, generated data can feel generic or culturally off, which weakens the model’s real-world performance.

🍞 Anchor: A Kansai-area shop owner persona helps the model learn regional tones and customer-service phrases that feel authentically Japanese.

🍞 Hook: Imagine a practice test that looks real but is made by teachers so students can train safely and a lot.

🥬 The Concept: Synthetic Data Generation (SDG) is creating training examples with computers that mimic real situations without exposing private data.

How it works: 1) Start from seed personas and prompts, 2) generate many varied examples, 3) filter and de-duplicate, 4) train the model.
Why it matters: Without SDG, you run out of safe, diverse data and the model can’t learn wide, realistic scenarios.

🍞 Anchor: Using Nemotron-Personas-Japan as seeds, the team generated tool-using dialogues (like API calls) that match Japanese business etiquette.

🍞 Hook: Think of tuning a guitar after you’ve learned the song—you make it sound just right for the performance.

🥬 The Concept: Supervised Fine-Tuning (SFT) teaches the model from examples of correct behavior so it follows instructions and uses tools reliably.

How it works: 1) Collect instruction–response pairs, 2) include tool-calling examples, 3) train to match good answers, 4) validate and iterate.
Why it matters: Without SFT, a model might be knowledgeable but ignore instructions or misuse tools.

🍞 Anchor: After SFT, when asked “Convert this CSV to JSON and email it,” the model outputs the right tool call and structured data every time.

🍞 Hook: You know how smaller, faster bikes are easier to ride in a crowded city?

🥬 The Concept: Size-to-performance ratio describes how much quality you get from how small and efficient a model is.

How it works: 1) Measure accuracy and reasoning, 2) compare to model size and speed, 3) pick the sweet spot for your hardware.
Why it matters: Without this balance, you might choose a giant model that’s too slow or a tiny one that’s too weak.

🍞 Anchor: Nemotron-Nano-9B-v2-Japanese beats many peers under 10B on Japan-focused benchmarks while still fitting on edge GPUs.

🍞 Hook: Imagine a national library where the books, rules, and styles are unique—you need a librarian who really knows that place.

🥬 The Concept: Sovereign AI means building AI that truly understands a country’s language, culture, and rules, and can be run locally.

How it works: 1) Train on local-language data, 2) align with cultural norms, 3) support on-premises use, 4) provide open tools and recipes.
Why it matters: Without sovereign AI, organizations depend on foreign systems that may not fit local needs or privacy laws.

🍞 Anchor: A Japanese hospital uses this model to draft discharge summaries in natural Japanese, all inside its secure network.

02Core Idea

🍞 Hook: Imagine upgrading a compact electric car so it speaks perfect Japanese on GPS, handles tight streets smoothly, and even books your parking spot for you.

🥬 The Concept: The key idea is to take a proven, efficient small model (Nemotron 2 Nano) and supercharge it for Japanese by feeding it culturally precise synthetic data from Nemotron-Personas-Japan, plus continued pre-training—so it becomes a strong, tool-using agent that runs on modest hardware.

How it works: 1) Start with Nemotron-Nano-9B-v2’s architecture, 2) continue pre-training on Japanese open-source text, 3) generate massive, persona-grounded synthetic conversations and tool calls, 4) apply SFT and post-training for instruction following and alignment.
Why it matters: Without this combo, you either get a fast but shallow Japanese model, or a smart one that’s too big and expensive to deploy on-prem.

🍞 Anchor: The result is a 9B-parameter model that ranks first under 10B on Japan’s Nejumi Leaderboard 4 and reliably calls APIs in Japanese business workflows.

Multiple analogies (three ways):

Language gym: The base model has strong “muscles.” Continuous Japanese pre-training is cardio for stamina; persona-driven synthetic data is targeted weight training; SFT is the coach correcting form so it performs cleanly.
Cooking school: Nemotron 2 Nano is the versatile kitchen; Japanese corpora are local ingredients; personas are diners with diverse tastes; SFT is plating and seasoning to match exact instructions.
Orchestra: The architecture is the conductor; corpora are sheet music; personas are unique audience vibes; SFT is rehearsal so the performance follows the score precisely, including cueing tool “instruments.”

Before vs After:

Before: Small Japanese models often lacked deep cultural fluency, stumbled on tool calls, or needed cloud-scale hardware.
After: A compact model can hold natural Japanese, reason well, and operate tools, all while fitting inside private company servers.

Why it works (intuition):

The base architecture (Transformer-Mamba) is engineered for speed and long sequences, so it remembers multi-turn context without slowing down.
Persona-grounded SDG captures Japan-specific styles and scenarios, improving both language naturalness and agent reliability.
Continued pre-training plugs gaps in Japanese knowledge, while SFT aligns behavior to follow instructions and structure outputs.
The pieces reinforce each other: better Japanese text understanding makes tool plans smarter; tool-calling practice sharpens reasoning and formatting.

Building blocks (bite-sized pieces): 🍞 Hook: Think of LEGO bricks that snap together to build a smart helper.

🥬 The Concept: Nemotron 2 Nano (Transformer-Mamba) architecture is the sturdy baseplate for speed and memory.

How it works: 1) Efficient attention-like mechanisms, 2) optimized sequence handling, 3) throughput-focused design.
Why it matters: Without a fast base, multi-turn chats and tool chains bog down.

🍞 Anchor: The model can chat over many turns while keeping facts straight and responding quickly on an edge GPU.

🍞 Hook: Imagine role cards that tell actors who they are and how to speak.

🥬 The Concept: Nemotron-Personas-Japan provides culturally faithful personas to seed synthetic data.

How it works: 1) Match real demographics and regions, 2) generate varied, polite-to-casual dialogues, 3) ensure cultural consistency.
Why it matters: Without faithful personas, the model sounds generic or off-tone.

🍞 Anchor: A Tokyo office worker persona teaches the model the right keigo for client emails.

🍞 Hook: Think of adding new chapters to a textbook so you don’t miss any topics.

🥬 The Concept: Continuous pre-training on Japanese corpora fills knowledge gaps.

How it works: 1) Wikipedia, fineweb-2 Japanese, Aozora Bunko, SIP3, 2) Nemotron-CC-v2.1 and specialized sets, 3) careful training to prevent forgetting.
Why it matters: Without it, the model misreads idioms or old literature styles.

🍞 Anchor: After this step, it recognizes literature from Aozora Bunko and modern slang from web text.

🍞 Hook: Picture a music teacher correcting every note until it’s just right.

🥬 The Concept: SFT and post-training align the model to follow instructions and output structured tool calls.

How it works: 1) Curate instruction data, 2) include function-calling formats, 3) train and validate, 4) reduce bias/toxicity.
Why it matters: Without alignment, the model may be smart but unruly.

🍞 Anchor: It returns valid JSON for an API instead of a messy paragraph when asked to run a tool.

03Methodology

At a high level: Input (Japanese prompts and tasks) → Continued Japanese pre-training → Persona-seeded synthetic data generation → Supervised fine-tuning and post-training → Output (fluent Japanese answers, reliable tool calls, multi-turn reasoning)

Step A: Continued Japanese Pre-training 🍞 Hook: Imagine topping up a water bottle during a long hike so you don’t run out of energy.

🥬 The Concept: Continuous pre-training keeps improving the base model’s Japanese by feeding it diverse open-source text.

What happens: The team starts from Nemotron-Nano-9B-v2 and trains on Japanese OSS corpora: Wikipedia, fineweb-2 Japanese, Aozora Bunko, and SIP3-JA-general-web-corpus, plus Nemotron-CC-v2.1 and Nemotron-Pretraining-Specialized-v1.
Why this step exists: Without it, the model might lack modern phrases, domain terms, or classical styles; it could also forget agent skills if not balanced with the Nemotron datasets.
Example: Before training, the model might confuse formal salutations; after, it uses “お世話になっております” properly in business emails.

🍞 Anchor: A customer support prompt about shipping delays gets a precise, polite reply using correct keigo and up-to-date terms.

Step B: Persona-Seeded Synthetic Data Generation (SDG) 🍞 Hook: Think of a flight simulator that lets pilots practice every kind of weather safely and endlessly.

🥬 The Concept: SDG creates huge, realistic practice conversations and tool-using tasks from culturally grounded personas.

What happens: Using Nemotron-Personas-Japan (CC BY 4.0) with millions of persona profiles, the team generates diverse dialogues, instructions, and tool-calling traces. Data is filtered, deduplicated, and expanded while preserving cultural consistency.
Why this step exists: Real, labeled data is limited and sensitive; SDG scales diversity and volume without leaking private info.
Example data: A retail manager persona asks the AI to 1) summarize weekly sales, 2) call an inventory API with {"stor $e_i$ d": 17, "week": 42}, 3) draft an email to suppliers in polite Japanese.

🍞 Anchor: The model later handles a real store’s request to check stock via tool call and writes a culturally appropriate supplier note.

Step C: Supervised Fine-Tuning (SFT) and Post-Training Alignment 🍞 Hook: Imagine a coach running drills so athletes execute plays perfectly under pressure.

🥬 The Concept: SFT teaches the model to follow instructions and produce correct, structured outputs; post-training improves helpfulness, truthfulness, and safety.

What happens: Train on instruction–response pairs, include strict formats for tool and function calls (e.g., valid JSON), and evaluate for truthfulness, bias, and toxicity. Use Nemotron-Post-Training-v3 recipes.
Why this step exists: Without SFT and alignment, the model might waffle, ignore formats, or generate unsafe content.
Example: Given “Compute tax for order 123 and return JSON,” the model outputs {"action": "comput $e_t$ ax", "orde $r_i$ d": 123} instead of a paragraph.

🍞 Anchor: In a finance workflow, it hooks cleanly into a backend tax service because the JSON schema is exactly right.

Step D: Tools and Infrastructure 🍞 Hook: Think of the kitchen: you need the right stove and knives to cook well.

🥬 The Concept: Training used Megatron-LM for pre-training and SFT, and NeMo Curator for data cleaning and filtering; future customization uses NeMo (Megatron-Bridge, AutoModel, NeMo-RL).

What happens: Megatron-LM scales training efficiently; NeMo Curator removes low-quality or duplicated data; NeMo frameworks help teams adapt the model to their own domains.
Why this step exists: Without solid tooling, training becomes unstable, slow, or noisy.
Example: Deduplication prevents the model from overfitting repeated web passages, keeping outputs fresh and general.

🍞 Anchor: A healthcare team later fine-tunes safely with NeMo tools on de-identified clinical templates.

Step E: Deployment and Inference 🍞 Hook: Picture opening night at a theater: the orchestra performs smoothly if rehearsals and the stage are set.

🥬 The Concept: The model’s architecture (Transformer-Mamba) and recipes from Nemotron 2 Nano yield fast, stable inference even on edge GPUs, supporting multi-turn chats and tool chains.

What happens: The model runs through optimized inference engines from Nemotron 2 Nano, delivering up to 6x throughput vs some open-source alternatives; it tracks multi-turn states and outputs structured tool calls.
Why this step exists: Without high throughput and context handling, agents feel slow and lose track in long conversations.
Example: Over a 12-turn troubleshooting chat, the model remembers prior steps and calls a diagnostics API twice with different parameters.

🍞 Anchor: A field technician’s tablet runs the model to guide repairs on-site, quickly switching between dialogue and tool use.

The Secret Sauce (what’s clever): 🍞 Hook: You know how a great recipe balances sweet, salty, sour, and crunch all at once?

🥬 The Concept: The clever mix is an efficient architecture, culturally precise persona seeds, large-scale SDG, and disciplined SFT/post-training—all tuned for Japanese agent tasks under a 10B budget.

Why this works: Each part fixes a specific bottleneck (speed, culture, data scale, reliability), and together they create a compact, capable, and deployable Japanese agent.

🍞 Anchor: That’s why the same 9B model both chats naturally in Japanese and cleanly drives APIs in real business workflows.

04Experiments & Results

The Test: What did they measure and why? 🍞 Hook: When you try out for a team, you don’t just run fast—you also pass, shoot, and follow the coach’s play.

🥬 The Concept: The model was tested across about 40 benchmarks on Nejumi Leaderboard 4, covering language fluency, agent skills (coding, math, tool use), and alignment (instruction following, bias, toxicity, truthfulness, robustness).

How it works: 1) Evaluate Japanese understanding and generation, 2) test reasoning and tool-related tasks, 3) check safety and reliability.
Why it matters: Without broad testing, a model might ace grammar but fail at safe tool calls or long conversations.

🍞 Anchor: The model doesn’t just write politely; it also produces valid function calls and avoids harmful outputs.

The Competition: Who was it compared against? 🍞 Hook: It’s not enough to be good—you need to be better than the other finalists.

🥬 The Concept: Comparisons included the base Nemotron-Nano-9B-v2 and peer models like Qwen3-8B in the under-10B class.

How it works: 1) Use the same benchmark set, 2) compute composite scores, 3) analyze strengths by domain.
Why it matters: Without fair baselines, you can’t tell if gains come from language tuning or lucky test choices.

🍞 Anchor: On the same scoreboard, Nemotron-Nano-9B-v2-Japanese outperformed Qwen3-8B in overall size-to-performance.

The Scoreboard: Results with context 🍞 Hook: Getting 87% on a tough test means more when everyone else is stuck in the 70s.

🥬 The Concept: Nemotron-Nano-9B-v2-Japanese ranked first among models under 10B parameters on Nejumi Leaderboard 4, confirming that Japanese tuning improved not just language but also tools, coding, and alignment.

What this means: 1) It retained agent skills while gaining Japanese fluency, 2) it handled multi-turn contexts well, 3) it kept safety and truthfulness in check.
Why it matters: A top rank signals a dependable base for enterprise customization and on-prem deployment.

🍞 Anchor: For a developer choosing a base model, “first under 10B” is like picking the A+ student when most others are B-range.

Surprising Findings 🍞 Hook: Sometimes practicing free throws makes your defense better, too—skills can transfer.

🥬 The Concept: Training on persona-seeded tool-calling data boosted not only tool reliability but also general Japanese QA and instruction following.

How it works: Tool-calling examples force precise thinking, formatting, and verification, which spill over into better general responses.
Why it matters: It shows that carefully designed SDG can improve multiple abilities at once, not just the targeted one.

🍞 Anchor: After the tool-focused SFT, the model also gave crisper, more on-task answers to regular Japanese questions.

Performance Engineering Results 🍞 Hook: A car that’s both fast and fuel-efficient is extra valuable for city driving.

🥬 The Concept: The Transformer-Mamba-derived architecture and Nemotron 2 Nano inference stack delivered up to 6x throughput over some open-source alternatives, while staying accurate.

How it works: Efficient sequence handling and optimized kernels reduce time per token; better context memory stabilizes multi-turn chats.
Why it matters: Without speed, agents feel laggy; without context, they forget earlier steps.

🍞 Anchor: On an edge GPU, a customer bot responds quickly even in a long back-and-forth with multiple tool calls.

05Discussion & Limitations

Limitations 🍞 Hook: Even great sneakers don’t help much if you’re trying to swim—they’re made for a purpose.

🥬 The Concept: This model is focused on Japanese; outside Japanese contexts, performance depends on the base’s multilingual strengths but may not be optimal.

Why it matters: If you need top-tier Korean or Arabic, you may need further tuning or a different base.

🍞 Anchor: A bilingual call center might still want a separate model fine-tuned for English–Japanese switching.

Required Resources 🍞 Hook: You can’t bake a cake without an oven—even if the recipe is simple.

🥬 The Concept: While smaller than giant LLMs, the 9B model still needs a capable GPU for training/fine-tuning and a solid GPU or two for low-latency inference.

Why it matters: Without proper hardware and data pipelines (Megatron-LM, NeMo Curator), results may slow or degrade.

🍞 Anchor: A midsize company uses one high-memory GPU server on-prem to serve chats and another for occasional fine-tunes.

When NOT to Use 🍞 Hook: Don’t bring a bicycle to a mountain of snow—use skis.

🥬 The Concept: If you require cutting-edge cross-lingual generation across many languages at once, or ultra-niche domains without data, a larger or domain-pretrained model may be better.

Why it matters: The sweet spot here is Japanese enterprise tasks with tools and on-prem needs.

🍞 Anchor: A global marketing generator for 15 languages may favor a large multilingual base.

Open Questions 🍞 Hook: After you build a treehouse, you still wonder how it handles storms.

🥬 The Concept: How far can persona-seeded SDG scale without subtle cultural drift? What’s the best mix of real vs synthetic data for safety and truthfulness at 9B? How reliably do multi-agent, multi-tool workflows stay stable over very long sessions?

Why it matters: Answers help future releases push quality and safety while keeping efficiency.

🍞 Anchor: Next versions might add new regional personas, longer-context tuning, and stronger verification loops for tool results.

06Conclusion & Future Work

Three-Sentence Summary Nemotron-Nano-9B-v2-Japanese turns a proven, efficient 9B architecture into a fluent, tool-using Japanese agent through continued Japanese pre-training and persona-seeded synthetic data plus SFT. It achieves first place under 10B on Nejumi Leaderboard 4, offering strong size-to-performance for on-prem enterprise use. The model runs fast on edge GPUs, speaks culturally accurate Japanese, and outputs reliable structured tool calls.

Main Achievement 🍞 Hook: Like fitting a high-performance engine into a compact car.

🥬 The Concept: The top contribution is showing that a sub-10B model can deliver state-of-the-art Japanese language and agent skills suitable for secure, real-world deployment.

🍞 Anchor: Companies can now build capable Japanese AI agents without sending sensitive data to the cloud.

Future Directions 🍞 Hook: Once a garden grows well, you plant new varieties.

🥬 The Concept: Expand personas across more regions and industries, refine long-context handling, strengthen safety/robustness, and streamline domain fine-tuning with NeMo.

🍞 Anchor: Expect smoother multi-agent workflows, richer tool ecosystems, and even better Japanese nuance.

Why Remember This 🍞 Hook: Sometimes small is mighty.

🥬 The Concept: This work proves that careful architecture, culture-aware data, and disciplined training can make a small model a dependable, on-prem Japanese agent.

🍞 Anchor: It’s a practical path for sovereign AI—powerful, polite, and private.

Practical Applications

•Deploy an on-prem Japanese help-desk agent that answers policy questions and calls internal tools.
•Automate back-office tasks by having the model generate valid API calls to HR, finance, or inventory systems.
•Build a customer support chatbot that maintains polite keigo across long, multi-turn chats.
•Fine-tune the model on domain documents (with NeMo) to create a legal or healthcare assistant.
•Use the model to draft and localize Japanese emails, reports, and summaries with cultural accuracy.
•Run a field service assistant on edge GPUs for diagnostics and step-by-step repair workflows.
•Create a sales analyst bot that queries BI tools and returns structured JSON results.
•Prototype multi-agent workflows (planner + executor) without the overhead of giant models.
•Perform secure document Q&A over private PDFs using on-prem servers.
•Generate synthetic, culturally aligned training data for niche Japanese scenarios using persona seeds.

Version: 1