This is NVIDIA's home for open model weights, datasets, and interactive demos. Everything here is designed to give developers and researchers production-ready starting points for Generative AI, Physical AI, and agentic workflows – backed by the same research that powers NVIDIA's enterprise AI platform.
The Nemotron family is NVIDIA's lineup of purpose-built foundation models spanning language, reasoning, vision, retrieval, speech, and safety. Each model targets a specific performance profile - from ultra-efficient edge inference to heavyweight multi-turn agent orchestration - and ships with open weights, open datasets, and reproducible training recipes.
The core language model lineup, engineered for advanced reasoning and agentic tasks across a range of model sizes and deployment targets.
NVIDIA provides a broad set of specialized multimodal foundations that integrate seamlessly with the Nemotron ecosystem — spanning speech recognition, multilingual translation, vision-language understanding, and real-time voice AI. These models are optimized for both cloud and edge deployment.
State-of-the-art, production-ready speech models from the NVIDIA NeMo Speech research team for ASR, TTS, speaker diarization, and speech-to-speech.
Vision-language models that bring multimodal understanding to documents, images, and video - from OCR and chart parsing to visual Q&A.
A complete, modular retrieval-augmented generation stack — from document ingestion through semantic search to precision reranking — designed for production pipelines that handle text, images, and complex multimodal documents.
NVIDIA releases optimized and aligned versions of leading community architectures, leveraging proprietary alignment techniques (SteerLM, RLHF, RPO) and open datasets like HelpSteer2 to push open models further.
NVIDIA Cosmos is a platform of generative World Foundation Models (WFMs), tokenizers, and data curation tools — purpose-built to model and simulate physical interactions for robotics and autonomous systems.
GR00T N1.5 VLA: NVIDIA's open foundation model for humanoid robot reasoning and control. Combines an Eagle-based vision-language backbone with a diffusion transformer (DiT) action head for language-conditioned manipulation across diverse embodiments. We have integrated GR00T N1.5 into LeRobot for policy post-training learning and deployment.
IsaacLab-Arena: An open-source framework for large-scale, GPU-accelerated robot policy evaluation in simulation built over top of IsaacLab. Provides modular APIs for task curation, automated diversification, and parallel benchmarking across embodiments and environments. We have integrated IsaacLab-Arena into LeRobot for scalable closed-loop policy evaluation and benchmarking along with datasets and 250+ scenes from our partner Lightwheel AI, on HuggingfaceHub.
Every model NVIDIA ships rests on a data layer — and that data shapes how the model reasons, what it knows, and where it can be safely deployed. Nemotron Datasets are the open version of that foundation: web-scale pretraining corpora, alignment and reasoning data, multimodal grounding, and embodied AI simulation, released under permissive licenses with the training recipes and evaluation frameworks that produced them. Beyond Nemotron, NVIDIA's broader open data catalog spans 200+ releases across Physical AI and robotics, autonomous vehicles, biology and drug discovery, retrieval and evaluation benchmarks, and sovereign AI. Use the table below to find the right starting point for what you're trying to build.
| If you want to... | Use this Collection | Start with these datasets |
|---|---|---|
| FOUNDATION | ||
| Pre-train a base model | Nemotron Pre-Training Collection | Nemotron-CC-v2.1, Nemotron-CC-Math-v1, Nemotron-CC-Code-v1, Nemotron-ClimbMix |
| BUILD A CAPABILITY | ||
| Math reasoning, proofs, and quantitative problem-solving | Nemotron Math & Reasoning Collection | Nemotron-SFT-Math-v3, Nemotron-Math-v2, AceReason-Math, Nemotron-CC-Math-v1 |
| Code generation, debugging, and SWE workflows | Nemotron Code & SWE Collection | Nemotron-SFT-Competitive-Programming-v2, Nemotron-SFT-SWE-v2, Nemotron-CC-Code-v1 |
| Helpful, multi-turn, instruction-following chat | Nemotron Chat & Instruction Collection | Nemotron-SFT-Instruction-Following-Chat-v2, Nemotron-RL-instruction_following |
| Agentic and tool-use behavior | Nemotron Agentic Collection | Nemotron-SFT-Agentic-v2, Nemotron-RL-Agentic-Function-Calling-Pivot-v1 |
| Safety, refusals, and content moderation | Nemotron Safety Collection | Aegis-AI-Content-Safety-Dataset-2.0, Nemotron-Safety-Guard-Dataset-v3, Nemotron-PII |
| Image and document understanding | Nemotron Vision-Language Collection | Nemotron-VLM-Dataset-v2, Llama-Nemotron-VLM-Dataset-v1 |
| TRAINING STAGES | ||
| RL data with verifiable rewards (math, code, agentic, instruction) | Nemotron Reinforcement Learning Collection | Nemotron-RL-math-OpenMathReasoning, Nemotron-RL-coding-competitive_coding, Nemotron-RL-Agentic-Function-Calling-Pivot-v1 |
| Train a reward model | Nemotron Reward Modeling Collection | HelpSteer3, HelpSteer2, Nemotron-RLHF-GenRM-v1 |
| Full post-training recipe (SFT + RL blend) | Nemotron Post-Training Blends Collection | Llama-Nemotron-Post-Training-Dataset, Nemotron-Post-Training-Dataset-v2, Nemotron-Cascade-2-SFT-Data |
| Evaluate model performance | Nemotron Eval & Benchmark Collection | SPEED-bench |
| SPECIALIZED & SOVEREIGN | ||
| Multilingual or domain-specific (e.g. finance) capability | Nemotron Supervised Fine-Tuning Collection | Nemotron-SFT-Multilingual-v1, Nemotron-SpecializedDomains-Finance-v1 |
| Diverse synthetic personas grounded in real population distributions | Nemotron Personas Collection | Nemotron-Personas-USA / India / Japan / Brazil / France / Singapore |