Learn extra at:
Differentiator is openness
To underscore its dedication to open supply, Nvidia is revealing a few of Nemotron 3’s inside workings, releasing a dataset with real-world telemetry for security evaluations, and three trillion tokens of Nemotron 3’s pretraining, post-training, and RL datasets.
As well as, Nvidia is open-sourcing its NeMo Gym and NeMo RL libraries, which offer Nemotron 3’s coaching environments and post-training basis, and NeMo Evaluator, to assist builders validate mannequin security and efficiency. All at the moment are out there on GitHub and Hugging Face. Of those, Mayham famous, NeMo Gymnasium is perhaps probably the most “strategically vital” piece of this launch.
Pre-training teaches fashions to foretell tokens, to not full domain-specific duties, and conventional RL from human suggestions (RLHF) doesn’t scale for complicated agentic behaviors, Mayham defined. NeMo Gymnasium allows RL with verifiable rewards — basically computational verification of job completion fairly than subjective human scores. That’s, did the code cross checks? Is the mathematics appropriate? Had been the instruments known as correctly?

