Orion-100B: Training a 100 Billion Parameter AI for $1.25 Per Hour
Macrocosmos completed the Orion-100B training run across globally distributed hardware for as little as $1.25 per hour per node.
Orion-100B is the largest distributed LLM pre-training run ever completed over the open internet to date.
Per Macrocosmos, the model was trained across globally distributed single GPUs using the Bittensor SN9 network.
What Is Orion-100B and How Was It Trained?

Orion-100B is a 100 billion parameter language model trained using pipeline-parallel distributed techniques.
Training used 16-stage pipeline parallelism with 3 replicas, equalling a total of 48 geographically distributed devices.
Each stage was hosted by a separate non-colocated peer across five US datacenters connected over the internet.
This is the first time a 100 billion parameter model has been trained using this kind of distributed architecture.
The Bittensor SN9 network provided the infrastructure through the IOTA architecture framework developed by Macrocosmos.
The model achieved roughly 65% of the training speed of equivalent traditional datacenter setups at much lower cost.
How Orion-100B Achieved $1.25 Per Hour Training Cost

Individual participants can join the Orion-100B training network for as little as $1.25 per hour per node.
A full Orion replica uses 16 non-colocated NVIDIA A100 GPUs and costs around $20 per hour to provision.
That is 2.5 times cheaper than equivalent datacenter hardware rental costs from standard cloud providers.
The cost savings come from using idle consumer and research GPUs distributed across the internet globally.
Per SimplyTao, Orion-100B achieved over 30% model FLOP utilization on NVIDIA A100-80GB chips during training.
30% MFU is competitive with many centralized training runs and remarkable for a fully distributed setup.
Orion-100B Performance: What 30 Percent Model FLOP Utilization Means

Model FLOP utilization (MFU) measures how efficiently a training setup uses its available computing power.
Most cutting-edge datacenter training runs achieve 40-50% MFU under ideal conditions with co-located hardware.
Orion-100B’s 30% MFU across geographically distributed hardware is a significant distributed computing achievement.
The primary bottleneck is network latency between nodes, which reduces synchronization speed compared to local clusters.
Macrocosmos developed the IOTA architecture specifically to minimize the latency penalty in distributed pipeline training.
The efficiency gains connect to broader AI training cost trends driving down the price of frontier AI.
Why Orion-100B Matters for the Future of AI Training

Orion-100B proves that frontier-scale AI training is no longer limited to companies with billion-dollar GPU clusters.
Decentralized training on Bittensor enables anyone with a GPU to contribute to and profit from large model training.
This could democratize access to AI development for universities, small labs, and emerging market researchers.
Training 100 billion parameter models over the public internet was considered impossible two years ago in 2024.
As distributed techniques improve, experts predict 500-billion parameter models could be trained this way by 2027.
The implications connect to agentic AI system deployment costs coming down across the entire industry.
Orion-100B vs Traditional AI Training: What Changes for Developers?

Traditionally, training a 100 billion parameter model required exclusive access to thousands of co-located GPUs.
Orion-100B shows that geographically dispersed hardware can accomplish the same task at a fraction of the cost.
Startups and academic labs can now consider training large foundation models without cloud provider lock-in.
The Bittensor network rewards GPU contributors with tokens, creating an economic incentive for participation.
Orion-100B’s weights will be released publicly, allowing the broader research community to build on top of it.
This open-weights approach follows the EU EUROPA model’s philosophy that frontier AI should be accessible globally.