Architecture
Three pieces: the client you talk to, the relay that coordinates and the nodes that do the actual thinking.
Client
The web app at usetalos.xyz. You sign in, pick a tier and send messages. It opens a WebSocket connection to the relay and receives streamed tokens as they're generated.
Relay
The central routing layer, a small always-on service that coordinates everything:
- Pro / Light → browser nodes running Nimbus 8B.
- Heavy → rig nodes (Atlas 30B or Atlas Vision 27B, operator's choice).
It doesn't store conversations or prompt content. It routes traffic and discards it.
Nodes
Browser nodes (WebGPU)
Run inside a browser tab via an in-browser WebGPU runtime:
- Stats broadcast: pushes live network numbers (active nodes, queue depth, jobs completed) to every connected client every 5 seconds.
- Nimbus 8B (~4.2GB): serves the Light tier.
Models download once and cache in the browser. Nodes connect to the relay over the same WebSocket layer, pick up job assignments, run inference and stream tokens back.
Rig nodes (local runtime)
Run as a background process driving a local model runtime, with hardware acceleration via:
- CUDA: NVIDIA GPUs
- Metal: Apple Silicon
- Vulkan: AMD and Intel GPUs
Rig nodes serve Heavy tier requests exclusively: Atlas 30B and Atlas Vision 27B, operator-selectable. They authenticate with a node token and connect to the relay the same way.
Render nodes (image pipeline)
Run a node-based render pipeline on an independent GPU and serve the render_image tool exposed on the Heavy tier. Same token-based authentication, same relay connection.
Job lifecycle
1. User sends a message
2. Relay receives the request and determines the tier
3. Request enters the tier-specific queue
4. Relay matches it to an idle node of the right type
(weighted-random by measured tokens/sec among eligible idle nodes)
5. Job assigned to that node
6. Node runs inference, streams tokens back to the relay
7. Relay relays tokens to the user in real time
8. Job completes, node returns to idle, earnings are creditedSearch flow (Heavy tier)
1. User sends a message (Heavy tier)
2. The node runs the model; it emits a web_lookup tool call
3. Relay queries an independent search API
4. Relay fetches the top result pages and extracts content
5. Results are returned to the model as a tool result
6. Model continues generating, grounded in that content
7. Response streams back with source citationsLive stats
The relay broadcasts network stats to every connected client every 5 seconds:
- Active nodes, by type and model
- Current queue depth per tier
- Total jobs completed
- Network-wide tokens per second