Roadmap
XeeNet is a distributed experimentation network that turns idle compute into a research fabric for bounded, verifiable ML workloads. This roadmap tracks the journey from working prototype to global-scale autoresearch.
Current State Phase 1 Complete
The core platform is functional end-to-end as a single-operator research network. A researcher creates a brief, the orchestrator decomposes it into a campaign of bounded experiments with real hyperparameter configurations, workers execute actual PyTorch training, results flow back through the API, and the dashboard displays live progress with factor analysis. Credits are calculated and recorded on every result submission.
What's Built
| Component | Status | Detail |
|---|---|---|
| FastAPI Backend | Complete | REST API, async DB, 3 router groups, lifespan management |
| Web Dashboard | Complete | HTMX + Jinja2, brief CRUD, campaign status, results display, auto-refresh |
| Orchestrator Agent | Complete | Brief decomposition, config generation, task graph creation |
| Python Worker Agent | Complete | Subprocess execution, dual deadlines, simulated fallback, CLI runner |
| Electron Desktop Worker | Complete | Zero-setup install, GPU detection, task execution, system tray |
| Training Pipeline | Complete | Char-level transformer on TinyShakespeare, real val_bpb, bounded budgets |
| Config Generator | Complete | Reproducible search space sampling with deterministic seeds |
| Campaign Tooling | Complete | Campaign runner, progress monitor, post-campaign factor analysis with JSON export |
| Credits System | Complete | Calculation on result submission, ledger persistence, dashboard display |
| Portal Assistant | Framework | Agent prompt and stub, needs LLM integration |
| Test Suite | Complete | 110 tests across 10 files, all passing |
Growth Phases
XeeNet grows in four deliberate phases. Each phase proves a specific thesis before the next layer of complexity is introduced. The platform is not a replacement for datacentres — it is a new compute substrate for high-volume ML experimentation.
| Phase | Status | Thesis to Prove |
|---|---|---|
| 1. Single-Operator Network | Complete | A distributed worker network can reliably execute bounded ML experiments and produce useful, verified research insights. |
| 2. Trusted External Contributors | Next | Strangers can safely contribute compute with verifiable results and earned reputation. |
| 3. Research Platform | Planned | External researchers get genuine value from submitting briefs to the network. |
| 4. Economic Layer | Planned | A sustainable incentive model drives long-term participation without regulatory overhead. |
Phase 2: Trusted External Contributors Next
The trust layer is the existential requirement. Without it, adoption stops. These priorities harden the platform for external workers before scaling the researcher side.
Result Verification
First-class k-of-n redundancy: randomly re-run a subset of tasks on trusted workers, compare metrics within tolerance, and flag anomalies. This is the minimum viable trust layer for a semi-trusted worker network.
Workload Sandboxing
The autoresearch contract already constrains the attack surface — workers run a pre-approved script with JSON config, not arbitrary code. Harden this with containerised or WASM-based execution, no arbitrary filesystem or network access, and deterministic runtime constraints.
Worker Reputation
Track worker reliability over time: task completion rate, result consistency across redundant runs, uptime history. Reputation scores inform task scheduling priority and eligibility for higher-value workloads.
Cross-Platform Workers
Extend the Electron app’s auto-setup beyond Windows. Add platform-specific Python distribution management for macOS and Linux. Signed binaries, reproducible runtimes, and transparent security model in the documentation.
Phase 3: Research Platform Planned
Once external workers can contribute safely, open the platform to external researchers. The goal is to prove that the network produces genuine research value — not just compute, but knowledge.
Workload Admission and Policy Engine
External briefs must pass through a policy engine: constrained job formats, pre-approved execution templates, resource and time-budget caps. The workload model must be strict enough to prevent abuse (cryptomining, data exfiltration, fingerprinting) while flexible enough to support diverse research questions. Public datasets only in the initial release.
Bayesian Optimisation
Replace random search with Bayesian optimisation. Use the accumulated experiment history to inform the next batch of hyperparameter configurations. Focus exploration on promising regions of the search space rather than sweeping blindly.
Multiple Experiment Types
Extend beyond character-level LMs. Add experiment templates for image classification (CIFAR-10), reinforcement learning (CartPole), and other self-contained benchmarks. Each template follows the same autoresearch contract: bounded budget, single comparable metric, deterministic seeds.
Regional Orchestrator Nodes
Deploy local orchestrator nodes in each geographic region. Regional nodes prioritise low-latency clients, handle worker-to-task matching within their zone, and synchronise results with the central server. Reduces cross-region data transfer and improves scheduling responsiveness.
Intelligent Orchestration
The orchestrator consults the global experiment corpus before designing new campaigns. Historical results inform which hyperparameter regions to explore, which architectures to prioritise, and how to allocate compute across tasks. Each campaign builds on the accumulated knowledge of every previous campaign.
Phase 4: Economic Layer Planned
Introduce only after the research platform proves its value. The economic layer must be legally sound and operationally justified before any monetisation.
Credits Marketplace
Workers earn credits for completed, verified tasks. Researchers spend credits to submit campaigns. The economics agent manages pricing based on supply and demand, detects fraud, and ensures fair distribution. Credits reflect actual compute contributed: GPU time is worth more than CPU time.
Monetisation
Large research projects and enterprise labs can purchase worker time for high-throughput experimentation. The acquisition thesis is not “replace datacentres” — it is “massively accelerate the experiment loop that informs what to train in the datacentres.” Every major lab runs thousands of small experiments before committing to a large training run. That pre-cluster experimentation phase is the sweet spot.
Central Ledger with Cryptographic Audit
A centralised append-only ledger with cryptographic task receipts, worker attestations, and audit trails. Provides 95% of the trust guarantees of a distributed blockchain with a fraction of the complexity. Blockchain becomes relevant only if the platform requires trustless settlement between parties who do not trust a central operator.
Federated Research Programmes
Multiple researchers contribute to shared long-term research goals (e.g., “find the optimal small transformer architecture for character-level language modelling”). Programmes coordinate campaigns across research groups and accumulate results into shared knowledge.
Agent-Driven Code Modification
Following the autoresearch pattern more deeply: agents propose modifications to training scripts based on experiment results. Given a series of outcomes, the orchestrator suggests architectural changes, new regularisation techniques, or training procedure modifications and generates the code to test them.
Long-Term: The Global Experiment Corpus
The real moat is not the compute network. It is the experiment database.
Every completed task produces a verified (seed, config, metric) tuple. Over time, that accumulates into a massive structured dataset of “what works in ML.” That corpus cannot be replicated by simply spinning up more GPUs — it represents institutional knowledge at network scale.
Imagine a dataset that answers questions like:
- Across 40 million experiments, which optimiser schedules consistently outperform others for small transformer models?
- How does context length interact with model depth across dozens of hardware tiers?
- What architectural patterns produce the best token efficiency under strict compute budgets?
Most research knowledge today is fragmented across papers, private lab notebooks, and unpublished results. The failures, near-misses, and surprising parameter combinations that drive real scientific progress mostly vanish into internal systems. XeeNet captures that negative space.
The ultimate vision:
- Millions of volunteer devices run bounded ML experiments continuously
- Research programmes self-direct based on accumulated results and lessons
- A global experiment corpus captures insights across all experiments — successes and failures alike
- Researchers launch campaigns that build on the accumulated knowledge of every previous campaign
- A sustainable credits economy incentivises long-term participation
- Results are publicly available, advancing open ML research
SETI@home proved that volunteers will donate compute for science. At its peak, it had over 5 million participants providing 27 PetaFLOPS. XeeNet applies the same model to ML research: instead of searching for extraterrestrial signals, we search for optimal neural network architectures — using workloads that are embarrassingly parallel, naturally bounded, and immediately verifiable. The compute requirements are similar; the scientific payoff is immediate and measurable.
Risks and Constraints
These are the dragons in the cave. Each must be addressed deliberately as the platform grows through the phases above.
| Risk | Phase | Mitigation |
|---|---|---|
| Trust and sandboxing | 2 | Constrained job formats, pre-approved templates, containerised execution, no arbitrary filesystem or network access. The autoresearch contract is itself the primary sandbox. |
| Result integrity | 2 | k-of-n redundant execution, cross-worker metric comparison within tolerance, anomaly detection. First-class verification, not an afterthought. |
| Malicious workloads | 3 | Workload admission policy engine, deterministic runtime constraints, no arbitrary internet access from tasks, pre-approved execution templates only. |
| Dataset privacy | 3 | Public datasets only in early phases. Proprietary data support requires differential privacy, secure enclaves, and data governance review. |
| Cold start problem | 2–3 | Do not build a marketplace first. Build a working network: seed worker fleet, own workloads, prove throughput, then invite external workers, then external researchers. |
| Consumer trust optics | 2 | Signed binaries, reproducible runtimes, transparent code, public security review, explicit resource controls, hard caps on power / time / bandwidth. |
| Regulatory exposure | 4 | If credits become redeemable or cash-equivalent, financial regulation applies. Start with non-cash reputation scoring; add grants, prizes, or sponsorships before direct financial settlement. Professional legal review before any tokenisation. |
| Hardware heterogeneity | 2–3 | Resource profiles as first-class scheduling inputs, task parameterisation by device tier, regional orchestrators for latency-aware matching. |
| Network economics | 3 | Bounded experiments minimise data transfer by design. Small configs in, single metric out. Datasets cached locally on workers, not streamed per task. |