Roadmap
Real-time status of every EULLM component, the milestones we're hitting, and a full history of every release shipped.
Platform overview
EULLM Engine
v0.4.3Rust inference runtime. Drop-in Ollama replacement with OpenAI-compatible API.
259 tok/s
Throughput
2.5×
vs Ollama
131K
Max context
EULLM Forge
Model verticalization pipeline. Components ready, end-to-end CLI integration in progress.
30B→7B
Size cut
GGUF
Export
Beta
Pipeline
EULLM Hub
EU-hosted model registry with AI Act compliance cards. Operational as prototype.
Prototype
Models
3 planned
Domains
EU only
Hosting
Engine benchmark — v0.4.3 vs Ollama
16 concurrent requests · Mistral 7B · NVIDIA RTX 4090 · Linux x64
259 tok/s
EULLM throughput
continuous batching
102 tok/s
Ollama throughput
sequential baseline
9.3 s
EULLM latency
final response
23.6 s
Ollama latency
final response
What we're building
Phase 01 — Foundation
Q1 2026
Core inference engine reaches production quality. Forge pipeline components built. Hub operational as prototype.
- Engine: prebuilt binaries (Linux, macOS x64/ARM)
- Continuous batching — 259 tok/s (2.5× vs Ollama)
- TurboQuant KV cache — 131K context on 16 GB GPU
- OpenAI-compatible + Ollama drop-in API
- GPU: CUDA, ROCm, Vulkan, Metal
- EU AI Act built-in audit logging
- Transparent web browsing (no function-call overhead)
- Interactive REPL: /temp, /maxtokens, /system
- Forge: structural pruning + knowledge distillation
- Forge end-to-end pipeline CLI
- Demo model: legal-it-7b
Phase 02 — Ecosystem
Q2 2026
First production-ready Hub models go live. Forge stable CLI. Platform support expanded.
- Hub: Legal domain model (EU/Italian law)
- Hub: Medical triage support model
- Hub: Finance & KYC compliance model
- AI Act compliance cards for all Hub models
- Forge: stable CLI + full documentation
- Windows x64 support
- Multi-GPU inference
- Quantization wizard for consumer hardware
Phase 03 — Enterprise
H2 2026
Enterprise-grade hardening: distributed inference, access control, Forge Studio visual UI.
- Multi-node distributed inference
- Kubernetes operator
- SSO / RBAC access control
- Forge Studio — visual fine-tuning UI
- Model versioning & rollback in Hub
- Certified EU data center partnerships
- SLA support tiers
Release history
- Drop-in Ollama replacement with continuous batching
- TurboQuant KV cache — 131K context on 16 GB GPU
- Transparent web browsing without function-call overhead
- EU AI Act audit logging built-in
- Interactive REPL: /temp, /maxtokens, /system commands
- TurboQuant quality/accuracy automatic recommendations
- TurboQuant math accuracy improvements
- 1% accuracy loss isolated to matrix operations only
- Default context window increased to 2 048 tokens
- Math accuracy benchmarking suite added
- Mixed KV cache type support
- Bug fixes
- Documentation updates
- Batch scheduler refinements
- Build pipeline stabilization
Shape the roadmap
Open an issue, vote on features, or contribute code. EULLM is built in the open and every voice counts.
