Roadmap
Real-time status of every EULLM component, the milestones we're hitting, and a full history of every release shipped.
Platform overview
EULLM Engine
v0.5.3Rust inference runtime. Drop-in Ollama replacement with OpenAI-compatible API and embedded chat UI on localhost:11435.
259 tok/s
Throughput
264K
Max context
โ tested
Windows
EULLM Forge
Model verticalization pipeline. Components ready, end-to-end CLI integration in progress.
30Bโ7B
Size cut
GGUF
Export
Beta
Pipeline
EULLM Hub
EU-hosted model registry with AI Act compliance cards. Operational as prototype.
Prototype
Models
3 planned
Sectors
EU only
Hosting
TurboQuant context proof โ v0.5.3
Qwen3-8B Q4_K_M ยท NVIDIA RTX 5070 Ti 16 GB ยท Windows x64 ยท multi-turn 6 rounds
75 tok/s
132K ctx throughput
~10 GB VRAM, sweet spot
77 tok/s
264K ctx (TQ3_0)
~12.6 GB VRAM, 3 GB headroom
37 GB
F16 KV equivalent
impossible on consumer GPU
6 rounds
Multi-turn stability
70โ77 tok/s steady-state
What we're building
Phase 01 โ Foundation
Q1 2026
Core inference engine reaches production quality. Forge pipeline components built. Hub operational as prototype.
- Engine: prebuilt binaries (Linux x64, Windows x64)
- Continuous batching โ 259 tok/s
- TurboQuant โ 264K context on 16 GB GPU (TQ3_0)
- OpenAI-compatible + Ollama drop-in API
- GPU: CUDA, ROCm, Vulkan, Metal
- EU AI Act built-in audit logging
- Transparent web browsing (no function-call overhead)
- Interactive REPL: /temp, /maxtokens, /system
- Web tool calling
- Embedded chat UI โ localhost:11435, ~29 KB in binary
- One-click Windows installers (CPU / CUDA / CUDA+TQ)
- eullm -V backend variant display
- Forge: structural pruning + knowledge distillation
- Forge end-to-end pipeline CLI
- Demo model: legal-it-7b
Phase 02 โ Ecosystem
Q2 2026
First production-ready Hub models go live. Forge stable CLI. Platform support expanded.
- Hub: Legal sector model (EU/Italian law)
- Hub: Medical triage support model
- Hub: Finance & KYC compliance model
- AI Act compliance cards for all Hub models
- Forge: stable CLI + full documentation
- Windows x64 support
- Multi-GPU inference
- Quantization wizard for consumer hardware
Phase 03 โ Enterprise
H2 2026
Enterprise-grade hardening: distributed inference, access control, Forge Studio visual UI.
- Multi-node distributed inference
- Kubernetes operator
- SSO / RBAC access control
- Forge Studio โ visual fine-tuning UI
- Model versioning & rollback in Hub
- Certified EU data center partnerships
- SLA support tiers
Release history
- Embedded chat UI on localhost:11435 โ ~29 KB in binary, zero CDN or external dependencies
- One-click Windows installers: CPU, CUDA, CUDA+TurboQuant (per-user, no UAC)
- TurboQuant proof: 264K context at 77 tok/s on RTX 5070 Ti 16 GB (TQ3_0)
- eullm -V now shows backend variant: CPU / CUDA / CUDA+TurboQuant / Metal
- Benchmark axis shifted: TQ4_0 vs Q4_0 at equal bit-width (the real TurboQuant claim)
- Zenodo DOI badge โ concept DOI auto-updates on every new release
- Honest platform labels: Linux x64 + Windows tested ยท macOS & ARM64 experimental
- Zenodo DOI citation section added to README
- Engine roadmap fit flag implementation
- Windows CUDA installer fixes
- Web tool calling โ transparent URL fetching in conversation
- Legal-IT dataset preparation module
- GPU layer fitting improvements
- Drop-in Ollama replacement with continuous batching
- TurboQuant KV cache โ 131K context on 16 GB GPU
- Transparent web browsing without function-call overhead
- EU AI Act audit logging built-in
- Interactive REPL: /temp, /maxtokens, /system commands
- TurboQuant quality/accuracy automatic recommendations
- TurboQuant math accuracy improvements
- 1% accuracy loss isolated to matrix operations only
- Default context window increased to 2 048 tokens
- Math accuracy benchmarking suite added
- Mixed KV cache type support
- Bug fixes
- Documentation updates
- Batch scheduler refinements
- Build pipeline stabilization
Shape the roadmap
Open an issue, vote on features, or contribute code. EULLM is built in the open and every voice counts.
