Building in public · Phase 1 in progress

Roadmap

Real-time status of every EULLM component, the milestones we're hitting, and a full history of every release shipped.

Component status

Platform overview

Production-Ready

EULLM Engine

v0.4.3

Rust inference runtime. Drop-in Ollama replacement with OpenAI-compatible API.

Progress88%

259 tok/s

Throughput

2.5×

vs Ollama

131K

Max context

In Development

EULLM Forge

Model verticalization pipeline. Components ready, end-to-end CLI integration in progress.

Progress42%

30B→7B

Size cut

GGUF

Export

Beta

Pipeline

Preview

EULLM Hub

EU-hosted model registry with AI Act compliance cards. Operational as prototype.

Progress25%

Prototype

Models

3 planned

Domains

EU only

Hosting

Engine benchmark — v0.4.3 vs Ollama

16 concurrent requests · Mistral 7B · NVIDIA RTX 4090 · Linux x64

259 tok/s

EULLM throughput

continuous batching

102 tok/s

Ollama throughput

sequential baseline

9.3 s

EULLM latency

final response

23.6 s

Ollama latency

final response

Development phases

What we're building

01Current

Phase 01Foundation

Q1 2026

Core inference engine reaches production quality. Forge pipeline components built. Hub operational as prototype.

9/11 items82%
  • Engine: prebuilt binaries (Linux, macOS x64/ARM)
  • Continuous batching — 259 tok/s (2.5× vs Ollama)
  • TurboQuant KV cache — 131K context on 16 GB GPU
  • OpenAI-compatible + Ollama drop-in API
  • GPU: CUDA, ROCm, Vulkan, Metal
  • EU AI Act built-in audit logging
  • Transparent web browsing (no function-call overhead)
  • Interactive REPL: /temp, /maxtokens, /system
  • Forge: structural pruning + knowledge distillation
  • Forge end-to-end pipeline CLI
  • Demo model: legal-it-7b
02Planned

Phase 02Ecosystem

Q2 2026

First production-ready Hub models go live. Forge stable CLI. Platform support expanded.

0/8 items0%
  • Hub: Legal domain model (EU/Italian law)
  • Hub: Medical triage support model
  • Hub: Finance & KYC compliance model
  • AI Act compliance cards for all Hub models
  • Forge: stable CLI + full documentation
  • Windows x64 support
  • Multi-GPU inference
  • Quantization wizard for consumer hardware
03Future

Phase 03Enterprise

H2 2026

Enterprise-grade hardening: distributed inference, access control, Forge Studio visual UI.

0/7 items0%
  • Multi-node distributed inference
  • Kubernetes operator
  • SSO / RBAC access control
  • Forge Studio — visual fine-tuning UI
  • Model versioning & rollback in Hub
  • Certified EU data center partnerships
  • SLA support tiers
Changelog

Release history

v0.4.3Latest8 Apr 2026
  • Drop-in Ollama replacement with continuous batching
  • TurboQuant KV cache — 131K context on 16 GB GPU
  • Transparent web browsing without function-call overhead
  • EU AI Act audit logging built-in
v0.3.136 Apr 2026
  • Interactive REPL: /temp, /maxtokens, /system commands
  • TurboQuant quality/accuracy automatic recommendations
v0.3.105 Apr 2026
  • TurboQuant math accuracy improvements
  • 1% accuracy loss isolated to matrix operations only
v0.3.53 Apr 2026
  • Default context window increased to 2 048 tokens
  • Math accuracy benchmarking suite added
v0.3.31 Apr 2026
  • Mixed KV cache type support
v0.3.230 Mar 2026
  • Bug fixes
  • Documentation updates
v0.2.9829 Mar 2026
  • Batch scheduler refinements
  • Build pipeline stabilization

Shape the roadmap

Open an issue, vote on features, or contribute code. EULLM is built in the open and every voice counts.