Building in public · Phase 1 in progress

Roadmap

Real-time status of every EULLM component, the milestones we're hitting, and a full history of every release shipped.

GitHub Releases View source

Component status

Platform overview

Production-Ready

EULLM Engine

v0.4.3

Rust inference runtime. Drop-in Ollama replacement with OpenAI-compatible API.

Progress88%

259 tok/s

Throughput

2.5×

vs Ollama

131K

Max context

In Development

EULLM Forge

Model verticalization pipeline. Components ready, end-to-end CLI integration in progress.

Progress42%

30B→7B

Size cut

GGUF

Export

Beta

Pipeline

Preview

EULLM Hub

EU-hosted model registry with AI Act compliance cards. Operational as prototype.

Progress25%

Prototype

Models

3 planned

Domains

EU only

Hosting

Engine benchmark — v0.4.3 vs Ollama

16 concurrent requests · Mistral 7B · NVIDIA RTX 4090 · Linux x64

259 tok/s

EULLM throughput

continuous batching

102 tok/s

Ollama throughput

sequential baseline

9.3 s

EULLM latency

final response

23.6 s

Ollama latency

final response

Development phases

What we're building

01Current

Phase 01 — Foundation

Q1 2026

Core inference engine reaches production quality. Forge pipeline components built. Hub operational as prototype.

9/11 items82%

Engine: prebuilt binaries (Linux, macOS x64/ARM)
Continuous batching — 259 tok/s (2.5× vs Ollama)
TurboQuant KV cache — 131K context on 16 GB GPU
OpenAI-compatible + Ollama drop-in API
GPU: CUDA, ROCm, Vulkan, Metal
EU AI Act built-in audit logging
Transparent web browsing (no function-call overhead)
Interactive REPL: /temp, /maxtokens, /system
Forge: structural pruning + knowledge distillation
Forge end-to-end pipeline CLI
Demo model: legal-it-7b

02Planned

Phase 02 — Ecosystem

Q2 2026

First production-ready Hub models go live. Forge stable CLI. Platform support expanded.

0/8 items0%

Hub: Legal domain model (EU/Italian law)
Hub: Medical triage support model
Hub: Finance & KYC compliance model
AI Act compliance cards for all Hub models
Forge: stable CLI + full documentation
Windows x64 support
Multi-GPU inference
Quantization wizard for consumer hardware

03Future

Phase 03 — Enterprise

H2 2026

Enterprise-grade hardening: distributed inference, access control, Forge Studio visual UI.

0/7 items0%

Multi-node distributed inference
Kubernetes operator
SSO / RBAC access control
Forge Studio — visual fine-tuning UI
Model versioning & rollback in Hub
Certified EU data center partnerships
SLA support tiers

Changelog

Release history

v0.4.3Latest8 Apr 2026

Drop-in Ollama replacement with continuous batching
TurboQuant KV cache — 131K context on 16 GB GPU
Transparent web browsing without function-call overhead
EU AI Act audit logging built-in

v0.3.136 Apr 2026

Interactive REPL: /temp, /maxtokens, /system commands
TurboQuant quality/accuracy automatic recommendations

v0.3.105 Apr 2026

TurboQuant math accuracy improvements
1% accuracy loss isolated to matrix operations only

v0.3.53 Apr 2026

Default context window increased to 2 048 tokens
Math accuracy benchmarking suite added

v0.3.31 Apr 2026

Mixed KV cache type support

v0.3.230 Mar 2026

Bug fixes
Documentation updates

v0.2.9829 Mar 2026

Batch scheduler refinements
Build pipeline stabilization

View all releases on GitHub →

Shape the roadmap

Open an issue, vote on features, or contribute code. EULLM is built in the open and every voice counts.

Open an issue Join the discussion