Building in public ยท Phase 1 in progress

Roadmap

Real-time status of every EULLM component, the milestones we're hitting, and a full history of every release shipped.

Component status

Platform overview

Production-Ready

EULLM Engine

v0.5.3

Rust inference runtime. Drop-in Ollama replacement with OpenAI-compatible API and embedded chat UI on localhost:11435.

Progress88%

259 tok/s

Throughput

264K

Max context

โœ“ tested

Windows

In Development

EULLM Forge

Model verticalization pipeline. Components ready, end-to-end CLI integration in progress.

Progress42%

30Bโ†’7B

Size cut

GGUF

Export

Beta

Pipeline

Preview

EULLM Hub

EU-hosted model registry with AI Act compliance cards. Operational as prototype.

Progress25%

Prototype

Models

3 planned

Sectors

EU only

Hosting

TurboQuant context proof โ€” v0.5.3

Qwen3-8B Q4_K_M ยท NVIDIA RTX 5070 Ti 16 GB ยท Windows x64 ยท multi-turn 6 rounds

75 tok/s

132K ctx throughput

~10 GB VRAM, sweet spot

77 tok/s

264K ctx (TQ3_0)

~12.6 GB VRAM, 3 GB headroom

37 GB

F16 KV equivalent

impossible on consumer GPU

6 rounds

Multi-turn stability

70โ€“77 tok/s steady-state

Development phases

What we're building

01Current

Phase 01 โ€” Foundation

Q1 2026

Core inference engine reaches production quality. Forge pipeline components built. Hub operational as prototype.

13/15 items87%
  • Engine: prebuilt binaries (Linux x64, Windows x64)
  • Continuous batching โ€” 259 tok/s
  • TurboQuant โ€” 264K context on 16 GB GPU (TQ3_0)
  • OpenAI-compatible + Ollama drop-in API
  • GPU: CUDA, ROCm, Vulkan, Metal
  • EU AI Act built-in audit logging
  • Transparent web browsing (no function-call overhead)
  • Interactive REPL: /temp, /maxtokens, /system
  • Web tool calling
  • Embedded chat UI โ€” localhost:11435, ~29 KB in binary
  • One-click Windows installers (CPU / CUDA / CUDA+TQ)
  • eullm -V backend variant display
  • Forge: structural pruning + knowledge distillation
  • Forge end-to-end pipeline CLI
  • Demo model: legal-it-7b
02Planned

Phase 02 โ€” Ecosystem

Q2 2026

First production-ready Hub models go live. Forge stable CLI. Platform support expanded.

1/8 items13%
  • Hub: Legal sector model (EU/Italian law)
  • Hub: Medical triage support model
  • Hub: Finance & KYC compliance model
  • AI Act compliance cards for all Hub models
  • Forge: stable CLI + full documentation
  • Windows x64 support
  • Multi-GPU inference
  • Quantization wizard for consumer hardware
03Future

Phase 03 โ€” Enterprise

H2 2026

Enterprise-grade hardening: distributed inference, access control, Forge Studio visual UI.

0/7 items0%
  • Multi-node distributed inference
  • Kubernetes operator
  • SSO / RBAC access control
  • Forge Studio โ€” visual fine-tuning UI
  • Model versioning & rollback in Hub
  • Certified EU data center partnerships
  • SLA support tiers
Changelog

Release history

v0.5.3Latest31 May 2026
  • Embedded chat UI on localhost:11435 โ€” ~29 KB in binary, zero CDN or external dependencies
  • One-click Windows installers: CPU, CUDA, CUDA+TurboQuant (per-user, no UAC)
  • TurboQuant proof: 264K context at 77 tok/s on RTX 5070 Ti 16 GB (TQ3_0)
  • eullm -V now shows backend variant: CPU / CUDA / CUDA+TurboQuant / Metal
v0.5.231 May 2026
  • Benchmark axis shifted: TQ4_0 vs Q4_0 at equal bit-width (the real TurboQuant claim)
  • Zenodo DOI badge โ€” concept DOI auto-updates on every new release
  • Honest platform labels: Linux x64 + Windows tested ยท macOS & ARM64 experimental
v0.5.130 May 2026
  • Zenodo DOI citation section added to README
  • Engine roadmap fit flag implementation
  • Windows CUDA installer fixes
v0.4.427 May 2026
  • Web tool calling โ€” transparent URL fetching in conversation
  • Legal-IT dataset preparation module
  • GPU layer fitting improvements
v0.4.38 Apr 2026
  • Drop-in Ollama replacement with continuous batching
  • TurboQuant KV cache โ€” 131K context on 16 GB GPU
  • Transparent web browsing without function-call overhead
  • EU AI Act audit logging built-in
v0.3.136 Apr 2026
  • Interactive REPL: /temp, /maxtokens, /system commands
  • TurboQuant quality/accuracy automatic recommendations
v0.3.105 Apr 2026
  • TurboQuant math accuracy improvements
  • 1% accuracy loss isolated to matrix operations only
v0.3.53 Apr 2026
  • Default context window increased to 2 048 tokens
  • Math accuracy benchmarking suite added
v0.3.31 Apr 2026
  • Mixed KV cache type support
v0.3.230 Mar 2026
  • Bug fixes
  • Documentation updates
v0.2.9829 Mar 2026
  • Batch scheduler refinements
  • Build pipeline stabilization

Shape the roadmap

Open an issue, vote on features, or contribute code. EULLM is built in the open and every voice counts.