Building in public Β· Phase 1 in progress

Roadmap

Real-time status of every EULLM component, the milestones we're hitting, and a full history of every release shipped.

Component status

Platform overview

Production-Ready

EULLM Engine

v0.6.2

Rust inference runtime. Multimodal vision + audio, drop-in Ollama replacement with OpenAI-compatible API and embedded chat UI on localhost:11435.

Progress88%

259 tok/s

Throughput

Vision+Audio

Multimodal

βœ“ tested

Windows

In Development

EULLM Forge

Model verticalization pipeline. Components ready, end-to-end CLI integration in progress.

Progress42%

30B→7B

Size cut

GGUF

Export

Beta

Pipeline

Preview

EULLM Hub

EU-hosted model registry with AI Act compliance cards. Operational as prototype.

Progress25%

Prototype

Models

3 planned

Sectors

EU only

Hosting

Engine capabilities β€” v0.6.2

Rust runtime Β· continuous batching Β· multimodal vision + audio Β· fully local on consumer GPUs

259 tok/s

Throughput

16 concurrent requests

Vision+Audio

Multimodal

OCR, scene, transcription

~2-4Γ—

Quantized KV

context, Q4_0/Q5/Q8

--web

Web browsing

model-agnostic, any GGUF

Development phases

What we're building

01Current

Phase 01 β€” Foundation

Q1 2026

Core inference engine reaches production quality. Forge pipeline components built. Hub operational as prototype.

11/13 items85%
  • Engine: standalone binaries (Linux x64, Windows x64)
  • Multimodal vision + audio (Gemma 4)
  • Continuous batching β€” 259 tok/s
  • Quantized KV cache β€” Q4_0/Q5/Q8 (~2-4Γ— context)
  • OpenAI-compatible + Ollama drop-in API
  • GPU: CUDA (tested), ROCm, Vulkan, Metal
  • EU AI Act built-in audit logging
  • Transparent web browsing (--web, model-agnostic)
  • Interactive REPL: /temp, /maxtokens, /system
  • Embedded chat UI β€” localhost:11435, ~29 KB in binary
  • Forge: structural pruning + knowledge distillation
  • Forge end-to-end pipeline CLI
  • Demo model: legal-it-7b
02Planned

Phase 02 β€” Ecosystem

Q2 2026

First production-ready Hub models go live. Forge stable CLI. Platform support expanded.

1/8 items13%
  • Hub: Legal sector model (EU/Italian law)
  • Hub: Medical triage support model
  • Hub: Finance & KYC compliance model
  • AI Act compliance cards for all Hub models
  • Forge: stable CLI + full documentation
  • Windows x64 support
  • Multi-GPU inference
  • Quantization wizard for consumer hardware
03Future

Phase 03 β€” Enterprise

H2 2026

Enterprise-grade hardening: distributed inference, access control, Forge Studio visual UI.

0/7 items0%
  • Multi-node distributed inference
  • Kubernetes operator
  • SSO / RBAC access control
  • Forge Studio β€” visual fine-tuning UI
  • Model versioning & rollback in Hub
  • Certified EU data center partnerships
  • SLA support tiers
Changelog

Release history

v0.6.2Latest9 Jun 2026
  • Multimodal in the Chat UI β€” drop in an image or audio clip, fully local
  • Vision + audio understanding stable (Gemma 4): OCR, scene description, transcription
  • BOS token handling fix for multimodal prompts
v0.6.07 Jun 2026
  • Multimodal vision launched β€” image OCR and scene description on consumer GPUs
  • Audio understanding (experimental, CLI) β€” transcription and in-content search
  • Runs fully local, zero telemetry
v0.5.206 Jun 2026
  • Math expression rendering in the Chat UI
  • Quantized KV cache β€” Q4_0/Q5/Q8 for ~2-4Γ— context on the same GPU
v0.5.331 May 2026
  • Embedded chat UI on localhost:11435 β€” ~29 KB in binary, zero CDN or external dependencies
  • eullm -V now shows the active backend variant
  • Standalone Windows binaries: CPU and CUDA
v0.4.427 May 2026
  • Web tool calling β€” transparent URL fetching in conversation
  • Legal-IT dataset preparation module
  • GPU layer fitting improvements
v0.4.38 Apr 2026
  • Drop-in Ollama replacement with continuous batching
  • Quantized KV cache for larger context on 16 GB GPUs
  • Transparent web browsing without function-call overhead
  • EU AI Act audit logging built-in
v0.3.136 Apr 2026
  • Interactive REPL: /temp, /maxtokens, /system commands
  • Quantized KV cache quality/accuracy automatic recommendations
v0.3.105 Apr 2026
  • Quantized KV cache math accuracy improvements
  • 1% accuracy loss isolated to matrix operations only
v0.3.53 Apr 2026
  • Default context window increased to 2 048 tokens
  • Math accuracy benchmarking suite added
v0.3.31 Apr 2026
  • Mixed KV cache type support
v0.3.230 Mar 2026
  • Bug fixes
  • Documentation updates
v0.2.9829 Mar 2026
  • Batch scheduler refinements
  • Build pipeline stabilization

Shape the roadmap

Open an issue, vote on features, or contribute code. EULLM is built in the open and every voice counts.