Roadmap
Real-time status of every EULLM component, the milestones we're hitting, and a full history of every release shipped.
Platform overview
EULLM Engine
v0.6.2Rust inference runtime. Multimodal vision + audio, drop-in Ollama replacement with OpenAI-compatible API and embedded chat UI on localhost:11435.
259 tok/s
Throughput
Vision+Audio
Multimodal
β tested
Windows
EULLM Forge
Model verticalization pipeline. Components ready, end-to-end CLI integration in progress.
30Bβ7B
Size cut
GGUF
Export
Beta
Pipeline
EULLM Hub
EU-hosted model registry with AI Act compliance cards. Operational as prototype.
Prototype
Models
3 planned
Sectors
EU only
Hosting
Engine capabilities β v0.6.2
Rust runtime Β· continuous batching Β· multimodal vision + audio Β· fully local on consumer GPUs
259 tok/s
Throughput
16 concurrent requests
Vision+Audio
Multimodal
OCR, scene, transcription
~2-4Γ
Quantized KV
context, Q4_0/Q5/Q8
--web
Web browsing
model-agnostic, any GGUF
What we're building
Phase 01 β Foundation
Q1 2026
Core inference engine reaches production quality. Forge pipeline components built. Hub operational as prototype.
- Engine: standalone binaries (Linux x64, Windows x64)
- Multimodal vision + audio (Gemma 4)
- Continuous batching β 259 tok/s
- Quantized KV cache β Q4_0/Q5/Q8 (~2-4Γ context)
- OpenAI-compatible + Ollama drop-in API
- GPU: CUDA (tested), ROCm, Vulkan, Metal
- EU AI Act built-in audit logging
- Transparent web browsing (--web, model-agnostic)
- Interactive REPL: /temp, /maxtokens, /system
- Embedded chat UI β localhost:11435, ~29 KB in binary
- Forge: structural pruning + knowledge distillation
- Forge end-to-end pipeline CLI
- Demo model: legal-it-7b
Phase 02 β Ecosystem
Q2 2026
First production-ready Hub models go live. Forge stable CLI. Platform support expanded.
- Hub: Legal sector model (EU/Italian law)
- Hub: Medical triage support model
- Hub: Finance & KYC compliance model
- AI Act compliance cards for all Hub models
- Forge: stable CLI + full documentation
- Windows x64 support
- Multi-GPU inference
- Quantization wizard for consumer hardware
Phase 03 β Enterprise
H2 2026
Enterprise-grade hardening: distributed inference, access control, Forge Studio visual UI.
- Multi-node distributed inference
- Kubernetes operator
- SSO / RBAC access control
- Forge Studio β visual fine-tuning UI
- Model versioning & rollback in Hub
- Certified EU data center partnerships
- SLA support tiers
Release history
- Multimodal in the Chat UI β drop in an image or audio clip, fully local
- Vision + audio understanding stable (Gemma 4): OCR, scene description, transcription
- BOS token handling fix for multimodal prompts
- Multimodal vision launched β image OCR and scene description on consumer GPUs
- Audio understanding (experimental, CLI) β transcription and in-content search
- Runs fully local, zero telemetry
- Math expression rendering in the Chat UI
- Quantized KV cache β Q4_0/Q5/Q8 for ~2-4Γ context on the same GPU
- Embedded chat UI on localhost:11435 β ~29 KB in binary, zero CDN or external dependencies
- eullm -V now shows the active backend variant
- Standalone Windows binaries: CPU and CUDA
- Web tool calling β transparent URL fetching in conversation
- Legal-IT dataset preparation module
- GPU layer fitting improvements
- Drop-in Ollama replacement with continuous batching
- Quantized KV cache for larger context on 16 GB GPUs
- Transparent web browsing without function-call overhead
- EU AI Act audit logging built-in
- Interactive REPL: /temp, /maxtokens, /system commands
- Quantized KV cache quality/accuracy automatic recommendations
- Quantized KV cache math accuracy improvements
- 1% accuracy loss isolated to matrix operations only
- Default context window increased to 2 048 tokens
- Math accuracy benchmarking suite added
- Mixed KV cache type support
- Bug fixes
- Documentation updates
- Batch scheduler refinements
- Build pipeline stabilization
Shape the roadmap
Open an issue, vote on features, or contribute code. EULLM is built in the open and every voice counts.
