Öffentliche Entwicklung · Phase 1 läuft

Roadmap

Echtzeit-Status jeder EULLM-Komponente, die Meilensteine, die wir erreichen, und eine vollständige Geschichte jeder veröffentlichten Version.

GitHub-Releases Quellcode ansehen

Komponentenstatus

Plattformübersicht

Produktionsreif

EULLM Engine

v0.6.2

Rust-Inferenz-Laufzeit. Multimodal Vision + Audio, direkter Ollama-Ersatz mit OpenAI-kompatibler API und integrierter Chat-Oberfläche auf localhost:11435.

Fortschritt88%

259 tok/s

Throughput

Vision+Audio

Multimodal

✓ getestet

Windows

In Entwicklung

EULLM Forge

Modell-Vertikalisierungspipeline. Komponenten fertig, End-to-End-CLI-Integration in Arbeit.

Fortschritt42%

30B→7B

Größenreduktion

GGUF

Export

Beta

Pipeline

Vorschau

EULLM Hub

In der EU gehostetes Modellregister mit AI Act-Konformitätskarten. Als Prototyp in Betrieb.

Fortschritt25%

Prototyp

Modelle

3 geplant

Sektoren

Nur EU

Hosting

Engine-Fähigkeiten — v0.6.2

Rust-Laufzeit · continuous batching · multimodal Vision + Audio · vollständig lokal auf Consumer-GPUs

259 tok/s

Throughput

16 gleichzeitige Anfragen

Vision+Audio

Multimodal

OCR, Szenen, Transkription

~2-4×

Quantized KV

Kontext, Q4_0/Q5/Q8

--web

Web-Browsing

model-agnostic, jedes GGUF

Entwicklungsphasen

Was wir aufbauen

01Aktuell

Phase 01 — Fundament

Q1 2026

Der Inferenz-Engine erreicht Produktionsqualität. Forge-Pipeline-Komponenten entwickelt. Hub als Prototyp in Betrieb.

11/13 Punkte85%

Engine: Standalone-Binärdateien (Linux x64, Windows x64)
Multimodal Vision + Audio (Gemma 4)
Continuous batching — 259 Tok/s
Quantized KV cache — Q4_0/Q5/Q8 (~2-4× Kontext)
OpenAI-compatible + Ollama Drop-in API
GPU: CUDA (getestet), ROCm, Vulkan, Metal
Integriertes EU AI Act-Audit-Logging
Transparentes Web-Browsing (--web, model-agnostic)
Interaktives REPL: /temp, /maxtokens, /system
Integrierte Chat-Oberfläche — localhost:11435, ~29 KB im Binärformat
Forge: structural pruning + knowledge distillation
Forge: End-to-End-Pipeline CLI
Demo-Modell: legal-it-7b

02Geplant

Phase 02 — Ökosystem

Q2 2026

Die ersten produktionsbereiten Hub-Modelle gehen live. Stabile Forge-CLI. Erweiterte Plattformunterstützung.

1/8 Punkte13%

Hub: Modell für den Rechtssektor (EU/italienisches Recht)
Hub: Modell zur Unterstützung der medizinischen Triage
Hub: Finanz- und KYC-Compliance-Modell
AI Act-Konformitätskarten für alle Hub-Modelle
Forge: stabile CLI + vollständige Dokumentation
Windows x64-Unterstützung
Multi-GPU-Inferenz
Quantisierungsassistent für Consumer-Hardware

03Zukünftig

Phase 03 — Enterprise

H2 2026

Enterprise-Härtung: verteilte Inferenz, Zugriffskontrolle, Forge Studio visuelle Oberfläche.

0/7 Punkte0%

Multi-Node verteilte Inferenz
Kubernetes-Operator
SSO / RBAC access control
Forge Studio — visuelle Oberfläche für das fine-tuning
Modellversionierung und Rollback im Hub
Zertifizierte EU-Rechenzentrumspartnerschaften
SLA-Support-Stufen

Changelog

Versionshistorie

v0.6.2Neueste9 Jun 2026

Multimodal in the Chat UI — drop in an image or audio clip, fully local
Vision + audio understanding stable (Gemma 4): OCR, scene description, transcription
BOS token handling fix for multimodal prompts

v0.6.07 Jun 2026

Multimodal vision launched — image OCR and scene description on consumer GPUs
Audio understanding (experimental, CLI) — transcription and in-content search
Runs fully local, zero telemetry

v0.5.206 Jun 2026

Math expression rendering in the Chat UI
Quantized KV cache — Q4_0/Q5/Q8 for ~2-4× context on the same GPU

v0.5.331 May 2026

Embedded chat UI on localhost:11435 — ~29 KB in binary, zero CDN or external dependencies
eullm -V now shows the active backend variant
Standalone Windows binaries: CPU and CUDA

v0.4.427 May 2026

Web tool calling — transparent URL fetching in conversation
Legal-IT dataset preparation module
GPU layer fitting improvements

v0.4.38 Apr 2026

Drop-in Ollama replacement with continuous batching
Quantized KV cache for larger context on 16 GB GPUs
Transparent web browsing without function-call overhead
EU AI Act audit logging built-in

v0.3.136 Apr 2026

Interactive REPL: /temp, /maxtokens, /system commands
Quantized KV cache quality/accuracy automatic recommendations

v0.3.105 Apr 2026

Quantized KV cache math accuracy improvements
1% accuracy loss isolated to matrix operations only

v0.3.53 Apr 2026

Default context window increased to 2 048 tokens
Math accuracy benchmarking suite added

v0.3.31 Apr 2026

Mixed KV cache type support

v0.3.230 Mar 2026

Bug fixes
Documentation updates

v0.2.9829 Mar 2026

Batch scheduler refinements
Build pipeline stabilization

Alle Releases auf GitHub ansehen →

Die Roadmap mitgestalten

Öffnen Sie ein Issue, stimmen Sie über Funktionen ab oder tragen Sie Code bei. EULLM wird öffentlich entwickelt und jede Stimme zählt.

Issue öffnen An der Diskussion teilnehmen