feat: XTTS auf local-Mode (dauerhaft im VRAM) + /tts_stream + Fallback

Root cause der langen Render-Zeiten und /tts_stream 400-Errors: daswer123 default ist apiManual/api-Mode — Modell wird pro Request gefetched/reloaded, Streaming unsupported. Fix in xtts/docker-compose.yml: command: ['--listen', '-p', '8020', '-t', 'http://0.0.0.0:8020', '-ms', 'local', '-o', '/app/output', '-mf', '/app/xtts_models', '-sf', '/voices'] -ms local: - Modell dauerhaft im GPU-VRAM (~2GB, passt auf RTX 3060 mit 12GB) - Render startet sofort, kein per-Request-Load mehr - /tts_stream unterstuetzt → echtes progressive streaming - time-to-first-audio ~500ms statt 8-11s xtts/bridge.js: /tts_stream primary, /tts_to_audio/ als Fallback wenn Stream fehlt. Robust: wenn User spaeter den Mode wieder umstellt, fallback greift. Erste Ladung nach dem Wechsel dauert einmalig laenger (Modell ins VRAM laden). Danach: schnell + streaming. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 17:38:53 +02:00
parent 647a1cb726
commit 4cbe184faa
2 changed files with 27 additions and 10 deletions
@@ -33,6 +33,15 @@ services:
      - ./voices:/voices                        # Custom Voice Samples
    environment:
      - COQUI_TOS_AGREED=1
+    # Local-Modus: Modell bleibt dauerhaft im GPU-VRAM (~2GB). Vorteile:
+    #   - Render startet sofort (kein reload pro Request)
+    #   - /tts_stream funktioniert → echtes Streaming mit ~500ms time-to-first-audio
+    # Ohne diesen command: apiManual-Modus, jede Anfrage lädt Modell neu, kein Streaming.
+    command: ["--listen", "-p", "8020", "-t", "http://0.0.0.0:8020",
+              "-ms", "local",
+              "-o", "/app/output",
+              "-mf", "/app/xtts_models",
+              "-sf", "/voices"]
    restart: unless-stopped

  # ─── XTTS Bridge (verbindet zu RVS) ───────────