Compare commits

..

15 Commits

Author SHA1 Message Date
duffyduck a7eb3cf433 release: bump version to 0.0.2.6 2026-04-10 02:11:04 +02:00
duffyduck e4e0e793a8 fix: Audio queue for sequential TTS playback (no overlap/skip)
- Audio packets queued instead of stopping previous
- _playNext() plays sequentially, each sentence after the previous
- stopPlayback() clears queue
- Fixes overlapping/skipping with XTTS sentence-by-sentence rendering

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 02:09:35 +02:00
duffyduck b3d3b8b6bc fix: XTTS bridge splits text into sentences sequentially
- XTTS-Bridge does sentence splitting (not ARIA-Bridge)
- Sequential rendering: correct order guaranteed
- Each sentence sent as separate xtts_response
- Markdown removal before splitting
- App starts playback after first sentence (faster UX)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 02:03:29 +02:00
duffyduck 06bc456221 fix: XTTS splits long text into sentences before sending (WebSocket size limit)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 01:56:25 +02:00
duffyduck 3461f45207 docs: update README with XTTS v2 setup details, voice cloning guide
- Architecture diagram for XTTS flow (Gaming-PC ↔ RVS ↔ ARIA-VM)
- Port 8020 (not 8000), token must match, model caching
- Voice cloning step-by-step guide
- TTS engine switching (Piper/XTTS) with fallback
- Known limitation: RVS zombie connections

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 01:49:08 +02:00
duffyduck a17d4acc13 fix: XTTS bridge shares /voices volume with XTTS server
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 01:40:41 +02:00
duffyduck 62fd9193a1 fix: XTTS voice dropdown shows saved voice after page reload
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 01:34:00 +02:00
duffyduck 2329645df4 fix: XTTS voices list + upload use fresh RVS connection with response wait
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 01:24:55 +02:00
duffyduck 8a435ddf6c fix: voice upload uses send() via server, not client-side sendToRVS_raw
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 01:15:29 +02:00
duffyduck 25b754ba31 fix: voice upload Base64 conversion (chunked, no stack overflow)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 01:08:32 +02:00
duffyduck b734593bf2 fix: Bridge _send_to_rvs ping-check before send, force reconnect on zombie
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 00:37:22 +02:00
duffyduck 16847ce6f7 fix: TTS toggle global above engine selector, health check /docs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 00:27:55 +02:00
duffyduck 6300829317 fix: XTTS model cache volume path /app/xtts_models
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 23:44:29 +02:00
duffyduck a1e1ee31bd fix: XTTS bridge port 8020, longer startup wait
- XTTS API runs on port 8020 (not 8000)
- Bridge waits up to 5min for model download (30x10s)
- Health check uses / instead of /docs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 23:39:45 +02:00
duffyduck 7ed70b876d updated image public path 2026-04-07 23:06:26 +02:00
10 changed files with 209 additions and 73 deletions
+40 -8
View File
@@ -530,38 +530,68 @@ cp ARIA-v0.0.3.0.apk ~/ARIA-AGENT/rvs/updates/
## XTTS v2 — GPU TTS Server (optional)
Laeuft auf einem separaten Rechner mit NVIDIA GPU (z.B. Gaming-PC mit RTX 3060).
Verbindet sich ueber RVS mit der ARIA-Infrastruktur — kein VPN noetig.
Verbindet sich ueber RVS mit der ARIA-Infrastruktur — kein VPN noetig, funktioniert
ueber verschiedene Netze hinweg.
### Architektur
```
Gaming-PC (Windows, RTX 3060, Docker Desktop + WSL2)
├── aria-xtts XTTS v2 GPU Server (Port 8020 intern)
└── aria-xtts-bridge RVS-Relay (empfaengt Requests, sendet Audio)
└── Beide teilen ./voices/ Volume fuer Voice Cloning
↕ RVS (Rechenzentrum, WebSocket Relay)
ARIA-VM
└── aria-bridge: tts_engine="xtts" → xtts_request via RVS → wartet auf xtts_response
```
### Voraussetzungen
- Docker Desktop mit WSL2 (Windows) oder Docker mit NVIDIA Runtime (Linux)
- NVIDIA Container Toolkit
- GPU mit mindestens 4GB VRAM (6GB+ empfohlen)
- **Gleicher RVS_TOKEN wie auf der ARIA-VM!**
### Setup
```bash
cd xtts
cp .env.example .env
# .env mit RVS-Verbindungsdaten fuellen (gleiche wie auf der ARIA-VM)
# .env mit RVS-Verbindungsdaten fuellen (gleicher Token wie ARIA-VM!)
docker compose up -d
# Erster Start laedt ~2GB Model herunter
# Erster Start laedt ~2GB Model herunter (danach gecacht)
```
**Wichtig:** Der XTTS-Server laeuft intern auf Port **8020** (nicht 8000).
Das Model wird im Volume `xtts-models` gecacht und muss nur einmal geladen werden.
### Features
- **Natuerliche Stimmen**: Deutlich bessere Qualitaet als Piper
- **Voice Cloning**: Eigene Stimme mit 6-10s Audio-Sample
- **Voice Cloning**: Eigene Stimme mit 6-10s Audio-Sample (~2s Latenz auf RTX 3060)
- **16 Sprachen**: Deutsch, Englisch, Franzoesisch, etc.
- **RVS-Integration**: Bridge waehlt automatisch XTTS wenn verfuegbar
- **Fallback**: Wenn XTTS nicht erreichbar, nutzt die Bridge automatisch Piper
### TTS-Engine umschalten
In der Diagnostic unter Einstellungen → Sprachausgabe:
- **TTS aktiv**: Global An/Aus
- **TTS Engine**: Piper (lokal, CPU, schnell) oder XTTS v2 (remote, GPU, natuerlich)
- **Piper**: Standard-Stimme, Highlight-Stimme, Speed pro Stimme
- **XTTS**: Stimmen-Auswahl, Voice Cloning
### Stimme klonen
In der Diagnostic unter Einstellungen → Sprachausgabe → XTTS:
1. TTS Engine auf "XTTS v2" stellen
2. "Stimme klonen" → Audio-Dateien hochladen (WAV/MP3, min. 6-10s)
2. "Stimme klonen" → Audio-Dateien hochladen (WAV/MP3, 1-10 Dateien, min. 6-10s gesamt)
3. Name vergeben → "Stimme erstellen"
4. Neue Stimme in der Auswahl verfuegbar
4. "Laden" klicken → neue Stimme in der Auswahl
5. Stimme auswaehlen → Config wird automatisch gespeichert
> **Tipp:** Fuer beste Ergebnisse: saubere Aufnahme, eine Stimme, kein Hintergrund,
> 10-30 Sekunden Gesamtlaenge. Mehrere kurze Dateien werden zusammengefuegt.
---
@@ -633,6 +663,8 @@ docker exec aria-core ssh aria-wohnung hostname
- **Wake Word nur auf VM**: Die Bridge hoert auf "ARIA" ueber das lokale Mikrofon der VM.
In der App gibt es Energy-basierte Erkennung (Phase 1). On-device "ARIA"-Keyword (Porcupine) ist Phase 2.
- **Audio-Format**: App nimmt AAC/MP4 auf, Bridge konvertiert via FFmpeg zu 16kHz PCM.
- **RVS Zombie-Connections**: WebSocket-Verbindungen sterben gelegentlich ohne Fehlermeldung.
Bridge hat Ping-Check (5s), Diagnostic nutzt frische Verbindungen pro Request.
- **Bildanalyse eingeschraenkt**: Bilder werden in `/shared/uploads/` gespeichert. ARIA kann
sie per Bash/Read-Tool oeffnen, aber Claude Vision (direkte Bildanalyse) ist ueber den
Proxy-Pfad (`claude --print`) noch nicht moeglich. ARIA sieht den Dateipfad, nicht das Bild.
+2 -2
View File
@@ -79,8 +79,8 @@ android {
applicationId "com.ariacockpit"
minSdkVersion rootProject.ext.minSdkVersion
targetSdkVersion rootProject.ext.targetSdkVersion
versionCode 205
versionName "0.0.2.5"
versionCode 206
versionName "0.0.2.6"
// Fallback fuer Libraries mit Product Flavors
missingDimensionStrategy 'react-native-camera', 'general'
}
+1 -1
View File
@@ -1,6 +1,6 @@
{
"name": "aria-cockpit",
"version": "0.0.2.5",
"version": "0.0.2.6",
"private": true,
"scripts": {
"android": "react-native run-android",
+1 -1
View File
@@ -748,7 +748,7 @@ const SettingsScreen: React.FC = () => {
<Text style={styles.sectionTitle}>{'\u00DC'}ber</Text>
<View style={styles.card}>
<Text style={styles.aboutTitle}>ARIA Cockpit</Text>
<Text style={styles.aboutVersion}>Version 0.0.2.5 </Text>
<Text style={styles.aboutVersion}>Version 0.0.2.6 </Text>
<Text style={styles.aboutInfo}>
Stefans Kommandozentrale f{'\u00FC'}r ARIA.{'\n'}
Gebaut mit React Native + TypeScript.
+27 -5
View File
@@ -55,6 +55,10 @@ class AudioService {
private recorder: AudioRecorderPlayer;
private recordingPath: string = '';
// Audio-Queue fuer sequentielle TTS-Wiedergabe
private audioQueue: string[] = [];
private isPlaying: boolean = false;
// VAD State
private vadEnabled: boolean = false;
private lastSpeechTime: number = 0;
@@ -198,15 +202,27 @@ class AudioService {
// --- Wiedergabe ---
/** Base64-kodiertes Audio abspielen (z.B. TTS-Antwort von ARIA) */
/** Base64-kodiertes Audio in die Queue stellen und abspielen */
async playAudio(base64Data: string): Promise<void> {
if (!base64Data) return;
// Laufende Wiedergabe stoppen
this.stopPlayback();
this.audioQueue.push(base64Data);
if (!this.isPlaying) {
this._playNext();
}
}
/** Naechstes Audio aus der Queue abspielen */
private async _playNext(): Promise<void> {
if (this.audioQueue.length === 0) {
this.isPlaying = false;
return;
}
this.isPlaying = true;
const base64Data = this.audioQueue.shift()!;
try {
// Base64 -> temporaere WAV-Datei -> Sound abspielen
const tmpPath = `${RNFS.CachesDirectoryPath}/aria_tts_${Date.now()}.wav`;
await RNFS.writeFile(tmpPath, base64Data, 'base64');
@@ -214,6 +230,7 @@ class AudioService {
if (error) {
console.error('[Audio] Fehler beim Laden:', error);
RNFS.unlink(tmpPath).catch(() => {});
this._playNext();
return;
}
this.currentSound?.play((success) => {
@@ -225,15 +242,20 @@ class AudioService {
this.currentSound?.release();
this.currentSound = null;
RNFS.unlink(tmpPath).catch(() => {});
// Naechstes Audio abspielen
this._playNext();
});
});
} catch (err) {
console.error('[Audio] Wiedergabefehler:', err);
this._playNext();
}
}
/** Laufende Wiedergabe stoppen */
/** Laufende Wiedergabe stoppen + Queue leeren */
stopPlayback(): void {
this.audioQueue = [];
this.isPlaying = false;
if (this.currentSound) {
this.currentSound.stop();
this.currentSound.release();
+16 -2
View File
@@ -851,7 +851,7 @@ class ARIABridge:
tts_engine = getattr(self, 'tts_engine_type', 'piper')
if tts_engine == "xtts":
# XTTS: Request ueber RVS an Gaming-PC senden
# XTTS: Ganzen Text senden, XTTS-Bridge teilt satzweise auf
xtts_voice = getattr(self, 'xtts_voice', '')
try:
await self._send_to_rvs({
@@ -1354,10 +1354,24 @@ class ARIABridge:
pass
async def _send_to_rvs(self, message: dict) -> None:
"""Sendet eine Nachricht an die App (via RVS)."""
"""Sendet eine Nachricht an die App (via RVS) mit Verbindungs-Check."""
if self.ws_rvs is None:
return
# Ping-Check: Verbindung wirklich aktiv?
try:
pong = await self.ws_rvs.ping()
await asyncio.wait_for(pong, timeout=5)
except Exception:
logger.warning("[rvs] Ping fehlgeschlagen — Verbindung tot, erzwinge Reconnect")
try:
await self.ws_rvs.close()
except Exception:
pass
self.ws_rvs = None
# Reconnect wird vom connect_to_rvs Loop uebernommen
return
try:
await self.ws_rvs.send(json.dumps(message))
except Exception:
+48 -22
View File
@@ -401,6 +401,12 @@
<div class="settings-section">
<h2>Sprachausgabe</h2>
<div class="card" style="max-width:500px;">
<!-- TTS aktiv (global fuer alle Engines) -->
<div style="display:flex;align-items:center;gap:12px;margin-bottom:12px;">
<label style="color:#8888AA;font-size:12px;">TTS aktiv:</label>
<label class="toggle"><input type="checkbox" id="diag-tts-enabled" checked onchange="sendVoiceConfig()"><span class="slider"></span></label>
</div>
<!-- TTS Engine Auswahl -->
<div style="display:flex;align-items:center;gap:12px;margin-bottom:12px;">
<label style="color:#8888AA;font-size:12px;">TTS Engine:</label>
@@ -426,10 +432,6 @@
<option value="ramona">Ramona (weiblich)</option>
</select>
</div>
<div style="display:flex;align-items:center;gap:12px;margin-bottom:12px;">
<label style="color:#8888AA;font-size:12px;">TTS aktiv:</label>
<label class="toggle"><input type="checkbox" id="diag-tts-enabled" checked onchange="sendVoiceConfig()"><span class="slider"></span></label>
</div>
<div style="margin-bottom:4px;">
<label style="color:#8888AA;font-size:12px;">Ramona Speed: <span id="speed-ramona-label">1.0x</span></label>
</div>
@@ -744,7 +746,16 @@
document.getElementById('diag-speed-thorsten').value = st;
document.getElementById('speed-thorsten-label').textContent = st + 'x';
document.getElementById('diag-tts-engine').value = msg.ttsEngine || 'piper';
document.getElementById('diag-xtts-voice').value = msg.xttsVoice || '';
// XTTS-Voice setzen — Option hinzufuegen falls nicht vorhanden
const xttsSelect = document.getElementById('diag-xtts-voice');
const xttsVoice = msg.xttsVoice || '';
if (xttsVoice && !Array.from(xttsSelect.options).some(o => o.value === xttsVoice)) {
const opt = document.createElement('option');
opt.value = xttsVoice;
opt.textContent = xttsVoice;
xttsSelect.appendChild(opt);
}
xttsSelect.value = xttsVoice;
toggleXTTSPanel();
return;
}
@@ -1247,7 +1258,16 @@
}
function loadXTTSVoices() {
sendToRVS_raw({ type: 'xtts_list_voices', payload: {}, timestamp: Date.now() });
send({ action: 'xtts_list_voices' });
}
function arrayBufferToBase64(buffer) {
const bytes = new Uint8Array(buffer);
let binary = '';
for (let i = 0; i < bytes.length; i += 8192) {
binary += String.fromCharCode.apply(null, bytes.subarray(i, i + 8192));
}
return btoa(binary);
}
async function uploadVoiceSamples() {
@@ -1255,25 +1275,31 @@
const files = document.getElementById('xtts-clone-files').files;
if (!name) { alert('Bitte einen Namen eingeben'); return; }
if (!files || files.length === 0) { alert('Bitte Audio-Dateien auswaehlen'); return; }
if (files.length > 10) { alert('Maximal 10 Dateien'); return; }
document.getElementById('xtts-clone-status').textContent = `Lade ${files.length} Datei(en) hoch...`;
const status = document.getElementById('xtts-clone-status');
status.textContent = `Lade ${files.length} Datei(en)...`;
status.style.color = '#FFD60A';
const samples = [];
for (const file of files) {
const buffer = await file.arrayBuffer();
const base64 = btoa(String.fromCharCode(...new Uint8Array(buffer)));
samples.push({ base64, name: file.name, size: file.size });
try {
const samples = [];
for (let i = 0; i < files.length; i++) {
status.textContent = `Lese Datei ${i + 1}/${files.length}: ${files[i].name}...`;
const buffer = await files[i].arrayBuffer();
const base64 = arrayBufferToBase64(buffer);
samples.push({ base64, name: files[i].name, size: files[i].size });
}
const totalSize = samples.reduce((s, f) => s + f.size, 0);
status.textContent = `Sende ${samples.length} Sample(s) (${(totalSize / 1024).toFixed(0)}KB)...`;
send({ action: 'voice_upload', name, samples });
status.textContent = `Gesendet — warte auf Bestaetigung vom XTTS-Server...`;
} catch (err) {
status.textContent = `Fehler: ${err.message}`;
status.style.color = '#FF3B30';
}
const totalSize = samples.reduce((s, f) => s + f.size, 0);
document.getElementById('xtts-clone-status').textContent =
`Sende ${samples.length} Sample(s) (${(totalSize / 1024).toFixed(0)}KB) an XTTS-Server...`;
sendToRVS_raw({
type: 'voice_upload',
payload: { name, samples },
timestamp: Date.now(),
});
}
// ── Abbrechen ──────────────────────────────
+32
View File
@@ -560,6 +560,31 @@ function connectRVS(forcePlain) {
});
}
function sendToRVS_withResponse(sendType, sendPayload, expectType, clientWs) {
if (!RVS_HOST || !RVS_TOKEN) return;
const proto = RVS_TLS === "true" ? "wss" : "ws";
const url = `${proto}://${RVS_HOST}:${RVS_PORT}?token=${RVS_TOKEN}`;
const freshWs = new WebSocket(url);
const timeout = setTimeout(() => {
try { freshWs.close(); } catch (_) {}
clientWs.send(JSON.stringify({ type: expectType, payload: { voices: [], error: "Timeout" }, timestamp: Date.now() }));
}, 15000);
freshWs.on("open", () => {
freshWs.send(JSON.stringify({ type: sendType, payload: sendPayload, timestamp: Date.now() }));
});
freshWs.on("message", (raw) => {
try {
const resp = JSON.parse(raw.toString());
if (resp.type === expectType) {
clearTimeout(timeout);
clientWs.send(JSON.stringify(resp));
setTimeout(() => { try { freshWs.close(); } catch (_) {} }, 1000);
}
} catch {}
});
freshWs.on("error", () => {});
}
function sendToRVS_raw(msgObj) {
if (!RVS_HOST || !RVS_TOKEN) return;
const proto = RVS_TLS === "true" ? "wss" : "ws";
@@ -1165,6 +1190,13 @@ wss.on("connection", (ws) => {
if (pipelineActive) pipelineEnd(false, "Vom Benutzer abgebrochen");
broadcast({ type: "agent_activity", activity: "idle" });
dockerExec("aria-core", "openclaw doctor --fix 2>/dev/null || true").catch(() => {});
} else if (msg.action === "voice_upload") {
// Voice-Samples an XTTS-Bridge via RVS weiterleiten, auf Bestätigung warten
log("info", "server", `Voice-Upload '${msg.name}' (${(msg.samples || []).length} Samples) sende an RVS...`);
sendToRVS_withResponse("voice_upload", { name: msg.name, samples: msg.samples }, "xtts_voice_saved", ws);
} else if (msg.action === "xtts_list_voices") {
// Frische Verbindung die auf Antwort wartet
sendToRVS_withResponse("xtts_list_voices", {}, "xtts_voices_list", ws);
} else if (msg.action === "get_voice_config") {
handleGetVoiceConfig(ws);
} else if (msg.action === "send_voice_config") {
+36 -28
View File
@@ -97,39 +97,47 @@ async function handleTTSRequest(payload) {
const { text, voice, requestId, language } = payload;
if (!text) return;
log(`TTS-Request: "${text.slice(0, 60)}..." (voice: ${voice || "default"}, lang: ${language || "de"})`);
// Markdown entfernen
const cleanText = text.replace(/\*\*([^*]+)\*\*/g, "$1").trim();
// Text in Saetze aufteilen (sequentiell rendern fuer korrekte Reihenfolge)
const sentences = cleanText.split(/(?<=[.!?])\s+/).map(s => s.trim()).filter(s => s.length > 0);
if (sentences.length === 0) return;
log(`TTS-Request: "${cleanText.slice(0, 60)}..." (${sentences.length} Saetze, voice: ${voice || "default"}, lang: ${language || "de"})`);
try {
// Voice-Sample Pfad bestimmen
const voiceSample = voice ? path.join(VOICES_DIR, `${voice}.wav`) : null;
const hasCustomVoice = voiceSample && fs.existsSync(voiceSample);
// XTTS API aufrufen
const audioBuffer = await callXTTSAPI(text, language || "de", hasCustomVoice ? voiceSample : null);
// Jeden Satz sequentiell rendern und sofort senden
for (let i = 0; i < sentences.length; i++) {
const sentence = sentences[i];
try {
const audioBuffer = await callXTTSAPI(sentence, language || "de", hasCustomVoice ? voiceSample : null);
if (audioBuffer && audioBuffer.length > 100) {
const base64 = audioBuffer.toString("base64");
log(`TTS fertig: ${audioBuffer.length} bytes (${(audioBuffer.length / 1024).toFixed(0)}KB)`);
if (audioBuffer && audioBuffer.length > 100) {
const base64 = audioBuffer.toString("base64");
log(`TTS [${i + 1}/${sentences.length}]: ${audioBuffer.length} bytes (${(audioBuffer.length / 1024).toFixed(0)}KB) — "${sentence.slice(0, 40)}..."`);
sendToRVS({
type: "xtts_response",
payload: {
requestId: requestId || "",
base64,
mimeType: "audio/wav",
voice: voice || "default",
engine: "xtts",
},
timestamp: Date.now(),
});
} else {
log("TTS: Leeres Audio erhalten");
sendToRVS({
type: "xtts_response",
payload: { requestId, error: "Leeres Audio" },
timestamp: Date.now(),
});
sendToRVS({
type: "xtts_response",
payload: {
requestId: `${requestId || ""}_${i}`,
base64,
mimeType: "audio/wav",
voice: voice || "default",
engine: "xtts",
},
timestamp: Date.now(),
});
}
} catch (sentenceErr) {
log(`TTS [${i + 1}/${sentences.length}] Fehler: ${sentenceErr.message} — ueberspringe`);
}
}
log(`TTS komplett: ${sentences.length} Saetze gerendert`);
} catch (err) {
log(`TTS Fehler: ${err.message}`);
sendToRVS({
@@ -257,12 +265,12 @@ log(`RVS: ${RVS_HOST}:${RVS_PORT}`);
function waitForXTTS(callback, attempts) {
if (attempts <= 0) { log("XTTS API nicht erreichbar — starte trotzdem"); callback(); return; }
http.get(`${XTTS_API_URL}/docs`, (res) => {
log("XTTS API erreichbar");
log(`XTTS API erreichbar (HTTP ${res.statusCode})`);
callback();
}).on("error", () => {
log(`XTTS API noch nicht bereit — warte (${attempts} Versuche uebrig)...`);
setTimeout(() => waitForXTTS(callback, attempts - 1), 5000);
setTimeout(() => waitForXTTS(callback, attempts - 1), 10000); // 10s statt 5s (Model laden dauert)
});
}
waitForXTTS(() => connectRVS(), 24); // Max 2min warten
waitForXTTS(() => connectRVS(), 30); // Max 5min warten
+6 -4
View File
@@ -17,7 +17,7 @@ services:
# ─── XTTS v2 API Server (GPU) ─────────────────
xtts:
image: ghcr.io/daswer123/xtts-api-server:latest
image: daswer123/xtts-api-server:latest
container_name: aria-xtts
deploy:
resources:
@@ -27,9 +27,9 @@ services:
count: 1
capabilities: [gpu]
ports:
- "8000:8000"
- "8000:8020"
volumes:
- xtts-models:/root/.local/share/tts # Model-Cache (~2GB)
- xtts-models:/app/xtts_models # Model-Cache (~2GB)
- ./voices:/voices # Custom Voice Samples
environment:
- COQUI_TOS_AGREED=1
@@ -41,8 +41,10 @@ services:
container_name: aria-xtts-bridge
depends_on:
- xtts
volumes:
- ./voices:/voices # Shared mit XTTS-Server
environment:
- XTTS_API_URL=http://xtts:8000
- XTTS_API_URL=http://xtts:8020
- RVS_HOST=${RVS_HOST}
- RVS_PORT=${RVS_PORT:-443}
- RVS_TLS=${RVS_TLS:-true}