feat: Streaming TTS — PCM-Stream statt WAV-Chunks (Weg A)
Pipeline: XTTS-Server → xtts-bridge → aria-bridge → RVS → App AudioTrack XTTS-Bridge (Gaming-PC): - streamXTTSAsPCM(): liest /tts_to_audio/ Response inkrementell, parst WAV-Header (samplerate/channels), teilt PCM in 8KB-Chunks (~170ms bei 24kHz s16 mono) und sendet jeden als audio_pcm. - Finaler Chunk mit final=true nach letztem Text-Chunk aria-bridge: - audio_pcm Handler leitet payload 1:1 weiter, filled messageId aus requestId → messageId Map falls XTTS-Bridge messageId nicht hatte - Alter xtts_response Pfad bleibt als Legacy-Fallback (WAV) RVS: audio_pcm in ALLOWED_TYPES Android Native: - PcmStreamPlayerModule (Kotlin): AudioTrack MODE_STREAM mit Writer-Thread und BlockingQueue. start(rate, ch) / writeChunk(b64) / end() / stop() - 8x MinBufferSize grosszuegig dimensioniert, glatt auch bei Netz-Aussetzern - Registered im MainApplication via PcmStreamPlayerPackage App JS: - audioService.handlePcmChunk(): erkennt neue Session (messageId-Wechsel), started nativen Stream, cached PCM-Bytes pro Message. Bei final=true Stream sauber schliessen + _savePcmBufferAsWav → WAV-File im tts_cache/<messageId>.wav - _savePcmBufferAsWav: baut 44-byte WAV-Header (PCM s16le, korrekte samplerate/channels), haengt alle gesammelten base64-PCM-Chunks an - stopPlayback beendet auch aktiven PCM-Stream - ChatScreen routet type=audio_pcm an handlePcmChunk, bei final setzt audioPath in der Message Play-Button: falls messageId einen audioPath hat → WAV aus Cache (Sound-basiert), egal ob Original-TTS Piper oder XTTS war. Audio-Focus: - requestDuck() beim Stream-Start, release() bei Stream-Ende - Andere Apps (Spotify etc.) werden leiser waehrend ARIA spricht Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+25
-3
@@ -1296,19 +1296,41 @@ class ARIABridge:
|
||||
await self._emit_activity("idle", "")
|
||||
return
|
||||
|
||||
elif msg_type == "audio_pcm":
|
||||
# XTTS-PCM-Stream vom Gaming-PC empfangen → durchleiten zur App.
|
||||
# Wenn in payload kein messageId (alte XTTS-Bridge), aus requestId auflösen.
|
||||
error = payload.get("error", "")
|
||||
if error:
|
||||
logger.warning("[rvs] XTTS PCM-Fehler: %s", error)
|
||||
return
|
||||
linked_message_id = payload.get("messageId", "")
|
||||
if not linked_message_id:
|
||||
req_id_full = payload.get("requestId", "")
|
||||
req_id_base = req_id_full.rsplit("_", 1)[0] if "_" in req_id_full else req_id_full
|
||||
linked_message_id = self._xtts_request_to_message.get(req_id_base, "")
|
||||
# Einfach 1:1 weiterleiten mit eingefuellter messageId
|
||||
forwarded = dict(payload)
|
||||
forwarded["messageId"] = linked_message_id
|
||||
await self._send_to_rvs({
|
||||
"type": "audio_pcm",
|
||||
"payload": forwarded,
|
||||
"timestamp": int(asyncio.get_event_loop().time() * 1000),
|
||||
})
|
||||
return
|
||||
|
||||
elif msg_type == "xtts_response":
|
||||
# XTTS-Audio vom Gaming-PC empfangen → an App weiterleiten
|
||||
# Legacy-Pfad (alte XTTS-Bridge mit WAV-Response). Weiterleiten als
|
||||
# type "audio" — App nutzt den bestehenden WAV-Queue-Spieler.
|
||||
audio_b64 = payload.get("base64", "")
|
||||
error = payload.get("error", "")
|
||||
req_id_full = payload.get("requestId", "")
|
||||
# XTTS-Bridge suffixt chunkweise: "uuid_0", "uuid_1" → Basis-UUID extrahieren
|
||||
req_id_base = req_id_full.rsplit("_", 1)[0] if "_" in req_id_full else req_id_full
|
||||
linked_message_id = self._xtts_request_to_message.get(req_id_base, "")
|
||||
if error:
|
||||
logger.warning("[rvs] XTTS Fehler: %s", error)
|
||||
return
|
||||
if audio_b64:
|
||||
logger.info("[rvs] XTTS-Audio empfangen: %dKB", len(audio_b64) // 1365)
|
||||
logger.info("[rvs] XTTS-Audio legacy empfangen: %dKB", len(audio_b64) // 1365)
|
||||
await self._send_to_rvs({
|
||||
"type": "audio",
|
||||
"payload": {
|
||||
|
||||
Reference in New Issue
Block a user