fix(f5tts): NaN/Inf in Modell-Output sauber abfangen vor int16-Cast

F5-TTS generiert gelegentlich NaN/Inf samples — ohne sanitize lief der
int16-Cast in undefined behavior (RuntimeWarning + kaputter Sound in den
entsprechenden Stellen). Jetzt: nan_to_num vor clip, plus Warnung wie
viele samples betroffen waren.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
duffyduck 2026-04-24 19:35:57 +02:00
parent 5ba89c7191
commit 958c8d6fc6
1 changed files with 9 additions and 1 deletions

View File

@ -268,7 +268,15 @@ def split_sentences(text: str, max_len: int = 350) -> list[str]:
def float_to_pcm16(wav: np.ndarray) -> bytes:
"""Float32 (-1..+1) → int16 little-endian bytes."""
"""Float32 (-1..+1) → int16 little-endian bytes.
F5-TTS generiert gelegentlich NaN/Inf bei Instabilitaeten ohne sanitize
waere der Cast zu int16 undefiniert (RuntimeWarning + kaputter Sound).
"""
nan_count = int(np.isnan(wav).sum() + np.isinf(wav).sum())
if nan_count > 0:
logger.warning("F5-TTS Output enthaelt %d NaN/Inf samples — ersetze mit 0", nan_count)
wav = np.nan_to_num(wav, nan=0.0, posinf=1.0, neginf=-1.0)
wav = np.clip(wav, -1.0, 1.0)
pcm = (wav * 32767.0).astype(np.int16)
return pcm.tobytes()