feat(speaker-id): Phase 1 — SpeechBrain ECAPA-TDNN Backend in whisper-bridge
Speaker-ID-Modul (Hermes-Style „echtes Gespraech ohne Wake-Word"-Vision, Phase 1 von 5). Erkennt Stefans Stimme via 192-dim Embedding + Cosine- Match gegen einen persistierten Fingerprint. Module: - speaker_id.py: lazy-loaded ECAPA-TDNN (HuggingFace), enroll/verify/ status/delete. Fingerprint = L2-normalisierter Mittelwert aus N Enrollment-Samples in /voice-id/fingerprint.json. Fail-open: kein Fingerprint → verify() returnt (True, 0.0). - bridge.py: 3 Message-Handler — voice_id_status_request, voice_id_enroll_request (samples[]: base64 16kHz int16 PCM), voice_id_delete_request. Enrollment laeuft im Executor (Torch blockt sonst die Event-Loop). - Dockerfile: torch 2.3.1 + torchaudio mit CUDA-12.1-Wheels (sonst zieht speechbrain CPU-only Torch rein). Container ~1 GB groesser. - docker-compose.yml: ./voice-id:/voice-id Bind-Mount fuer Fingerprint- Persistenz (ueberlebt Container-Restart). - rvs/server.js: 6 neue Message-Types in ALLOWED_TYPES. Phase 2 (next): App-Enrollment-Flow + Diagnostic-Voice-ID-Section. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -33,6 +33,8 @@ import sys
|
||||
import tempfile
|
||||
import time
|
||||
from dataclasses import dataclass, field
|
||||
|
||||
import speaker_id
|
||||
from typing import Optional
|
||||
|
||||
import numpy as np
|
||||
@@ -729,6 +731,52 @@ async def run_loop(runner: WhisperRunner, sessions: SessionManager) -> None:
|
||||
f"received id={req_id[:12]} reason={payload.get('reason', '')}")
|
||||
sessions.end_session(req_id)
|
||||
|
||||
elif mtype == "voice_id_status_request":
|
||||
req_id = payload.get("requestId", "")
|
||||
try:
|
||||
status = speaker_id.status()
|
||||
except Exception as exc:
|
||||
await _send(ws, "voice_id_status_response", {
|
||||
"requestId": req_id, "ok": False, "error": str(exc)[:200],
|
||||
})
|
||||
continue
|
||||
await _send(ws, "voice_id_status_response", {
|
||||
"requestId": req_id, "ok": True, **status,
|
||||
})
|
||||
|
||||
elif mtype == "voice_id_enroll_request":
|
||||
# samples: Liste von base64-kodierten int16-LE-PCM-Buffern,
|
||||
# 16kHz mono, je ~3-5s. App nimmt sie nacheinander auf und
|
||||
# schickt sie zusammen.
|
||||
req_id = payload.get("requestId", "")
|
||||
samples = payload.get("samples") or []
|
||||
logger.info("voice_id_enroll_request: %d Samples (id=%s)",
|
||||
len(samples), req_id[:8])
|
||||
try:
|
||||
result = await asyncio.get_running_loop().run_in_executor(
|
||||
None, speaker_id.enroll_from_samples, samples
|
||||
)
|
||||
except Exception as exc:
|
||||
logger.warning("voice_id_enroll failed: %s", exc)
|
||||
await _send(ws, "voice_id_enroll_response", {
|
||||
"requestId": req_id, "ok": False, "error": str(exc)[:300],
|
||||
})
|
||||
continue
|
||||
await _send(ws, "voice_id_enroll_response", {
|
||||
"requestId": req_id, "ok": True,
|
||||
"sample_count": result.get("sample_count", 0),
|
||||
"rejected": result.get("rejected", []),
|
||||
"updated_at": result.get("updated_at"),
|
||||
"embedding_dim": result.get("embedding_dim"),
|
||||
})
|
||||
|
||||
elif mtype == "voice_id_delete_request":
|
||||
req_id = payload.get("requestId", "")
|
||||
removed = speaker_id.delete_fingerprint()
|
||||
await _send(ws, "voice_id_delete_response", {
|
||||
"requestId": req_id, "ok": True, "removed": removed,
|
||||
})
|
||||
|
||||
elif mtype == "config":
|
||||
# Debug-Toggle: aria-bridge broadcastet jetzt whisperDebugLog
|
||||
# damit Stefan im laufenden Betrieb via Diagnostic-Settings
|
||||
|
||||
Reference in New Issue
Block a user