6e19adab87
Speaker-ID-Modul (Hermes-Style „echtes Gespraech ohne Wake-Word"-Vision, Phase 1 von 5). Erkennt Stefans Stimme via 192-dim Embedding + Cosine- Match gegen einen persistierten Fingerprint. Module: - speaker_id.py: lazy-loaded ECAPA-TDNN (HuggingFace), enroll/verify/ status/delete. Fingerprint = L2-normalisierter Mittelwert aus N Enrollment-Samples in /voice-id/fingerprint.json. Fail-open: kein Fingerprint → verify() returnt (True, 0.0). - bridge.py: 3 Message-Handler — voice_id_status_request, voice_id_enroll_request (samples[]: base64 16kHz int16 PCM), voice_id_delete_request. Enrollment laeuft im Executor (Torch blockt sonst die Event-Loop). - Dockerfile: torch 2.3.1 + torchaudio mit CUDA-12.1-Wheels (sonst zieht speechbrain CPU-only Torch rein). Container ~1 GB groesser. - docker-compose.yml: ./voice-id:/voice-id Bind-Mount fuer Fingerprint- Persistenz (ueberlebt Container-Restart). - rvs/server.js: 6 neue Message-Types in ALLOWED_TYPES. Phase 2 (next): App-Enrollment-Flow + Diagnostic-Voice-ID-Section. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
23 lines
649 B
Docker
23 lines
649 B
Docker
FROM nvidia/cuda:12.2.2-cudnn8-runtime-ubuntu22.04
|
|
|
|
ENV DEBIAN_FRONTEND=noninteractive
|
|
ENV PYTHONUNBUFFERED=1
|
|
|
|
RUN apt-get update && apt-get install -y --no-install-recommends \
|
|
python3 python3-pip ffmpeg git \
|
|
&& rm -rf /var/lib/apt/lists/*
|
|
|
|
WORKDIR /app
|
|
|
|
# PyTorch CUDA-Wheels zuerst (sonst zieht speechbrain CPU-only Torch rein
|
|
# falls f5tts den Cache noch nicht geseedet hat).
|
|
RUN pip3 install --no-cache-dir torch==2.3.1 torchaudio==2.3.1 \
|
|
--index-url https://download.pytorch.org/whl/cu121
|
|
|
|
COPY requirements.txt .
|
|
RUN pip3 install --no-cache-dir -r requirements.txt
|
|
|
|
COPY bridge.py speaker_id.py ./
|
|
|
|
CMD ["python3", "bridge.py"]
|