Compare commits
17 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| a4d3449e3a | |||
| 44d2c6b4fe | |||
| 0309c95aa5 | |||
| 2aa2cc70c9 | |||
| 9d0776c819 | |||
| f031fa159e | |||
| be373466a3 | |||
| bbf9aed3ba | |||
| 745b4a07c0 | |||
| 23ca815cb2 | |||
| cc3fac8142 | |||
| cd89e36ec2 | |||
| f5b4285d15 | |||
| 248e7c9ae4 | |||
| 7058cc8d8d | |||
| 7919489543 | |||
| feac7f2479 |
@@ -380,6 +380,7 @@ API-Endpoint fuer andere Services: `GET http://localhost:3001/api/session`
|
||||
- Text-Chat mit ARIA
|
||||
- **Sprachaufnahme**: Push-to-Talk (halten) oder Tap-to-Talk (tippen, Auto-Stop bei Stille)
|
||||
- **Gespraechsmodus** (Ohr-Button): Nach jeder ARIA-Antwort startet automatisch die Aufnahme — wie ein natuerliches Gespraech hin und her
|
||||
- **Wake-Word** (optional, Picovoice Porcupine on-device): "Jarvis", "Computer" usw. — Mikrofon hoert passiv mit, Konversation startet beim Schluesselwort. Eigene Wake-Words ueber die Picovoice Console moeglich. Ohne API-Key faellt der Ohr-Button auf Direkt-Aufnahme zurueck.
|
||||
- **VAD (Voice Activity Detection)**: Konfigurierbare Stille-Toleranz (1.0–8.0s, Default 2.8s) bevor Auto-Stop greift. Max-Aufnahme 120s.
|
||||
- **Speech Gate**: Aufnahme wird verworfen wenn keine Sprache erkannt
|
||||
- **STT (Speech-to-Text)**: 16kHz mono → Bridge → Gamebox-Whisper (CUDA) → Text im Chat. Fast in Echtzeit.
|
||||
@@ -398,6 +399,49 @@ API-Endpoint fuer andere Services: `GET http://localhost:3001/api/session`
|
||||
- GPS-Position (optional)
|
||||
- QR-Code Scanner fuer Token-Pairing
|
||||
|
||||
### Wake-Word einrichten (Picovoice Porcupine)
|
||||
|
||||
Das Wake-Word laeuft komplett **on-device** in der App — kein Audio verlaesst dein Telefon
|
||||
fuer die Erkennung. Picovoice bietet aktuell einen **7-Tage Free Trial** ohne Kreditkarte
|
||||
und ohne Auto-Renewal an, danach kostenpflichtig (siehe [picovoice.ai/pricing](https://picovoice.ai/pricing)).
|
||||
Wer das Wake-Word ueberspringen will: der Ohr-Button funktioniert auch ohne AccessKey
|
||||
(Direkt-Aufnahme statt passivem Lauschen — siehe unten).
|
||||
|
||||
**1) AccessKey holen** (einmalig, ~2 Minuten):
|
||||
|
||||
1. Auf [console.picovoice.ai](https://console.picovoice.ai) registrieren (Email + Passwort, keine Kreditkarte fuer den Trial).
|
||||
2. Nach dem Login auf dem Dashboard → **AccessKey** kopieren (langer Base64-String).
|
||||
|
||||
**2) AccessKey in der App eintragen:**
|
||||
|
||||
- App → **Einstellungen** → Abschnitt **Wake-Word**
|
||||
- AccessKey einfuegen, **Keyword** auswaehlen (Default: `jarvis`)
|
||||
- Speichern → die App initialisiert Porcupine automatisch
|
||||
|
||||
**Eingebaute Keywords** (sofort verfuegbar, kein Training noetig):
|
||||
`jarvis`, `computer`, `picovoice`, `porcupine`, `bumblebee`, `terminator`,
|
||||
`alexa`, `hey google`, `ok google`, `hey siri`
|
||||
|
||||
**3) Eigenes Wake-Word erstellen** ("ARIA", "Hey Stefan", was du willst):
|
||||
|
||||
1. [console.picovoice.ai](https://console.picovoice.ai) → **Porcupine** → **Train Wake Word**
|
||||
2. Wort eingeben (z.B. `ARIA`), Sprache `German` waehlen, Plattform `Android`
|
||||
3. **Train** druecken — Picovoice trainiert das Modell in ~1–2 Minuten
|
||||
4. Die fertige `.ppn`-Datei runterladen
|
||||
5. *(Custom-Upload in der App ist Phase 2 — aktuell nur eingebaute Keywords.
|
||||
`.ppn`-Dateien koennen schon manuell ins App-Bundle gelegt werden, die UI
|
||||
dafuer kommt mit dem naechsten Diagnostic-Update.)*
|
||||
|
||||
**Bedienung:**
|
||||
- **Ohr-Button (👂)** in der Statusleiste tippen → Wake-Word ist scharf, App hoert passiv mit
|
||||
- Wake-Word sagen → Symbol wechselt auf 🎙️, normale Konversation laeuft
|
||||
- Nach jeder ARIA-Antwort oeffnet sich das Mikro nochmal — Stille → zurueck zu 👂
|
||||
- Erneut tippen → Ohr aus (🔇)
|
||||
|
||||
**Ohne AccessKey:** Der Ohr-Button startet stattdessen die Direkt-Aufnahme (Mikro
|
||||
ist sofort aktiv, kein passives Lauschen). Auch ein gueltiger Modus, nur halt ohne
|
||||
"Hands-free" via Schluesselwort.
|
||||
|
||||
### Ersteinrichtung (Dev-Maschine, einmalig)
|
||||
|
||||
```bash
|
||||
@@ -744,8 +788,9 @@ docker exec aria-core ssh aria-wohnung hostname
|
||||
- **Proxy Cold Start**: Jede Nachricht spawnt einen neuen `claude --print` Prozess.
|
||||
Dadurch ist ARIA langsamer als die direkte Claude CLI. Timeout ist auf 900s (15 Min).
|
||||
- **Kein Streaming zur App**: Die App zeigt erst die fertige Antwort, keine Streaming-Tokens.
|
||||
- **Wake Word nur auf VM**: Die Bridge hoert auf "ARIA" ueber das lokale Mikrofon der VM.
|
||||
In der App gibt es Energy-basierte Erkennung (Phase 1). On-device "ARIA"-Keyword (Porcupine) ist Phase 2.
|
||||
- **Wake-Word in der App nur eingebaute Keywords**: `jarvis`, `computer` etc. funktionieren
|
||||
sofort, eigene Wake-Words (`.ppn` aus der Picovoice Console) muessen aktuell noch manuell
|
||||
ins App-Bundle. Die Upload-UI in Diagnostic ist Phase 2.
|
||||
- **Audio-Format**: App nimmt AAC/MP4 auf, Bridge konvertiert via FFmpeg zu 16kHz PCM.
|
||||
- **RVS Zombie-Connections**: WebSocket-Verbindungen sterben gelegentlich ohne Fehlermeldung.
|
||||
Bridge hat Ping-Check (5s), Diagnostic nutzt frische Verbindungen pro Request.
|
||||
@@ -800,6 +845,7 @@ docker exec aria-core ssh aria-wohnung hostname
|
||||
- [x] Audio-Pause statt Ducking (TRANSIENT statt MAY_DUCK) + release-Timing fix
|
||||
- [x] VAD-Stille-Toleranz und Max-Aufnahme einstellbar (1-8s, 120s)
|
||||
- [x] Disk-Voll Banner in Diagnostic mit copy-baren Cleanup-Befehlen
|
||||
- [x] Porcupine Wake-Word on-device in der App (eingebaute Keywords + State-Icon)
|
||||
|
||||
### Phase 2 — ARIA wird produktiv
|
||||
|
||||
@@ -815,5 +861,5 @@ docker exec aria-core ssh aria-wohnung hostname
|
||||
- [ ] STARFACE Telefonie-Skill
|
||||
- [ ] Desktop Client (Tauri)
|
||||
- [ ] bKVM Remote IT-Support
|
||||
- [ ] Porcupine Wake Word (on-device "ARIA" in der App)
|
||||
- [ ] Custom-`.ppn`-Upload fuer Wake-Word ueber Diagnostic (eigene Trigger-Worte)
|
||||
- [ ] Claude Vision direkt (Bildanalyse ohne Dateipfad-Umweg)
|
||||
|
||||
@@ -79,8 +79,8 @@ android {
|
||||
applicationId "com.ariacockpit"
|
||||
minSdkVersion rootProject.ext.minSdkVersion
|
||||
targetSdkVersion rootProject.ext.targetSdkVersion
|
||||
versionCode 508
|
||||
versionName "0.0.5.8"
|
||||
versionCode 605
|
||||
versionName "0.0.6.5"
|
||||
// Fallback fuer Libraries mit Product Flavors
|
||||
missingDimensionStrategy 'react-native-camera', 'general'
|
||||
}
|
||||
|
||||
@@ -32,11 +32,17 @@ class PcmStreamPlayerModule(reactContext: ReactApplicationContext) : ReactContex
|
||||
private const val TAG = "PcmStreamPlayer"
|
||||
// Fallback wenn JS keinen Wert uebergibt.
|
||||
private const val DEFAULT_PREROLL_SECONDS = 3.5
|
||||
private const val MIN_PREROLL_SECONDS = 0.5
|
||||
// 0.0 = sofortige Wiedergabe — play() direkt beim ersten Chunk.
|
||||
// Macht Sinn fuer F5-TTS weil Render so schnell ist dass ein Puffer
|
||||
// unnoetig ist und bei kurzen Saetzen sogar stoeren kann.
|
||||
private const val MIN_PREROLL_SECONDS = 0.0
|
||||
private const val MAX_PREROLL_SECONDS = 10.0
|
||||
// Stille am Stream-Anfang, damit AudioTrack sauber anfaehrt und die
|
||||
// ersten Samples nicht abgeschnitten werden (XTTS-Warmup + play()-Latenz).
|
||||
private const val LEADING_SILENCE_SECONDS = 0.2
|
||||
private const val LEADING_SILENCE_SECONDS = 0.3
|
||||
// Stille am Ende — puffert das Hardware-Flushen damit die letzten
|
||||
// echten Samples garantiert ausgespielt werden bevor stop() kommt.
|
||||
private const val TRAILING_SILENCE_SECONDS = 0.3
|
||||
}
|
||||
|
||||
override fun getName() = "PcmStreamPlayer"
|
||||
@@ -59,9 +65,12 @@ class PcmStreamPlayerModule(reactContext: ReactApplicationContext) : ReactContex
|
||||
// Alte Session beenden falls vorhanden
|
||||
stopInternal()
|
||||
|
||||
val prerollSec = prerollSeconds
|
||||
.coerceIn(MIN_PREROLL_SECONDS, MAX_PREROLL_SECONDS)
|
||||
.let { if (it.isFinite() && it > 0) it else DEFAULT_PREROLL_SECONDS }
|
||||
// Nur NaN/Inf → Default. 0.0 ist gueltig (= sofortige Wiedergabe).
|
||||
val prerollSec = if (prerollSeconds.isFinite() && prerollSeconds >= 0.0) {
|
||||
prerollSeconds.coerceIn(MIN_PREROLL_SECONDS, MAX_PREROLL_SECONDS)
|
||||
} else {
|
||||
DEFAULT_PREROLL_SECONDS
|
||||
}
|
||||
|
||||
val channelConfig = if (channels == 2) AudioFormat.CHANNEL_OUT_STEREO else AudioFormat.CHANNEL_OUT_MONO
|
||||
val encoding = AudioFormat.ENCODING_PCM_16BIT
|
||||
@@ -103,9 +112,9 @@ class PcmStreamPlayerModule(reactContext: ReactApplicationContext) : ReactContex
|
||||
val t = track ?: return@Thread
|
||||
try {
|
||||
// Leading-Silence in den Buffer — gibt AudioTrack Zeit anzufahren.
|
||||
val silenceBytes = ((sampleRate * channels * 2) * LEADING_SILENCE_SECONDS).toInt() and 0x7FFFFFFE
|
||||
if (silenceBytes > 0) {
|
||||
val silence = ByteArray(silenceBytes)
|
||||
val leadingBytes = ((sampleRate * channels * 2) * LEADING_SILENCE_SECONDS).toInt() and 0x7FFFFFFE
|
||||
if (leadingBytes > 0) {
|
||||
val silence = ByteArray(leadingBytes)
|
||||
var silOff = 0
|
||||
while (silOff < silence.size && !writerShouldStop) {
|
||||
val w = t.write(silence, silOff, silence.size - silOff)
|
||||
@@ -114,8 +123,23 @@ class PcmStreamPlayerModule(reactContext: ReactApplicationContext) : ReactContex
|
||||
}
|
||||
bytesBuffered += silence.size
|
||||
}
|
||||
while (!writerShouldStop) {
|
||||
val data = queue.poll(50, java.util.concurrent.TimeUnit.MILLISECONDS) ?: run {
|
||||
// Bei preroll=0: play() SOFORT nach Leading-Silence aufrufen,
|
||||
// nicht erst bei Ankunft des ersten echten Chunks. Android's
|
||||
// AudioTrack haelt den Play-State und wartet auf neue Samples.
|
||||
// So verschluckt es keine Worte wenn der erste Chunk erst
|
||||
// nach play()-Startup-Latenz eintrifft.
|
||||
if (prerollBytes == 0 && !playbackStarted) {
|
||||
try {
|
||||
t.play()
|
||||
playbackStarted = true
|
||||
Log.i(TAG, "Playback sofort gestartet (preroll=0, ${bytesBuffered}B silence)")
|
||||
} catch (e: Exception) {
|
||||
Log.w(TAG, "play() sofort failed: ${e.message}")
|
||||
}
|
||||
}
|
||||
mainLoop@ while (!writerShouldStop) {
|
||||
val data = queue.poll(50, java.util.concurrent.TimeUnit.MILLISECONDS)
|
||||
if (data == null) {
|
||||
if (endRequested) {
|
||||
// Falls wir vor Pre-Roll enden (kurzer Text): trotzdem abspielen
|
||||
if (!playbackStarted) {
|
||||
@@ -127,10 +151,10 @@ class PcmStreamPlayerModule(reactContext: ReactApplicationContext) : ReactContex
|
||||
Log.w(TAG, "play() fallback failed: ${e.message}")
|
||||
}
|
||||
}
|
||||
return@Thread
|
||||
break@mainLoop
|
||||
}
|
||||
null
|
||||
} ?: continue
|
||||
continue@mainLoop
|
||||
}
|
||||
|
||||
// Pre-Roll Check: play() erst wenn genug gepuffert
|
||||
if (!playbackStarted && bytesBuffered + data.size >= prerollBytes) {
|
||||
@@ -151,6 +175,19 @@ class PcmStreamPlayerModule(reactContext: ReactApplicationContext) : ReactContex
|
||||
}
|
||||
bytesBuffered += data.size
|
||||
}
|
||||
// Trailing-Silence damit die letzten echten Samples garantiert
|
||||
// durch das Hardware-Buffering kommen bevor stop() sie abschneidet
|
||||
val trailingBytes = ((sampleRate * channels * 2) * TRAILING_SILENCE_SECONDS).toInt() and 0x7FFFFFFE
|
||||
if (trailingBytes > 0 && !writerShouldStop) {
|
||||
val silence = ByteArray(trailingBytes)
|
||||
var silOff = 0
|
||||
while (silOff < silence.size && !writerShouldStop) {
|
||||
val w = t.write(silence, silOff, silence.size - silOff)
|
||||
if (w <= 0) break
|
||||
silOff += w
|
||||
}
|
||||
bytesBuffered += silence.size
|
||||
}
|
||||
} catch (e: Exception) {
|
||||
Log.w(TAG, "Writer-Thread Fehler: ${e.message}")
|
||||
} finally {
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "aria-cockpit",
|
||||
"version": "0.0.5.8",
|
||||
"version": "0.0.6.5",
|
||||
"private": true,
|
||||
"scripts": {
|
||||
"android": "react-native run-android",
|
||||
|
||||
@@ -72,13 +72,28 @@ interface Props {
|
||||
const MessageText: React.FC<Props> = ({ text, style }) => {
|
||||
const segments = React.useMemo(() => tokenize(text), [text]);
|
||||
return (
|
||||
<Text style={style} selectable>
|
||||
<Text
|
||||
style={style}
|
||||
selectable
|
||||
// dataDetectorType ist Android-only und macht Phone/URL/Email zusaetzlich
|
||||
// ueber System-Detection klickbar — als Fallback falls unsere Regex-
|
||||
// Tokens nicht passen.
|
||||
dataDetectorType="all"
|
||||
>
|
||||
{segments.map((seg, i) => {
|
||||
if (seg.kind === 'text') {
|
||||
return <Text key={i}>{seg.text}</Text>;
|
||||
return <Text key={i} selectable>{seg.text}</Text>;
|
||||
}
|
||||
return (
|
||||
<Text key={i} style={LINK_STYLE} onPress={() => onPress(seg)}>
|
||||
<Text
|
||||
key={i}
|
||||
selectable
|
||||
style={LINK_STYLE}
|
||||
onPress={() => onPress(seg)}
|
||||
// Long-Press soll an den Parent durch fuer Selection
|
||||
onLongPress={undefined}
|
||||
suppressHighlighting={false}
|
||||
>
|
||||
{seg.text}
|
||||
</Text>
|
||||
);
|
||||
|
||||
@@ -104,6 +104,8 @@ const ChatScreen: React.FC = () => {
|
||||
const [showCameraUpload, setShowCameraUpload] = useState(false);
|
||||
const [gpsEnabled, setGpsEnabled] = useState(false);
|
||||
const [wakeWordActive, setWakeWordActive] = useState(false);
|
||||
// Genauer State (off/armed/conversing) fuer UI-Feedback am Button
|
||||
const [wakeWordState, setWakeWordState] = useState<'off' | 'armed' | 'conversing'>('off');
|
||||
const [fullscreenImage, setFullscreenImage] = useState<string | null>(null);
|
||||
const [searchQuery, setSearchQuery] = useState('');
|
||||
const [searchVisible, setSearchVisible] = useState(false);
|
||||
@@ -154,6 +156,11 @@ const ChatScreen: React.FC = () => {
|
||||
// Wake Word: einmalig laden + Porcupine vorbereiten (wenn Access Key gesetzt)
|
||||
useEffect(() => {
|
||||
wakeWordService.loadFromStorage().catch(() => {});
|
||||
const unsub = wakeWordService.onStateChange((s) => {
|
||||
setWakeWordState(s);
|
||||
setWakeWordActive(s !== 'off');
|
||||
});
|
||||
return () => unsub();
|
||||
}, []);
|
||||
|
||||
// ttsCanPlayRef live aktuell halten — Closure in onMessage unten liest
|
||||
@@ -263,15 +270,22 @@ const ChatScreen: React.FC = () => {
|
||||
if (message.type === 'chat') {
|
||||
const sender = (message.payload.sender as string) || '';
|
||||
|
||||
// STT-Ergebnis: Transkribierten Text in die Sprach-Bubble schreiben
|
||||
// STT-Ergebnis: Transkribierten Text in die Sprach-Bubble schreiben.
|
||||
// WICHTIG: Nur die ERSTE noch unaufgeloeste Aufnahme matchen — sonst
|
||||
// wuerde bei zwei kurz hintereinander gesendeten Audios beide Bubbles
|
||||
// den gleichen Text bekommen (Bug: zweite Antwort ueberschreibt erste).
|
||||
if (sender === 'stt') {
|
||||
const sttText = (message.payload.text as string) || '';
|
||||
if (sttText) {
|
||||
setMessages(prev => prev.map(m =>
|
||||
m.sender === 'user' && m.text.includes('Spracheingabe wird verarbeitet')
|
||||
? { ...m, text: `\uD83C\uDFA4 ${sttText}` }
|
||||
: m
|
||||
));
|
||||
setMessages(prev => {
|
||||
const idx = prev.findIndex(m =>
|
||||
m.sender === 'user' && m.text.includes('Spracheingabe wird verarbeitet')
|
||||
);
|
||||
if (idx < 0) return prev;
|
||||
const next = prev.slice();
|
||||
next[idx] = { ...next[idx], text: `\uD83C\uDFA4 ${sttText}` };
|
||||
return next;
|
||||
});
|
||||
}
|
||||
return;
|
||||
}
|
||||
@@ -572,6 +586,8 @@ const ChatScreen: React.FC = () => {
|
||||
};
|
||||
setMessages(prev => capMessages([...prev, userMsg]));
|
||||
|
||||
console.log('[Chat] sende mit voice=%s speed=%s',
|
||||
localXttsVoiceRef.current || '(default)', ttsSpeedRef.current);
|
||||
// An RVS senden — mit geraetelokaler Voice (Bridge nutzt sie fuer die Antwort)
|
||||
rvs.send('chat', {
|
||||
text,
|
||||
@@ -1000,7 +1016,10 @@ const ChatScreen: React.FC = () => {
|
||||
style={[styles.wakeWordBtn, wakeWordActive && styles.wakeWordBtnActive]}
|
||||
onPress={toggleWakeWord}
|
||||
>
|
||||
<Text style={styles.wakeWordIcon}>{wakeWordActive ? '👂' : '🔇'}</Text>
|
||||
<Text style={styles.wakeWordIcon}>
|
||||
{wakeWordState === 'conversing' ? '🎙️' :
|
||||
wakeWordState === 'armed' ? '👂' : '🔇'}
|
||||
</Text>
|
||||
</TouchableOpacity>
|
||||
</>
|
||||
)}
|
||||
|
||||
@@ -143,7 +143,7 @@ const MAX_RECORDING_MS = 120000;
|
||||
// Pre-Roll: Wie lange Audio im AudioTrack-Buffer liegt bevor play() startet.
|
||||
// Einstellbar via Diagnostic/Settings (Key: aria_tts_preroll_sec).
|
||||
export const TTS_PREROLL_DEFAULT_SEC = 3.5;
|
||||
export const TTS_PREROLL_MIN_SEC = 1.0;
|
||||
export const TTS_PREROLL_MIN_SEC = 0; // 0 = sofort abspielen (F5-TTS ist schnell genug)
|
||||
export const TTS_PREROLL_MAX_SEC = 6.0;
|
||||
export const TTS_PREROLL_STORAGE_KEY = 'aria_tts_preroll_sec';
|
||||
|
||||
@@ -191,6 +191,13 @@ class AudioService {
|
||||
private pcmBytesCollected: number = 0;
|
||||
private readonly PCM_MAX_CACHE_BYTES = 30 * 1024 * 1024; // 30MB
|
||||
|
||||
// AudioFocus wird verzoegert freigegeben — wenn ARIA eine zweite Antwort
|
||||
// direkt hinterherschickt (oder ein neuer Stream startet), bleibt Spotify
|
||||
// pausiert. Ohne diese Verzoegerung springt Spotify im Mikro-Sekunden-Gap
|
||||
// zwischen zwei Streams kurz wieder an.
|
||||
private focusReleaseTimer: ReturnType<typeof setTimeout> | null = null;
|
||||
private readonly FOCUS_RELEASE_DELAY_MS = 800;
|
||||
|
||||
// VAD State
|
||||
private vadEnabled: boolean = false;
|
||||
private lastSpeechTime: number = 0;
|
||||
@@ -205,6 +212,24 @@ class AudioService {
|
||||
this.recorder.setSubscriptionDuration(0.1); // 100ms Metering-Updates
|
||||
}
|
||||
|
||||
/** AudioFocus mit kleiner Verzoegerung freigeben — Spotify/YouTube
|
||||
* springen sonst im Gap zwischen zwei TTS-Streams (oder wenn ARIA
|
||||
* eine zweite Antwort direkt hinterherschickt) kurz wieder an. */
|
||||
private _releaseFocusDeferred(): void {
|
||||
this._cancelDeferredFocusRelease();
|
||||
this.focusReleaseTimer = setTimeout(() => {
|
||||
this.focusReleaseTimer = null;
|
||||
AudioFocus?.release().catch(() => {});
|
||||
}, this.FOCUS_RELEASE_DELAY_MS);
|
||||
}
|
||||
|
||||
private _cancelDeferredFocusRelease(): void {
|
||||
if (this.focusReleaseTimer) {
|
||||
clearTimeout(this.focusReleaseTimer);
|
||||
this.focusReleaseTimer = null;
|
||||
}
|
||||
}
|
||||
|
||||
// --- Berechtigungen ---
|
||||
|
||||
async requestMicrophonePermission(): Promise<boolean> {
|
||||
@@ -305,6 +330,7 @@ class AudioService {
|
||||
this.setState('recording');
|
||||
|
||||
// Andere Apps waehrend der Aufnahme pausieren (Musik, Videos etc.)
|
||||
this._cancelDeferredFocusRelease();
|
||||
AudioFocus?.requestExclusive().catch(() => {});
|
||||
|
||||
// VAD aktivieren — Stille-Dauer aus AsyncStorage (Settings-konfigurierbar).
|
||||
@@ -328,11 +354,12 @@ class AudioService {
|
||||
};
|
||||
if (autoStop) {
|
||||
const vadSilenceMs = await loadVadSilenceMs();
|
||||
console.log('[Audio] VAD-Stille:', vadSilenceMs, 'ms');
|
||||
console.log('[Audio] startRecording: autoStop=true, VAD-Stille=%dms, MAX=%dms',
|
||||
vadSilenceMs, MAX_RECORDING_MS);
|
||||
this.vadTimer = setInterval(() => {
|
||||
const silenceDuration = Date.now() - this.lastSpeechTime;
|
||||
if (silenceDuration >= vadSilenceMs) {
|
||||
fireSilenceOnce(`VAD ${silenceDuration}ms Stille`);
|
||||
fireSilenceOnce(`VAD ${silenceDuration}ms Stille (Schwelle=${vadSilenceMs}ms)`);
|
||||
}
|
||||
}, 200);
|
||||
// Notbremse: Nach MAX_RECORDING_MS zwangsweise stoppen
|
||||
@@ -386,8 +413,9 @@ class AudioService {
|
||||
await this.recorder.stopRecorder();
|
||||
this.recorder.removeRecordBackListener();
|
||||
|
||||
// Audio-Focus freigeben — andere Apps duerfen wieder
|
||||
AudioFocus?.release().catch(() => {});
|
||||
// Audio-Focus verzoegert freigeben — gleich kommt die TTS-Antwort,
|
||||
// im Gap soll Spotify nicht hochkommen.
|
||||
this._releaseFocusDeferred();
|
||||
|
||||
const durationMs = Date.now() - this.recordingStartTime;
|
||||
const hadSpeech = this.speechDetected;
|
||||
@@ -459,7 +487,13 @@ class AudioService {
|
||||
|
||||
/** Einen PCM-Chunk aus einer audio_pcm Nachricht empfangen.
|
||||
* silent=true → nur cachen, nicht abspielen (z.B. wenn TTS geraetelokal gemutet).
|
||||
* Gibt bei final=true den Cache-Pfad zurueck (file://) oder '' wenn nicht gecached. */
|
||||
* Gibt bei final=true den Cache-Pfad zurueck (file://) oder '' wenn nicht gecached.
|
||||
*
|
||||
* Wrapper serialisiert aufeinanderfolgende Chunk-Calls via Promise-Queue —
|
||||
* sonst gabs bei kurzen Streams einen Race: final-Chunk konnte `end()` rufen
|
||||
* BEVOR der vorherige `start()` im Native-Modul fertig war. Der Writer-
|
||||
* Thread sah dann endRequested=true ohne jemals Chunks zu verarbeiten. */
|
||||
private _pcmChunkQueue: Promise<any> = Promise.resolve();
|
||||
async handlePcmChunk(payload: {
|
||||
base64: string;
|
||||
sampleRate?: number;
|
||||
@@ -468,6 +502,24 @@ class AudioService {
|
||||
chunk?: number;
|
||||
final?: boolean;
|
||||
silent?: boolean;
|
||||
}): Promise<string> {
|
||||
const p = this._pcmChunkQueue.then(() => this._handlePcmChunkImpl(payload)).catch(err => {
|
||||
console.warn('[Audio] handlePcmChunk queued err:', err);
|
||||
return '';
|
||||
});
|
||||
// Chain only on the side effect — callers still get the per-call result
|
||||
this._pcmChunkQueue = p;
|
||||
return p;
|
||||
}
|
||||
|
||||
private async _handlePcmChunkImpl(payload: {
|
||||
base64: string;
|
||||
sampleRate?: number;
|
||||
channels?: number;
|
||||
messageId?: string;
|
||||
chunk?: number;
|
||||
final?: boolean;
|
||||
silent?: boolean;
|
||||
}): Promise<string> {
|
||||
const silent = !!payload.silent;
|
||||
if (!silent && !PcmStreamPlayer) {
|
||||
@@ -510,6 +562,7 @@ class AudioService {
|
||||
this.pcmStreamActive = false;
|
||||
return '';
|
||||
}
|
||||
this._cancelDeferredFocusRelease();
|
||||
AudioFocus?.requestDuck().catch(() => {});
|
||||
}
|
||||
}
|
||||
@@ -528,11 +581,12 @@ class AudioService {
|
||||
if (isFinal) {
|
||||
if (!silent) {
|
||||
// end() resolved jetzt erst wenn der native Writer-Thread fertig
|
||||
// ist (alle Samples ausgespielt) — danach erst AudioFocus freigeben,
|
||||
// damit Spotify/YouTube nicht waehrend des Pre-Roll-Ausklangs
|
||||
// wieder aufdrehen.
|
||||
// ist (alle Samples ausgespielt) — danach AudioFocus verzoegert
|
||||
// freigeben, damit Spotify/YouTube nicht im Mikro-Gap zwischen zwei
|
||||
// ARIA-Antworten wieder hochdrehen. Wenn ein neuer Stream innerhalb
|
||||
// FOCUS_RELEASE_DELAY_MS startet, wird das Release abgebrochen.
|
||||
try { await PcmStreamPlayer!.end(); } catch {}
|
||||
AudioFocus?.release().catch(() => {});
|
||||
this._releaseFocusDeferred();
|
||||
}
|
||||
this.pcmStreamActive = false;
|
||||
|
||||
@@ -636,8 +690,9 @@ class AudioService {
|
||||
private async _playNext(): Promise<void> {
|
||||
if (this.audioQueue.length === 0) {
|
||||
this.isPlaying = false;
|
||||
// Audio-Focus abgeben → andere Apps volle Lautstaerke
|
||||
AudioFocus?.release().catch(() => {});
|
||||
// Audio-Focus verzoegert abgeben → wenn gleich noch eine Antwort kommt,
|
||||
// bleibt Spotify pausiert.
|
||||
this._releaseFocusDeferred();
|
||||
// Alle Audio-Teile abgespielt → Listener benachrichtigen
|
||||
this.playbackFinishedListeners.forEach(cb => cb());
|
||||
return;
|
||||
@@ -645,6 +700,7 @@ class AudioService {
|
||||
|
||||
// Beim ersten Playback-Start: andere Apps ducken
|
||||
if (!this.isPlaying) {
|
||||
this._cancelDeferredFocusRelease();
|
||||
AudioFocus?.requestDuck().catch(() => {});
|
||||
}
|
||||
this.isPlaying = true;
|
||||
@@ -730,7 +786,8 @@ class AudioService {
|
||||
this.pcmBytesCollected = 0;
|
||||
this.pcmMessageId = '';
|
||||
}
|
||||
// Audio-Focus freigeben
|
||||
// Audio-Focus sofort freigeben — User hat explizit abgebrochen
|
||||
this._cancelDeferredFocusRelease();
|
||||
AudioFocus?.release().catch(() => {});
|
||||
}
|
||||
|
||||
|
||||
@@ -29,6 +29,11 @@ class UpdateService {
|
||||
private downloading = false;
|
||||
|
||||
constructor() {
|
||||
// Beim Start alte APK-Reste aus dem Cache wegraeumen — wenn diese App
|
||||
// laeuft, sind frueher heruntergeladene APKs entweder schon installiert
|
||||
// oder unvollstaendig gewesen. Spart sonst pro Update 20-30MB auf dem Handy.
|
||||
this.cleanupOldApks().catch(() => {});
|
||||
|
||||
// Auf update_available Nachrichten lauschen
|
||||
rvs.onMessage((msg: RVSMessage) => {
|
||||
if (msg.type === 'update_available' as any) {
|
||||
@@ -45,6 +50,30 @@ class UpdateService {
|
||||
});
|
||||
}
|
||||
|
||||
/** Raeumt alte heruntergeladene APK-Dateien aus dem Cache auf. */
|
||||
private async cleanupOldApks(): Promise<void> {
|
||||
try {
|
||||
const files = await RNFS.readDir(RNFS.CachesDirectoryPath);
|
||||
const apks = files.filter(f => /\.apk$/i.test(f.name));
|
||||
let freed = 0;
|
||||
for (const f of apks) {
|
||||
try {
|
||||
const size = parseInt(f.size as any, 10) || 0;
|
||||
await RNFS.unlink(f.path);
|
||||
freed += size;
|
||||
console.log(`[Update] Alte APK geloescht: ${f.name} (${(size / 1024 / 1024).toFixed(1)}MB)`);
|
||||
} catch (err: any) {
|
||||
console.warn(`[Update] APK-Loeschen fehlgeschlagen: ${f.name} (${err?.message || err})`);
|
||||
}
|
||||
}
|
||||
if (apks.length > 0) {
|
||||
console.log(`[Update] Cleanup fertig: ${apks.length} APKs entfernt, ${(freed / 1024 / 1024).toFixed(1)}MB freigegeben`);
|
||||
}
|
||||
} catch (err: any) {
|
||||
console.warn(`[Update] Cleanup-Fehler: ${err?.message || err}`);
|
||||
}
|
||||
}
|
||||
|
||||
/** Bei App-Start Update pruefen */
|
||||
checkForUpdate(): void {
|
||||
if (this.checking) return;
|
||||
@@ -111,6 +140,10 @@ class UpdateService {
|
||||
});
|
||||
});
|
||||
|
||||
// Vor dem Schreiben alte APKs im Cache wegraeumen — falls mehrere
|
||||
// Updates in einer Session gezogen werden
|
||||
await this.cleanupOldApks();
|
||||
|
||||
// Base64 als APK-Datei speichern
|
||||
const destPath = `${RNFS.CachesDirectoryPath}/${apkData.fileName}`;
|
||||
await RNFS.writeFile(destPath, apkData.base64, 'base64');
|
||||
|
||||
@@ -17,6 +17,7 @@
|
||||
*/
|
||||
|
||||
import AsyncStorage from '@react-native-async-storage/async-storage';
|
||||
import { ToastAndroid } from 'react-native';
|
||||
|
||||
type WakeWordCallback = () => void;
|
||||
type StateCallback = (state: WakeWordState) => void;
|
||||
@@ -80,10 +81,20 @@ class WakeWordService {
|
||||
|
||||
// Laufende Instanz stoppen
|
||||
await this.disposePorcupine();
|
||||
if (!this.accessKey) return false;
|
||||
if (!this.accessKey) {
|
||||
console.warn('[WakeWord] configure: kein Access Key gesetzt');
|
||||
return false;
|
||||
}
|
||||
|
||||
// Neu initialisieren
|
||||
return this.initPorcupine();
|
||||
const ok = await this.initPorcupine();
|
||||
if (!ok) {
|
||||
ToastAndroid.show(
|
||||
`Wake-Word "${this.keyword}" konnte nicht initialisiert werden — Logs pruefen`,
|
||||
ToastAndroid.LONG,
|
||||
);
|
||||
}
|
||||
return ok;
|
||||
}
|
||||
|
||||
private async initPorcupine(): Promise<boolean> {
|
||||
@@ -117,10 +128,14 @@ class WakeWordService {
|
||||
this.disposePorcupine().catch(() => {});
|
||||
},
|
||||
);
|
||||
console.log('[WakeWord] Porcupine init OK (keyword=%s)', this.keyword);
|
||||
console.log('[WakeWord] Porcupine init OK (keyword=%s, manager=%s)',
|
||||
this.keyword, this.porcupine ? 'created' : 'NULL');
|
||||
return true;
|
||||
} catch (err) {
|
||||
console.warn('[WakeWord] Porcupine init fehlgeschlagen:', err);
|
||||
} catch (err: any) {
|
||||
console.warn('[WakeWord] Porcupine init fehlgeschlagen:', err?.message || err);
|
||||
console.warn('[WakeWord] err details:', JSON.stringify({
|
||||
name: err?.name, code: err?.code, stack: err?.stack?.slice(0, 200),
|
||||
}));
|
||||
this.porcupine = null;
|
||||
return false;
|
||||
} finally {
|
||||
@@ -146,14 +161,27 @@ class WakeWordService {
|
||||
try {
|
||||
await this.porcupine.start();
|
||||
console.log('[WakeWord] armed — warte auf Wake Word "%s"', this.keyword);
|
||||
ToastAndroid.show(`Lausche auf "${this.keyword}"`, ToastAndroid.SHORT);
|
||||
this.setState('armed');
|
||||
return true;
|
||||
} catch (err) {
|
||||
console.warn('[WakeWord] Porcupine start fehlgeschlagen — Fallback Direkt-Konversation:', err);
|
||||
} catch (err: any) {
|
||||
console.warn('[WakeWord] Porcupine start fehlgeschlagen — Fallback Direkt-Konversation:',
|
||||
err?.message || err);
|
||||
ToastAndroid.show(
|
||||
`Wake-Word-Start failed: ${err?.message || err}`,
|
||||
ToastAndroid.LONG,
|
||||
);
|
||||
}
|
||||
} else {
|
||||
// Kein Porcupine init → User explicit informieren
|
||||
console.warn('[WakeWord] Porcupine nicht initialisiert — Access Key fehlt? Fallback Direkt-Aufnahme');
|
||||
ToastAndroid.show(
|
||||
'Wake-Word nicht aktiv — direkte Aufnahme startet (Mikro hoert mit)',
|
||||
ToastAndroid.LONG,
|
||||
);
|
||||
}
|
||||
// Fallback: direkt in die Konversation
|
||||
console.log('[WakeWord] Konversation startet sofort (kein Wake-Word)');
|
||||
// Fallback: direkt in die Konversation (Mikro AKTIV, nicht passive)
|
||||
console.log('[WakeWord] Direkt-Aufnahme startet (kein Wake-Word)');
|
||||
this.setState('conversing');
|
||||
setTimeout(() => {
|
||||
if (this.state === 'conversing') {
|
||||
@@ -175,6 +203,7 @@ class WakeWordService {
|
||||
/** Wake-Word getriggert: Porcupine pausieren, Konversation starten. */
|
||||
private async onWakeDetected(): Promise<void> {
|
||||
console.log('[WakeWord] Wake-Word "%s" erkannt!', this.keyword);
|
||||
ToastAndroid.show(`Wake-Word "${this.keyword}" erkannt — sprich jetzt`, ToastAndroid.SHORT);
|
||||
if (this.porcupine) {
|
||||
try { await this.porcupine.stop(); } catch {}
|
||||
}
|
||||
@@ -197,6 +226,7 @@ class WakeWordService {
|
||||
try {
|
||||
await this.porcupine.start();
|
||||
console.log('[WakeWord] Konversation zu Ende — zurueck zu armed');
|
||||
ToastAndroid.show(`Lausche wieder auf "${this.keyword}"`, ToastAndroid.SHORT);
|
||||
this.setState('armed');
|
||||
return;
|
||||
} catch (err) {
|
||||
@@ -204,6 +234,7 @@ class WakeWordService {
|
||||
}
|
||||
}
|
||||
console.log('[WakeWord] Konversation zu Ende — Ohr aus');
|
||||
ToastAndroid.show('Mikro aus', ToastAndroid.SHORT);
|
||||
this.setState('off');
|
||||
}
|
||||
|
||||
|
||||
@@ -942,7 +942,8 @@ class ARIABridge:
|
||||
},
|
||||
"timestamp": int(asyncio.get_event_loop().time() * 1000),
|
||||
})
|
||||
logger.info("[core] XTTS-Request gesendet (%s): '%s'", xtts_voice or "default", tts_text[:60])
|
||||
logger.info("[core] XTTS-Request gesendet (voice=%s, speed=%.2fx): '%s'",
|
||||
xtts_voice or "default", xtts_speed, tts_text[:60])
|
||||
except Exception as e:
|
||||
logger.error("[core] XTTS-Request fehlgeschlagen: %s — kein Audio", e)
|
||||
|
||||
|
||||
+34
-1
@@ -145,6 +145,15 @@
|
||||
</div>
|
||||
<textarea id="voice-preview-text" rows="4"
|
||||
style="background:#0D0D1A;border:1px solid #2A2A3E;border-radius:6px;padding:10px;color:#fff;font-size:13px;resize:vertical;"></textarea>
|
||||
|
||||
<div style="display:flex;align-items:center;gap:10px;font-size:12px;color:#8888AA;">
|
||||
<span style="min-width:120px;">Geschwindigkeit:</span>
|
||||
<button onclick="adjustPreviewSpeed(-0.1)" class="btn secondary" style="padding:4px 10px;font-size:12px;">−0.1</button>
|
||||
<span id="voice-preview-speed-value" style="min-width:52px;text-align:center;color:#fff;font-weight:600;">1.0 x</span>
|
||||
<button onclick="adjustPreviewSpeed(0.1)" class="btn secondary" style="padding:4px 10px;font-size:12px;">+0.1</button>
|
||||
<span style="color:#555570;font-size:11px;">(nur fuer dieses Modal, wird nicht gespeichert)</span>
|
||||
</div>
|
||||
|
||||
<div style="display:flex;gap:8px;align-items:center;">
|
||||
<button id="voice-preview-play" onclick="playVoicePreview()" class="btn primary" style="padding:8px 16px;">
|
||||
▶ Abspielen
|
||||
@@ -1630,10 +1639,29 @@
|
||||
|
||||
// ── Voice Preview Modal ─────────────────────────
|
||||
const VOICE_PREVIEW_DEFAULT = 'Hallo, ich bin ARIA. Das hier ist ein kleiner Test damit du meine Stimme beurteilen kannst.';
|
||||
const PREVIEW_SPEED_DEFAULT = 1.0;
|
||||
const PREVIEW_SPEED_MIN = 0.1;
|
||||
const PREVIEW_SPEED_MAX = 5.0;
|
||||
let currentPreviewVoice = '';
|
||||
let currentPreviewSpeed = PREVIEW_SPEED_DEFAULT;
|
||||
|
||||
function _refreshPreviewSpeedLabel() {
|
||||
const el = document.getElementById('voice-preview-speed-value');
|
||||
if (el) el.textContent = currentPreviewSpeed.toFixed(1) + ' x';
|
||||
}
|
||||
|
||||
function adjustPreviewSpeed(delta) {
|
||||
const next = Math.round((currentPreviewSpeed + delta) * 10) / 10;
|
||||
if (next < PREVIEW_SPEED_MIN || next > PREVIEW_SPEED_MAX) return;
|
||||
currentPreviewSpeed = next;
|
||||
_refreshPreviewSpeedLabel();
|
||||
}
|
||||
|
||||
function openVoicePreview(name) {
|
||||
currentPreviewVoice = name;
|
||||
// Speed bei jedem Oeffnen zuruecksetzen — bewusst kein persist
|
||||
currentPreviewSpeed = PREVIEW_SPEED_DEFAULT;
|
||||
_refreshPreviewSpeedLabel();
|
||||
document.getElementById('voice-preview-name').textContent = name;
|
||||
// Text bei jedem Oeffnen zuruecksetzen
|
||||
document.getElementById('voice-preview-text').value = VOICE_PREVIEW_DEFAULT;
|
||||
@@ -1658,7 +1686,12 @@
|
||||
}
|
||||
document.getElementById('voice-preview-status').textContent = '⏳ Rendere...';
|
||||
document.getElementById('voice-preview-play').disabled = true;
|
||||
send({ action: 'preview_voice', voice: currentPreviewVoice, text });
|
||||
send({
|
||||
action: 'preview_voice',
|
||||
voice: currentPreviewVoice,
|
||||
text,
|
||||
speed: currentPreviewSpeed,
|
||||
});
|
||||
}
|
||||
|
||||
function deleteXttsVoice(name) {
|
||||
|
||||
@@ -1469,7 +1469,7 @@ wss.on("connection", (ws) => {
|
||||
} else if (msg.action === "test_tts") {
|
||||
handleTestTTS(ws, msg.text || "Test");
|
||||
} else if (msg.action === "preview_voice") {
|
||||
handleVoicePreview(ws, msg.voice || "", msg.text || "Hallo.");
|
||||
handleVoicePreview(ws, msg.voice || "", msg.text || "Hallo.", msg.speed);
|
||||
} else if (msg.action === "check_tts") {
|
||||
handleCheckTTS(ws);
|
||||
} else if (msg.action === "check_desktop") {
|
||||
@@ -1704,8 +1704,11 @@ function _handlePreviewChunk(payload) {
|
||||
}
|
||||
}
|
||||
|
||||
async function handleVoicePreview(clientWs, voice, text) {
|
||||
async function handleVoicePreview(clientWs, voice, text, speed) {
|
||||
try {
|
||||
// Speed clampen — Browser-Slider ist 0.1-5.0
|
||||
let spd = parseFloat(speed);
|
||||
if (!isFinite(spd) || spd < 0.1 || spd > 5.0) spd = 1.0;
|
||||
const requestId = crypto.randomUUID();
|
||||
_previewPending.set(requestId, { clientWs, chunks: [], sampleRate: 0, channels: 0 });
|
||||
// Timeout safety net
|
||||
@@ -1720,10 +1723,10 @@ async function handleVoicePreview(clientWs, voice, text) {
|
||||
}
|
||||
}
|
||||
}, 60000);
|
||||
log("info", "server", `Voice-Preview: voice="${voice}" text="${text.slice(0, 60)}"`);
|
||||
log("info", "server", `Voice-Preview: voice="${voice}" speed=${spd.toFixed(1)}x text="${text.slice(0, 60)}"`);
|
||||
sendToRVS_raw({
|
||||
type: "xtts_request",
|
||||
payload: { text, language: "de", requestId, voice, speed: 1.0 },
|
||||
payload: { text, language: "de", requestId, voice, speed: spd },
|
||||
timestamp: Date.now(),
|
||||
});
|
||||
} catch (err) {
|
||||
|
||||
@@ -239,6 +239,8 @@ class F5Runner:
|
||||
|
||||
def _infer_blocking(self, gen_text: str, ref_wav: str, ref_text: str,
|
||||
speed: float = 1.0) -> tuple[np.ndarray, int]:
|
||||
logger.info("infer() text=%d chars, speed=%.2f, cfg=%.2f, nfe=%d",
|
||||
len(gen_text), speed, self.cfg_strength, self.nfe_step)
|
||||
wav, sr, _ = self.model.infer(
|
||||
ref_file=ref_wav,
|
||||
ref_text=ref_text,
|
||||
@@ -507,7 +509,8 @@ async def _do_tts(ws, runner: F5Runner, text: str, voice: str,
|
||||
ref_wav_str, ref_text = str(pair[0]), pair[1].read_text(encoding="utf-8").strip()
|
||||
|
||||
sentences = split_sentences(text)
|
||||
logger.info("F5-TTS: %d Satz(e), voice=%s (%s)", len(sentences), voice or "default", ref_wav_str)
|
||||
logger.info("F5-TTS: %d Satz(e), voice=%s, speed=%.2fx (%s)",
|
||||
len(sentences), voice or "default", speed, ref_wav_str)
|
||||
|
||||
chunk_index = 0
|
||||
pcm_sr = TARGET_SR
|
||||
|
||||
Reference in New Issue
Block a user