Compare commits
25 Commits
| Author | SHA1 | Date |
|---|---|---|
|
|
b1ccf29295 | |
|
|
4cd9faece2 | |
|
|
fec8aa977b | |
|
|
20123de827 | |
|
|
8761d1a1b7 | |
|
|
abc5b971f4 | |
|
|
b588dd7e3b | |
|
|
309df9d851 | |
|
|
f2e643d1fb | |
|
|
6ac374621c | |
|
|
efbd306597 | |
|
|
4454613a98 | |
|
|
55cfb752a2 | |
|
|
a4d3449e3a | |
|
|
44d2c6b4fe | |
|
|
0309c95aa5 | |
|
|
2aa2cc70c9 | |
|
|
9d0776c819 | |
|
|
f031fa159e | |
|
|
be373466a3 | |
|
|
bbf9aed3ba | |
|
|
745b4a07c0 | |
|
|
23ca815cb2 | |
|
|
cc3fac8142 | |
|
|
cd89e36ec2 |
49
README.md
49
README.md
|
|
@ -380,6 +380,7 @@ API-Endpoint fuer andere Services: `GET http://localhost:3001/api/session`
|
||||||
- Text-Chat mit ARIA
|
- Text-Chat mit ARIA
|
||||||
- **Sprachaufnahme**: Push-to-Talk (halten) oder Tap-to-Talk (tippen, Auto-Stop bei Stille)
|
- **Sprachaufnahme**: Push-to-Talk (halten) oder Tap-to-Talk (tippen, Auto-Stop bei Stille)
|
||||||
- **Gespraechsmodus** (Ohr-Button): Nach jeder ARIA-Antwort startet automatisch die Aufnahme — wie ein natuerliches Gespraech hin und her
|
- **Gespraechsmodus** (Ohr-Button): Nach jeder ARIA-Antwort startet automatisch die Aufnahme — wie ein natuerliches Gespraech hin und her
|
||||||
|
- **Wake-Word** (on-device, openWakeWord ONNX): "Hey Jarvis", "Alexa", "Hey Mycroft", "Hey Rhasspy" — Mikrofon hoert passiv mit, Konversation startet beim Schluesselwort. Komplett on-device via ONNX Runtime, kein API-Key, kein Cloud-Roundtrip, Audio verlaesst das Geraet nicht.
|
||||||
- **VAD (Voice Activity Detection)**: Konfigurierbare Stille-Toleranz (1.0–8.0s, Default 2.8s) bevor Auto-Stop greift. Max-Aufnahme 120s.
|
- **VAD (Voice Activity Detection)**: Konfigurierbare Stille-Toleranz (1.0–8.0s, Default 2.8s) bevor Auto-Stop greift. Max-Aufnahme 120s.
|
||||||
- **Speech Gate**: Aufnahme wird verworfen wenn keine Sprache erkannt
|
- **Speech Gate**: Aufnahme wird verworfen wenn keine Sprache erkannt
|
||||||
- **STT (Speech-to-Text)**: 16kHz mono → Bridge → Gamebox-Whisper (CUDA) → Text im Chat. Fast in Echtzeit.
|
- **STT (Speech-to-Text)**: 16kHz mono → Bridge → Gamebox-Whisper (CUDA) → Text im Chat. Fast in Echtzeit.
|
||||||
|
|
@ -398,6 +399,45 @@ API-Endpoint fuer andere Services: `GET http://localhost:3001/api/session`
|
||||||
- GPS-Position (optional)
|
- GPS-Position (optional)
|
||||||
- QR-Code Scanner fuer Token-Pairing
|
- QR-Code Scanner fuer Token-Pairing
|
||||||
|
|
||||||
|
### Wake-Word (openWakeWord, on-device)
|
||||||
|
|
||||||
|
Wake-Word-Erkennung laeuft komplett **on-device** ueber [openWakeWord](https://github.com/dscripka/openWakeWord)
|
||||||
|
mit ONNX Runtime — kein API-Key, kein Cloud-Roundtrip, kein Cent Lizenzgebuehren,
|
||||||
|
und das Audio verlaesst das Geraet nie.
|
||||||
|
|
||||||
|
**Mitgelieferte Wake-Words** (ONNX-Dateien in `android/android/app/src/main/assets/openwakeword/`):
|
||||||
|
- `Hey Jarvis` (Default, openWakeWord-Original)
|
||||||
|
- `Computer` (Star-Trek-Style, Community-Modell)
|
||||||
|
- `Alexa`, `Hey Mycroft`, `Hey Rhasspy` (openWakeWord-Originale)
|
||||||
|
|
||||||
|
Community-Modelle stammen aus [fwartner/home-assistant-wakewords-collection](https://github.com/fwartner/home-assistant-wakewords-collection).
|
||||||
|
|
||||||
|
**Bedienung:**
|
||||||
|
- App → **Einstellungen** → **Wake-Word** → gewuenschtes Keyword waehlen → **Speichern + Aktivieren**
|
||||||
|
- **Ohr-Button (👂)** in der Statusleiste tippen → Wake-Word ist scharf, App hoert passiv mit
|
||||||
|
- Wake-Word sagen → Symbol wechselt auf 🎙️, Konversation laeuft
|
||||||
|
- Nach jeder ARIA-Antwort oeffnet sich das Mikro nochmal — Stille → zurueck zu 👂
|
||||||
|
- Erneut tippen → Ohr aus (🔇)
|
||||||
|
|
||||||
|
**Eigene Wake-Words trainieren** (gratis, ~30 Min):
|
||||||
|
|
||||||
|
1. openWakeWord Trainings-Notebook auf Colab oeffnen (Link im
|
||||||
|
[openWakeWord Repo](https://github.com/dscripka/openWakeWord) unter "Training Custom Models")
|
||||||
|
2. Wake-Word-Phrase eingeben (z.B. "ARIA", "Hey Stefan"), Notebook ausfuehren —
|
||||||
|
das Notebook generiert synthetische Trainings-Beispiele und trainiert das Modell.
|
||||||
|
3. Resultierende `.onnx`-Datei runterladen
|
||||||
|
4. Datei in `android/android/app/src/main/assets/openwakeword/` ablegen
|
||||||
|
5. In `android/src/services/wakeword.ts` den Dateinamen (ohne `.onnx`) zur
|
||||||
|
`WAKE_KEYWORDS`-Liste hinzufuegen
|
||||||
|
6. APK neu bauen
|
||||||
|
|
||||||
|
*(Diagnostic-Upload fuer Custom-`.onnx` ohne Rebuild kommt spaeter.)*
|
||||||
|
|
||||||
|
**Tuning** (in [wakeword.ts](android/src/services/wakeword.ts)):
|
||||||
|
- `DEFAULT_THRESHOLD = 0.5` — Score-Schwelle (raise auf 0.6–0.7 bei False-Positives)
|
||||||
|
- `DEFAULT_PATIENCE = 2` — wie viele Frames ueber Threshold noetig
|
||||||
|
- `DEFAULT_DEBOUNCE_MS = 1500` — Mindestabstand zwischen zwei Triggern
|
||||||
|
|
||||||
### Ersteinrichtung (Dev-Maschine, einmalig)
|
### Ersteinrichtung (Dev-Maschine, einmalig)
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
|
@ -744,8 +784,10 @@ docker exec aria-core ssh aria-wohnung hostname
|
||||||
- **Proxy Cold Start**: Jede Nachricht spawnt einen neuen `claude --print` Prozess.
|
- **Proxy Cold Start**: Jede Nachricht spawnt einen neuen `claude --print` Prozess.
|
||||||
Dadurch ist ARIA langsamer als die direkte Claude CLI. Timeout ist auf 900s (15 Min).
|
Dadurch ist ARIA langsamer als die direkte Claude CLI. Timeout ist auf 900s (15 Min).
|
||||||
- **Kein Streaming zur App**: Die App zeigt erst die fertige Antwort, keine Streaming-Tokens.
|
- **Kein Streaming zur App**: Die App zeigt erst die fertige Antwort, keine Streaming-Tokens.
|
||||||
- **Wake Word nur auf VM**: Die Bridge hoert auf "ARIA" ueber das lokale Mikrofon der VM.
|
- **Wake-Word in der App nur eingebaute Keywords**: `Hey Jarvis`, `Alexa`, `Hey Mycroft`,
|
||||||
In der App gibt es Energy-basierte Erkennung (Phase 1). On-device "ARIA"-Keyword (Porcupine) ist Phase 2.
|
`Hey Rhasspy` funktionieren sofort, eigene Wake-Words muessen aktuell noch als
|
||||||
|
`.onnx`-Datei ins App-Bundle gelegt + zur Liste in `wakeword.ts` hinzugefuegt werden.
|
||||||
|
Die Diagnostic-Upload-UI ist Phase 2.
|
||||||
- **Audio-Format**: App nimmt AAC/MP4 auf, Bridge konvertiert via FFmpeg zu 16kHz PCM.
|
- **Audio-Format**: App nimmt AAC/MP4 auf, Bridge konvertiert via FFmpeg zu 16kHz PCM.
|
||||||
- **RVS Zombie-Connections**: WebSocket-Verbindungen sterben gelegentlich ohne Fehlermeldung.
|
- **RVS Zombie-Connections**: WebSocket-Verbindungen sterben gelegentlich ohne Fehlermeldung.
|
||||||
Bridge hat Ping-Check (5s), Diagnostic nutzt frische Verbindungen pro Request.
|
Bridge hat Ping-Check (5s), Diagnostic nutzt frische Verbindungen pro Request.
|
||||||
|
|
@ -800,6 +842,7 @@ docker exec aria-core ssh aria-wohnung hostname
|
||||||
- [x] Audio-Pause statt Ducking (TRANSIENT statt MAY_DUCK) + release-Timing fix
|
- [x] Audio-Pause statt Ducking (TRANSIENT statt MAY_DUCK) + release-Timing fix
|
||||||
- [x] VAD-Stille-Toleranz und Max-Aufnahme einstellbar (1-8s, 120s)
|
- [x] VAD-Stille-Toleranz und Max-Aufnahme einstellbar (1-8s, 120s)
|
||||||
- [x] Disk-Voll Banner in Diagnostic mit copy-baren Cleanup-Befehlen
|
- [x] Disk-Voll Banner in Diagnostic mit copy-baren Cleanup-Befehlen
|
||||||
|
- [x] Wake-Word on-device via openWakeWord (ONNX Runtime, kein API-Key) + State-Icon
|
||||||
|
|
||||||
### Phase 2 — ARIA wird produktiv
|
### Phase 2 — ARIA wird produktiv
|
||||||
|
|
||||||
|
|
@ -815,5 +858,5 @@ docker exec aria-core ssh aria-wohnung hostname
|
||||||
- [ ] STARFACE Telefonie-Skill
|
- [ ] STARFACE Telefonie-Skill
|
||||||
- [ ] Desktop Client (Tauri)
|
- [ ] Desktop Client (Tauri)
|
||||||
- [ ] bKVM Remote IT-Support
|
- [ ] bKVM Remote IT-Support
|
||||||
- [ ] Porcupine Wake Word (on-device "ARIA" in der App)
|
- [ ] Custom-`.onnx`-Upload fuer Wake-Word ueber Diagnostic (ohne App-Rebuild)
|
||||||
- [ ] Claude Vision direkt (Bildanalyse ohne Dateipfad-Umweg)
|
- [ ] Claude Vision direkt (Bildanalyse ohne Dateipfad-Umweg)
|
||||||
|
|
|
||||||
|
|
@ -79,8 +79,8 @@ android {
|
||||||
applicationId "com.ariacockpit"
|
applicationId "com.ariacockpit"
|
||||||
minSdkVersion rootProject.ext.minSdkVersion
|
minSdkVersion rootProject.ext.minSdkVersion
|
||||||
targetSdkVersion rootProject.ext.targetSdkVersion
|
targetSdkVersion rootProject.ext.targetSdkVersion
|
||||||
versionCode 600
|
versionCode 701
|
||||||
versionName "0.0.6.0"
|
versionName "0.0.7.1"
|
||||||
// Fallback fuer Libraries mit Product Flavors
|
// Fallback fuer Libraries mit Product Flavors
|
||||||
missingDimensionStrategy 'react-native-camera', 'general'
|
missingDimensionStrategy 'react-native-camera', 'general'
|
||||||
}
|
}
|
||||||
|
|
@ -104,6 +104,19 @@ android {
|
||||||
proguardFiles getDefaultProguardFile("proguard-android.txt"), "proguard-rules.pro"
|
proguardFiles getDefaultProguardFile("proguard-android.txt"), "proguard-rules.pro"
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// ABI-Split: nur arm64-v8a (jedes Android-Phone seit ~2017). Bringt die
|
||||||
|
// APK von ~136 MB auf ~35 MB — relevant weil ONNX Runtime + die anderen
|
||||||
|
// Native-Libs sonst pro Architektur dazukommen. Wer 32-bit oder Emulator
|
||||||
|
// braucht, kann hier "armeabi-v7a", "x86_64" etc. ergaenzen.
|
||||||
|
splits {
|
||||||
|
abi {
|
||||||
|
enable true
|
||||||
|
reset()
|
||||||
|
include "arm64-v8a"
|
||||||
|
universalApk false
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
dependencies {
|
dependencies {
|
||||||
|
|
@ -111,6 +124,9 @@ dependencies {
|
||||||
implementation("com.facebook.react:react-android")
|
implementation("com.facebook.react:react-android")
|
||||||
implementation("com.facebook.react:flipper-integration")
|
implementation("com.facebook.react:flipper-integration")
|
||||||
|
|
||||||
|
// ONNX Runtime fuer on-device Wake-Word (openWakeWord ONNX-Modelle in assets/openwakeword/)
|
||||||
|
implementation("com.microsoft.onnxruntime:onnxruntime-android:1.17.1")
|
||||||
|
|
||||||
if (hermesEnabled.toBoolean()) {
|
if (hermesEnabled.toBoolean()) {
|
||||||
implementation("com.facebook.react:hermes-android")
|
implementation("com.facebook.react:hermes-android")
|
||||||
} else {
|
} else {
|
||||||
|
|
|
||||||
|
|
@ -4,6 +4,8 @@
|
||||||
<uses-permission android:name="android.permission.CAMERA" />
|
<uses-permission android:name="android.permission.CAMERA" />
|
||||||
<uses-permission android:name="android.permission.RECORD_AUDIO" />
|
<uses-permission android:name="android.permission.RECORD_AUDIO" />
|
||||||
<uses-permission android:name="android.permission.REQUEST_INSTALL_PACKAGES" />
|
<uses-permission android:name="android.permission.REQUEST_INSTALL_PACKAGES" />
|
||||||
|
<!-- Anruf-State lesen damit TTS bei klingelndem Telefon pausiert -->
|
||||||
|
<uses-permission android:name="android.permission.READ_PHONE_STATE" />
|
||||||
|
|
||||||
<application
|
<application
|
||||||
android:name=".MainApplication"
|
android:name=".MainApplication"
|
||||||
|
|
|
||||||
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
|
|
@ -21,6 +21,8 @@ class MainApplication : Application(), ReactApplication {
|
||||||
add(ApkInstallerPackage())
|
add(ApkInstallerPackage())
|
||||||
add(AudioFocusPackage())
|
add(AudioFocusPackage())
|
||||||
add(PcmStreamPlayerPackage())
|
add(PcmStreamPlayerPackage())
|
||||||
|
add(OpenWakeWordPackage())
|
||||||
|
add(PhoneCallPackage())
|
||||||
}
|
}
|
||||||
|
|
||||||
override fun getJSMainModuleName(): String = "index"
|
override fun getJSMainModuleName(): String = "index"
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,369 @@
|
||||||
|
package com.ariacockpit
|
||||||
|
|
||||||
|
import ai.onnxruntime.OnnxTensor
|
||||||
|
import ai.onnxruntime.OrtEnvironment
|
||||||
|
import ai.onnxruntime.OrtSession
|
||||||
|
import android.Manifest
|
||||||
|
import android.content.pm.PackageManager
|
||||||
|
import android.media.AudioFormat
|
||||||
|
import android.media.AudioRecord
|
||||||
|
import android.media.MediaRecorder
|
||||||
|
import android.util.Log
|
||||||
|
import androidx.core.content.ContextCompat
|
||||||
|
import com.facebook.react.bridge.Promise
|
||||||
|
import com.facebook.react.bridge.ReactApplicationContext
|
||||||
|
import com.facebook.react.bridge.ReactContextBaseJavaModule
|
||||||
|
import com.facebook.react.bridge.ReactMethod
|
||||||
|
import com.facebook.react.modules.core.DeviceEventManagerModule
|
||||||
|
import java.nio.FloatBuffer
|
||||||
|
import java.util.concurrent.atomic.AtomicBoolean
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Wake-Word Erkennung on-device via openWakeWord (https://github.com/dscripka/openWakeWord).
|
||||||
|
*
|
||||||
|
* Drei-stufige ONNX Pipeline:
|
||||||
|
* 1. Audio (16kHz mono int16, 1280-Sample-Chunks) → Melspectrogram → 32-mel Frames
|
||||||
|
* 2. 76 Mel-Frames Sliding Window (stride 8) → Speech-Embedding → 96-dim Vektor
|
||||||
|
* 3. Letzte 16 Embeddings (~1.28s Kontext) → Wake-Word-Klassifikator → Sigmoid-Score
|
||||||
|
*
|
||||||
|
* Modelle liegen in assets/openwakeword/ (mel + embedding shared, plus pro Keyword
|
||||||
|
* ein eigenes .onnx). Erkennung feuert nach `patience` aufeinanderfolgenden
|
||||||
|
* Frames ueber `threshold` und unterdrueckt Wiederholungen fuer `debounceMs`.
|
||||||
|
*
|
||||||
|
* Emittiert "WakeWordDetected" als RN-Event wenn ein Trigger erkannt wurde.
|
||||||
|
*/
|
||||||
|
class OpenWakeWordModule(reactContext: ReactApplicationContext) : ReactContextBaseJavaModule(reactContext) {
|
||||||
|
override fun getName() = "OpenWakeWord"
|
||||||
|
|
||||||
|
companion object {
|
||||||
|
private const val TAG = "OpenWakeWord"
|
||||||
|
private const val SAMPLE_RATE = 16000
|
||||||
|
private const val CHUNK_SAMPLES = 1280 // 80ms @ 16kHz
|
||||||
|
private const val MEL_FRAMES_PER_EMBEDDING = 76 // Embedding-Fenster
|
||||||
|
private const val EMBEDDING_STRIDE = 8 // Slide um 8 Mel-Frames
|
||||||
|
private const val EMBEDDING_DIM = 96
|
||||||
|
private const val MEL_BINS = 32
|
||||||
|
private const val DEFAULT_WW_INPUT_FRAMES = 16 // Fallback wenn Modell-Metadata fehlt
|
||||||
|
}
|
||||||
|
|
||||||
|
private val env: OrtEnvironment = OrtEnvironment.getEnvironment()
|
||||||
|
private var melSession: OrtSession? = null
|
||||||
|
private var embSession: OrtSession? = null
|
||||||
|
private var wwSession: OrtSession? = null
|
||||||
|
|
||||||
|
private var melInputName: String = "input"
|
||||||
|
private var embInputName: String = "input_1"
|
||||||
|
private var wwInputName: String = "input"
|
||||||
|
// Anzahl Embedding-Frames die der Wake-Word-Klassifikator pro Inferenz erwartet —
|
||||||
|
// hey_jarvis hat 16, andere Community-Modelle koennen abweichen (z.B. 28).
|
||||||
|
// Wird beim init() aus den Modell-Metadaten gelesen.
|
||||||
|
private var wwInputFrames: Int = DEFAULT_WW_INPUT_FRAMES
|
||||||
|
|
||||||
|
// Konfiguration
|
||||||
|
private var threshold: Float = 0.5f
|
||||||
|
private var patience: Int = 2
|
||||||
|
private var debounceMs: Long = 1500
|
||||||
|
private var modelName: String = "hey_jarvis"
|
||||||
|
|
||||||
|
// Audio-Capture-Thread
|
||||||
|
private var audioRecord: AudioRecord? = null
|
||||||
|
private val running = AtomicBoolean(false)
|
||||||
|
private var captureThread: Thread? = null
|
||||||
|
|
||||||
|
// Inferenz-State
|
||||||
|
private val melBuffer: ArrayList<FloatArray> = ArrayList(256) // Liste von 32-dim Frames
|
||||||
|
private var melProcessedIdx: Int = 0
|
||||||
|
private val embBuffer: ArrayDeque<FloatArray> = ArrayDeque(32) // Ringpuffer letzter Embeddings
|
||||||
|
private var consecutiveAboveThreshold: Int = 0
|
||||||
|
private var lastDetectionMs: Long = 0L
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Initialisiert die ONNX-Sessions fuer ein bestimmtes Wake-Word.
|
||||||
|
* modelName: dateiname ohne Suffix (z.B. "hey_jarvis", "alexa", "hey_mycroft", "hey_rhasspy")
|
||||||
|
*/
|
||||||
|
@ReactMethod
|
||||||
|
fun init(modelName: String, threshold: Double, patience: Int, debounceMs: Int, promise: Promise) {
|
||||||
|
try {
|
||||||
|
disposeSessions()
|
||||||
|
this.modelName = modelName
|
||||||
|
this.threshold = threshold.toFloat()
|
||||||
|
this.patience = patience.coerceAtLeast(1)
|
||||||
|
this.debounceMs = debounceMs.toLong()
|
||||||
|
|
||||||
|
val ctx = reactApplicationContext
|
||||||
|
val melBytes = ctx.assets.open("openwakeword/melspectrogram.onnx").use { it.readBytes() }
|
||||||
|
val embBytes = ctx.assets.open("openwakeword/embedding_model.onnx").use { it.readBytes() }
|
||||||
|
val wwBytes = ctx.assets.open("openwakeword/$modelName.onnx").use { it.readBytes() }
|
||||||
|
|
||||||
|
val opts = OrtSession.SessionOptions()
|
||||||
|
melSession = env.createSession(melBytes, opts)
|
||||||
|
embSession = env.createSession(embBytes, opts)
|
||||||
|
wwSession = env.createSession(wwBytes, opts)
|
||||||
|
|
||||||
|
melInputName = melSession!!.inputNames.first()
|
||||||
|
embInputName = embSession!!.inputNames.first()
|
||||||
|
wwInputName = wwSession!!.inputNames.first()
|
||||||
|
|
||||||
|
// WW-Input-Frame-Count aus dem Modell lesen — variiert pro Keyword.
|
||||||
|
// Erwartete Form: (1, N, 96), N steht in der Modell-Metadaten.
|
||||||
|
val wwInputInfo = wwSession!!.inputInfo[wwInputName]
|
||||||
|
val wwShape = (wwInputInfo?.info as? ai.onnxruntime.TensorInfo)?.shape
|
||||||
|
wwInputFrames = wwShape?.getOrNull(1)?.toInt()?.takeIf { it > 0 } ?: DEFAULT_WW_INPUT_FRAMES
|
||||||
|
|
||||||
|
Log.i(TAG, "Init OK: model=$modelName wwFrames=$wwInputFrames threshold=$threshold patience=$patience " +
|
||||||
|
"debounce=${debounceMs}ms (inputs: mel=$melInputName emb=$embInputName ww=$wwInputName)")
|
||||||
|
promise.resolve(true)
|
||||||
|
} catch (e: Exception) {
|
||||||
|
Log.e(TAG, "Init fehlgeschlagen: ${e.message}", e)
|
||||||
|
disposeSessions()
|
||||||
|
promise.reject("INIT_FAILED", e.message ?: "Unbekannter Fehler", e)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
@ReactMethod
|
||||||
|
fun start(promise: Promise) {
|
||||||
|
if (running.get()) {
|
||||||
|
promise.resolve(true)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
if (melSession == null || embSession == null || wwSession == null) {
|
||||||
|
promise.reject("NOT_INITIALIZED", "init() muss vor start() aufgerufen werden")
|
||||||
|
return
|
||||||
|
}
|
||||||
|
// Berechtigung pruefen — der App-Code holt die ueblicherweise schon vorher,
|
||||||
|
// aber wir bestehen hier explizit darauf damit AudioRecord nicht stumm
|
||||||
|
// failt.
|
||||||
|
val perm = ContextCompat.checkSelfPermission(reactApplicationContext, Manifest.permission.RECORD_AUDIO)
|
||||||
|
if (perm != PackageManager.PERMISSION_GRANTED) {
|
||||||
|
promise.reject("NO_MIC_PERMISSION", "RECORD_AUDIO Permission fehlt")
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
try {
|
||||||
|
val minBuf = AudioRecord.getMinBufferSize(
|
||||||
|
SAMPLE_RATE,
|
||||||
|
AudioFormat.CHANNEL_IN_MONO,
|
||||||
|
AudioFormat.ENCODING_PCM_16BIT,
|
||||||
|
).coerceAtLeast(CHUNK_SAMPLES * 2 * 4)
|
||||||
|
|
||||||
|
val record = AudioRecord(
|
||||||
|
MediaRecorder.AudioSource.MIC,
|
||||||
|
SAMPLE_RATE,
|
||||||
|
AudioFormat.CHANNEL_IN_MONO,
|
||||||
|
AudioFormat.ENCODING_PCM_16BIT,
|
||||||
|
minBuf,
|
||||||
|
)
|
||||||
|
if (record.state != AudioRecord.STATE_INITIALIZED) {
|
||||||
|
record.release()
|
||||||
|
promise.reject("AUDIO_INIT", "AudioRecord nicht initialisiert (Mikro belegt?)")
|
||||||
|
return
|
||||||
|
}
|
||||||
|
audioRecord = record
|
||||||
|
resetInferenceState()
|
||||||
|
running.set(true)
|
||||||
|
record.startRecording()
|
||||||
|
|
||||||
|
captureThread = Thread({ captureLoop() }, "OpenWakeWordCapture").apply {
|
||||||
|
isDaemon = true
|
||||||
|
start()
|
||||||
|
}
|
||||||
|
|
||||||
|
Log.i(TAG, "Lauschen gestartet (model=$modelName)")
|
||||||
|
promise.resolve(true)
|
||||||
|
} catch (e: Exception) {
|
||||||
|
Log.e(TAG, "start fehlgeschlagen", e)
|
||||||
|
running.set(false)
|
||||||
|
audioRecord?.release()
|
||||||
|
audioRecord = null
|
||||||
|
promise.reject("START_FAILED", e.message ?: "Unbekannter Fehler", e)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
@ReactMethod
|
||||||
|
fun stop(promise: Promise) {
|
||||||
|
running.set(false)
|
||||||
|
try {
|
||||||
|
captureThread?.join(1500)
|
||||||
|
} catch (_: InterruptedException) {}
|
||||||
|
captureThread = null
|
||||||
|
try { audioRecord?.stop() } catch (_: Exception) {}
|
||||||
|
try { audioRecord?.release() } catch (_: Exception) {}
|
||||||
|
audioRecord = null
|
||||||
|
Log.i(TAG, "Lauschen gestoppt")
|
||||||
|
promise.resolve(true)
|
||||||
|
}
|
||||||
|
|
||||||
|
@ReactMethod
|
||||||
|
fun dispose(promise: Promise) {
|
||||||
|
running.set(false)
|
||||||
|
try { captureThread?.join(1000) } catch (_: InterruptedException) {}
|
||||||
|
captureThread = null
|
||||||
|
try { audioRecord?.stop() } catch (_: Exception) {}
|
||||||
|
try { audioRecord?.release() } catch (_: Exception) {}
|
||||||
|
audioRecord = null
|
||||||
|
disposeSessions()
|
||||||
|
promise.resolve(true)
|
||||||
|
}
|
||||||
|
|
||||||
|
@ReactMethod
|
||||||
|
fun isAvailable(promise: Promise) {
|
||||||
|
// Wake-Word ist immer verfuegbar (kein API-Key, alles on-device)
|
||||||
|
promise.resolve(true)
|
||||||
|
}
|
||||||
|
|
||||||
|
// RN-Event-Subscriptions — RN-Konvention, sonst Warnung im Debug-Build
|
||||||
|
@ReactMethod fun addListener(eventName: String) {}
|
||||||
|
@ReactMethod fun removeListeners(count: Int) {}
|
||||||
|
|
||||||
|
private fun disposeSessions() {
|
||||||
|
try { melSession?.close() } catch (_: Exception) {}
|
||||||
|
try { embSession?.close() } catch (_: Exception) {}
|
||||||
|
try { wwSession?.close() } catch (_: Exception) {}
|
||||||
|
melSession = null
|
||||||
|
embSession = null
|
||||||
|
wwSession = null
|
||||||
|
}
|
||||||
|
|
||||||
|
private fun resetInferenceState() {
|
||||||
|
melBuffer.clear()
|
||||||
|
melProcessedIdx = 0
|
||||||
|
embBuffer.clear()
|
||||||
|
consecutiveAboveThreshold = 0
|
||||||
|
lastDetectionMs = 0L
|
||||||
|
}
|
||||||
|
|
||||||
|
private fun emitDetected() {
|
||||||
|
val params = com.facebook.react.bridge.Arguments.createMap().apply {
|
||||||
|
putString("model", modelName)
|
||||||
|
}
|
||||||
|
try {
|
||||||
|
reactApplicationContext
|
||||||
|
.getJSModule(DeviceEventManagerModule.RCTDeviceEventEmitter::class.java)
|
||||||
|
.emit("WakeWordDetected", params)
|
||||||
|
} catch (e: Exception) {
|
||||||
|
Log.w(TAG, "emit fehlgeschlagen: ${e.message}")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
private fun captureLoop() {
|
||||||
|
val buf = ShortArray(CHUNK_SAMPLES)
|
||||||
|
val record = audioRecord ?: return
|
||||||
|
Log.i(TAG, "Capture-Loop gestartet")
|
||||||
|
while (running.get()) {
|
||||||
|
var read = 0
|
||||||
|
while (read < CHUNK_SAMPLES && running.get()) {
|
||||||
|
val n = record.read(buf, read, CHUNK_SAMPLES - read)
|
||||||
|
if (n <= 0) {
|
||||||
|
Log.w(TAG, "AudioRecord.read returned $n — Loop ende")
|
||||||
|
running.set(false)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
read += n
|
||||||
|
}
|
||||||
|
if (!running.get()) break
|
||||||
|
try {
|
||||||
|
processChunk(buf)
|
||||||
|
} catch (e: Exception) {
|
||||||
|
Log.w(TAG, "processChunk: ${e.message}")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
Log.i(TAG, "Capture-Loop beendet")
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Verarbeitet einen 1280-Sample int16 Audio-Chunk. */
|
||||||
|
private fun processChunk(audio: ShortArray) {
|
||||||
|
// 1) Audio → mel (output (1, 1, frames, 32))
|
||||||
|
val floats = FloatArray(audio.size) { audio[it].toFloat() }
|
||||||
|
val melTensor = OnnxTensor.createTensor(
|
||||||
|
env,
|
||||||
|
FloatBuffer.wrap(floats),
|
||||||
|
longArrayOf(1L, audio.size.toLong()),
|
||||||
|
)
|
||||||
|
val melResult = melSession!!.run(mapOf(melInputName to melTensor))
|
||||||
|
val melOut = melResult.get(0).value
|
||||||
|
melTensor.close()
|
||||||
|
@Suppress("UNCHECKED_CAST")
|
||||||
|
val mel4 = melOut as Array<Array<Array<FloatArray>>>
|
||||||
|
val frames = mel4[0][0]
|
||||||
|
// openWakeWord wendet `mel/10 + 2` an, bevor es ans Embedding-Modell geht
|
||||||
|
for (frame in frames) {
|
||||||
|
val scaled = FloatArray(frame.size) { frame[it] / 10f + 2f }
|
||||||
|
melBuffer.add(scaled)
|
||||||
|
}
|
||||||
|
melResult.close()
|
||||||
|
|
||||||
|
// 2) Sliding window: alle vollstaendigen 76-Frame-Fenster verarbeiten
|
||||||
|
while (melBuffer.size >= melProcessedIdx + MEL_FRAMES_PER_EMBEDDING) {
|
||||||
|
val flat = FloatArray(MEL_FRAMES_PER_EMBEDDING * MEL_BINS)
|
||||||
|
var pos = 0
|
||||||
|
for (i in 0 until MEL_FRAMES_PER_EMBEDDING) {
|
||||||
|
val src = melBuffer[melProcessedIdx + i]
|
||||||
|
System.arraycopy(src, 0, flat, pos, MEL_BINS)
|
||||||
|
pos += MEL_BINS
|
||||||
|
}
|
||||||
|
val embIn = OnnxTensor.createTensor(
|
||||||
|
env,
|
||||||
|
FloatBuffer.wrap(flat),
|
||||||
|
longArrayOf(1L, MEL_FRAMES_PER_EMBEDDING.toLong(), MEL_BINS.toLong(), 1L),
|
||||||
|
)
|
||||||
|
val embRes = embSession!!.run(mapOf(embInputName to embIn))
|
||||||
|
val embOut = embRes.get(0).value
|
||||||
|
embIn.close()
|
||||||
|
// Erwartete Output-Form: (1, 1, 1, 96) — rank-4, NICHT (1, 96).
|
||||||
|
// Die Google-Embedding-Pipeline behaelt extra Dimensionen.
|
||||||
|
@Suppress("UNCHECKED_CAST")
|
||||||
|
val embArr = embOut as Array<Array<Array<FloatArray>>>
|
||||||
|
embBuffer.addLast(embArr[0][0][0].copyOf())
|
||||||
|
while (embBuffer.size > wwInputFrames) embBuffer.removeFirst()
|
||||||
|
embRes.close()
|
||||||
|
|
||||||
|
melProcessedIdx += EMBEDDING_STRIDE
|
||||||
|
}
|
||||||
|
// Mel-Buffer trimmen — verhindert Memory-Wachstum
|
||||||
|
if (melProcessedIdx > MEL_FRAMES_PER_EMBEDDING) {
|
||||||
|
val keepFrom = melProcessedIdx - MEL_FRAMES_PER_EMBEDDING
|
||||||
|
val newList = ArrayList<FloatArray>(melBuffer.size - keepFrom)
|
||||||
|
for (i in keepFrom until melBuffer.size) newList.add(melBuffer[i])
|
||||||
|
melBuffer.clear()
|
||||||
|
melBuffer.addAll(newList)
|
||||||
|
melProcessedIdx = MEL_FRAMES_PER_EMBEDDING
|
||||||
|
}
|
||||||
|
|
||||||
|
// 3) Klassifikation — sobald wir 16 Embeddings haben
|
||||||
|
if (embBuffer.size < wwInputFrames) return
|
||||||
|
val flatEmb = FloatArray(wwInputFrames * EMBEDDING_DIM)
|
||||||
|
var p = 0
|
||||||
|
// Letzte wwInputFrames Embeddings nehmen (embBuffer ist auf wwInputFrames begrenzt)
|
||||||
|
for (e in embBuffer) {
|
||||||
|
System.arraycopy(e, 0, flatEmb, p, EMBEDDING_DIM)
|
||||||
|
p += EMBEDDING_DIM
|
||||||
|
}
|
||||||
|
val wwIn = OnnxTensor.createTensor(
|
||||||
|
env,
|
||||||
|
FloatBuffer.wrap(flatEmb),
|
||||||
|
longArrayOf(1L, wwInputFrames.toLong(), EMBEDDING_DIM.toLong()),
|
||||||
|
)
|
||||||
|
val wwRes = wwSession!!.run(mapOf(wwInputName to wwIn))
|
||||||
|
val wwOut = wwRes.get(0).value
|
||||||
|
wwIn.close()
|
||||||
|
// Erwartete Output-Form: (1, 1) → Array<FloatArray>
|
||||||
|
@Suppress("UNCHECKED_CAST")
|
||||||
|
val score = (wwOut as Array<FloatArray>)[0][0]
|
||||||
|
wwRes.close()
|
||||||
|
|
||||||
|
if (score >= threshold) {
|
||||||
|
consecutiveAboveThreshold++
|
||||||
|
if (consecutiveAboveThreshold >= patience) {
|
||||||
|
val now = System.currentTimeMillis()
|
||||||
|
if (now - lastDetectionMs >= debounceMs) {
|
||||||
|
lastDetectionMs = now
|
||||||
|
consecutiveAboveThreshold = 0
|
||||||
|
Log.i(TAG, "Wake-Word erkannt! score=$score model=$modelName")
|
||||||
|
emitDetected()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
consecutiveAboveThreshold = 0
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
@ -0,0 +1,16 @@
|
||||||
|
package com.ariacockpit
|
||||||
|
|
||||||
|
import com.facebook.react.ReactPackage
|
||||||
|
import com.facebook.react.bridge.NativeModule
|
||||||
|
import com.facebook.react.bridge.ReactApplicationContext
|
||||||
|
import com.facebook.react.uimanager.ViewManager
|
||||||
|
|
||||||
|
class OpenWakeWordPackage : ReactPackage {
|
||||||
|
override fun createNativeModules(reactContext: ReactApplicationContext): List<NativeModule> {
|
||||||
|
return listOf(OpenWakeWordModule(reactContext))
|
||||||
|
}
|
||||||
|
|
||||||
|
override fun createViewManagers(reactContext: ReactApplicationContext): List<ViewManager<*, *>> {
|
||||||
|
return emptyList()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
@ -137,6 +137,17 @@ class PcmStreamPlayerModule(reactContext: ReactApplicationContext) : ReactContex
|
||||||
Log.w(TAG, "play() sofort failed: ${e.message}")
|
Log.w(TAG, "play() sofort failed: ${e.message}")
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
// Idle-Cutoff: wenn endRequested NICHT kam aber 30s nichts mehr
|
||||||
|
// reinkommt, brechen wir ab (Bridge-Crash, verlorener final).
|
||||||
|
var idleMs = 0L
|
||||||
|
val maxIdleMs = 30_000L
|
||||||
|
// Zielpufferfuellung — unter diesem Wasserstand fuettern wir
|
||||||
|
// Stille rein damit AudioTrack nicht underrunt waehrend die
|
||||||
|
// Bridge den naechsten Satz rendert. Spotify/YouTube reagieren
|
||||||
|
// sonst mit eigenmaechtiger Wiederaufnahme nach ~10s Stille.
|
||||||
|
val underrunGuardFrames = sampleRate / 10 // ~100ms
|
||||||
|
val silenceFillFrames = sampleRate / 20 // ~50ms pro Refill
|
||||||
|
|
||||||
mainLoop@ while (!writerShouldStop) {
|
mainLoop@ while (!writerShouldStop) {
|
||||||
val data = queue.poll(50, java.util.concurrent.TimeUnit.MILLISECONDS)
|
val data = queue.poll(50, java.util.concurrent.TimeUnit.MILLISECONDS)
|
||||||
if (data == null) {
|
if (data == null) {
|
||||||
|
|
@ -153,8 +164,33 @@ class PcmStreamPlayerModule(reactContext: ReactApplicationContext) : ReactContex
|
||||||
}
|
}
|
||||||
break@mainLoop
|
break@mainLoop
|
||||||
}
|
}
|
||||||
|
// Underrun-Schutz: Stille reinfuettern wenn der AudioTrack-
|
||||||
|
// Puffer leerzulaufen droht. Spotify resumed sonst nach
|
||||||
|
// ~10s Pause auf eigene Faust, obwohl wir den Fokus halten.
|
||||||
|
if (playbackStarted) {
|
||||||
|
val framesWritten = bytesBuffered / streamBytesPerFrame
|
||||||
|
val framesPlayed = t.playbackHeadPosition.toLong()
|
||||||
|
val framesInBuffer = framesWritten - framesPlayed
|
||||||
|
if (framesInBuffer < underrunGuardFrames) {
|
||||||
|
val fillBytes = silenceFillFrames * streamBytesPerFrame
|
||||||
|
val silence = ByteArray(fillBytes)
|
||||||
|
var silOff = 0
|
||||||
|
while (silOff < silence.size && !writerShouldStop) {
|
||||||
|
val w = t.write(silence, silOff, silence.size - silOff)
|
||||||
|
if (w <= 0) break
|
||||||
|
silOff += w
|
||||||
|
}
|
||||||
|
bytesBuffered += silence.size
|
||||||
|
}
|
||||||
|
}
|
||||||
|
idleMs += 50L
|
||||||
|
if (idleMs >= maxIdleMs) {
|
||||||
|
Log.w(TAG, "Idle-Cutoff: ${maxIdleMs}ms keine Daten — Stream wird beendet")
|
||||||
|
break@mainLoop
|
||||||
|
}
|
||||||
continue@mainLoop
|
continue@mainLoop
|
||||||
}
|
}
|
||||||
|
idleMs = 0L
|
||||||
|
|
||||||
// Pre-Roll Check: play() erst wenn genug gepuffert
|
// Pre-Roll Check: play() erst wenn genug gepuffert
|
||||||
if (!playbackStarted && bytesBuffered + data.size >= prerollBytes) {
|
if (!playbackStarted && bytesBuffered + data.size >= prerollBytes) {
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,126 @@
|
||||||
|
package com.ariacockpit
|
||||||
|
|
||||||
|
import android.Manifest
|
||||||
|
import android.content.Context
|
||||||
|
import android.content.pm.PackageManager
|
||||||
|
import android.os.Build
|
||||||
|
import android.telephony.PhoneStateListener
|
||||||
|
import android.telephony.TelephonyCallback
|
||||||
|
import android.telephony.TelephonyManager
|
||||||
|
import android.util.Log
|
||||||
|
import androidx.core.content.ContextCompat
|
||||||
|
import com.facebook.react.bridge.Arguments
|
||||||
|
import com.facebook.react.bridge.Promise
|
||||||
|
import com.facebook.react.bridge.ReactApplicationContext
|
||||||
|
import com.facebook.react.bridge.ReactContextBaseJavaModule
|
||||||
|
import com.facebook.react.bridge.ReactMethod
|
||||||
|
import com.facebook.react.modules.core.DeviceEventManagerModule
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Lauscht auf Anruf-Statusaenderungen — wenn das Telefon klingelt oder ein
|
||||||
|
* Anruf laeuft, sendet das Modul ein "PhoneCallStateChanged"-Event an JS.
|
||||||
|
*
|
||||||
|
* JS-Side stoppt dann die TTS-Wiedergabe damit ARIA nicht mitten ins Gespraech
|
||||||
|
* weiterredet. Ohne READ_PHONE_STATE-Permission failt start() leise — der Rest
|
||||||
|
* der App funktioniert wie bisher.
|
||||||
|
*
|
||||||
|
* State-Strings: "idle" | "ringing" | "offhook"
|
||||||
|
*/
|
||||||
|
class PhoneCallModule(reactContext: ReactApplicationContext) : ReactContextBaseJavaModule(reactContext) {
|
||||||
|
override fun getName() = "PhoneCall"
|
||||||
|
|
||||||
|
companion object { private const val TAG = "PhoneCall" }
|
||||||
|
|
||||||
|
private var telephonyManager: TelephonyManager? = null
|
||||||
|
private var legacyListener: PhoneStateListener? = null
|
||||||
|
private var modernCallback: Any? = null // TelephonyCallback ab API 31
|
||||||
|
private var lastState: Int = TelephonyManager.CALL_STATE_IDLE
|
||||||
|
|
||||||
|
@ReactMethod
|
||||||
|
fun start(promise: Promise) {
|
||||||
|
try {
|
||||||
|
val perm = ContextCompat.checkSelfPermission(reactApplicationContext, Manifest.permission.READ_PHONE_STATE)
|
||||||
|
if (perm != PackageManager.PERMISSION_GRANTED) {
|
||||||
|
Log.w(TAG, "READ_PHONE_STATE Permission fehlt — Anruf-Erkennung inaktiv")
|
||||||
|
promise.resolve(false)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
val tm = reactApplicationContext.getSystemService(Context.TELEPHONY_SERVICE) as? TelephonyManager
|
||||||
|
if (tm == null) {
|
||||||
|
Log.w(TAG, "TelephonyManager nicht verfuegbar")
|
||||||
|
promise.resolve(false)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
telephonyManager = tm
|
||||||
|
|
||||||
|
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.S) {
|
||||||
|
val cb = object : TelephonyCallback(), TelephonyCallback.CallStateListener {
|
||||||
|
override fun onCallStateChanged(state: Int) {
|
||||||
|
handleStateChange(state)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
tm.registerTelephonyCallback(reactApplicationContext.mainExecutor, cb)
|
||||||
|
modernCallback = cb
|
||||||
|
} else {
|
||||||
|
@Suppress("DEPRECATION")
|
||||||
|
val l = object : PhoneStateListener() {
|
||||||
|
override fun onCallStateChanged(state: Int, phoneNumber: String?) {
|
||||||
|
handleStateChange(state)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
@Suppress("DEPRECATION")
|
||||||
|
tm.listen(l, PhoneStateListener.LISTEN_CALL_STATE)
|
||||||
|
legacyListener = l
|
||||||
|
}
|
||||||
|
Log.i(TAG, "PhoneCall-Listener aktiv")
|
||||||
|
promise.resolve(true)
|
||||||
|
} catch (e: Exception) {
|
||||||
|
Log.e(TAG, "start fehlgeschlagen", e)
|
||||||
|
promise.reject("START_FAILED", e.message ?: "Unbekannter Fehler", e)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
@ReactMethod
|
||||||
|
fun stop(promise: Promise) {
|
||||||
|
try {
|
||||||
|
val tm = telephonyManager
|
||||||
|
if (tm != null) {
|
||||||
|
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.S) {
|
||||||
|
(modernCallback as? TelephonyCallback)?.let { tm.unregisterTelephonyCallback(it) }
|
||||||
|
modernCallback = null
|
||||||
|
} else {
|
||||||
|
@Suppress("DEPRECATION")
|
||||||
|
legacyListener?.let { tm.listen(it, PhoneStateListener.LISTEN_NONE) }
|
||||||
|
legacyListener = null
|
||||||
|
}
|
||||||
|
}
|
||||||
|
telephonyManager = null
|
||||||
|
lastState = TelephonyManager.CALL_STATE_IDLE
|
||||||
|
promise.resolve(true)
|
||||||
|
} catch (e: Exception) {
|
||||||
|
promise.reject("STOP_FAILED", e.message ?: "")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
private fun handleStateChange(state: Int) {
|
||||||
|
if (state == lastState) return
|
||||||
|
lastState = state
|
||||||
|
val name = when (state) {
|
||||||
|
TelephonyManager.CALL_STATE_RINGING -> "ringing"
|
||||||
|
TelephonyManager.CALL_STATE_OFFHOOK -> "offhook"
|
||||||
|
TelephonyManager.CALL_STATE_IDLE -> "idle"
|
||||||
|
else -> return
|
||||||
|
}
|
||||||
|
Log.i(TAG, "Telefon-State: $name")
|
||||||
|
val params = Arguments.createMap().apply { putString("state", name) }
|
||||||
|
try {
|
||||||
|
reactApplicationContext.getJSModule(DeviceEventManagerModule.RCTDeviceEventEmitter::class.java)
|
||||||
|
.emit("PhoneCallStateChanged", params)
|
||||||
|
} catch (e: Exception) {
|
||||||
|
Log.w(TAG, "Event-emit fehlgeschlagen: ${e.message}")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
@ReactMethod fun addListener(eventName: String) {}
|
||||||
|
@ReactMethod fun removeListeners(count: Int) {}
|
||||||
|
}
|
||||||
|
|
@ -0,0 +1,16 @@
|
||||||
|
package com.ariacockpit
|
||||||
|
|
||||||
|
import com.facebook.react.ReactPackage
|
||||||
|
import com.facebook.react.bridge.NativeModule
|
||||||
|
import com.facebook.react.bridge.ReactApplicationContext
|
||||||
|
import com.facebook.react.uimanager.ViewManager
|
||||||
|
|
||||||
|
class PhoneCallPackage : ReactPackage {
|
||||||
|
override fun createNativeModules(reactContext: ReactApplicationContext): List<NativeModule> {
|
||||||
|
return listOf(PhoneCallModule(reactContext))
|
||||||
|
}
|
||||||
|
|
||||||
|
override fun createViewManagers(reactContext: ReactApplicationContext): List<ViewManager<*, *>> {
|
||||||
|
return emptyList()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
@ -167,10 +167,23 @@ export CI=true
|
||||||
|
|
||||||
if [ "$MODE" = "debug" ]; then
|
if [ "$MODE" = "debug" ]; then
|
||||||
./gradlew assembleDebug
|
./gradlew assembleDebug
|
||||||
APK_PATH="app/build/outputs/apk/debug/app-debug.apk"
|
OUT_DIR="app/build/outputs/apk/debug"
|
||||||
else
|
else
|
||||||
./gradlew assembleRelease
|
./gradlew assembleRelease
|
||||||
APK_PATH="app/build/outputs/apk/release/app-release.apk"
|
OUT_DIR="app/build/outputs/apk/release"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Mit ABI-Splits heisst die APK z.B. app-arm64-v8a-release.apk statt
|
||||||
|
# app-release.apk. arm64-v8a-Variante zuerst probieren (das ist unser
|
||||||
|
# Standard), Universal-APK als Fallback falls Splits deaktiviert sind.
|
||||||
|
if [ -f "$OUT_DIR/app-arm64-v8a-${MODE}.apk" ]; then
|
||||||
|
APK_PATH="$OUT_DIR/app-arm64-v8a-${MODE}.apk"
|
||||||
|
elif [ -f "$OUT_DIR/app-${MODE}.apk" ]; then
|
||||||
|
APK_PATH="$OUT_DIR/app-${MODE}.apk"
|
||||||
|
else
|
||||||
|
echo -e "${RED}Keine passende APK in $OUT_DIR gefunden${NC}"
|
||||||
|
cd ..
|
||||||
|
exit 1
|
||||||
fi
|
fi
|
||||||
|
|
||||||
cd ..
|
cd ..
|
||||||
|
|
|
||||||
|
|
@ -1,6 +1,6 @@
|
||||||
{
|
{
|
||||||
"name": "aria-cockpit",
|
"name": "aria-cockpit",
|
||||||
"version": "0.0.6.0",
|
"version": "0.0.7.1",
|
||||||
"private": true,
|
"private": true,
|
||||||
"scripts": {
|
"scripts": {
|
||||||
"android": "react-native run-android",
|
"android": "react-native run-android",
|
||||||
|
|
@ -24,9 +24,7 @@
|
||||||
"react-native-camera-kit": "^13.0.0",
|
"react-native-camera-kit": "^13.0.0",
|
||||||
"@react-native-async-storage/async-storage": "^1.21.0",
|
"@react-native-async-storage/async-storage": "^1.21.0",
|
||||||
"react-native-fs": "^2.20.0",
|
"react-native-fs": "^2.20.0",
|
||||||
"react-native-audio-recorder-player": "^3.6.7",
|
"react-native-audio-recorder-player": "^3.6.7"
|
||||||
"@picovoice/porcupine-react-native": "3.0.5",
|
|
||||||
"@picovoice/react-native-voice-processor": "1.2.3"
|
|
||||||
},
|
},
|
||||||
"devDependencies": {
|
"devDependencies": {
|
||||||
"typescript": "^5.3.3",
|
"typescript": "^5.3.3",
|
||||||
|
|
|
||||||
|
|
@ -72,13 +72,28 @@ interface Props {
|
||||||
const MessageText: React.FC<Props> = ({ text, style }) => {
|
const MessageText: React.FC<Props> = ({ text, style }) => {
|
||||||
const segments = React.useMemo(() => tokenize(text), [text]);
|
const segments = React.useMemo(() => tokenize(text), [text]);
|
||||||
return (
|
return (
|
||||||
<Text style={style} selectable>
|
<Text
|
||||||
|
style={style}
|
||||||
|
selectable
|
||||||
|
// dataDetectorType ist Android-only und macht Phone/URL/Email zusaetzlich
|
||||||
|
// ueber System-Detection klickbar — als Fallback falls unsere Regex-
|
||||||
|
// Tokens nicht passen.
|
||||||
|
dataDetectorType="all"
|
||||||
|
>
|
||||||
{segments.map((seg, i) => {
|
{segments.map((seg, i) => {
|
||||||
if (seg.kind === 'text') {
|
if (seg.kind === 'text') {
|
||||||
return <Text key={i}>{seg.text}</Text>;
|
return <Text key={i} selectable>{seg.text}</Text>;
|
||||||
}
|
}
|
||||||
return (
|
return (
|
||||||
<Text key={i} style={LINK_STYLE} onPress={() => onPress(seg)}>
|
<Text
|
||||||
|
key={i}
|
||||||
|
selectable
|
||||||
|
style={LINK_STYLE}
|
||||||
|
onPress={() => onPress(seg)}
|
||||||
|
// Long-Press soll an den Parent durch fuer Selection
|
||||||
|
onLongPress={undefined}
|
||||||
|
suppressHighlighting={false}
|
||||||
|
>
|
||||||
{seg.text}
|
{seg.text}
|
||||||
</Text>
|
</Text>
|
||||||
);
|
);
|
||||||
|
|
|
||||||
|
|
@ -25,6 +25,7 @@ import RNFS from 'react-native-fs';
|
||||||
import rvs, { RVSMessage, ConnectionState } from '../services/rvs';
|
import rvs, { RVSMessage, ConnectionState } from '../services/rvs';
|
||||||
import audioService from '../services/audio';
|
import audioService from '../services/audio';
|
||||||
import wakeWordService from '../services/wakeword';
|
import wakeWordService from '../services/wakeword';
|
||||||
|
import phoneCallService from '../services/phoneCall';
|
||||||
import updateService from '../services/updater';
|
import updateService from '../services/updater';
|
||||||
import VoiceButton from '../components/VoiceButton';
|
import VoiceButton from '../components/VoiceButton';
|
||||||
import FileUpload, { FileData } from '../components/FileUpload';
|
import FileUpload, { FileData } from '../components/FileUpload';
|
||||||
|
|
@ -104,6 +105,8 @@ const ChatScreen: React.FC = () => {
|
||||||
const [showCameraUpload, setShowCameraUpload] = useState(false);
|
const [showCameraUpload, setShowCameraUpload] = useState(false);
|
||||||
const [gpsEnabled, setGpsEnabled] = useState(false);
|
const [gpsEnabled, setGpsEnabled] = useState(false);
|
||||||
const [wakeWordActive, setWakeWordActive] = useState(false);
|
const [wakeWordActive, setWakeWordActive] = useState(false);
|
||||||
|
// Genauer State (off/armed/conversing) fuer UI-Feedback am Button
|
||||||
|
const [wakeWordState, setWakeWordState] = useState<'off' | 'armed' | 'conversing'>('off');
|
||||||
const [fullscreenImage, setFullscreenImage] = useState<string | null>(null);
|
const [fullscreenImage, setFullscreenImage] = useState<string | null>(null);
|
||||||
const [searchQuery, setSearchQuery] = useState('');
|
const [searchQuery, setSearchQuery] = useState('');
|
||||||
const [searchVisible, setSearchVisible] = useState(false);
|
const [searchVisible, setSearchVisible] = useState(false);
|
||||||
|
|
@ -154,6 +157,24 @@ const ChatScreen: React.FC = () => {
|
||||||
// Wake Word: einmalig laden + Porcupine vorbereiten (wenn Access Key gesetzt)
|
// Wake Word: einmalig laden + Porcupine vorbereiten (wenn Access Key gesetzt)
|
||||||
useEffect(() => {
|
useEffect(() => {
|
||||||
wakeWordService.loadFromStorage().catch(() => {});
|
wakeWordService.loadFromStorage().catch(() => {});
|
||||||
|
const unsub = wakeWordService.onStateChange((s) => {
|
||||||
|
setWakeWordState(s);
|
||||||
|
setWakeWordActive(s !== 'off');
|
||||||
|
// Conversation-Focus an Wake-Word-State koppeln: solange wir aktiv im
|
||||||
|
// Dialog sind, soll Spotify dauerhaft gepaust bleiben (auch ueber
|
||||||
|
// Render-Pausen + zwischen Antworten hinweg). Sobald wir zurueck nach
|
||||||
|
// 'armed' oder 'off' fallen, darf Spotify wieder.
|
||||||
|
if (s === 'conversing') audioService.acquireConversationFocus();
|
||||||
|
else audioService.releaseConversationFocus();
|
||||||
|
});
|
||||||
|
return () => unsub();
|
||||||
|
}, []);
|
||||||
|
|
||||||
|
// Anruf-Erkennung: TTS pausieren wenn das Telefon klingelt
|
||||||
|
useEffect(() => {
|
||||||
|
phoneCallService.start().catch(err =>
|
||||||
|
console.warn('[Chat] phoneCall.start fehlgeschlagen', err));
|
||||||
|
return () => { phoneCallService.stop().catch(() => {}); };
|
||||||
}, []);
|
}, []);
|
||||||
|
|
||||||
// ttsCanPlayRef live aktuell halten — Closure in onMessage unten liest
|
// ttsCanPlayRef live aktuell halten — Closure in onMessage unten liest
|
||||||
|
|
@ -263,15 +284,35 @@ const ChatScreen: React.FC = () => {
|
||||||
if (message.type === 'chat') {
|
if (message.type === 'chat') {
|
||||||
const sender = (message.payload.sender as string) || '';
|
const sender = (message.payload.sender as string) || '';
|
||||||
|
|
||||||
// STT-Ergebnis: Transkribierten Text in die Sprach-Bubble schreiben
|
// STT-Ergebnis: Transkribierten Text in die Sprach-Bubble schreiben.
|
||||||
|
// WICHTIG: Nur die ERSTE noch unaufgeloeste Aufnahme matchen — sonst
|
||||||
|
// wuerde bei zwei kurz hintereinander gesendeten Audios beide Bubbles
|
||||||
|
// den gleichen Text bekommen (Bug: zweite Antwort ueberschreibt erste).
|
||||||
if (sender === 'stt') {
|
if (sender === 'stt') {
|
||||||
const sttText = (message.payload.text as string) || '';
|
const sttText = (message.payload.text as string) || '';
|
||||||
if (sttText) {
|
if (sttText) {
|
||||||
setMessages(prev => prev.map(m =>
|
setMessages(prev => {
|
||||||
|
const idx = prev.findIndex(m =>
|
||||||
m.sender === 'user' && m.text.includes('Spracheingabe wird verarbeitet')
|
m.sender === 'user' && m.text.includes('Spracheingabe wird verarbeitet')
|
||||||
? { ...m, text: `\uD83C\uDFA4 ${sttText}` }
|
);
|
||||||
: m
|
const newText = `\uD83C\uDFA4 ${sttText}`;
|
||||||
));
|
if (idx < 0) {
|
||||||
|
// Defensiv: wenn keine Placeholder im State (z.B. weil sie nie
|
||||||
|
// hinzugefuegt wurde oder schon durch ein anderes Update verloren
|
||||||
|
// ging), die Sprachnachricht trotzdem als neue Bubble einfuegen.
|
||||||
|
// Sonst kommt ARIAs Antwort ohne sichtbare User-Nachricht.
|
||||||
|
return capMessages([...prev, {
|
||||||
|
id: nextId(),
|
||||||
|
sender: 'user',
|
||||||
|
text: newText,
|
||||||
|
timestamp: message.timestamp,
|
||||||
|
attachments: [{ type: 'audio', name: 'Sprachaufnahme' }],
|
||||||
|
}]);
|
||||||
|
}
|
||||||
|
const next = prev.slice();
|
||||||
|
next[idx] = { ...next[idx], text: newText };
|
||||||
|
return next;
|
||||||
|
});
|
||||||
}
|
}
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
|
@ -572,6 +613,8 @@ const ChatScreen: React.FC = () => {
|
||||||
};
|
};
|
||||||
setMessages(prev => capMessages([...prev, userMsg]));
|
setMessages(prev => capMessages([...prev, userMsg]));
|
||||||
|
|
||||||
|
console.log('[Chat] sende mit voice=%s speed=%s',
|
||||||
|
localXttsVoiceRef.current || '(default)', ttsSpeedRef.current);
|
||||||
// An RVS senden — mit geraetelokaler Voice (Bridge nutzt sie fuer die Antwort)
|
// An RVS senden — mit geraetelokaler Voice (Bridge nutzt sie fuer die Antwort)
|
||||||
rvs.send('chat', {
|
rvs.send('chat', {
|
||||||
text,
|
text,
|
||||||
|
|
@ -603,6 +646,8 @@ const ChatScreen: React.FC = () => {
|
||||||
base64: result.base64,
|
base64: result.base64,
|
||||||
durationMs: result.durationMs,
|
durationMs: result.durationMs,
|
||||||
mimeType: result.mimeType,
|
mimeType: result.mimeType,
|
||||||
|
voice: localXttsVoiceRef.current,
|
||||||
|
speed: ttsSpeedRef.current,
|
||||||
...(location && { location }),
|
...(location && { location }),
|
||||||
});
|
});
|
||||||
}, [getCurrentLocation]);
|
}, [getCurrentLocation]);
|
||||||
|
|
@ -1000,7 +1045,10 @@ const ChatScreen: React.FC = () => {
|
||||||
style={[styles.wakeWordBtn, wakeWordActive && styles.wakeWordBtnActive]}
|
style={[styles.wakeWordBtn, wakeWordActive && styles.wakeWordBtnActive]}
|
||||||
onPress={toggleWakeWord}
|
onPress={toggleWakeWord}
|
||||||
>
|
>
|
||||||
<Text style={styles.wakeWordIcon}>{wakeWordActive ? '👂' : '🔇'}</Text>
|
<Text style={styles.wakeWordIcon}>
|
||||||
|
{wakeWordState === 'conversing' ? '🎙️' :
|
||||||
|
wakeWordState === 'armed' ? '👂' : '🔇'}
|
||||||
|
</Text>
|
||||||
</TouchableOpacity>
|
</TouchableOpacity>
|
||||||
</>
|
</>
|
||||||
)}
|
)}
|
||||||
|
|
|
||||||
|
|
@ -41,9 +41,9 @@ import {
|
||||||
TTS_SPEED_STORAGE_KEY,
|
TTS_SPEED_STORAGE_KEY,
|
||||||
} from '../services/audio';
|
} from '../services/audio';
|
||||||
import wakeWordService, {
|
import wakeWordService, {
|
||||||
BUILTIN_KEYWORDS,
|
WAKE_KEYWORDS,
|
||||||
|
KEYWORD_LABELS,
|
||||||
DEFAULT_KEYWORD,
|
DEFAULT_KEYWORD,
|
||||||
WAKE_ACCESS_KEY_STORAGE,
|
|
||||||
WAKE_KEYWORD_STORAGE,
|
WAKE_KEYWORD_STORAGE,
|
||||||
} from '../services/wakeword';
|
} from '../services/wakeword';
|
||||||
import ModeSelector from '../components/ModeSelector';
|
import ModeSelector from '../components/ModeSelector';
|
||||||
|
|
@ -103,8 +103,6 @@ const SettingsScreen: React.FC = () => {
|
||||||
const [vadSilenceSec, setVadSilenceSec] = useState<number>(VAD_SILENCE_DEFAULT_SEC);
|
const [vadSilenceSec, setVadSilenceSec] = useState<number>(VAD_SILENCE_DEFAULT_SEC);
|
||||||
const [convWindowSec, setConvWindowSec] = useState<number>(CONV_WINDOW_DEFAULT_SEC);
|
const [convWindowSec, setConvWindowSec] = useState<number>(CONV_WINDOW_DEFAULT_SEC);
|
||||||
const [ttsSpeed, setTtsSpeed] = useState<number>(TTS_SPEED_DEFAULT);
|
const [ttsSpeed, setTtsSpeed] = useState<number>(TTS_SPEED_DEFAULT);
|
||||||
const [wakeAccessKey, setWakeAccessKey] = useState<string>('');
|
|
||||||
const [wakeAccessKeyVisible, setWakeAccessKeyVisible] = useState(false);
|
|
||||||
const [wakeKeyword, setWakeKeyword] = useState<string>(DEFAULT_KEYWORD);
|
const [wakeKeyword, setWakeKeyword] = useState<string>(DEFAULT_KEYWORD);
|
||||||
const [wakeStatus, setWakeStatus] = useState<string>('');
|
const [wakeStatus, setWakeStatus] = useState<string>('');
|
||||||
const [editingPath, setEditingPath] = useState(false);
|
const [editingPath, setEditingPath] = useState(false);
|
||||||
|
|
@ -164,11 +162,8 @@ const SettingsScreen: React.FC = () => {
|
||||||
if (isFinite(n) && n >= TTS_SPEED_MIN && n <= TTS_SPEED_MAX) setTtsSpeed(n);
|
if (isFinite(n) && n >= TTS_SPEED_MIN && n <= TTS_SPEED_MAX) setTtsSpeed(n);
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
AsyncStorage.getItem(WAKE_ACCESS_KEY_STORAGE).then(saved => {
|
|
||||||
if (saved) setWakeAccessKey(saved);
|
|
||||||
});
|
|
||||||
AsyncStorage.getItem(WAKE_KEYWORD_STORAGE).then(saved => {
|
AsyncStorage.getItem(WAKE_KEYWORD_STORAGE).then(saved => {
|
||||||
if (saved) setWakeKeyword(saved);
|
if (saved && (WAKE_KEYWORDS as readonly string[]).includes(saved)) setWakeKeyword(saved);
|
||||||
});
|
});
|
||||||
AsyncStorage.getItem('aria_xtts_voice').then(saved => {
|
AsyncStorage.getItem('aria_xtts_voice').then(saved => {
|
||||||
if (saved) setXttsVoice(saved);
|
if (saved) setXttsVoice(saved);
|
||||||
|
|
@ -678,44 +673,23 @@ const SettingsScreen: React.FC = () => {
|
||||||
</View>
|
</View>
|
||||||
</View>
|
</View>
|
||||||
|
|
||||||
{/* === Wake-Word (geraetelokal) === */}
|
{/* === Wake-Word (komplett on-device, openWakeWord) === */}
|
||||||
<Text style={styles.sectionTitle}>Wake-Word</Text>
|
<Text style={styles.sectionTitle}>Wake-Word</Text>
|
||||||
<View style={styles.card}>
|
<View style={styles.card}>
|
||||||
<Text style={styles.toggleHint}>
|
<Text style={styles.toggleHint}>
|
||||||
Wenn ein Picovoice-Access-Key eingetragen ist, hoert die App passiv
|
Lokale Erkennung via openWakeWord (ONNX, on-device). Kein API-Key,
|
||||||
auf das gewaehlte Wake-Word — du kannst dich mit anderen unterhalten,
|
kein Cloud-Roundtrip — Audio verlaesst das Geraet nicht. Wenn das Ohr
|
||||||
Musik laufen lassen und mit "{wakeKeyword}" eine Konversation mit
|
aktiv ist, hoerst du normal mit; sagst du das Wake-Word, startet eine
|
||||||
ARIA starten. Ohne Key oder bei Fehlschlag startet das Ohr direkt
|
Konversation mit ARIA.
|
||||||
eine Konversation (klassischer Modus).
|
|
||||||
</Text>
|
</Text>
|
||||||
|
|
||||||
<Text style={[styles.toggleLabel, {marginTop: 16}]}>Picovoice Access Key</Text>
|
|
||||||
<View style={{flexDirection: 'row', alignItems: 'center', gap: 8, marginTop: 6}}>
|
|
||||||
<TextInput
|
|
||||||
style={[styles.input, {flex: 1}]}
|
|
||||||
value={wakeAccessKey}
|
|
||||||
onChangeText={setWakeAccessKey}
|
|
||||||
placeholder="kostenlos auf console.picovoice.ai"
|
|
||||||
placeholderTextColor="#666680"
|
|
||||||
secureTextEntry={!wakeAccessKeyVisible}
|
|
||||||
autoCapitalize="none"
|
|
||||||
autoCorrect={false}
|
|
||||||
/>
|
|
||||||
<TouchableOpacity
|
|
||||||
onPress={() => setWakeAccessKeyVisible(v => !v)}
|
|
||||||
style={{padding: 8}}
|
|
||||||
>
|
|
||||||
<Text style={{fontSize: 18}}>{wakeAccessKeyVisible ? '🙈' : '👁'}</Text>
|
|
||||||
</TouchableOpacity>
|
|
||||||
</View>
|
|
||||||
|
|
||||||
<Text style={[styles.toggleLabel, {marginTop: 16}]}>Wake-Word</Text>
|
<Text style={[styles.toggleLabel, {marginTop: 16}]}>Wake-Word</Text>
|
||||||
<Text style={styles.toggleHint}>
|
<Text style={styles.toggleHint}>
|
||||||
Built-In: sofort verwendbar. "ARIA" als Custom-Keyword kommt spaeter
|
Eigene Wake-Words via openWakeWord-Notebook trainierbar (gratis).
|
||||||
ueber Diagnostic-Upload.
|
Custom-Upload ueber Diagnostic kommt in einer spaeteren Version.
|
||||||
</Text>
|
</Text>
|
||||||
<View style={{flexDirection: 'row', flexWrap: 'wrap', gap: 6, marginTop: 8}}>
|
<View style={{flexDirection: 'row', flexWrap: 'wrap', gap: 6, marginTop: 8}}>
|
||||||
{BUILTIN_KEYWORDS.map(kw => (
|
{WAKE_KEYWORDS.map(kw => (
|
||||||
<TouchableOpacity
|
<TouchableOpacity
|
||||||
key={kw}
|
key={kw}
|
||||||
style={[
|
style={[
|
||||||
|
|
@ -728,7 +702,7 @@ const SettingsScreen: React.FC = () => {
|
||||||
styles.keywordChipText,
|
styles.keywordChipText,
|
||||||
wakeKeyword === kw && styles.keywordChipTextActive,
|
wakeKeyword === kw && styles.keywordChipTextActive,
|
||||||
]}>
|
]}>
|
||||||
{kw}
|
{KEYWORD_LABELS[kw]}
|
||||||
</Text>
|
</Text>
|
||||||
</TouchableOpacity>
|
</TouchableOpacity>
|
||||||
))}
|
))}
|
||||||
|
|
@ -740,8 +714,8 @@ const SettingsScreen: React.FC = () => {
|
||||||
onPress={async () => {
|
onPress={async () => {
|
||||||
setWakeStatus('Initialisiere...');
|
setWakeStatus('Initialisiere...');
|
||||||
try {
|
try {
|
||||||
const ok = await wakeWordService.configure(wakeAccessKey, wakeKeyword);
|
const ok = await wakeWordService.configure(wakeKeyword);
|
||||||
setWakeStatus(ok ? `✅ "${wakeKeyword}" bereit` : '❌ Fehlgeschlagen — Access Key pruefen');
|
setWakeStatus(ok ? `✅ "${KEYWORD_LABELS[wakeKeyword as keyof typeof KEYWORD_LABELS]}" bereit` : '❌ Init-Fehler — Logs pruefen');
|
||||||
} catch (err: any) {
|
} catch (err: any) {
|
||||||
setWakeStatus('❌ ' + String(err?.message || err).slice(0, 80));
|
setWakeStatus('❌ ' + String(err?.message || err).slice(0, 80));
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -191,6 +191,19 @@ class AudioService {
|
||||||
private pcmBytesCollected: number = 0;
|
private pcmBytesCollected: number = 0;
|
||||||
private readonly PCM_MAX_CACHE_BYTES = 30 * 1024 * 1024; // 30MB
|
private readonly PCM_MAX_CACHE_BYTES = 30 * 1024 * 1024; // 30MB
|
||||||
|
|
||||||
|
// AudioFocus wird verzoegert freigegeben — wenn ARIA eine zweite Antwort
|
||||||
|
// direkt hinterherschickt (oder ein neuer Stream startet), bleibt Spotify
|
||||||
|
// pausiert. Ohne diese Verzoegerung springt Spotify im Mikro-Sekunden-Gap
|
||||||
|
// zwischen zwei Streams kurz wieder an.
|
||||||
|
private focusReleaseTimer: ReturnType<typeof setTimeout> | null = null;
|
||||||
|
private readonly FOCUS_RELEASE_DELAY_MS = 800;
|
||||||
|
|
||||||
|
// Conversation-Mode: solange aktiv (Wake-Word Status 'conversing' ODER
|
||||||
|
// wir wissen "ARIA spricht gerade in einem Multi-Turn-Dialog"), halten wir
|
||||||
|
// den AudioFocus DAUERHAFT. Der per-Stream-Release wird unterdrueckt,
|
||||||
|
// damit Spotify nicht in Render-Pausen oder zwischen Antworten zurueckkehrt.
|
||||||
|
private _conversationFocusActive: boolean = false;
|
||||||
|
|
||||||
// VAD State
|
// VAD State
|
||||||
private vadEnabled: boolean = false;
|
private vadEnabled: boolean = false;
|
||||||
private lastSpeechTime: number = 0;
|
private lastSpeechTime: number = 0;
|
||||||
|
|
@ -205,6 +218,58 @@ class AudioService {
|
||||||
this.recorder.setSubscriptionDuration(0.1); // 100ms Metering-Updates
|
this.recorder.setSubscriptionDuration(0.1); // 100ms Metering-Updates
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/** AudioFocus mit kleiner Verzoegerung freigeben — Spotify/YouTube
|
||||||
|
* springen sonst im Gap zwischen zwei TTS-Streams (oder wenn ARIA
|
||||||
|
* eine zweite Antwort direkt hinterherschickt) kurz wieder an.
|
||||||
|
* Im Conversation-Mode (Wake-Word conversing) wird das Release komplett
|
||||||
|
* unterdrueckt — der Focus bleibt fuer die ganze Konversation gehalten. */
|
||||||
|
private _releaseFocusDeferred(): void {
|
||||||
|
if (this._conversationFocusActive) {
|
||||||
|
this._cancelDeferredFocusRelease();
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
this._cancelDeferredFocusRelease();
|
||||||
|
this.focusReleaseTimer = setTimeout(() => {
|
||||||
|
this.focusReleaseTimer = null;
|
||||||
|
if (this._conversationFocusActive) return;
|
||||||
|
AudioFocus?.release().catch(() => {});
|
||||||
|
}, this.FOCUS_RELEASE_DELAY_MS);
|
||||||
|
}
|
||||||
|
|
||||||
|
private _cancelDeferredFocusRelease(): void {
|
||||||
|
if (this.focusReleaseTimer) {
|
||||||
|
clearTimeout(this.focusReleaseTimer);
|
||||||
|
this.focusReleaseTimer = null;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Conversation-Mode beginnt → AudioFocus dauerhaft halten (Spotify bleibt
|
||||||
|
* pausiert). Idempotent: mehrfaches Aufrufen ist sicher. */
|
||||||
|
acquireConversationFocus(): void {
|
||||||
|
if (this._conversationFocusActive) return;
|
||||||
|
this._conversationFocusActive = true;
|
||||||
|
this._cancelDeferredFocusRelease();
|
||||||
|
console.log('[Audio] Conversation-Focus aktiv (Spotify bleibt gepaust)');
|
||||||
|
AudioFocus?.requestDuck().catch(() => {});
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Conversation-Mode endet → Focus darf wieder freigegeben werden
|
||||||
|
* (verzoegert, damit eine direkt folgende Antwort nichts kaputtmacht). */
|
||||||
|
releaseConversationFocus(): void {
|
||||||
|
if (!this._conversationFocusActive) return;
|
||||||
|
this._conversationFocusActive = false;
|
||||||
|
console.log('[Audio] Conversation-Focus inaktiv');
|
||||||
|
this._releaseFocusDeferred();
|
||||||
|
}
|
||||||
|
|
||||||
|
/** TTS-Wiedergabe haart stoppen — z.B. wenn ein Anruf reinkommt.
|
||||||
|
* Released auch sofort den AudioFocus damit der Anruf-Klingelton hoerbar ist. */
|
||||||
|
haltAllPlayback(reason: string = ''): void {
|
||||||
|
console.log('[Audio] haltAllPlayback: %s', reason || '(no reason)');
|
||||||
|
this._conversationFocusActive = false;
|
||||||
|
this.stopPlayback();
|
||||||
|
}
|
||||||
|
|
||||||
// --- Berechtigungen ---
|
// --- Berechtigungen ---
|
||||||
|
|
||||||
async requestMicrophonePermission(): Promise<boolean> {
|
async requestMicrophonePermission(): Promise<boolean> {
|
||||||
|
|
@ -305,6 +370,7 @@ class AudioService {
|
||||||
this.setState('recording');
|
this.setState('recording');
|
||||||
|
|
||||||
// Andere Apps waehrend der Aufnahme pausieren (Musik, Videos etc.)
|
// Andere Apps waehrend der Aufnahme pausieren (Musik, Videos etc.)
|
||||||
|
this._cancelDeferredFocusRelease();
|
||||||
AudioFocus?.requestExclusive().catch(() => {});
|
AudioFocus?.requestExclusive().catch(() => {});
|
||||||
|
|
||||||
// VAD aktivieren — Stille-Dauer aus AsyncStorage (Settings-konfigurierbar).
|
// VAD aktivieren — Stille-Dauer aus AsyncStorage (Settings-konfigurierbar).
|
||||||
|
|
@ -328,11 +394,12 @@ class AudioService {
|
||||||
};
|
};
|
||||||
if (autoStop) {
|
if (autoStop) {
|
||||||
const vadSilenceMs = await loadVadSilenceMs();
|
const vadSilenceMs = await loadVadSilenceMs();
|
||||||
console.log('[Audio] VAD-Stille:', vadSilenceMs, 'ms');
|
console.log('[Audio] startRecording: autoStop=true, VAD-Stille=%dms, MAX=%dms',
|
||||||
|
vadSilenceMs, MAX_RECORDING_MS);
|
||||||
this.vadTimer = setInterval(() => {
|
this.vadTimer = setInterval(() => {
|
||||||
const silenceDuration = Date.now() - this.lastSpeechTime;
|
const silenceDuration = Date.now() - this.lastSpeechTime;
|
||||||
if (silenceDuration >= vadSilenceMs) {
|
if (silenceDuration >= vadSilenceMs) {
|
||||||
fireSilenceOnce(`VAD ${silenceDuration}ms Stille`);
|
fireSilenceOnce(`VAD ${silenceDuration}ms Stille (Schwelle=${vadSilenceMs}ms)`);
|
||||||
}
|
}
|
||||||
}, 200);
|
}, 200);
|
||||||
// Notbremse: Nach MAX_RECORDING_MS zwangsweise stoppen
|
// Notbremse: Nach MAX_RECORDING_MS zwangsweise stoppen
|
||||||
|
|
@ -386,8 +453,9 @@ class AudioService {
|
||||||
await this.recorder.stopRecorder();
|
await this.recorder.stopRecorder();
|
||||||
this.recorder.removeRecordBackListener();
|
this.recorder.removeRecordBackListener();
|
||||||
|
|
||||||
// Audio-Focus freigeben — andere Apps duerfen wieder
|
// Audio-Focus verzoegert freigeben — gleich kommt die TTS-Antwort,
|
||||||
AudioFocus?.release().catch(() => {});
|
// im Gap soll Spotify nicht hochkommen.
|
||||||
|
this._releaseFocusDeferred();
|
||||||
|
|
||||||
const durationMs = Date.now() - this.recordingStartTime;
|
const durationMs = Date.now() - this.recordingStartTime;
|
||||||
const hadSpeech = this.speechDetected;
|
const hadSpeech = this.speechDetected;
|
||||||
|
|
@ -459,7 +527,13 @@ class AudioService {
|
||||||
|
|
||||||
/** Einen PCM-Chunk aus einer audio_pcm Nachricht empfangen.
|
/** Einen PCM-Chunk aus einer audio_pcm Nachricht empfangen.
|
||||||
* silent=true → nur cachen, nicht abspielen (z.B. wenn TTS geraetelokal gemutet).
|
* silent=true → nur cachen, nicht abspielen (z.B. wenn TTS geraetelokal gemutet).
|
||||||
* Gibt bei final=true den Cache-Pfad zurueck (file://) oder '' wenn nicht gecached. */
|
* Gibt bei final=true den Cache-Pfad zurueck (file://) oder '' wenn nicht gecached.
|
||||||
|
*
|
||||||
|
* Wrapper serialisiert aufeinanderfolgende Chunk-Calls via Promise-Queue —
|
||||||
|
* sonst gabs bei kurzen Streams einen Race: final-Chunk konnte `end()` rufen
|
||||||
|
* BEVOR der vorherige `start()` im Native-Modul fertig war. Der Writer-
|
||||||
|
* Thread sah dann endRequested=true ohne jemals Chunks zu verarbeiten. */
|
||||||
|
private _pcmChunkQueue: Promise<any> = Promise.resolve();
|
||||||
async handlePcmChunk(payload: {
|
async handlePcmChunk(payload: {
|
||||||
base64: string;
|
base64: string;
|
||||||
sampleRate?: number;
|
sampleRate?: number;
|
||||||
|
|
@ -468,6 +542,24 @@ class AudioService {
|
||||||
chunk?: number;
|
chunk?: number;
|
||||||
final?: boolean;
|
final?: boolean;
|
||||||
silent?: boolean;
|
silent?: boolean;
|
||||||
|
}): Promise<string> {
|
||||||
|
const p = this._pcmChunkQueue.then(() => this._handlePcmChunkImpl(payload)).catch(err => {
|
||||||
|
console.warn('[Audio] handlePcmChunk queued err:', err);
|
||||||
|
return '';
|
||||||
|
});
|
||||||
|
// Chain only on the side effect — callers still get the per-call result
|
||||||
|
this._pcmChunkQueue = p;
|
||||||
|
return p;
|
||||||
|
}
|
||||||
|
|
||||||
|
private async _handlePcmChunkImpl(payload: {
|
||||||
|
base64: string;
|
||||||
|
sampleRate?: number;
|
||||||
|
channels?: number;
|
||||||
|
messageId?: string;
|
||||||
|
chunk?: number;
|
||||||
|
final?: boolean;
|
||||||
|
silent?: boolean;
|
||||||
}): Promise<string> {
|
}): Promise<string> {
|
||||||
const silent = !!payload.silent;
|
const silent = !!payload.silent;
|
||||||
if (!silent && !PcmStreamPlayer) {
|
if (!silent && !PcmStreamPlayer) {
|
||||||
|
|
@ -510,6 +602,7 @@ class AudioService {
|
||||||
this.pcmStreamActive = false;
|
this.pcmStreamActive = false;
|
||||||
return '';
|
return '';
|
||||||
}
|
}
|
||||||
|
this._cancelDeferredFocusRelease();
|
||||||
AudioFocus?.requestDuck().catch(() => {});
|
AudioFocus?.requestDuck().catch(() => {});
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
@ -528,11 +621,12 @@ class AudioService {
|
||||||
if (isFinal) {
|
if (isFinal) {
|
||||||
if (!silent) {
|
if (!silent) {
|
||||||
// end() resolved jetzt erst wenn der native Writer-Thread fertig
|
// end() resolved jetzt erst wenn der native Writer-Thread fertig
|
||||||
// ist (alle Samples ausgespielt) — danach erst AudioFocus freigeben,
|
// ist (alle Samples ausgespielt) — danach AudioFocus verzoegert
|
||||||
// damit Spotify/YouTube nicht waehrend des Pre-Roll-Ausklangs
|
// freigeben, damit Spotify/YouTube nicht im Mikro-Gap zwischen zwei
|
||||||
// wieder aufdrehen.
|
// ARIA-Antworten wieder hochdrehen. Wenn ein neuer Stream innerhalb
|
||||||
|
// FOCUS_RELEASE_DELAY_MS startet, wird das Release abgebrochen.
|
||||||
try { await PcmStreamPlayer!.end(); } catch {}
|
try { await PcmStreamPlayer!.end(); } catch {}
|
||||||
AudioFocus?.release().catch(() => {});
|
this._releaseFocusDeferred();
|
||||||
}
|
}
|
||||||
this.pcmStreamActive = false;
|
this.pcmStreamActive = false;
|
||||||
|
|
||||||
|
|
@ -636,8 +730,9 @@ class AudioService {
|
||||||
private async _playNext(): Promise<void> {
|
private async _playNext(): Promise<void> {
|
||||||
if (this.audioQueue.length === 0) {
|
if (this.audioQueue.length === 0) {
|
||||||
this.isPlaying = false;
|
this.isPlaying = false;
|
||||||
// Audio-Focus abgeben → andere Apps volle Lautstaerke
|
// Audio-Focus verzoegert abgeben → wenn gleich noch eine Antwort kommt,
|
||||||
AudioFocus?.release().catch(() => {});
|
// bleibt Spotify pausiert.
|
||||||
|
this._releaseFocusDeferred();
|
||||||
// Alle Audio-Teile abgespielt → Listener benachrichtigen
|
// Alle Audio-Teile abgespielt → Listener benachrichtigen
|
||||||
this.playbackFinishedListeners.forEach(cb => cb());
|
this.playbackFinishedListeners.forEach(cb => cb());
|
||||||
return;
|
return;
|
||||||
|
|
@ -645,6 +740,7 @@ class AudioService {
|
||||||
|
|
||||||
// Beim ersten Playback-Start: andere Apps ducken
|
// Beim ersten Playback-Start: andere Apps ducken
|
||||||
if (!this.isPlaying) {
|
if (!this.isPlaying) {
|
||||||
|
this._cancelDeferredFocusRelease();
|
||||||
AudioFocus?.requestDuck().catch(() => {});
|
AudioFocus?.requestDuck().catch(() => {});
|
||||||
}
|
}
|
||||||
this.isPlaying = true;
|
this.isPlaying = true;
|
||||||
|
|
@ -730,7 +826,8 @@ class AudioService {
|
||||||
this.pcmBytesCollected = 0;
|
this.pcmBytesCollected = 0;
|
||||||
this.pcmMessageId = '';
|
this.pcmMessageId = '';
|
||||||
}
|
}
|
||||||
// Audio-Focus freigeben
|
// Audio-Focus sofort freigeben — User hat explizit abgebrochen
|
||||||
|
this._cancelDeferredFocusRelease();
|
||||||
AudioFocus?.release().catch(() => {});
|
AudioFocus?.release().catch(() => {});
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,108 @@
|
||||||
|
/**
|
||||||
|
* PhoneCall-Service — pausiert die TTS-Wiedergabe wenn das Telefon klingelt
|
||||||
|
* oder ein Anruf laeuft. Native-Bindung an PhoneCallModule.kt.
|
||||||
|
*
|
||||||
|
* Bei "ringing" oder "offhook" wird audioService.haltAllPlayback() gerufen —
|
||||||
|
* ARIA verstummt sofort. Nach dem Auflegen passiert nichts automatisch
|
||||||
|
* (Audio kommt nicht zurueck), der User muesste die Antwort manuell
|
||||||
|
* nochmal anfordern (Play-Button auf der Nachricht).
|
||||||
|
*
|
||||||
|
* Permission READ_PHONE_STATE muss vom Nutzer einmalig erteilt werden —
|
||||||
|
* wenn nicht, failed start() leise und der Rest funktioniert wie bisher.
|
||||||
|
*/
|
||||||
|
|
||||||
|
import {
|
||||||
|
NativeEventEmitter,
|
||||||
|
NativeModules,
|
||||||
|
PermissionsAndroid,
|
||||||
|
Platform,
|
||||||
|
ToastAndroid,
|
||||||
|
} from 'react-native';
|
||||||
|
import audioService from './audio';
|
||||||
|
|
||||||
|
interface PhoneCallNative {
|
||||||
|
start(): Promise<boolean>;
|
||||||
|
stop(): Promise<boolean>;
|
||||||
|
}
|
||||||
|
|
||||||
|
const { PhoneCall } = NativeModules as { PhoneCall?: PhoneCallNative };
|
||||||
|
|
||||||
|
type PhoneState = 'idle' | 'ringing' | 'offhook';
|
||||||
|
|
||||||
|
class PhoneCallService {
|
||||||
|
private started: boolean = false;
|
||||||
|
private subscription: { remove: () => void } | null = null;
|
||||||
|
private lastState: PhoneState = 'idle';
|
||||||
|
|
||||||
|
async start(): Promise<boolean> {
|
||||||
|
if (this.started || !PhoneCall) return false;
|
||||||
|
if (Platform.OS !== 'android') return false;
|
||||||
|
|
||||||
|
// Runtime-Permission holen (nur einmal noetig)
|
||||||
|
try {
|
||||||
|
const granted = await PermissionsAndroid.request(
|
||||||
|
PermissionsAndroid.PERMISSIONS.READ_PHONE_STATE,
|
||||||
|
{
|
||||||
|
title: 'ARIA Cockpit — Anruf-Erkennung',
|
||||||
|
message: 'Damit ARIA bei einem eingehenden Anruf nicht weiterredet, '
|
||||||
|
+ 'darf die App den Anruf-Status sehen (Klingeln/Aktiv/Aufgelegt). '
|
||||||
|
+ 'Es werden keine Anrufdaten gelesen oder gespeichert.',
|
||||||
|
buttonPositive: 'Erlauben',
|
||||||
|
buttonNegative: 'Spaeter',
|
||||||
|
},
|
||||||
|
);
|
||||||
|
if (granted !== PermissionsAndroid.RESULTS.GRANTED) {
|
||||||
|
console.warn('[PhoneCall] READ_PHONE_STATE Permission abgelehnt');
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
} catch (err) {
|
||||||
|
console.warn('[PhoneCall] Permission-Anfrage gescheitert', err);
|
||||||
|
}
|
||||||
|
|
||||||
|
try {
|
||||||
|
const ok = await PhoneCall.start();
|
||||||
|
if (!ok) {
|
||||||
|
console.warn('[PhoneCall] Native start() lieferte false (Permission?)');
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
const emitter = new NativeEventEmitter(NativeModules.PhoneCall as any);
|
||||||
|
this.subscription = emitter.addListener('PhoneCallStateChanged', (e: { state: PhoneState }) => {
|
||||||
|
this._onStateChanged(e.state);
|
||||||
|
});
|
||||||
|
this.started = true;
|
||||||
|
console.log('[PhoneCall] Listener aktiv');
|
||||||
|
return true;
|
||||||
|
} catch (err: any) {
|
||||||
|
console.warn('[PhoneCall] start gescheitert:', err?.message || err);
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async stop(): Promise<void> {
|
||||||
|
if (!this.started || !PhoneCall) return;
|
||||||
|
try {
|
||||||
|
this.subscription?.remove();
|
||||||
|
this.subscription = null;
|
||||||
|
await PhoneCall.stop();
|
||||||
|
} catch {}
|
||||||
|
this.started = false;
|
||||||
|
this.lastState = 'idle';
|
||||||
|
}
|
||||||
|
|
||||||
|
private _onStateChanged(state: PhoneState): void {
|
||||||
|
if (state === this.lastState) return;
|
||||||
|
console.log('[PhoneCall] State: %s → %s', this.lastState, state);
|
||||||
|
this.lastState = state;
|
||||||
|
if (state === 'ringing' || state === 'offhook') {
|
||||||
|
audioService.haltAllPlayback(`Telefon-State: ${state}`);
|
||||||
|
ToastAndroid.show(
|
||||||
|
state === 'ringing' ? 'Anruf — ARIA pausiert' : 'Im Gespraech — ARIA pausiert',
|
||||||
|
ToastAndroid.SHORT,
|
||||||
|
);
|
||||||
|
}
|
||||||
|
// idle: nichts automatisch — User soll nichts unbeabsichtigt re-triggern
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
const phoneCallService = new PhoneCallService();
|
||||||
|
export default phoneCallService;
|
||||||
|
|
@ -29,6 +29,11 @@ class UpdateService {
|
||||||
private downloading = false;
|
private downloading = false;
|
||||||
|
|
||||||
constructor() {
|
constructor() {
|
||||||
|
// Beim Start alte APK-Reste aus dem Cache wegraeumen — wenn diese App
|
||||||
|
// laeuft, sind frueher heruntergeladene APKs entweder schon installiert
|
||||||
|
// oder unvollstaendig gewesen. Spart sonst pro Update 20-30MB auf dem Handy.
|
||||||
|
this.cleanupOldApks().catch(() => {});
|
||||||
|
|
||||||
// Auf update_available Nachrichten lauschen
|
// Auf update_available Nachrichten lauschen
|
||||||
rvs.onMessage((msg: RVSMessage) => {
|
rvs.onMessage((msg: RVSMessage) => {
|
||||||
if (msg.type === 'update_available' as any) {
|
if (msg.type === 'update_available' as any) {
|
||||||
|
|
@ -45,6 +50,30 @@ class UpdateService {
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/** Raeumt alte heruntergeladene APK-Dateien aus dem Cache auf. */
|
||||||
|
private async cleanupOldApks(): Promise<void> {
|
||||||
|
try {
|
||||||
|
const files = await RNFS.readDir(RNFS.CachesDirectoryPath);
|
||||||
|
const apks = files.filter(f => /\.apk$/i.test(f.name));
|
||||||
|
let freed = 0;
|
||||||
|
for (const f of apks) {
|
||||||
|
try {
|
||||||
|
const size = parseInt(f.size as any, 10) || 0;
|
||||||
|
await RNFS.unlink(f.path);
|
||||||
|
freed += size;
|
||||||
|
console.log(`[Update] Alte APK geloescht: ${f.name} (${(size / 1024 / 1024).toFixed(1)}MB)`);
|
||||||
|
} catch (err: any) {
|
||||||
|
console.warn(`[Update] APK-Loeschen fehlgeschlagen: ${f.name} (${err?.message || err})`);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (apks.length > 0) {
|
||||||
|
console.log(`[Update] Cleanup fertig: ${apks.length} APKs entfernt, ${(freed / 1024 / 1024).toFixed(1)}MB freigegeben`);
|
||||||
|
}
|
||||||
|
} catch (err: any) {
|
||||||
|
console.warn(`[Update] Cleanup-Fehler: ${err?.message || err}`);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
/** Bei App-Start Update pruefen */
|
/** Bei App-Start Update pruefen */
|
||||||
checkForUpdate(): void {
|
checkForUpdate(): void {
|
||||||
if (this.checking) return;
|
if (this.checking) return;
|
||||||
|
|
@ -111,6 +140,10 @@ class UpdateService {
|
||||||
});
|
});
|
||||||
});
|
});
|
||||||
|
|
||||||
|
// Vor dem Schreiben alte APKs im Cache wegraeumen — falls mehrere
|
||||||
|
// Updates in einer Session gezogen werden
|
||||||
|
await this.cleanupOldApks();
|
||||||
|
|
||||||
// Base64 als APK-Datei speichern
|
// Base64 als APK-Datei speichern
|
||||||
const destPath = `${RNFS.CachesDirectoryPath}/${apkData.fileName}`;
|
const destPath = `${RNFS.CachesDirectoryPath}/${apkData.fileName}`;
|
||||||
await RNFS.writeFile(destPath, apkData.base64, 'base64');
|
await RNFS.writeFile(destPath, apkData.base64, 'base64');
|
||||||
|
|
|
||||||
|
|
@ -1,21 +1,26 @@
|
||||||
/**
|
/**
|
||||||
* Gespraechsmodus / Wake Word Service
|
* Gespraechsmodus / Wake Word Service
|
||||||
*
|
*
|
||||||
|
* Wake-Word-Engine: openWakeWord (https://github.com/dscripka/openWakeWord),
|
||||||
|
* komplett on-device via ONNX Runtime in Native-Kotlin (siehe
|
||||||
|
* OpenWakeWordModule.kt + assets/openwakeword/). Kein API-Key, kein Cloud-
|
||||||
|
* Roundtrip, kein Cent Lizenzgebuehren.
|
||||||
|
*
|
||||||
* Drei Zustaende:
|
* Drei Zustaende:
|
||||||
* off — Ohr aus, nichts laeuft
|
* off — Ohr aus, nichts laeuft
|
||||||
* armed — Ohr aktiv, Porcupine hoert passiv auf das Wake-Word.
|
* armed — Ohr aktiv, openWakeWord hoert passiv auf das Wake-Word.
|
||||||
* Das Mikro ist von Porcupine belegt; AudioRecorder ist aus.
|
* Das Mikro ist von OpenWakeWord belegt; AudioRecorder ist aus.
|
||||||
* conversing — Wake-Word getriggert (oder Ohr-Tap ohne Wake-Word):
|
* conversing — Wake-Word getriggert (oder Ohr-Tap manuell):
|
||||||
* aktive Konversation. Porcupine pausiert (gibt Mikro frei),
|
* aktive Konversation. OpenWakeWord pausiert (gibt Mikro frei),
|
||||||
* AudioRecorder uebernimmt fuer die Aufnahme.
|
* AudioRecorder uebernimmt fuer die Aufnahme.
|
||||||
* Nach jeder ARIA-Antwort oeffnet das Mikro fuer X Sekunden
|
* Nach jeder ARIA-Antwort oeffnet das Mikro fuer X Sekunden
|
||||||
* (Conversation-Window). Stille im Fenster → zurueck zu armed.
|
* (Conversation-Window). Stille im Fenster → zurueck zu armed.
|
||||||
*
|
*
|
||||||
* Wake-Word fallback: ist kein Picovoice-Access-Key gesetzt, geht 'start'
|
* Faellt das Native-Modul aus (alte App-Version, ONNX-Init-Fehler), geht
|
||||||
* direkt in 'conversing' (klassischer Gespraechsmodus). 'endConversation'
|
* 'start' direkt in 'conversing' (klassischer Direkt-Aufnahme-Modus).
|
||||||
* geht dann nach 'off' statt 'armed'.
|
|
||||||
*/
|
*/
|
||||||
|
|
||||||
|
import { NativeEventEmitter, NativeModules, ToastAndroid } from 'react-native';
|
||||||
import AsyncStorage from '@react-native-async-storage/async-storage';
|
import AsyncStorage from '@react-native-async-storage/async-storage';
|
||||||
|
|
||||||
type WakeWordCallback = () => void;
|
type WakeWordCallback = () => void;
|
||||||
|
|
@ -23,105 +28,113 @@ type StateCallback = (state: WakeWordState) => void;
|
||||||
|
|
||||||
export type WakeWordState = 'off' | 'armed' | 'conversing';
|
export type WakeWordState = 'off' | 'armed' | 'conversing';
|
||||||
|
|
||||||
export const WAKE_ACCESS_KEY_STORAGE = 'aria_wake_access_key';
|
|
||||||
export const WAKE_KEYWORD_STORAGE = 'aria_wake_keyword';
|
export const WAKE_KEYWORD_STORAGE = 'aria_wake_keyword';
|
||||||
|
|
||||||
/** Built-In Keywords von Picovoice — pre-trained, sofort einsetzbar.
|
/** Verfuegbare Wake-Words — entsprechen den .onnx Dateien in
|
||||||
* Custom Keywords (z.B. "ARIA") brauchen ein .ppn File aus der Picovoice
|
* android/app/src/main/assets/openwakeword/. Custom-Keywords (eigenes
|
||||||
* Console — wird spaeter ueber Diagnostic uploadbar. */
|
* Training via openwakeword Notebook) muessen aktuell als Asset eingebaut
|
||||||
export const BUILTIN_KEYWORDS = [
|
* werden — Diagnostic-Upload ist Phase 2. */
|
||||||
'jarvis',
|
export const WAKE_KEYWORDS = [
|
||||||
|
'hey_jarvis',
|
||||||
'computer',
|
'computer',
|
||||||
'picovoice',
|
|
||||||
'porcupine',
|
|
||||||
'bumblebee',
|
|
||||||
'terminator',
|
|
||||||
'alexa',
|
'alexa',
|
||||||
'hey google',
|
'hey_mycroft',
|
||||||
'ok google',
|
'hey_rhasspy',
|
||||||
'hey siri',
|
|
||||||
] as const;
|
] as const;
|
||||||
export type BuiltinKeyword = typeof BUILTIN_KEYWORDS[number];
|
export type WakeKeyword = typeof WAKE_KEYWORDS[number];
|
||||||
export const DEFAULT_KEYWORD: BuiltinKeyword = 'jarvis';
|
export const DEFAULT_KEYWORD: WakeKeyword = 'hey_jarvis';
|
||||||
|
|
||||||
|
/** Hilfs-Mapping fuer die Anzeige im UI. */
|
||||||
|
export const KEYWORD_LABELS: Record<WakeKeyword, string> = {
|
||||||
|
hey_jarvis: 'Hey Jarvis',
|
||||||
|
computer: 'Computer',
|
||||||
|
alexa: 'Alexa',
|
||||||
|
hey_mycroft: 'Hey Mycroft',
|
||||||
|
hey_rhasspy: 'Hey Rhasspy',
|
||||||
|
};
|
||||||
|
|
||||||
|
// Detection-Tuning — kann in Settings spaeter konfigurierbar werden.
|
||||||
|
const DEFAULT_THRESHOLD = 0.5;
|
||||||
|
const DEFAULT_PATIENCE = 2;
|
||||||
|
const DEFAULT_DEBOUNCE_MS = 1500;
|
||||||
|
|
||||||
|
interface OpenWakeWordModule {
|
||||||
|
init(modelName: string, threshold: number, patience: number, debounceMs: number): Promise<boolean>;
|
||||||
|
start(): Promise<boolean>;
|
||||||
|
stop(): Promise<boolean>;
|
||||||
|
dispose(): Promise<boolean>;
|
||||||
|
isAvailable(): Promise<boolean>;
|
||||||
|
}
|
||||||
|
|
||||||
|
const { OpenWakeWord } = NativeModules as { OpenWakeWord?: OpenWakeWordModule };
|
||||||
|
|
||||||
class WakeWordService {
|
class WakeWordService {
|
||||||
private state: WakeWordState = 'off';
|
private state: WakeWordState = 'off';
|
||||||
private wakeCallbacks: WakeWordCallback[] = [];
|
private wakeCallbacks: WakeWordCallback[] = [];
|
||||||
private stateCallbacks: StateCallback[] = [];
|
private stateCallbacks: StateCallback[] = [];
|
||||||
|
|
||||||
// Picovoice Manager (lazy, da Native Module nicht in jedem Build verfuegbar ist)
|
private keyword: WakeKeyword = DEFAULT_KEYWORD;
|
||||||
private porcupine: any = null;
|
private nativeReady: boolean = false;
|
||||||
private accessKey: string = '';
|
|
||||||
private keyword: string = DEFAULT_KEYWORD;
|
|
||||||
private initInProgress: Promise<boolean> | null = null;
|
private initInProgress: Promise<boolean> | null = null;
|
||||||
|
private eventSub: { remove: () => void } | null = null;
|
||||||
|
|
||||||
/** Beim App-Start aufrufen — laedt Settings, baut Porcupine wenn Key da ist. */
|
/** Beim App-Start aufrufen — laedt Settings, baut Native-Modul. */
|
||||||
async loadFromStorage(): Promise<void> {
|
async loadFromStorage(): Promise<void> {
|
||||||
try {
|
try {
|
||||||
const k = await AsyncStorage.getItem(WAKE_ACCESS_KEY_STORAGE);
|
|
||||||
const w = await AsyncStorage.getItem(WAKE_KEYWORD_STORAGE);
|
const w = await AsyncStorage.getItem(WAKE_KEYWORD_STORAGE);
|
||||||
this.accessKey = (k || '').trim();
|
const wt = (w || DEFAULT_KEYWORD).trim() as WakeKeyword;
|
||||||
this.keyword = (w || DEFAULT_KEYWORD).trim();
|
this.keyword = (WAKE_KEYWORDS as readonly string[]).includes(wt) ? wt : DEFAULT_KEYWORD;
|
||||||
if (this.accessKey) {
|
await this.initNative();
|
||||||
// Vorinitialisieren — wirft sich nicht durch wenn etwas fehlt
|
|
||||||
await this.initPorcupine();
|
|
||||||
}
|
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
console.warn('[WakeWord] loadFromStorage', err);
|
console.warn('[WakeWord] loadFromStorage', err);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/** Settings-Wechsel — neuer Key oder Keyword. Re-Init Porcupine. */
|
/** Settings-Wechsel: anderes Wake-Word. Re-Init des Native-Moduls. */
|
||||||
async configure(accessKey: string, keyword: string): Promise<boolean> {
|
async configure(keyword: string): Promise<boolean> {
|
||||||
this.accessKey = (accessKey || '').trim();
|
const next: WakeKeyword = (WAKE_KEYWORDS as readonly string[]).includes(keyword)
|
||||||
this.keyword = (keyword || DEFAULT_KEYWORD).trim();
|
? (keyword as WakeKeyword)
|
||||||
await AsyncStorage.setItem(WAKE_ACCESS_KEY_STORAGE, this.accessKey);
|
: DEFAULT_KEYWORD;
|
||||||
await AsyncStorage.setItem(WAKE_KEYWORD_STORAGE, this.keyword);
|
this.keyword = next;
|
||||||
|
await AsyncStorage.setItem(WAKE_KEYWORD_STORAGE, next);
|
||||||
|
|
||||||
// Laufende Instanz stoppen
|
// Laufende Instanz stoppen + neu initialisieren
|
||||||
await this.disposePorcupine();
|
await this.disposeNative();
|
||||||
if (!this.accessKey) return false;
|
const ok = await this.initNative();
|
||||||
|
if (!ok) {
|
||||||
// Neu initialisieren
|
ToastAndroid.show(
|
||||||
return this.initPorcupine();
|
`Wake-Word "${KEYWORD_LABELS[next]}" konnte nicht initialisiert werden — Logs pruefen`,
|
||||||
|
ToastAndroid.LONG,
|
||||||
|
);
|
||||||
|
}
|
||||||
|
return ok;
|
||||||
}
|
}
|
||||||
|
|
||||||
private async initPorcupine(): Promise<boolean> {
|
private async initNative(): Promise<boolean> {
|
||||||
|
if (!OpenWakeWord) {
|
||||||
|
console.warn('[WakeWord] OpenWakeWord Native-Modul nicht verfuegbar — Direkt-Aufnahme-Fallback aktiv');
|
||||||
|
this.nativeReady = false;
|
||||||
|
return false;
|
||||||
|
}
|
||||||
if (this.initInProgress) return this.initInProgress;
|
if (this.initInProgress) return this.initInProgress;
|
||||||
this.initInProgress = (async () => {
|
this.initInProgress = (async () => {
|
||||||
try {
|
try {
|
||||||
const porcupineRN = require('@picovoice/porcupine-react-native');
|
await OpenWakeWord.init(this.keyword, DEFAULT_THRESHOLD, DEFAULT_PATIENCE, DEFAULT_DEBOUNCE_MS);
|
||||||
const { PorcupineManager, BuiltInKeywords } = porcupineRN;
|
// Subscribe nur einmal
|
||||||
// Manche Porcupine-Versionen wollen das BuiltInKeywords-Enum (Objekt
|
if (!this.eventSub) {
|
||||||
// mit keys wie JARVIS, COMPUTER, HEY_GOOGLE), andere akzeptieren
|
const emitter = new NativeEventEmitter(NativeModules.OpenWakeWord);
|
||||||
// den String direkt. Mappen mit Fallback auf String:
|
this.eventSub = emitter.addListener('WakeWordDetected', () => {
|
||||||
const enumKey = this.keyword.toUpperCase().replace(/\s+/g, '_');
|
console.log('[WakeWord] Native Detection-Event empfangen');
|
||||||
const kw = (BuiltInKeywords && BuiltInKeywords[enumKey]) || this.keyword;
|
|
||||||
console.log('[WakeWord] Porcupine init: keyword=%s (resolved=%s)',
|
|
||||||
this.keyword, typeof kw === 'string' ? kw : '[enum]');
|
|
||||||
this.porcupine = await PorcupineManager.fromBuiltInKeywords(
|
|
||||||
this.accessKey,
|
|
||||||
[kw],
|
|
||||||
(keywordIndex: number) => {
|
|
||||||
console.log('[WakeWord] Porcupine callback fired (index=%d)', keywordIndex);
|
|
||||||
this.onWakeDetected().catch(err =>
|
this.onWakeDetected().catch(err =>
|
||||||
console.warn('[WakeWord] onWakeDetected crashed:', err));
|
console.warn('[WakeWord] onWakeDetected crashed:', err));
|
||||||
},
|
});
|
||||||
// Error handler (wenn Porcupine im Background-Thread crashed,
|
}
|
||||||
// z.B. beim Audio-Engine-Konflikt mit audio-recorder-player)
|
this.nativeReady = true;
|
||||||
(error: any) => {
|
console.log('[WakeWord] Init OK (model=%s)', this.keyword);
|
||||||
console.warn('[WakeWord] Porcupine runtime error:', error?.message || error);
|
|
||||||
// Nicht in Loop crashen — state zurueck auf off damit der User
|
|
||||||
// mit dem Aufnahme-Button wieder normal arbeiten kann
|
|
||||||
this.setState('off');
|
|
||||||
this.disposePorcupine().catch(() => {});
|
|
||||||
},
|
|
||||||
);
|
|
||||||
console.log('[WakeWord] Porcupine init OK (keyword=%s)', this.keyword);
|
|
||||||
return true;
|
return true;
|
||||||
} catch (err) {
|
} catch (err: any) {
|
||||||
console.warn('[WakeWord] Porcupine init fehlgeschlagen:', err);
|
console.warn('[WakeWord] Init fehlgeschlagen:', err?.message || err);
|
||||||
this.porcupine = null;
|
this.nativeReady = false;
|
||||||
return false;
|
return false;
|
||||||
} finally {
|
} finally {
|
||||||
this.initInProgress = null;
|
this.initInProgress = null;
|
||||||
|
|
@ -130,30 +143,39 @@ class WakeWordService {
|
||||||
return this.initInProgress;
|
return this.initInProgress;
|
||||||
}
|
}
|
||||||
|
|
||||||
private async disposePorcupine() {
|
private async disposeNative(): Promise<void> {
|
||||||
if (this.porcupine) {
|
if (!OpenWakeWord) return;
|
||||||
try { await this.porcupine.stop(); } catch {}
|
try { await OpenWakeWord.dispose(); } catch {}
|
||||||
try { await this.porcupine.delete(); } catch {}
|
this.nativeReady = false;
|
||||||
this.porcupine = null;
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/** Ohr-Button gedrueckt — startet passives Lauschen oder direkt Konversation. */
|
/** Ohr-Button gedrueckt — startet passives Lauschen oder direkt Konversation. */
|
||||||
async start(): Promise<boolean> {
|
async start(): Promise<boolean> {
|
||||||
if (this.state !== 'off') return true;
|
if (this.state !== 'off') return true;
|
||||||
if (this.porcupine) {
|
if (this.nativeReady && OpenWakeWord) {
|
||||||
// Passives Lauschen via Porcupine
|
|
||||||
try {
|
try {
|
||||||
await this.porcupine.start();
|
await OpenWakeWord.start();
|
||||||
console.log('[WakeWord] armed — warte auf Wake Word "%s"', this.keyword);
|
console.log('[WakeWord] armed — warte auf "%s"', this.keyword);
|
||||||
|
ToastAndroid.show(`Lausche auf "${KEYWORD_LABELS[this.keyword]}"`, ToastAndroid.SHORT);
|
||||||
this.setState('armed');
|
this.setState('armed');
|
||||||
return true;
|
return true;
|
||||||
} catch (err) {
|
} catch (err: any) {
|
||||||
console.warn('[WakeWord] Porcupine start fehlgeschlagen — Fallback Direkt-Konversation:', err);
|
console.warn('[WakeWord] start fehlgeschlagen — Fallback Direkt-Aufnahme:',
|
||||||
|
err?.message || err);
|
||||||
|
ToastAndroid.show(
|
||||||
|
`Wake-Word-Start failed: ${err?.message || err}`,
|
||||||
|
ToastAndroid.LONG,
|
||||||
|
);
|
||||||
}
|
}
|
||||||
|
} else {
|
||||||
|
console.warn('[WakeWord] Native-Modul nicht bereit — Direkt-Aufnahme-Fallback');
|
||||||
|
ToastAndroid.show(
|
||||||
|
'Wake-Word nicht aktiv — direkte Aufnahme startet (Mikro hoert mit)',
|
||||||
|
ToastAndroid.LONG,
|
||||||
|
);
|
||||||
}
|
}
|
||||||
// Fallback: direkt in die Konversation
|
// Fallback: direkt in Konversation
|
||||||
console.log('[WakeWord] Konversation startet sofort (kein Wake-Word)');
|
console.log('[WakeWord] Direkt-Aufnahme startet (kein Wake-Word)');
|
||||||
this.setState('conversing');
|
this.setState('conversing');
|
||||||
setTimeout(() => {
|
setTimeout(() => {
|
||||||
if (this.state === 'conversing') {
|
if (this.state === 'conversing') {
|
||||||
|
|
@ -166,20 +188,20 @@ class WakeWordService {
|
||||||
/** Komplett ausschalten (Ohr abschalten) */
|
/** Komplett ausschalten (Ohr abschalten) */
|
||||||
async stop(): Promise<void> {
|
async stop(): Promise<void> {
|
||||||
console.log('[WakeWord] Ohr deaktiviert');
|
console.log('[WakeWord] Ohr deaktiviert');
|
||||||
if (this.porcupine) {
|
if (this.nativeReady && OpenWakeWord) {
|
||||||
try { await this.porcupine.stop(); } catch {}
|
try { await OpenWakeWord.stop(); } catch {}
|
||||||
}
|
}
|
||||||
this.setState('off');
|
this.setState('off');
|
||||||
}
|
}
|
||||||
|
|
||||||
/** Wake-Word getriggert: Porcupine pausieren, Konversation starten. */
|
/** Wake-Word getriggert: Native-Modul pausieren, Konversation starten. */
|
||||||
private async onWakeDetected(): Promise<void> {
|
private async onWakeDetected(): Promise<void> {
|
||||||
console.log('[WakeWord] Wake-Word "%s" erkannt!', this.keyword);
|
console.log('[WakeWord] Wake-Word "%s" erkannt!', this.keyword);
|
||||||
if (this.porcupine) {
|
ToastAndroid.show(`Wake-Word "${KEYWORD_LABELS[this.keyword]}" erkannt — sprich jetzt`, ToastAndroid.SHORT);
|
||||||
try { await this.porcupine.stop(); } catch {}
|
if (this.nativeReady && OpenWakeWord) {
|
||||||
|
try { await OpenWakeWord.stop(); } catch {}
|
||||||
}
|
}
|
||||||
this.setState('conversing');
|
this.setState('conversing');
|
||||||
// kurz warten damit Mikrofon frei ist
|
|
||||||
setTimeout(() => {
|
setTimeout(() => {
|
||||||
if (this.state === 'conversing') {
|
if (this.state === 'conversing') {
|
||||||
this.wakeCallbacks.forEach(cb => cb());
|
this.wakeCallbacks.forEach(cb => cb());
|
||||||
|
|
@ -188,15 +210,16 @@ class WakeWordService {
|
||||||
}
|
}
|
||||||
|
|
||||||
/** Konversation beenden — User hat im Window nichts gesagt.
|
/** Konversation beenden — User hat im Window nichts gesagt.
|
||||||
* Mit Wake-Word: zurueck zu 'armed' (Porcupine wieder an).
|
* Mit Wake-Word: zurueck zu 'armed' (Listener wieder an).
|
||||||
* Ohne: zurueck zu 'off'.
|
* Ohne: zurueck zu 'off'.
|
||||||
*/
|
*/
|
||||||
async endConversation(): Promise<void> {
|
async endConversation(): Promise<void> {
|
||||||
if (this.state !== 'conversing') return;
|
if (this.state !== 'conversing') return;
|
||||||
if (this.porcupine && this.accessKey) {
|
if (this.nativeReady && OpenWakeWord) {
|
||||||
try {
|
try {
|
||||||
await this.porcupine.start();
|
await OpenWakeWord.start();
|
||||||
console.log('[WakeWord] Konversation zu Ende — zurueck zu armed');
|
console.log('[WakeWord] Konversation zu Ende — zurueck zu armed');
|
||||||
|
ToastAndroid.show(`Lausche wieder auf "${KEYWORD_LABELS[this.keyword]}"`, ToastAndroid.SHORT);
|
||||||
this.setState('armed');
|
this.setState('armed');
|
||||||
return;
|
return;
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
|
|
@ -204,6 +227,7 @@ class WakeWordService {
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
console.log('[WakeWord] Konversation zu Ende — Ohr aus');
|
console.log('[WakeWord] Konversation zu Ende — Ohr aus');
|
||||||
|
ToastAndroid.show('Mikro aus', ToastAndroid.SHORT);
|
||||||
this.setState('off');
|
this.setState('off');
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -228,10 +252,10 @@ class WakeWordService {
|
||||||
}
|
}
|
||||||
|
|
||||||
hasWakeWord(): boolean {
|
hasWakeWord(): boolean {
|
||||||
return !!this.porcupine;
|
return this.nativeReady;
|
||||||
}
|
}
|
||||||
|
|
||||||
getKeyword(): string {
|
getKeyword(): WakeKeyword {
|
||||||
return this.keyword;
|
return this.keyword;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -551,6 +551,15 @@ class ARIABridge:
|
||||||
# Beeinflusst das Timeout fuer stt_request — bei "loading" warten wir laenger,
|
# Beeinflusst das Timeout fuer stt_request — bei "loading" warten wir laenger,
|
||||||
# weil das Modell beim ersten Request noch ~1-2 Min runtergeladen werden kann.
|
# weil das Modell beim ersten Request noch ~1-2 Min runtergeladen werden kann.
|
||||||
self._remote_stt_ready: bool = False
|
self._remote_stt_ready: bool = False
|
||||||
|
# Pending Files: wenn die App ein Bild + Text gleichzeitig schickt, kommen
|
||||||
|
# zwei separate RVS-Events ('file' und 'chat') — wir buffern die Files
|
||||||
|
# kurz und mergen sie mit dem nachfolgenden Chat-Text zu einer einzigen
|
||||||
|
# Anfrage an aria-core. Sonst antwortet ARIA zweimal (einmal "warte auf
|
||||||
|
# Anweisung" beim file, einmal auf den Chat-Text).
|
||||||
|
# Liste von Tuples: (file_path, name, file_type, size_kb, width, height)
|
||||||
|
self._pending_files: list[tuple[str, str, str, int, int, int]] = []
|
||||||
|
self._pending_files_flush_task: Optional[asyncio.Task] = None
|
||||||
|
self._PENDING_FILES_WINDOW_SEC: float = 0.8
|
||||||
|
|
||||||
def initialize(self) -> None:
|
def initialize(self) -> None:
|
||||||
"""Initialisiert alle Komponenten.
|
"""Initialisiert alle Komponenten.
|
||||||
|
|
@ -907,18 +916,13 @@ class ARIABridge:
|
||||||
logger.info("[core] TTS unterdrueckt (Modus: %s)", self.current_mode.config.name)
|
logger.info("[core] TTS unterdrueckt (Modus: %s)", self.current_mode.config.name)
|
||||||
return
|
return
|
||||||
|
|
||||||
# Voice bestimmen: App-Override fuer diesen Request > globale Default-Voice
|
# Voice bestimmen: App-Override (gesetzt durch letzten chat-Event) > globale
|
||||||
|
# Default-Voice. Der Override wird NICHT pro Antwort verbraucht — sonst nutzt
|
||||||
|
# eine Multi-Turn-Antwort von ARIA (Tool-Use + finale Antwort) ab dem zweiten
|
||||||
|
# TTS-Call wieder die alte Default-Stimme. Der Override bleibt gueltig bis
|
||||||
|
# zum naechsten chat-Event, wo er entweder ueberschrieben oder geloescht wird.
|
||||||
xtts_voice = self._next_voice_override or getattr(self, 'xtts_voice', '')
|
xtts_voice = self._next_voice_override or getattr(self, 'xtts_voice', '')
|
||||||
# Override verbrauchen (gilt nur fuer genau diese naechste Antwort)
|
|
||||||
if self._next_voice_override:
|
|
||||||
logger.info("[core] Nutze Voice-Override: %s", self._next_voice_override)
|
|
||||||
self._next_voice_override = None
|
|
||||||
|
|
||||||
# Speed ebenfalls aus App-Override nehmen (fallback 1.0)
|
|
||||||
xtts_speed = self._next_speed_override or 1.0
|
xtts_speed = self._next_speed_override or 1.0
|
||||||
if self._next_speed_override:
|
|
||||||
logger.info("[core] Nutze Speed-Override: %.2fx", self._next_speed_override)
|
|
||||||
self._next_speed_override = None
|
|
||||||
|
|
||||||
tts_text = tts_text_preview or text
|
tts_text = tts_text_preview or text
|
||||||
if not tts_text:
|
if not tts_text:
|
||||||
|
|
@ -942,7 +946,8 @@ class ARIABridge:
|
||||||
},
|
},
|
||||||
"timestamp": int(asyncio.get_event_loop().time() * 1000),
|
"timestamp": int(asyncio.get_event_loop().time() * 1000),
|
||||||
})
|
})
|
||||||
logger.info("[core] XTTS-Request gesendet (%s): '%s'", xtts_voice or "default", tts_text[:60])
|
logger.info("[core] XTTS-Request gesendet (voice=%s, speed=%.2fx): '%s'",
|
||||||
|
xtts_voice or "default", xtts_speed, tts_text[:60])
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.error("[core] XTTS-Request fehlgeschlagen: %s — kein Audio", e)
|
logger.error("[core] XTTS-Request fehlgeschlagen: %s — kein Audio", e)
|
||||||
|
|
||||||
|
|
@ -1023,6 +1028,51 @@ class ARIABridge:
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.debug("[session] Diagnostic nicht erreichbar (%s) — nutze '%s'", e, self._session_key)
|
logger.debug("[session] Diagnostic nicht erreichbar (%s) — nutze '%s'", e, self._session_key)
|
||||||
|
|
||||||
|
def _build_pending_files_message(self, user_text: str) -> str:
|
||||||
|
"""Baut eine Anweisung an aria-core aus den gepufferten Files + optionalem
|
||||||
|
User-Text. user_text leer → 'warte auf Anweisung'-Variante."""
|
||||||
|
parts: list[str] = []
|
||||||
|
for fp, name, ftype, kb, w, h in self._pending_files:
|
||||||
|
dim = f" {w}x{h}px" if (w and h) else ""
|
||||||
|
kind = "Bild" if ftype.startswith("image/") else "Datei"
|
||||||
|
parts.append(f"- {kind}: {name}{dim} ({ftype}, {kb}KB) liegt unter {fp}")
|
||||||
|
files_summary = "\n".join(parts)
|
||||||
|
n = len(self._pending_files)
|
||||||
|
anhang = "Anhang" if n == 1 else "Anhaenge"
|
||||||
|
if user_text:
|
||||||
|
return (f"Stefan hat dir {n} {anhang} geschickt:\n{files_summary}\n\n"
|
||||||
|
f"Er sagt dazu: \"{user_text}\"")
|
||||||
|
return (f"Stefan hat dir {n} {anhang} geschickt:\n{files_summary}\n\n"
|
||||||
|
f"Warte auf seine Anweisung was du damit tun sollst.")
|
||||||
|
|
||||||
|
async def _flush_pending_files_after(self, delay: float) -> None:
|
||||||
|
"""Wenn nach `delay`s kein chat-Text gekommen ist: Files alleine an
|
||||||
|
aria-core senden ('warte auf Anweisung'-Variante)."""
|
||||||
|
try:
|
||||||
|
await asyncio.sleep(delay)
|
||||||
|
except asyncio.CancelledError:
|
||||||
|
return
|
||||||
|
if not self._pending_files:
|
||||||
|
return
|
||||||
|
text = self._build_pending_files_message("")
|
||||||
|
self._pending_files = []
|
||||||
|
self._pending_files_flush_task = None
|
||||||
|
await self.send_to_core(text, source="app-file")
|
||||||
|
|
||||||
|
async def _flush_pending_files_with_text(self, user_text: str) -> bool:
|
||||||
|
"""Wenn ein chat-Text reinkommt waehrend Files gepuffert sind:
|
||||||
|
Files + Text zu einer einzigen aria-core-Nachricht mergen.
|
||||||
|
Returns True wenn gemerged wurde (Caller soll dann nicht nochmal senden)."""
|
||||||
|
if not self._pending_files:
|
||||||
|
return False
|
||||||
|
if self._pending_files_flush_task and not self._pending_files_flush_task.done():
|
||||||
|
self._pending_files_flush_task.cancel()
|
||||||
|
self._pending_files_flush_task = None
|
||||||
|
text = self._build_pending_files_message(user_text)
|
||||||
|
self._pending_files = []
|
||||||
|
await self.send_to_core(text, source="app-file+chat")
|
||||||
|
return True
|
||||||
|
|
||||||
async def send_to_core(self, text: str, source: str = "bridge") -> None:
|
async def send_to_core(self, text: str, source: str = "bridge") -> None:
|
||||||
"""Sendet Text an aria-core (OpenClaw chat.send Protokoll)."""
|
"""Sendet Text an aria-core (OpenClaw chat.send Protokoll)."""
|
||||||
if self.ws_core is None:
|
if self.ws_core is None:
|
||||||
|
|
@ -1168,19 +1218,30 @@ class ARIABridge:
|
||||||
if sender in ("aria", "stt"):
|
if sender in ("aria", "stt"):
|
||||||
return
|
return
|
||||||
text = payload.get("text", "")
|
text = payload.get("text", "")
|
||||||
# Voice-Override fuer die naechste ARIA-Antwort merken
|
# Voice-Override fuer Folgenachrichten setzen — gilt bis zum naechsten
|
||||||
voice_override = payload.get("voice", "")
|
# chat-Event. Leerer String "" = explizit Default-Voice (override loeschen).
|
||||||
if voice_override:
|
# Field nicht gesendet = vorherigen Override unveraendert lassen (z.B. wenn
|
||||||
self._next_voice_override = voice_override
|
# cancel_request oder anderer Service die App umgeht).
|
||||||
logger.info("[rvs] Voice-Override fuer naechste Antwort: %s", voice_override)
|
if "voice" in payload:
|
||||||
|
voice_override = payload.get("voice", "") or ""
|
||||||
|
self._next_voice_override = voice_override or None
|
||||||
|
logger.info("[rvs] Voice fuer Antworten: %s",
|
||||||
|
self._next_voice_override or "(Default)")
|
||||||
# Speed-Override (TTS-Wiedergabegeschwindigkeit, pro Geraet)
|
# Speed-Override (TTS-Wiedergabegeschwindigkeit, pro Geraet)
|
||||||
|
if "speed" in payload:
|
||||||
try:
|
try:
|
||||||
speed = float(payload.get("speed", 0) or 0)
|
speed = float(payload.get("speed", 0) or 0)
|
||||||
if 0.1 <= speed <= 5.0:
|
self._next_speed_override = speed if 0.1 <= speed <= 5.0 else None
|
||||||
self._next_speed_override = speed
|
|
||||||
except (TypeError, ValueError):
|
except (TypeError, ValueError):
|
||||||
pass
|
self._next_speed_override = None
|
||||||
if text:
|
if text:
|
||||||
|
# Wenn Files gerade gepuffert sind (Bild + Text gleichzeitig
|
||||||
|
# gesendet), mergen wir sie zu einer einzigen Anfrage statt
|
||||||
|
# zwei separater send_to_core-Calls.
|
||||||
|
merged = await self._flush_pending_files_with_text(text)
|
||||||
|
if merged:
|
||||||
|
logger.info("[rvs] App-Chat (mit Anhaengen): '%s'", text[:80])
|
||||||
|
else:
|
||||||
logger.info("[rvs] App-Chat: '%s'", text[:80])
|
logger.info("[rvs] App-Chat: '%s'", text[:80])
|
||||||
await self.send_to_core(text, source="app")
|
await self.send_to_core(text, source="app")
|
||||||
return
|
return
|
||||||
|
|
@ -1341,59 +1402,46 @@ class ARIABridge:
|
||||||
await self.ws_core.send(raw_message)
|
await self.ws_core.send(raw_message)
|
||||||
|
|
||||||
elif msg_type == "file":
|
elif msg_type == "file":
|
||||||
# Datei von der App → als Text-Nachricht an aria-core
|
# Datei von der App: speichern + zu Pending-Queue hinzufuegen.
|
||||||
|
# Wird mit dem nachfolgenden chat-Event (innerhalb PENDING_FILES_WINDOW)
|
||||||
|
# zu einer einzigen aria-core-Anfrage gemerged. Sonst antwortet ARIA
|
||||||
|
# zweimal: einmal "warte auf Anweisung" beim file, einmal auf den Chat.
|
||||||
file_name = payload.get("name", "unbekannt")
|
file_name = payload.get("name", "unbekannt")
|
||||||
file_type = payload.get("type", "")
|
file_type = payload.get("type", "")
|
||||||
file_b64 = payload.get("base64", "")
|
file_b64 = payload.get("base64", "")
|
||||||
file_size = payload.get("size", 0)
|
|
||||||
width = payload.get("width", 0)
|
width = payload.get("width", 0)
|
||||||
height = payload.get("height", 0)
|
height = payload.get("height", 0)
|
||||||
logger.info("[rvs] Datei empfangen: %s (%s, %dKB)",
|
logger.info("[rvs] Datei empfangen: %s (%s, %dKB)",
|
||||||
file_name, file_type, len(file_b64) // 1365 if file_b64 else 0)
|
file_name, file_type, len(file_b64) // 1365 if file_b64 else 0)
|
||||||
|
|
||||||
# Shared Volume: /shared/ ist in Bridge UND aria-core gemountet
|
|
||||||
SHARED_DIR = "/shared/uploads"
|
SHARED_DIR = "/shared/uploads"
|
||||||
os.makedirs(SHARED_DIR, exist_ok=True)
|
os.makedirs(SHARED_DIR, exist_ok=True)
|
||||||
|
|
||||||
if file_b64 and file_type.startswith("image/"):
|
if not file_b64:
|
||||||
# Bild in Shared Volume speichern
|
text = f"Stefan hat eine Datei gesendet ({file_name}, {file_type}) aber die Daten sind leer angekommen."
|
||||||
|
await self.send_to_core(text, source="app-file")
|
||||||
|
return
|
||||||
|
|
||||||
|
if file_type.startswith("image/"):
|
||||||
ext = ".jpg" if "jpeg" in file_type or "jpg" in file_type else ".png"
|
ext = ".jpg" if "jpeg" in file_type or "jpg" in file_type else ".png"
|
||||||
safe_name = f"img_{int(asyncio.get_event_loop().time())}_{file_name.replace('/', '_')}"
|
safe_name = f"img_{int(asyncio.get_event_loop().time())}_{file_name.replace('/', '_')}"
|
||||||
file_path = os.path.join(SHARED_DIR, safe_name if safe_name.endswith(ext) else safe_name + ext)
|
file_path = os.path.join(SHARED_DIR, safe_name if safe_name.endswith(ext) else safe_name + ext)
|
||||||
with open(file_path, "wb") as f:
|
else:
|
||||||
f.write(base64.b64decode(file_b64))
|
|
||||||
size_kb = len(file_b64) // 1365
|
|
||||||
logger.info("[rvs] Bild gespeichert: %s (%dKB)", file_path, size_kb)
|
|
||||||
# ERST an aria-core senden (wichtigster Schritt)
|
|
||||||
text = (f"Stefan hat dir ein Bild geschickt: {file_name}"
|
|
||||||
f"{f' ({width}x{height}px)' if width else ''}"
|
|
||||||
f", {size_kb}KB."
|
|
||||||
f" Das Bild liegt unter: {file_path}"
|
|
||||||
f" Warte auf Stefans Anweisung was du damit tun sollst.")
|
|
||||||
await self.send_to_core(text, source="app-file")
|
|
||||||
# Dann App informieren (optional, darf nicht crashen)
|
|
||||||
try:
|
|
||||||
await self._send_to_rvs({
|
|
||||||
"type": "file_saved",
|
|
||||||
"payload": {"name": file_name, "serverPath": file_path, "mimeType": file_type},
|
|
||||||
"timestamp": int(asyncio.get_event_loop().time() * 1000),
|
|
||||||
})
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning("[rvs] file_saved konnte nicht an App gesendet werden: %s", e)
|
|
||||||
elif file_b64:
|
|
||||||
# Andere Datei in Shared Volume speichern
|
|
||||||
safe_name = f"file_{int(asyncio.get_event_loop().time())}_{file_name.replace('/', '_')}"
|
safe_name = f"file_{int(asyncio.get_event_loop().time())}_{file_name.replace('/', '_')}"
|
||||||
file_path = os.path.join(SHARED_DIR, safe_name)
|
file_path = os.path.join(SHARED_DIR, safe_name)
|
||||||
with open(file_path, "wb") as f:
|
with open(file_path, "wb") as f:
|
||||||
f.write(base64.b64decode(file_b64))
|
f.write(base64.b64decode(file_b64))
|
||||||
size_kb = len(file_b64) // 1365
|
size_kb = len(file_b64) // 1365
|
||||||
logger.info("[rvs] Datei gespeichert: %s (%dKB)", file_path, size_kb)
|
logger.info("[rvs] Datei gespeichert: %s (%dKB)", file_path, size_kb)
|
||||||
# ERST an aria-core senden
|
|
||||||
text = (f"Stefan hat dir eine Datei geschickt: {file_name}"
|
# In Pending-Queue + Flush-Timer (anti-spam Buffering)
|
||||||
f" ({file_type}, {size_kb}KB)."
|
self._pending_files.append((file_path, file_name, file_type, size_kb, int(width or 0), int(height or 0)))
|
||||||
f" Die Datei liegt unter: {file_path}"
|
if self._pending_files_flush_task and not self._pending_files_flush_task.done():
|
||||||
f" Warte auf Stefans Anweisung was du damit tun sollst.")
|
self._pending_files_flush_task.cancel()
|
||||||
await self.send_to_core(text, source="app-file")
|
self._pending_files_flush_task = asyncio.create_task(
|
||||||
|
self._flush_pending_files_after(self._PENDING_FILES_WINDOW_SEC)
|
||||||
|
)
|
||||||
|
|
||||||
try:
|
try:
|
||||||
await self._send_to_rvs({
|
await self._send_to_rvs({
|
||||||
"type": "file_saved",
|
"type": "file_saved",
|
||||||
|
|
@ -1402,9 +1450,6 @@ class ARIABridge:
|
||||||
})
|
})
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.warning("[rvs] file_saved konnte nicht an App gesendet werden: %s", e)
|
logger.warning("[rvs] file_saved konnte nicht an App gesendet werden: %s", e)
|
||||||
else:
|
|
||||||
text = f"Stefan hat eine Datei gesendet ({file_name}, {file_type}) aber die Daten sind leer angekommen."
|
|
||||||
await self.send_to_core(text, source="app-file")
|
|
||||||
|
|
||||||
elif msg_type == "file_request":
|
elif msg_type == "file_request":
|
||||||
# App fordert eine Datei an (Re-Download nach Cache-Leerung)
|
# App fordert eine Datei an (Re-Download nach Cache-Leerung)
|
||||||
|
|
@ -1443,17 +1488,18 @@ class ARIABridge:
|
||||||
if not audio_b64:
|
if not audio_b64:
|
||||||
logger.warning("[rvs] Audio ohne Daten empfangen")
|
logger.warning("[rvs] Audio ohne Daten empfangen")
|
||||||
return
|
return
|
||||||
# Voice-Override fuer die kommende ARIA-Antwort (App-lokal gewaehlt)
|
# Voice-Override fuer Folgenachrichten — gleiche Semantik wie beim chat-Event.
|
||||||
voice_override = payload.get("voice", "")
|
if "voice" in payload:
|
||||||
if voice_override:
|
voice_override = payload.get("voice", "") or ""
|
||||||
self._next_voice_override = voice_override
|
self._next_voice_override = voice_override or None
|
||||||
logger.info("[rvs] Voice-Override (via Audio): %s", voice_override)
|
logger.info("[rvs] Voice fuer Antworten (via Audio): %s",
|
||||||
|
self._next_voice_override or "(Default)")
|
||||||
|
if "speed" in payload:
|
||||||
try:
|
try:
|
||||||
speed = float(payload.get("speed", 0) or 0)
|
speed = float(payload.get("speed", 0) or 0)
|
||||||
if 0.1 <= speed <= 5.0:
|
self._next_speed_override = speed if 0.1 <= speed <= 5.0 else None
|
||||||
self._next_speed_override = speed
|
|
||||||
except (TypeError, ValueError):
|
except (TypeError, ValueError):
|
||||||
pass
|
self._next_speed_override = None
|
||||||
logger.info("[rvs] Audio empfangen: %s, %dms, %dKB",
|
logger.info("[rvs] Audio empfangen: %s, %dms, %dKB",
|
||||||
mime_type, duration_ms, len(audio_b64) // 1365)
|
mime_type, duration_ms, len(audio_b64) // 1365)
|
||||||
asyncio.create_task(self._process_app_audio(audio_b64, mime_type))
|
asyncio.create_task(self._process_app_audio(audio_b64, mime_type))
|
||||||
|
|
|
||||||
|
|
@ -239,6 +239,8 @@ class F5Runner:
|
||||||
|
|
||||||
def _infer_blocking(self, gen_text: str, ref_wav: str, ref_text: str,
|
def _infer_blocking(self, gen_text: str, ref_wav: str, ref_text: str,
|
||||||
speed: float = 1.0) -> tuple[np.ndarray, int]:
|
speed: float = 1.0) -> tuple[np.ndarray, int]:
|
||||||
|
logger.info("infer() text=%d chars, speed=%.2f, cfg=%.2f, nfe=%d",
|
||||||
|
len(gen_text), speed, self.cfg_strength, self.nfe_step)
|
||||||
wav, sr, _ = self.model.infer(
|
wav, sr, _ = self.model.infer(
|
||||||
ref_file=ref_wav,
|
ref_file=ref_wav,
|
||||||
ref_text=ref_text,
|
ref_text=ref_text,
|
||||||
|
|
@ -507,7 +509,8 @@ async def _do_tts(ws, runner: F5Runner, text: str, voice: str,
|
||||||
ref_wav_str, ref_text = str(pair[0]), pair[1].read_text(encoding="utf-8").strip()
|
ref_wav_str, ref_text = str(pair[0]), pair[1].read_text(encoding="utf-8").strip()
|
||||||
|
|
||||||
sentences = split_sentences(text)
|
sentences = split_sentences(text)
|
||||||
logger.info("F5-TTS: %d Satz(e), voice=%s (%s)", len(sentences), voice or "default", ref_wav_str)
|
logger.info("F5-TTS: %d Satz(e), voice=%s, speed=%.2fx (%s)",
|
||||||
|
len(sentences), voice or "default", speed, ref_wav_str)
|
||||||
|
|
||||||
chunk_index = 0
|
chunk_index = 0
|
||||||
pcm_sr = TARGET_SR
|
pcm_sr = TARGET_SR
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue