Logo des Repositoriums
Zur Startseite
  • English
  • Deutsch
Anmelden
  1. Startseite
  2. SuUB
  3. Dissertationen
  4. Personalizing Myoelectric Silent Speech Interfaces via Cross-Speaker Training and Voice Timbre Control
 
Zitierlink DOI
10.26092/elib/5498

Personalizing Myoelectric Silent Speech Interfaces via Cross-Speaker Training and Voice Timbre Control

Veröffentlichungsdatum
2026-01-30
Autoren
Scheck, Kevin  
Betreuer
Schultz, Tanja  
Gutachter
Schultz, Tanja  
Nakamura, Satoshi
Zusammenfassung
Electromyography (EMG) signals, measuring muscle activity, are investigated for Silent Speech Interfaces (SSIs) to enable speech communication via silent articulation. The previous paradigm for EMG-to-Speech conversion relies on speaker-dependent models predicting acoustic features of speech from the same speaker providing EMG inputs. However, this approach makes SSI applications limited, as it 1) cannot be used to synthesize the personal voice of individuals unable to produce audible speech during EMG recording, 2) suffers from data scarcity, requiring each speaker to record a sizable corpus, and 3) leads to unintelligible speech in low-latency settings.

The problem of converting EMG signals to speech in personal voices (1) is addressed by using voice conversion methods that disentangle phonetic and voice timbre information. The proposed voice-adaptive EMG-to-Speech models predict speech content features, mostly reflecting phonetic content, from EMG signals and combine them with reference audio of the target voice for speech synthesis. Further evaluations demonstrate that such models can be trained using EMG signals of silent speech only.

The data scarcity problem (2) is addressed by several studies. For this purpose, EMG models are pre-trained with other biosignals, unlabeled EMG signals, and labeled EMG signals of multiple speakers, i.e., cross-speaker training. In particular, cross-speaker training improves average speech synthesis intelligibility, while eliminating the need to train speaker-specific models.

To improve EMG-to-Speech in low-latency settings (3), this work presents an end-to-end model which outperforms previous low-latency baselines in speech intelligibility and naturalness while generating speech in less than 20 ms algorithmic latency. Furthermore, combining the previously outlined contributions, this work introduces a unified model which can convert EMG signals of multiple speakers to selectable voices.
Schlagwörter
Silent Speech Interfaces

; 

Electromyography

; 

Speech Synthesis

; 

Voice Conversion

; 

Deep Learning
Institution
Universität Bremen  
Fachbereich
Fachbereich 03: Mathematik/Informatik (FB 03)  
Institute
Cognitive Systems Lab (CSL)  
Dokumenttyp
Dissertation
Lizenz
Alle Rechte vorbehalten
Sprache
Englisch
Dateien
Lade...
Vorschaubild
Name

Kevin-Scheck-Dissertation-Personalizing-Myoelectric-Silent-Speech-Interfaces-via-Cross-Speaker-Training-and-Voice-Timbre-Control.pdf

Size

41.29 MB

Format

Adobe PDF

Checksum

(MD5):5e5c67e2b234f67f6deae352593d1490

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Datenschutzbestimmungen
  • Endnutzervereinbarung
  • Feedback schicken