Speech Processes for Brain-Computer Interfaces

Herff, Christian Emanuel

Zitierlink URN

https://nbn-resolving.de/urn:nbn:de:gbv:46-00105648-17

Speech Processes for Brain-Computer Interfaces

Veröffentlichungsdatum

2016-12-12

Autoren

Herff, Christian Emanuel

Betreuer

Schultz, Tanja

Gutachter

Krusienski, Dean

Schultz, Tanja

Zusammenfassung

Speech interfaces have become widely used and are integrated in many applications and devices. However, speech interfaces require the user to produce intelligible speech, which might be hindered by loud environments, concern to bother bystanders or the general in- ability to produce speech due to disabilities. Decoding a usera s imagined speech instead of actual speech would solve this problem. Such a Brain-Computer Interface (BCI) based on imagined speech would enable fast and natural communication without the need to actually speak out loud. These interfaces could provide a voice to otherwise mute people. This dissertation investigates BCIs based on speech processes using functional Near In- frared Spectroscopy (fNIRS) and Electrocorticography (ECoG), two brain activity imaging modalities on opposing ends of an invasiveness scale. Brain activity data have low signal- to-noise ratio and complex spatio-temporal and spectral coherence. To analyze these data, techniques from the areas of machine learning, neuroscience and Automatic Speech Recog- nition are combined in this dissertation to facilitate robust classification of detailed speech processes while simultaneously illustrating the underlying neural processes. fNIRS is an imaging modality based on cerebral blood flow. It only requires affordable hardware and can be set up within minutes in a day-to-day environment. Therefore, it is ideally suited for convenient user interfaces. However, the hemodynamic processes measured by fNIRS are slow in nature and the technology therefore offers poor temporal resolution. We investigate speech in fNIRS and demonstrate classification of speech processes for BCIs based on fNIRS. ECoG provides ideal signal properties by invasively measuring electrical potentials artifact- free directly on the brain surface. High spatial resolution and temporal resolution down to millisecond sampling provide localized information with accurate enough timing to capture the fast process underlying speech production. This dissertation presents the Brain-to- Text system, which harnesses automatic speech recognition technology to decode a textual representation of continuous speech from ECoG. This could allow to compose messages or to issue commands through a BCI. While the decoding of a textual representation is unparalleled for device control and typing, direct communication is even more natural if the full expressive power of speech - including emphasis and prosody - could be provided. For this purpose, a second system is presented, which directly synthesizes neural signals into audible speech, which could enable conversation with friends and family through a BCI. Up to now, both systems, the Brain-to-Text and synthesis system are operating on audibly produced speech. To bridge the gap to the final frontier of neural prostheses based on imagined speech processes, we investigate the differences between audibly produced and imagined speech and present first results towards BCI from imagined speech processes. This dissertation demonstrates the usage of speech processes as a paradigm for BCI for the first time. Speech processes offer a fast and natural interaction paradigm which will help patients and healthy users alike to communicate with computers and with friends and family efficiently through BCIs.

Schlagwörter

Speech

;

brain

;

ecog

;

fnirs

;

electrocorticography

;

functional near infrared spectroscopy

;

brain-computer interface

;

bci

;

speech production

;

automatic speech recognition

Institution

Universität Bremen

Fachbereich

Fachbereich 03: Mathematik/Informatik (FB 03)

Dokumenttyp

Dissertation

Zweitveröffentlichung

Nein

Lizenz

Sprache

Englisch

Dateien

Name

00105648-1.pdf

Size

17.92 MB

Format

Adobe PDF

Checksum

(MD5):57a51bb79ad03d2b43c6273a95bfd86e