Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 758
Filtrar
2.
Annu Int Conf IEEE Eng Med Biol Soc ; 2020: 1783-1786, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-33018344

RESUMO

Children with cerebral palsy and complex communication needs face limitations in their access technology (AT) usage. Speech recognition software and conventional ATs (e.g., mechanical switches) can be insufficient for those with speech impairment and limited control of voluntary motion. Automatic recognition of head movements represents a promising pathway. Previous studies have shown the robustness of head pose estimation algorithms on adult participants, but further research is needed to use these methods with children. An algorithm for head movement recognition was implemented and evaluated on videos recorded in a naturalistic environment when children were playing a videogame. A face-tracking algorithm was used to detect the main facial landmarks. Head poses were then estimated using the Pose from Orthography and Scaling with Iterations (POSIT) algorithm and three head movements were classified through Hidden Markov Models (HMMs). Preliminary classification results obtained from the analysis of videos of five typically developing children showed an accuracy of up to 95.6% in predicting head movements.


Assuntos
Movimentos da Cabeça , Reconhecimento Psicológico , Adulto , Algoritmos , Criança , Face , Humanos , Interface para o Reconhecimento da Fala
4.
PLoS One ; 15(6): e0234908, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32559211

RESUMO

Accurate, automated extraction of clinical stroke information from unstructured text has several important applications. ICD-9/10 codes can misclassify ischemic stroke events and do not distinguish acuity or location. Expeditious, accurate data extraction could provide considerable improvement in identifying stroke in large datasets, triaging critical clinical reports, and quality improvement efforts. In this study, we developed and report a comprehensive framework studying the performance of simple and complex stroke-specific Natural Language Processing (NLP) and Machine Learning (ML) methods to determine presence, location, and acuity of ischemic stroke from radiographic text. We collected 60,564 Computed Tomography and Magnetic Resonance Imaging Radiology reports from 17,864 patients from two large academic medical centers. We used standard techniques to featurize unstructured text and developed neurovascular specific word GloVe embeddings. We trained various binary classification algorithms to identify stroke presence, location, and acuity using 75% of 1,359 expert-labeled reports. We validated our methods internally on the remaining 25% of reports and externally on 500 radiology reports from an entirely separate academic institution. In our internal population, GloVe word embeddings paired with deep learning (Recurrent Neural Networks) had the best discrimination of all methods for our three tasks (AUCs of 0.96, 0.98, 0.93 respectively). Simpler NLP approaches (Bag of Words) performed best with interpretable algorithms (Logistic Regression) for identifying ischemic stroke (AUC of 0.95), MCA location (AUC 0.96), and acuity (AUC of 0.90). Similarly, GloVe and Recurrent Neural Networks (AUC 0.92, 0.89, 0.93) generalized better in our external test set than BOW and Logistic Regression for stroke presence, location and acuity, respectively (AUC 0.89, 0.86, 0.80). Our study demonstrates a comprehensive assessment of NLP techniques for unstructured radiographic text. Our findings are suggestive that NLP/ML methods can be used to discriminate stroke features from large data cohorts for both clinical and research-related investigations.


Assuntos
Isquemia Encefálica/diagnóstico por imagem , Processamento de Imagem Assistida por Computador/métodos , Aprendizado de Máquina , Interface para o Reconhecimento da Fala , Acidente Vascular Cerebral/diagnóstico por imagem , Humanos , Gravidade do Paciente
5.
AJOB Neurosci ; 11(2): 105-112, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32228383

RESUMO

This article examines the ethical and policy implications of using voice computing and artificial intelligence to screen for mental health conditions in low income and minority populations. Mental health is unequally distributed among these groups, which is further exacerbated by increased barriers to psychiatric care. Advancements in voice computing and artificial intelligence promise increased screening and more sensitive diagnostic assessments. Machine learning algorithms have the capacity to identify vocal features that can screen those with depression. However, in order to screen for mental health pathology, computer algorithms must first be able to account for the fundamental differences in vocal characteristics between low income minorities and those who are not. While researchers have envisioned this technology as a beneficent tool, this technology could be repurposed to scale up discrimination or exploitation. Studies on the use of big data and predictive analytics demonstrate that low income minority populations already face significant discrimination. This article urges researchers developing AI tools for vulnerable populations to consider the full ethical, legal, and social impact of their work. Without a national, coherent framework of legal regulations and ethical guidelines to protect vulnerable populations, it will be difficult to limit AI applications to solely beneficial uses. Without such protections, vulnerable populations will rightfully be wary of participating in such studies which also will negatively impact the robustness of such tools. Thus, for research involving AI tools like voice computing, it is in the research community's interest to demand more guidance and regulatory oversight from the federal government.


Assuntos
Inteligência Artificial/ética , Bioética , Transtornos Mentais/diagnóstico , Pessoas Mentalmente Doentes , Grupos Minoritários , Pobreza , Interface para o Reconhecimento da Fala/ética , Humanos
6.
J Clin Nurs ; 29(13-14): 2125-2137, 2020 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-32243006

RESUMO

INTRODUCTION: Speech recognition technology (SRT) recognises an individual's spoken word signals through a microphone and subsequently processes the user's words into digital text by means of a computer. SRT remains well established and continues to grow in popularity among the various health disciplines. Many studies have been done to examine the effects of SRT on nursing documentation, however, no previous systematic review (SR) on the effects of SRT on accuracy and efficiency of nursing documentation was identified. AIMS AND METHODS: To systematically review the impact of speech recognition technology on the accuracy and efficiency of clinical nursing documentation. A SR was conducted that measures the accuracy and efficiency (time to complete documentation) of SRT on nursing documentation. An extensive search of the literature included Web of Science, CINAHL via EBSCO host, Cochrane Library, Embase, MEDLINE and Google Scholar. The PRISMA checklist screened eligible papers. The quality of each paper was critically appraised, data extracted and analysed/synthesised. RESULTS: A total of 10 studies were included. Various devices and systems have been used to examine the accuracy, efficiency and impact of SRT on nursing documentation. A positive impact of SRT with significant advances in accuracy/productivity of nursing documentation at the point of care was found. However, a substantial degree of initial costing, training requirements and studied interface modification to individual healthcare units are needful in incorporating SRT systems. CONCLUSIONS: Speech recognition technology when applied to nursing documentation could open up a promising new interface for data entry from the point of care, though the full potential of the technology has not been explored. RELEVANCE TO CLINICAL PRACTICE: The compatibility/effectiveness of SRT with existing computer systems remains understudied. SRT training, prompt on-site technical support, maintenance and upgrades cannot be underestimated towards achieving high-level accuracy and efficiency (time to complete documentation) with SRT.


Assuntos
Registros de Enfermagem , Interface para o Reconhecimento da Fala , Interface Usuário-Computador , Humanos , Percepção da Fala
7.
Proc Natl Acad Sci U S A ; 117(14): 7684-7689, 2020 04 07.
Artigo em Inglês | MEDLINE | ID: mdl-32205437

RESUMO

Automated speech recognition (ASR) systems, which use sophisticated machine-learning algorithms to convert spoken language to text, have become increasingly widespread, powering popular virtual assistants, facilitating automated closed captioning, and enabling digital dictation platforms for health care. Over the last several years, the quality of these systems has dramatically improved, due both to advances in deep learning and to the collection of large-scale datasets used to train the systems. There is concern, however, that these tools do not work equally well for all subgroups of the population. Here, we examine the ability of five state-of-the-art ASR systems-developed by Amazon, Apple, Google, IBM, and Microsoft-to transcribe structured interviews conducted with 42 white speakers and 73 black speakers. In total, this corpus spans five US cities and consists of 19.8 h of audio matched on the age and gender of the speaker. We found that all five ASR systems exhibited substantial racial disparities, with an average word error rate (WER) of 0.35 for black speakers compared with 0.19 for white speakers. We trace these disparities to the underlying acoustic models used by the ASR systems as the race gap was equally large on a subset of identical phrases spoken by black and white individuals in our corpus. We conclude by proposing strategies-such as using more diverse training datasets that include African American Vernacular English-to reduce these performance differences and ensure speech recognition technology is inclusive.


Assuntos
Racismo , Interface para o Reconhecimento da Fala , Adulto , Afro-Americanos , Automação , Grupo com Ancestrais do Continente Europeu , Humanos , Idioma , Percepção da Fala
8.
Artigo em Inglês | MEDLINE | ID: mdl-32092914

RESUMO

Mobile phone use while driving has become one of the leading causes of traffic accidents and poses a significant threat to public health. This study investigated the impact of speech-based texting and handheld texting (two difficulty levels in each task) on car-following performance in terms of time headway and collision avoidance capability; and further examined the relationship between time headway increase strategy and the corresponding accident frequency. Fifty-three participants completed the car-following experiment in a driving simulator. A Generalized Estimating Equation method was applied to develop the linear regression model for time headway and the binary logistic regression model for accident probability. The results of the model for time headway indicated that drivers adopted compensation behavior to offset the increased workload by increasing their time headway by 0.41 and 0.59 s while conducting speech-based texting and handheld texting, respectively. The model results for the rear-end accident probability showed that the accident probability increased by 2.34 and 3.56 times, respectively, during the use of speech-based texting and handheld texting tasks. Additionally, the greater the deceleration of the lead vehicle, the higher the probability of a rear-end accident. Further, the relationship between time headway increase patterns and the corresponding accident frequencies showed that all drivers' compensation behaviors were different, and only a few drivers increased their time headway by 60% or more, which could completely offset the increased accident risk associated with mobile phone distraction. The findings provide a theoretical reference for the formulation of traffic regulations related to mobile phone use, driver safety education programs, and road safety public awareness campaigns. Moreover, the developed accident risk models may contribute to the development of a driving safety warning system.


Assuntos
Acidentes de Trânsito , Condução de Veículo , Uso do Telefone Celular , Telefone Celular , Adulto , Automóveis , Feminino , Humanos , Masculino , Assunção de Riscos , Interface para o Reconhecimento da Fala , Análise e Desempenho de Tarefas , Adulto Jovem
9.
Neural Netw ; 121: 186-207, 2020 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-31568896

RESUMO

There is an essential requirement to support people with speech and communication disabilities. A brain-computer interface using electroencephalography (EEG) is applied to satisfy this requirement. A number of research studies to recognize brain signals using machine learning and deep neural networks (DNNs) have been performed to increase the brain signal detection rate, yet there are several defects and limitations in the techniques. Among them is the use in specific circumstances of machine learning. On the one hand, DNNs extract the features well and automatically. On the other hand, their use results in overfitting and vanishing problems. Consequently, in this research, a deep network is designed on the basis of an autoencoder neural Turing machine (DN-AE-NTM) to resolve the problems by the use of NTM external memory. In addition, the DN-AE-NTM copes with all kinds of signals with high detection rates. The data were collected by P300 EEG devices from several individuals under the same conditions. During the test, each individual was requested to skim images with one to six labels and focus on only one of the images. Not to focus on some images is analogous to producing unimportant information in the individual's brain, which provides unfamiliar signals. Besides the main P300 EEG dataset, EEG recordings of individuals with alcoholism and control individuals and the EEGMMIDB, MNIST, and ORL datasets were implemented and tested. The proposed DN-AE-NTM method classifies data with an average detection rate of 97.5%, 95%, 98%, 99.4%, and 99.1%, respectively, in situations where the signals are noisy so that only 20% of the data are reliable and include useful information.


Assuntos
Interfaces Cérebro-Computador , Encéfalo/fisiologia , Aprendizado Profundo , Redes Neurais de Computação , Distúrbios da Fala/fisiopatologia , Interface para o Reconhecimento da Fala , Eletroencefalografia/métodos , Humanos
10.
Healthc Manage Forum ; 33(1): 10-18, 2020 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-31550922

RESUMO

Artificial Intelligence (AI) is evolving rapidly in healthcare, and various AI applications have been developed to solve some of the most pressing problems that health organizations currently face. It is crucial for health leaders to understand the state of AI technologies and the ways that such technologies can be used to improve the efficiency, safety, and access of health services, achieving value-based care. This article provides a guide to understand the fundamentals of AI technologies (ie, machine learning, natural language processing, and AI voice assistants) as well as their proper use in healthcare. It also provides practical recommendations to help decision-makers develop an AI strategy that can support their digital healthcare transformation.


Assuntos
Inteligência Artificial , Assistência à Saúde , Assistência à Saúde/métodos , Assistência à Saúde/organização & administração , Humanos , Aprendizado de Máquina , Processamento de Linguagem Natural , Robótica , Interface para o Reconhecimento da Fala
11.
Audiol., Commun. res ; 25: e2237, 2020. tab, graf
Artigo em Português | LILACS | ID: biblio-1098093

RESUMO

RESUMO Objetivo identificar a contribuição do microfone omnidirecional (T-Mic) e microfone direcional adaptativo (UltraZoom) do processador de som Naída CIQ70 para o reconhecimento da fala no ruído e em ambiente reverberante. Identificar a contribuição do processador de som Naída CIQ70 para usuários do processador Harmony. Métodos participaram do estudo sete adultos com implante coclear unilateral, usuários do processador de som Harmony. O reconhecimento de sentenças foi avaliado em silêncio, em sala reverberante (RT60 de 553 ms) e ruído de 42,7 dBA (Leq), com os processadores Harmony e Naída CIQ70. A contribuição do microfone direcional UltraZoom foi avaliada no ruído. As sentenças gravadas foram apresentadas a 0° azimute. O ruído (babble noise) foi apresentado a + 5 dB SNR, a 90° azimute. Os participantes avaliaram subjetivamente a clareza do som e a dificuldade de escutar nas várias condições do teste. Resultados a média do reconhecimento de sentenças no silêncio com reverberação foi de 38,5% com o Harmony e 66,5% com o Naída CIQ70. A pontuação média de reconhecimento de sentenças no ruído foi de 40,5% com o Naída CIQ70, sem UltraZoom, e de 64,5% com UltraZoom. Nas classificações subjetivas de clareza do som e facilidade de escuta no ruído, nenhuma diferença foi identificada entre as condições de teste. Conclusão para usuários experientes do processador de som Harmony, a compreensão da fala em silêncio em uma sala reverbente foi significativamente melhor com o Naída CIQ70. O uso de uma tecnologia de microfone direcional adaptativa (UltraZoom) contribuiu para o reconhecimento de fala no ruído.


Abstract Purpose 1) To measure speech understanding in noise with the Naída Q70 in the omnidirectional microphone mode (T-Mic) and adaptive directional microphone mode (UltraZoom) in reverberating acoustics and noisy conditions. 2) To measure improvement in speech understanding with use of the Advanced Bionics (AB) Naída Q70 sound processor for existing Harmony users. Methods Seven adult unilateral cochlear implant (CI) recipients, who were experienced users of the Harmony sound processor, participated in the study. Sentence recognition was evaluated in quiet in a reverberating room, with Harmony and Naída CI Q70 processors. Effectiveness of Naída CI Q70's UltraZoom directional microphone was evaluated in noise. Target stimuli were recorded Portuguese sentences presented from 0° azimuth. Twenty-talker babble was presented at +5dB SNR from ±90° azimuth. In addition to sentence recognition, the participants also rated the clarity of sound and difficulty of listening in the various test conditions. In order to evaluate the outcomes under more realistic acoustic conditions, tests were conducted in a non-sound treated reverberant room (RT60 of 553 ms and noise floor of 42.7 dBA (Leq). Results The average sentence recognition in quiet in the reverberant non-sound treated room was 38.5% with the Harmony and 66.5% with Naída CI Q70. The average sentence recognition score in noise was 40.5% with Naída CI Q70 without UltraZoom and 64.5% with UltraZoom. For subjective ratings of sound clarity and listening ease in noise no difference were identified between the test conditions. Conclusion For experienced users of the Harmony sound processor, speech understanding in quiet in a reverberating room was significantly improved with the Naída CI Q70. The use of an adaptive directional microphone technology (UltraZoom) enhanced speech perception in noise.


Assuntos
Humanos , Masculino , Feminino , Adulto , Implante Coclear , Interface para o Reconhecimento da Fala , Acústica da Fala , Inteligibilidade da Fala , Percepção da Fala , Perda Auditiva Bilateral , Ruído
12.
Artif Intell Med ; 100: 101706, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31607340

RESUMO

Artificial intelligence (AI) will pave the way to a new era in medicine. However, currently available AI systems do not interact with a patient, e.g., for anamnesis, and thus are only used by the physicians for predictions in diagnosis or prognosis. However, these systems are widely used, e.g., in diabetes or cancer prediction. In the current study, we developed an AI that is able to interact with a patient (virtual doctor) by using a speech recognition and speech synthesis system and thus can autonomously interact with the patient, which is particularly important for, e.g., rural areas, where the availability of primary medical care is strongly limited by low population densities. As a proof-of-concept, the system is able to predict type 2 diabetes mellitus (T2DM) based on non-invasive sensors and deep neural networks. Moreover, the system provides an easy-to-interpret probability estimation for T2DM for a given patient. Besides the development of the AI, we further analyzed the acceptance of young people for AI in healthcare to estimate the impact of such a system in the future.


Assuntos
Sistemas de Apoio a Decisões Clínicas , Aprendizado Profundo , Diabetes Mellitus Tipo 2/diagnóstico , Interface Usuário-Computador , Inteligência Artificial , Estatura , Índice de Massa Corporal , Peso Corporal , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Redes Neurais de Computação , Probabilidade , Interface para o Reconhecimento da Fala , Inquéritos e Questionários , Circunferência da Cintura
14.
J Acoust Soc Am ; 146(3): 1615, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31590492

RESUMO

Speech (syllable) rate estimation typically involves computing a feature contour based on sub-band energies having strong local maxima/peaks at syllable nuclei, which are detected with the help of voicing decisions (VDs). While such a two-stage scheme works well in clean conditions, the estimated speech rate becomes less accurate in noisy condition particularly due to erroneous VDs and non-informative sub-bands mainly at low signal-to-noise ratios (SNR). This work proposes a technique to use VDs in the peak detection strategy in an SNR dependent manner. It also proposes a data-driven sub-band pruning technique to improve syllabic peaks of the feature contour in the presence of noise. Further, this paper generalizes both the peak detection and the sub-band pruning technique for unknown noise and/or unknown SNR conditions. Experiments are performed in clean and 20, 10, and 0 dB SNR conditions separately using Switchboard, TIMIT, and CTIMIT corpora under five additive noises: white, car, high-frequency-channel, cockpit, and babble. Experiments are also carried out in test conditions at unseen SNRs of -5 and 5 dB with four unseen additive noises: factory, sub-way, street, and exhibition. The proposed method outperforms the best of the existing techniques in clean and noisy conditions for three corpora.


Assuntos
Interface para o Reconhecimento da Fala/normas , Razão Sinal-Ruído , Acústica da Fala , Voz
15.
Br J Nurs ; 28(16): 1092-1093, 2019 Sep 12.
Artigo em Inglês | MEDLINE | ID: mdl-31518527

RESUMO

Emeritus Professor Alan Glasper, University of Southampton, discusses a new strand of the Government's NHS Long Term Plan, which is commited to more fully embrace digital innovations, including voice-assisted technology.


Assuntos
Tecnologia Biomédica , Autocuidado , Interface para o Reconhecimento da Fala , Medicina Estatal/organização & administração , Humanos , Reino Unido
16.
J Acoust Soc Am ; 146(2): EL184, 2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-31472587

RESUMO

Acoustic cues are characteristic patterns in the speech signal that provide lexical, prosodic, or additional information, such as speaker identity. In particular, acoustic cues related to linguistic distinctive features can be extracted and marked from the speech signal. These acoustic cues can be used to infer the intended underlying phoneme sequence in an utterance. This study describes a framework for labeling acoustic cues in speech, including a suite of canonical cue prediction algorithms that facilitates manual labeling and provides a standard for analyzing variations in the surface realizations. A brief examination of subsets of annotated speech data shows that labeling acoustic cues opens the possibility of detailed analyses of cue modification patterns in speech.


Assuntos
Algoritmos , Linguística , Acústica da Fala , Interface para o Reconhecimento da Fala , Sinais (Psicologia) , Humanos , Percepção da Fala
17.
Sensors (Basel) ; 19(17)2019 Aug 29.
Artigo em Inglês | MEDLINE | ID: mdl-31470554

RESUMO

Integrating speech recondition technology into an electronic health record (EHR) has been studied in recent years. However, the full adoption of the system still faces challenges such as handling speech errors, transforming raw data into an understandable format and controlling the transition from one field to the next field with speech commands. To reduce errors, cost, and documentation time, we propose a dialogue system care record (DSCR) based on a smartphone for nursing documentation. We describe the effects of DSCR on (1) documentation speed, (2) document accuracy and (3) user satisfaction. We tested the application with 12 participants to examine the usability and feasibility of DSCR. The evaluation shows that DSCR can collect data efficiently by achieving 96% of documentation accuracy. Average documentation speed was increased by 15% (P = 0.012) compared to traditional electronic forms (e-forms). The participants' average satisfaction rating was 4.8 using DSCR compared to 3.6 using e-forms on a scale of 1-5 (P = 0.032).


Assuntos
Coleta de Dados/métodos , Registros Eletrônicos de Saúde , Idioma , Interface para o Reconhecimento da Fala , Interface Usuário-Computador
18.
Int J Med Inform ; 130: 103938, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-31442847

RESUMO

OBJECTIVE: To assess the role of speech recognition (SR) technology in clinicians' documentation workflows by examining use of, experience with and opinions about this technology. MATERIALS AND METHODS: We distributed a survey in 2016-2017 to 1731 clinician SR users at two large medical centers in Boston, Massachusetts and Aurora, Colorado. The survey asked about demographic and clinical characteristics, SR use and preferences, perceived accuracy, efficiency, and usability of SR, and overall satisfaction. Associations between outcomes (e.g., satisfaction) and factors (e.g., error prevalence) were measured using ordinal logistic regression. RESULTS: Most respondents (65.3%) had used their SR system for under one year. 75.5% of respondents estimated seeing 10 or fewer errors per dictation, but 19.6% estimated half or more of errors were clinically significant. Although 29.4% of respondents did not include SR among their preferred documentation methods, 78.8% were satisfied with SR, and 77.2% agreed that SR improves efficiency. Satisfaction was associated positively with efficiency and negatively with error prevalence and editing time. Respondents were interested in further training about using SR effectively but expressed concerns regarding software reliability, editing and workflow. DISCUSSION: Compared to other documentation methods (e.g., scribes, templates, typing, traditional dictation), SR has emerged as an effective solution, overcoming limitations inherent in other options and potentially improving efficiency while preserving documentation quality. CONCLUSION: While concerns about SR usability and accuracy persist, clinicians expressed positive opinions about its impact on workflow and efficiency. Faster and better approaches are needed for clinical documentation, and SR is likely to play an important role going forward.


Assuntos
Documentação/métodos , Registros Eletrônicos de Saúde/estatística & dados numéricos , Registros Eletrônicos de Saúde/normas , Pessoal de Saúde/estatística & dados numéricos , Erros Médicos/estatística & dados numéricos , Interface para o Reconhecimento da Fala/estatística & dados numéricos , Fala/fisiologia , Adulto , Idoso , Boston , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Percepção , Inquéritos e Questionários , Fluxo de Trabalho
19.
Comput Intell Neurosci ; 2019: 4368036, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31341467

RESUMO

Speech technologies have been developed for decades as a typical signal processing area, while the last decade has brought a huge progress based on new machine learning paradigms. Owing not only to their intrinsic complexity but also to their relation with cognitive sciences, speech technologies are now viewed as a prime example of interdisciplinary knowledge area. This review article on speech signal analysis and processing, corresponding machine learning algorithms, and applied computational intelligence aims to give an insight into several fields, covering speech production and auditory perception, cognitive aspects of speech communication and language understanding, both speech recognition and text-to-speech synthesis in more details, and consequently the main directions in development of spoken dialogue systems. Additionally, the article discusses the concepts and recent advances in speech signal compression, coding, and transmission, including cognitive speech coding. To conclude, the main intention of this article is to highlight recent achievements and challenges based on new machine learning paradigms that, over the last decade, had an immense impact in the field of speech signal processing.


Assuntos
Auxiliares de Comunicação para Pessoas com Deficiência , Aprendizado de Máquina , Interface para o Reconhecimento da Fala , Humanos
20.
Neural Comput ; 31(9): 1825-1852, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31335291

RESUMO

There is extensive evidence that biological neural networks encode information in the precise timing of the spikes generated and transmitted by neurons, which offers several advantages over rate-based codes. Here we adopt a vector space formulation of spike train sequences and introduce a new liquid state machine (LSM) network architecture and a new forward orthogonal regression algorithm to learn an input-output signal mapping or to decode the brain activity. The proposed algorithm uses precise spike timing to select the presynaptic neurons relevant to each learning task. We show that using precise spike timing to train the LSM and selecting the readout presynaptic neurons leads to a significant increase in performance on binary classification tasks, in decoding neural activity from multielectrode array recordings, as well as in a speech recognition task, compared with what is achieved using the standard architecture and training methods.


Assuntos
Potenciais de Ação/fisiologia , Algoritmos , Aprendizado de Máquina , Modelos Neurológicos , Redes Neurais de Computação , Humanos , Aprendizado de Máquina/tendências , Interface para o Reconhecimento da Fala/tendências
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA