Background: This work investigated different methods based on machine learning bio-sounds analysis for the automatic identification of different conditions. Concretely, we conducted three studies to investigated the automatic identification of bulbar involvement in patients with amyotrophic lateral sclerosis (ALS) through voice analysis. Additionally, a study to detect COVID-19 positive cases through the automatic identification of COVID-19 coughs was performed.
The Northeast Amyotrophic Lateral Sclerosis Consortium (NEALS) bulbar sub-committee released a recent statement regarding the need for objective-based approaches to diagnose bulbar involvement in ALS patients. Bulbar involvement is a term used in ALS that refers to motor neuron impairment in the corticobulbar area of the brainstem which leads to a dysfunction of speech and swallowing. One of the earliest symptoms of bulbar involvement is voice deterioration, characterised by grossly defective articulation, extremely slow laborious speech, marked hypernasality and severe harshness. Bulbar involvement requires well-timed and carefully coordinated interventions. So, early detection is crucial to improving the quality of life and lengthening the life expectancy of those ALS patients who present this dysfunction. Recently, research efforts have focused on voice analysis to capture this dysfunction. Analogously, easy detection of COVID-19 is a challenge. Quick biological tests do not give enough accuracy. Success in the fight against new outbreaks depends not only on the efficiency of the tests used, but also on the cost, time elapsed and the number of tests that can be done massively. Our proposal provides a solution to this challenge.
Methods: Three studies have been developed for the automated detection of bulbar involvement in patients with amyotrophic lateral sclerosis by machine learning and bio-sounds analysis.
The first study consisted of a methodology for diagnosing bulbar involvement efficiently through the acoustic parameters of uttered vowels in Spanish. The method focused on the extraction of features from the phonatory subsystem—jitter, shimmer, harmonics-to-noise ratio, and pitch—from the utterance of the five Spanish vowels. Then, we used various supervised classification algorithms, preceded by principal component analysis of the features obtained.
In the second study, we designed a new methodology for the automatic detection of bulbar involvement based on the phonatory subsystem and time-frequency characteristics. The methodology focused on providing a set of 50 phonatory subsystem and time-frequency features to detect this deficiency in males and females from the utterance of the five Spanish vowels. Then, multivariant analysis of variance was used to select the statistically significant features, and the most common supervised classifications models in clinical diagnosis were fitted to analyze their performance.
The third study consisted of providing a new methodology to automatically detect this dysfunction at early stages of the disease. The methodology focused on the creation of a voice fingerprint consisted of a pattern generated from the quasi-periodic components of a steady portion of the five Spanish vowels and from the computation of the five principal and independent components of this pattern. Then a set of statistically significant features were obtained and the most common supervised and semi-supervised classification models were implemented.
Additionally, a fourth and last study was performed to design a freely available, quick and efficient methodology for the automatic detection of COVID-19 in raw audio files. The methodology was based on automated extraction of time-frequency cough features and selection of the more significant ones to be used to diagnose COVID-19 using a supervised machine-learning algorithm.
Results: In the first study, support vector machines performed better (Accuracy 95.8%) than the models analyzed in the related work. We also show how the model can improve human diagnosis, which can often misdiagnose bulbar involvement.
In the second study, we obtained a set of statistically significant features for males and females to capture this dysfunction. To date, the Accuracy obtained (98.01% for females and 96.10% for males both obtained with random forest), outperformed the models of our first study and those models found in the literature.
In the third study, random forest obtained the best accuracy (93.5%) when com- pared controls and ALS patients with bulbar involvement and support vector ma- chines obtained 91.0% of Accuracy with 100.0% of Specificity when comparing directly ALS patients with and without bulbar involvement. Our model provided alternative annotation of bulbar and no bulbar subjects by means of semi-supervised machine-learning algorithms that improved even more the performance of our proposal.
In the fourth study, random forest has performed better to detect COVID-19 positive coughs than the other models analyzed. An Accuracy close to 90% was obtained.
Conclusions: The results obtained are very encouraging and demonstrate the efficiency and applicability of the machine learning bio-sounds analysis for the automated detection of certain conditions. It may be an appropriate tool to help in the diagnosis of ALS by multidisciplinary clinical teams, in particular to improve the diagnosis of bulbar involvement. It could also be useful to help for an early response to further COVID-19 outbreaks or other pandemics that may arise in the future.
The first study show how the model can improve human diagnosis, which can often misdiagnose bulbar involvement.
Adding time-frequency features to more classical phonatory-subsystem features increase the prediction capabilities of the machine learning models to detect bulbar involvement. Studying men and women separately has given additional success.
The results obtained to improve the annotation of ALS patients in whom bulbar involvement was not detected yet by using current subjective approaches are very encouraging and demonstrate the efficiency and applicability of the methodology presented. It may be an appropriate tool for screening bulbar involvement in early stages of the disease.
Finally, the fourth study demonstrates the feasibility of the automatic diagnose of COVID-19 from coughs, and its applicability to detecting new outbreaks.