[9] used raw waveforms and applied long-short-term memory networks (LSTM), and an attention model, obtaining an accuracy of 75.63% for dysarthria detection. Similarly, in Hernandez et al. [22] using rhythm and voice quality or prosody, random forest (RF) achieved an accuracy of 81.50%, ...