Intro Machine Learning
Aug 2025 - Dec 2025
This project introduces HERTS: Human Emotion Recognition Through Speech, a hybrid representation framework that combines interpretable prosodic features with rich multilingual embeddings from the wav2vec XLSR-53 model. Using strict actor-based splits of the CREMA-D dataset, audio was transformed into a unified hybrid feature vector and evaluated across multiple classifiers including Logistic Regression, Random Forest, and Multilayer Perceptron to classify a set of 6 emotions. This hybrid framework demonstrates strong speaker-independent performance on CREMA-D, with the hybrid MLP achieving the best results while requiring only a frozen XLSR encoder and a lightweight classifier to achieve a 68% accuracy. View the journal paper here.



![[digital project]](https://cdn.prod.website-files.com/68c0bc3f109f56bcfb9fe18c/695680c271f029ddccc7fc1e_emotion_distribution_crema.png)
![[digital project]](https://cdn.prod.website-files.com/68c0bc3f109f56bcfb9fe18c/695680c45d05996572d83ae6_mlp_confusion_test.png)