-
Notifications
You must be signed in to change notification settings - Fork 4
Performance Evaluation
For performance evaluation, post speech-enhancement by the deep learning model, the following metrics were considered for intelligibility and the quality gains:
-
STOI:
Short-Time Objective Intelligibility (STOI), measures the correlation between the short-time temporal envelopes of a reference (clean) audio signal and a degraded audio signal for speech intelligibility of human speech. The value range of STOI is typically between 0 and 1, 0 being the worst and 1 being the best intelligibility. STOI values can also be considered to be percentage correct. The method to calculate the STOI is available in the project with the script stoi. -
PESQ:
Perceptual Evaluation of Speech Quality (PESQ) is the standard metric recommended by the International Telecommunication Union (ITU) for analyzing the quality of a degraded signal with respect to a clean reference signal. PESQ applies an auditory transform to produce a loudness spectrum and compares the loudness spectra of a clean reference signal and a degraded signal to produce a score in a range of negative 0.5 to 4.5. This score is regarded to be a Mean Opinion Score (MOS). MOS can further be transformed in terms of listening objectivity metrics on a scale from 0 to 5 known as Listening Quality Objectivity (LQO). The source code for PESQ is available in the resources directory with a Makefile for Linux based systems. Post compiling this codebase, PESQ scores can be calculated using the following command:
pesq <clean_aud> <degraded_aud> +<sampling rate>