Skip to content

Performance Evaluation

Tapan Sharma edited this page Jul 19, 2019 · 1 revision

Performance Metrics

For performance evaluation, post speech-enhancement by the deep learning model, the following metrics were considered for intelligibility and the quality gains:

  1. STOI:
    Short-Time Objective Intelligibility (STOI), measures the correlation between the short-time temporal envelopes of a reference (clean) audio signal and a degraded audio signal for speech intelligibility of human speech. The value range of STOI is typically between 0 and 1, 0 being the worst and 1 being the best intelligibility. STOI values can also be considered to be percentage correct. The method to calculate the STOI is available in the project with the script stoi.

  2. PESQ:
    Perceptual Evaluation of Speech Quality (PESQ) is the standard metric recommended by the International Telecommunication Union (ITU) for analyzing the quality of a degraded signal with respect to a clean reference signal. PESQ applies an auditory transform to produce a loudness spectrum and compares the loudness spectra of a clean reference signal and a degraded signal to produce a score in a range of negative 0.5 to 4.5. This score is regarded to be a Mean Opinion Score (MOS). MOS can further be transformed in terms of listening objectivity metrics on a scale from 0 to 5 known as Listening Quality Objectivity (LQO). The source code for PESQ is available in the resources directory with a Makefile for Linux based systems. Post compiling this codebase, PESQ scores can be calculated using the following command:
    pesq <clean_aud> <degraded_aud> +<sampling rate>

Clone this wiki locally