Implementation of Deep Speech 2 Automatic Speech Recognition model from Scratch

About • Training results • Installation • How To Use • Credits • License

About

This repository contains a implementation of Deep Speech 2 Automatic Speech Recognition model made form scratch with PyTorch. Deep Speech 2 is an end-to-end deep learning model designed for Automatic Speech Recognition (ASR). It uses a combination of convolutional neural networks (CNNs) and recurrent neural networks (RNNs) to process raw audio features (e.g., spectrograms) and predict corresponding text transcriptions. The model is based on Baidu DeepSpeech2 paper and follows this architecture:

Training results

Tables with model's outputs during training and evaluation can be viewed in report. Overall model achieved WER of 0.44 and CER of 0.12 on average during evaluation

Installation

Follow these steps to install the project:

(Optional) Create and activate new environment using conda or venv (+pyenv).

a. conda version:

# create env
conda create -n project_env python=PYTHON_VERSION

# activate env
conda activate project_env

b. venv (+pyenv) version:

# create env
~/.pyenv/versions/PYTHON_VERSION/bin/python3 -m venv project_env

# alternatively, using default python version
python3 -m venv project_env

# activate env
source project_env

Install all required packages
```
pip install -r requirements.txt
```
Install pre-commit:
```
pre-commit install
```

How To Use

To train a model, run the following command:

python3 train.py -cn=CONFIG_NAME HYDRA_CONFIG_ARGUMENTS

Where CONFIG_NAME is a config from src/configs and HYDRA_CONFIG_ARGUMENTS are optional arguments.

To run inference (evaluate the model or save predictions):

python3 inference.py HYDRA_CONFIG_ARGUMENTS

Credits

This repository is based on a PyTorch Project Template.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
src		src
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
inference.py		inference.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Implementation of Deep Speech 2 Automatic Speech Recognition model from Scratch

About

Training results

Installation

How To Use

Credits

License

About

Releases

Packages

Languages

License

mediolanum1/deep-speech-2-

Folders and files

Latest commit

History

Repository files navigation

Implementation of Deep Speech 2 Automatic Speech Recognition model from Scratch

About

Training results

Installation

How To Use

Credits

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages