Diarisation Pet Proj

Run test_diarisation.py from the project's root folder. Speaker IDs will show up in a text file under data/processed/. Name of that file, as well as a bunch of other input parameters are stored in a config file under models/config.ini. Please read it carefully as it contains a lot of information critical for understanding of the workflow.

For Flask-app to use Bootstrap 4, install bootstrap-flask, not flask-bootstrap

Requirements

pyaudio - for opening .wav files
keras, tensoflow - for running the Celebrity Recognition model
bokeh for interactive visualization
flask,bootstrap-flask, flask-nav in case you'd like to run the demo in your browser

What did not work

Figuring out if we can avoid downloading data and just use a pre-trained model from somewhere that produces embeddings with similarity property -- did not work, so we share our weights for the model trained on VoxCeleb1

How to use split.py for speaker prediction:

Initialize arguments and params as seen in split_test.ipynb
Create network using network = src.vggvoxvlad.split.make_network(weight_path, args, input_dim=(257, None, 1), num_class=1251) where weight_path is a .h5 file
Create a list of dataframes using result_list = src.vggvoxvlad.split.voxceleb1_split(path, network, split_seconds=3, n=3,win_length=400, sr=16000, hop_length=160,n_fft=512, spec_len=250, n_classes=1251) where path is a .wav file
Each dataframe contains the headers Time (s), Speaker, Probability, Country and Gender
Each dataframe will display the three speakers with the highest predicted probability (to change this, change the n parameter)
voxceleb1_split() will predict a speaker every three seconds by default (to change this, change the split_seconds parameter)
To show a bargraph of the results, use plot_split(result_list, num_speakers=2) where num_speakers is the actual number of speakers in the .wav file

Project Organization

├── LICENSE
├── Makefile           <- Makefile with commands like `make data` or `make train`
├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
├── models             <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                         `1.0-jqp-initial-data-exploration`.
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `pip freeze > requirements.txt`
│
├── src                <- Source code for use in this project.
│   ├── __init__.py    <- Makes src a Python module
│   │
│   ├── data           <- Scripts to download or generate data
│   │   └── make_dataset.py
│   │
│   └── flask_app <- Scripts to create a Flask demo app
│       └── visualize.py

│   │
│   ├── models         <- Scripts to train models and then use trained models to make
│   │   │                 predictions
│   │   ├── predict_model.py
│   │   └── train_model.py
│   │
│   │
│   └── vggvoxvlad <- Scripts to create exploratory and results oriented visualizations
│       └── visualize.py

│
└── tox.ini            <- tox file with settings for running tox; see tox.testrun.org

Appendix

TODOs for Anton:

Tie a microphone recording module to the visualization via Bokeh server app
Start working on a Flask application and think about how to deploy it

TODOs for Dan:

Create a CLI wrapper for the set of scripts, that takes a path to an audiofile as an input argument, or a name of a preset .wav file, and launches a bokeh application locally on the user's machine. Rely on click package for settting up arguments
Create a quick gif animation of running the sript in the terminal - use https://github.com/faressoft/terminalizer to help you create a gif. Gif would show the following:
- Clone repo,
- Run CLI wrapper with a --help parameter, that would show some info about the script usage, as well as possible values for preset examples
- ls in the terminal for a custom wav
- Run CLI on some custom wav file
Clean this README, provide a nice overview of what the project does and how it achieves the goal - mention steps like:
- VAD for voice activation detection
- Speech Activity
- Diarization clustering
- VoxCeleb v1 dataset, metadata example, how the validation data was prepared (point to the right folder in the repo that has the codes, and the weights) -- this step is quite important actually, as people might be interested in downloading our weights
- How the embeddings and the final layer were trained

TODOs:

~~Visualization - tie wav, rttm together, create series of pngs~~. -- done by Anton
~~Tie the png together into a video, concatenate it with the original video of Zuck fighting with Cruz~~
~~Get the VoxCeleb/VoxCeleb2 data, calculate all the embeddings, store them with labels~~ (done by Anton), OR
~~Using all above, select an appropriate model to be used as a feature extractor, and get embeddings of a given .wav in a sliding window fashion~~ -- done by Dan
~~Figure out how to tie metadata, and pull out Male/Female ids~~ - done by Dan
~~Set up KNN in the embedding space with an appropriate metric / pick top_n from the prediction vector~~ - done by Dan & Anton
~~Develop an interactive web-friendly visualization~~ - done by Anton
~~Apply Speaker Classification to .wav in a rolling window, get top N predictions~~ - done by Dan
~~Build a classifier of masculinity/femininity (could be simply an average of top 10 / N predictions' gender)~~
~~Wrap everything into a nicer set of functions~~

Project based on the cookiecutter data science project template. #cookiecutterdatascience

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
data		data
deprecated		deprecated
models		models
notebooks		notebooks
src		src
tutorials		tutorials
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
entrypoint.sh		entrypoint.sh
requirements.txt		requirements.txt
startup.sh		startup.sh
test_diarisation.py		test_diarisation.py
test_predictionvgg.py		test_predictionvgg.py
test_split.py		test_split.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Diarisation Pet Proj

Requirements

What did not work

How to use split.py for speaker prediction:

Project Organization

Appendix

TODOs for Anton:

TODOs for Dan:

TODOs:

About

Releases

Packages

Languages

AntonBiryukovUofC/diurisation-pet-proj

Folders and files

Latest commit

History

Repository files navigation

Diarisation Pet Proj

Requirements

What did not work

How to use split.py for speaker prediction:

Project Organization

Appendix

TODOs for Anton:

TODOs for Dan:

TODOs:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages