Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Paper-2 functionality to perform live detection #3

Open
kubs0ne opened this issue Nov 21, 2020 · 3 comments
Open

Using Paper-2 functionality to perform live detection #3

kubs0ne opened this issue Nov 21, 2020 · 3 comments

Comments

@kubs0ne
Copy link

kubs0ne commented Nov 21, 2020

Hello,
I have a question, have you tried to run this detector using live audio recording? I've tried to implement this functionality to your program. Right now i have something like this:

scaler_filename = 'scaler.save'
scaler = joblib.load(scaler_filename)    
p = pyaudio.PyAudio()

CHUNK = 4800 #4800   # samples per frame
FORMAT = pyaudio.paFloat32 # audio format (bytes per sample?)
CHANNELS = 1                 # single channel for microphone
RATE = 8000                 # samples per second
stream = p.open(
    format=FORMAT,
    channels=CHANNELS,
    rate=RATE,
    input=True,
    output=True,
    frames_per_buffer=CHUNK
)
while True:
    data = stream.read(CHUNK)  
    decoded = np.fromstring(data, 'Float32')
    classes = predict_probability(decoded, scaler)

Generally, the program works, but it's very slow and has delay. I think the biggest problem here is the def predict_probability(y, scaler) function which i left unchanged from the em_detection.py program. Did you try any solutions to perform detections from live audio? Also I have a question how could I change the def predict_probability(y, scaler) function to work better with live audio? I tried to extract single mfcc from each chunk of live audio and then perform predictions, but it didn't work.
Kind regards
Kuba

@sheelabhadra
Copy link
Owner

Hi Kuba,
It's been a while since I worked on this project. I did try to perform detections from live audio using this script. However, I remember not being able to make it work flawlessly. Could you run this script and see if it solves your issue?

For detection from live audio, you have to make predictions over small chunks and that will introduce some delay (probably in the order of milliseconds). My code does this naively by maintaining a running average of the predictions over the small chunks. But, a more diligent way to do this would be to probably use an LSTM.

Thanks,
Sheel

@kubs0ne
Copy link
Author

kubs0ne commented Nov 24, 2020

Thank you for your response. Actually I can't run this script because of errors that are caused by incompatibilities between several python packages. I also couldn't run it on python 3.

@sheelabhadra
Copy link
Owner

That's unfortunate and understandable since the code has been written in Python 2. You could try creating a virtual environment and install the dependencies (cf. requirements.txt) and see if that solves the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants