All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Initial version of inference and switching logic taken from internal Emotech code
- Added in
end_silence_length
to track the raw end silences - Added new pub function
validate_input
forVadSession
struct.process
function will use it to make sure input is valid in debug mode. - Added a new method:
processed_duration
. A sample is considered as processed if it has been seen by Silero NN. - API to get current start/end time of session audio
- Ability to trim the starting silence to keep buffer size down
- Potential OOM when handling long autio.
- Incorrect segments when processing whole files
- Made output deterministic by not eagerly processing frame remainders (silent padding may cause issues)
- Deleted
timestamp_ms
inSpeechEnd
. - Added
start_timestamp_ms
,end_timestamp_ms
, andsamples
inSpeechEnd
.