You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Issue: Unexpected Processing Time Behavior with clip_timestamps Parameter
Description:
I'm observing unexpected processing times when using MLX-Whisper on audio files, particularly when the clip_timestamps parameter is enabled. For example, processing a 40‑second audio file takes significantly longer (~27-28 seconds) with clip_timestamps enabled compared to just 3-4 seconds when disabled. This behavior was brought up in discussions previously, but I am seeing this consistently in my environment, so I am raising this as an issue.
With clip_timestamps enabled: ~27-28 seconds processing time.
With clip_timestamps disabled: ~3-4 seconds processing time.
7‑second test file (ls_test.flac):
Processing time remains ~1-2 seconds, regardless of the clip_timestamps setting.
5‑minute audio file (small model):
Processing time is ~27 seconds with or without clip_timestamps.
5‑minute audio file (V3 Large Turbo/Turbo models):
Processing time increases to ~40 seconds.
This behavior seems inconsistent:
The 7‑second and 5‑minute test files perform similarly regardless of whether clip_timestamps is enabled, but the 40‑second file shows a dramatic increase in processing time when clip_timestamps is enabled. This suggests that processing times do not scale linearly with audio length when using clip_timestamps.
Environment:
Hardware: M1 Pro with 32GB RAM
Models Tested:
MLX-Whisper Small (observed ~27 seconds for 5‑minute audio)
V3 Large Turbo/Turbo (observed ~40 seconds for 5‑minute audio)
Process a 40‑second audio file with clip_timestamps enabled.
Observe processing time of ~27-28 seconds.
Process the same file with clip_timestamps disabled.
Observe processing time of ~3-4 seconds.
7‑Second Test File ([ls_test.flac](https://github.com/ml-explore/mlx-examples/blob/main/whisper/mlx_whisper/assets/ls_test.flac)):
Process with and without clip_timestamps.
Observe similar processing times (~1-2 seconds) in both cases.
5‑Minute File Test:
Process with the small model; observe ~27 seconds regardless of the clip_timestamps setting.
Process with V3 Large Turbo/Turbo models; observe ~40 seconds.
Questions/Concerns:
Unexpected Slowdown:
Is it expected that a 40‑second audio file takes ~27-28 seconds to process with clip_timestamps enabled, compared to just 3-4 seconds when disabled?
Bottlenecks and Optimizations:
Are there any known bottlenecks or configuration parameters in MLX-Whisper that can be adjusted to boost processing speed when clip_timestamps is enabled?
Model Comparisons:
The V3 Large Turbo/Turbo models are slower (e.g., 40 seconds for a 5‑minute file) compared to the small model. Should I compare these models to the regular Whisper Large model instead of the small model?
Is Whisper Turbo expected to be a faster alternative to the small model?
Any insights or suggestions for optimizing transcription speed, especially with clip_timestamps enabled, would be greatly appreciated.
Thank you!
The text was updated successfully, but these errors were encountered:
esphoenixc
changed the title
Unexpected Processing Times for Short vs. Long Audio Files with MLX-Whisper with clip_timestamps enabled
Unexpected processing times for short vs. long audio files with mLX-whisper with clip_timestamps enabled
Feb 14, 2025
Issue: Unexpected Processing Time Behavior with
clip_timestamps
ParameterDescription:
I'm observing unexpected processing times when using MLX-Whisper on audio files, particularly when the clip_timestamps parameter is enabled. For example, processing a 40‑second audio file takes significantly longer (~27-28 seconds) with clip_timestamps enabled compared to just 3-4 seconds when disabled. This behavior was brought up in discussions previously, but I am seeing this consistently in my environment, so I am raising this as an issue.
#1275
40‑second audio file:
clip_timestamps
enabled: ~27-28 seconds processing time.clip_timestamps
disabled: ~3-4 seconds processing time.7‑second test file (
ls_test.flac
):clip_timestamps
setting.5‑minute audio file (small model):
clip_timestamps
.5‑minute audio file (V3 Large Turbo/Turbo models):
This behavior seems inconsistent:
7‑second
and5‑minute
test files perform similarly regardless of whetherclip_timestamps
is enabled, but the40‑second
file shows a dramatic increase in processing time when clip_timestamps is enabled. This suggests that processing times do not scale linearly with audio length when using clip_timestamps.Environment:
clip_timestamps
silero-vad
Code Snippet:
Steps to Reproduce:
40‑Second File Test:
clip_timestamps
enabled.clip_timestamps
disabled.7‑Second Test File (
[ls_test.flac](https://github.com/ml-explore/mlx-examples/blob/main/whisper/mlx_whisper/assets/ls_test.flac)
):clip_timestamps
.5‑Minute File Test:
clip_timestamps
setting.Questions/Concerns:
Unexpected Slowdown:
clip_timestamps
enabled, compared to just 3-4 seconds when disabled?Bottlenecks and Optimizations:
clip_timestamps
is enabled?Model Comparisons:
Any insights or suggestions for optimizing transcription speed, especially with
clip_timestamps
enabled, would be greatly appreciated.Thank you!
The text was updated successfully, but these errors were encountered: