installed and test ffmpeg ok, but in whisper load audio error #1439
-
whisper run error in load audio(whisper) one@openSUSE:~> whisper 3body.mp4
/home/one/miniconda3/envs/whisper/lib/python3.9/site-packages/whisper/timing.py:58: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
def backtrace(trace: np.ndarray):
Traceback (most recent call last):
File "/home/one/miniconda3/envs/whisper/lib/python3.9/site-packages/whisper/audio.py", line 46, in load_audio
ffmpeg.input(file, threads=0)
File "/home/one/miniconda3/envs/whisper/lib/python3.9/site-packages/ffmpeg/_run.py", line 325, in run
raise Error('ffmpeg', out, err)
ffmpeg._run.Error: ffmpeg error (see stderr output for detail)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/one/miniconda3/envs/whisper/bin/whisper", line 8, in <module>
sys.exit(cli())
File "/home/one/miniconda3/envs/whisper/lib/python3.9/site-packages/whisper/transcribe.py", line 437, in cli
result = transcribe(model, audio_path, temperature=temperature, **args)
File "/home/one/miniconda3/envs/whisper/lib/python3.9/site-packages/whisper/transcribe.py", line 121, in transcribe
mel = log_mel_spectrogram(audio, padding=N_SAMPLES)
File "/home/one/miniconda3/envs/whisper/lib/python3.9/site-packages/whisper/audio.py", line 130, in log_mel_spectrogram
audio = load_audio(audio)
File "/home/one/miniconda3/envs/whisper/lib/python3.9/site-packages/whisper/audio.py", line 51, in load_audio
raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e
RuntimeError: Failed to load audio: ffmpeg version 3.4.2 Copyright (c) 2000-2018 the FFmpeg developers
built with gcc 7 (SUSE Linux)
configuration: --prefix=/usr --libdir=/usr/lib64 --shlibdir=/usr/lib64 --incdir=/usr/include/ffmpeg --extra-cflags='-fmessage-length=0 -grecord-gcc-switches -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -g' --optflags='-fmessage-length=0 -grecord-gcc-switches -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -g' --disable-htmlpages --enable-pic --disable-stripping --enable-shared --disable-static --enable-gpl --disable-openssl --enable-avresample --enable-libcdio --enable-gnutls --enable-ladspa --disable-cuda --disable-cuvid --enable-libass --enable-libbluray --enable-libcelt --enable-libcdio --enable-libdc1394 --enable-libfreetype --enable-libgsm --enable-libmp3lame --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libzvbi --enable-vaapi --enable-vdpau --enable-muxers --enable-demuxers --disable-encoders --disable-decoders --disable-decoder='mpeg4,h263,h264,hevc,vc1' --enable-encoder='apng,ass,ayuv,bmp,ffv1,ffvhuff,flac,gif,huffyuv,jpegls,libgsm,libmp3lame,libopenjpeg,libopus,libschroedinger,libspeex,libtheora,libtwolame,libvorbis,libvpx_vp8,libvpx_vp9,libwebp,libwebp_anim,mjpeg,mp2,mp2fixed,pam,pbm,pcm_alaw,pcm_f32be,pcm_f32le,pcm_f64be,pcm_f64le,pcm_mulaw,pcm_s16be,pcm_s16be_planar,pcm_s16le,pcm_s16le_planar,pcm_s24be,pcm_s24le,pcm_s24le_planar,pcm_s32be,pcm_s32le,pcm_s32le_planar,pcm_s8,pcm_s8_planar,pcm_u16be,pcm_u16le,pcm_u24be,pcm_u24le,pcm_u32be,pcm_u32le,pcm_u8,pcx,pgm,pgmyuv,png,ppm,sgi,srt,ssa,sunrast,targa,text,tiff,v210,v308,v408,v410,vorbis,xbm,xwd,y41p,yuv4,zlib,' --enable-decoder='ac3,ansi,apng,ass,ayuv,bmp,dirac,exr,ffv1,ffvhuff,ffwavesynth,flac,gif,gsm,huffyuv,libcelt,libgsm,libopenjpeg,libopus,libschroedinger,libspeex,libvorbis,libvpx_vp8,libvpx_vp9,mjpeg,mpeg1video,mpeg2video,,mp1,mp1float,mp2,mp2float,mp3,mp3float,opus,pam,pbm,pcm_alaw,pcm_bluray,pcm_dvd,pcm_f32be,pcm_f32le,pcm_f64be,pcm_f64le,pcm_mulaw,pcm_s16be,pcm_s16be_planar,pcm_s16le,pcm_s16le_planar,pcm_s24be,pcm_s24le,pcm_s24le_planar,pcm_s32be,pcm_s32le,pcm_s32le_planar,pcm_s8,pcm_s8_planar,pcm_u16be,pcm_u16le,pcm_u24be,pcm_u24le,pcm_u32be,pcm_u32le,pcm_u8,pcx,pgm,pgmyuv,pgssub,png,ppm,rawvideo,sgi,srt,ssa,sunrast,targa,text,theora,tiff,v210,v210x,v308,v408,v410,vorbis,vp3,vp5,vp6,vp6a,vp6f,vp8,vp9,webp,xbm,xwd,y41p,yuv4,zlib,'
libavutil 55. 78.100 / 55. 78.100
libavcodec 57.107.100 / 57.107.100
libavformat 57. 83.100 / 57. 83.100
libavdevice 57. 10.100 / 57. 10.100
libavfilter 6.107.100 / 6.107.100
libavresample 3. 7. 0 / 3. 7. 0
libswscale 4. 8.100 / 4. 8.100
libswresample 2. 9.100 / 2. 9.100
libpostproc 54. 7.100 / 54. 7.100
Guessed Channel Layout for Input Stream #0.1 : stereo
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '3body.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf59.4.100
Duration: 00:02:15.38, start: -0.000500, bitrate: 1299 kb/s
Stream #0:0(und): Video: h264 (avc1 / 0x31637661), none(tv, bt709), 852x480, 974 kb/s, SAR 640:639 DAR 16:9, 24 fps, 24 tbr, 90k tbn, 90k tbc (default)
Metadata:
handler_name : VideoHandler
Stream #0:1(und): Audio: aac (mp4a / 0x6134706D), 48000 Hz, stereo, 320 kb/s (default)
Metadata:
handler_name : SoundHandler
Stream mapping:
Stream #0:1 -> #0:0 (? (?) -> pcm_s16le (native))
This build of ffmpeg does not include a "aac" decoder needed for input stream #0:1. single test ffmpeg is ok(whisper) one@openSUSE:~> ffmpeg -i 3body.mp4 -vn -acodec copy ff.aac
ffmpeg version 3.4.2 Copyright (c) 2000-2018 the FFmpeg developers
built with gcc 7 (SUSE Linux)
configuration: --prefix=/usr --libdir=/usr/lib64 --shlibdir=/usr/lib64 --incdir=/usr/include/ffmpeg --extra-cflags='-fmessage-length=0 -grecord-gcc-switches -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -g' --optflags='-fmessage-length=0 -grecord-gcc-switches -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -g' --disable-htmlpages --enable-pic --disable-stripping --enable-shared --disable-static --enable-gpl --disable-openssl --enable-avresample --enable-libcdio --enable-gnutls --enable-ladspa --disable-cuda --disable-cuvid --enable-libass --enable-libbluray --enable-libcelt --enable-libcdio --enable-libdc1394 --enable-libfreetype --enable-libgsm --enable-libmp3lame --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libzvbi --enable-vaapi --enable-vdpau --enable-muxers --enable-demuxers --disable-encoders --disable-decoders --disable-decoder='mpeg4,h263,h264,hevc,vc1' --enable-encoder='apng,ass,ayuv,bmp,ffv1,ffvhuff,flac,gif,huffyuv,jpegls,libgsm,libmp3lame,libopenjpeg,libopus,libschroedinger,libspeex,libtheora,libtwolame,libvorbis,libvpx_vp8,libvpx_vp9,libwebp,libwebp_anim,mjpeg,mp2,mp2fixed,pam,pbm,pcm_alaw,pcm_f32be,pcm_f32le,pcm_f64be,pcm_f64le,pcm_mulaw,pcm_s16be,pcm_s16be_planar,pcm_s16le,pcm_s16le_planar,pcm_s24be,pcm_s24le,pcm_s24le_planar,pcm_s32be,pcm_s32le,pcm_s32le_planar,pcm_s8,pcm_s8_planar,pcm_u16be,pcm_u16le,pcm_u24be,pcm_u24le,pcm_u32be,pcm_u32le,pcm_u8,pcx,pgm,pgmyuv,png,ppm,sgi,srt,ssa,sunrast,targa,text,tiff,v210,v308,v408,v410,vorbis,xbm,xwd,y41p,yuv4,zlib,' --enable-decoder='ac3,ansi,apng,ass,ayuv,bmp,dirac,exr,ffv1,ffvhuff,ffwavesynth,flac,gif,gsm,huffyuv,libcelt,libgsm,libopenjpeg,libopus,libschroedinger,libspeex,libvorbis,libvpx_vp8,libvpx_vp9,mjpeg,mpeg1video,mpeg2video,,mp1,mp1float,mp2,mp2float,mp3,mp3float,opus,pam,pbm,pcm_alaw,pcm_bluray,pcm_dvd,pcm_f32be,pcm_f32le,pcm_f64be,pcm_f64le,pcm_mulaw,pcm_s16be,pcm_s16be_planar,pcm_s16le,pcm_s16le_planar,pcm_s24be,pcm_s24le,pcm_s24le_planar,pcm_s32be,pcm_s32le,pcm_s32le_planar,pcm_s8,pcm_s8_planar,pcm_u16be,pcm_u16le,pcm_u24be,pcm_u24le,pcm_u32be,pcm_u32le,pcm_u8,pcx,pgm,pgmyuv,pgssub,png,ppm,rawvideo,sgi,srt,ssa,sunrast,targa,text,theora,tiff,v210,v210x,v308,v408,v410,vorbis,vp3,vp5,vp6,vp6a,vp6f,vp8,vp9,webp,xbm,xwd,y41p,yuv4,zlib,'
libavutil 55. 78.100 / 55. 78.100
libavcodec 57.107.100 / 57.107.100
libavformat 57. 83.100 / 57. 83.100
libavdevice 57. 10.100 / 57. 10.100
libavfilter 6.107.100 / 6.107.100
libavresample 3. 7. 0 / 3. 7. 0
libswscale 4. 8.100 / 4. 8.100
libswresample 2. 9.100 / 2. 9.100
libpostproc 54. 7.100 / 54. 7.100
Guessed Channel Layout for Input Stream #0.1 : stereo
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '3body.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf59.4.100
Duration: 00:02:15.38, start: -0.000500, bitrate: 1299 kb/s
Stream #0:0(und): Video: h264 (avc1 / 0x31637661), none(tv, bt709), 852x480, 974 kb/s, SAR 640:639 DAR 16:9, 24 fps, 24 tbr, 90k tbn, 90k tbc (default)
Metadata:
handler_name : VideoHandler
Stream #0:1(und): Audio: aac (mp4a / 0x6134706D), 48000 Hz, stereo, 320 kb/s (default)
Metadata:
handler_name : SoundHandler
File 'ff.aac' already exists. Overwrite ? [y/N] y
Output #0, adts, to 'ff.aac':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf57.83.100
Stream #0:0(und): Audio: aac (mp4a / 0x6134706D), 48000 Hz, stereo, 320 kb/s (default)
Metadata:
handler_name : SoundHandler
Stream mapping:
Stream #0:1 -> #0:0 (copy)
Press [q] to stop, [?] for help
size= 5322kB time=00:02:14.99 bitrate= 323.0kbits/s speed=6.7e+03x
video:0kB audio:5279kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.820309%
(whisper) one@openSUSE:~> echo $?
0 my system enva openSUSE vm with paththough gpu 1660s, and nvidia driver is installed (whisper) one@openSUSE:~> cat /etc/os-release
NAME="openSUSE Leap"
VERSION="15.4"
ID="opensuse-leap"
ID_LIKE="suse opensuse"
VERSION_ID="15.4"
PRETTY_NAME="openSUSE Leap 15.4"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:opensuse:leap:15.4"
BUG_REPORT_URL="https://bugs.opensuse.org"
HOME_URL="https://www.opensuse.org/"
DOCUMENTATION_URL="https://en.opensuse.org/Portal:Leap"
LOGO="distributor-logo-Leap"
(whisper) one@openSUSE:~> sudo lspci |grep 1660
00:10.0 VGA compatible controller: NVIDIA Corporation TU116 [GeForce GTX 1660 SUPER] (rev a1)
(whisper) one@openSUSE:~> rpm -qa |grep ffmpeg
ffmpeg-3.4.2-150200.11.28.1.x86_64
ffmpegthumbs-21.12.3-bp154.1.35.x86_64
(whisper) one@openSUSE:~> pip list |grep whisper
openai-whisper 20230314
(whisper) one@openSUSE:~> python --version
Python 3.9.16
(whisper) one@openSUSE:~> whereis python
python: /home/one/miniconda3/envs/whisper/bin/python
(whisper) one@openSUSE:~> nvidia-smi
Sun Jun 11 09:25:19 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:00:10.0 Off | N/A |
| 0% 49C P8 24W / 125W | 0MiB / 6144MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
(whisper) one@openSUSE:~> |
Beta Was this translation helpful? Give feedback.
Answered by
phineas-pta
Jun 11, 2023
Replies: 2 comments 3 replies
-
can you run |
Beta Was this translation helpful? Give feedback.
3 replies
-
I got ffmpeg errors simply because I was using "complex" file name, with spaces and special characters. After renaming the source file to something more simple like source-file.mp4 the process completed. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
sorry the cmd should be
ffmpeg -nostdin -i 3body.mp4 -f s16le -ac 1 -acodec pcm_s16le -ar 16000 - > /dev/null
it almost the same command as in
whisper/audio.py
to load audio, but mine would only output error if any