-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Video decoding errors both on CPU / CUDA backends. #592
Comments
Hi @BernhardGlueck , thanks for the report.
Can you share on which previous version of torchcodec this was working? Also to help reproducing, can you share the name/path of one of the videos from UCF101 that this fails on? ideally, a full minimal reproducing example would be greatly helpful! Thanks |
So i am trying to investigate myself a bit further... import torch.cuda
from torchcodec.decoders import VideoDecoder
print(torch.cuda.is_available())
device = torch.device('cpu')
decoder = VideoDecoder("v_UnevenBars_g18_c04.avi",device=str(device))
for frame in decoder:
print(frame.data.shape) This works fine. class VideoDataset(Dataset[tuple[torch.Tensor, torch.Tensor, torch.Tensor]]):
def __init__(self,
root_dir: str,
class_mapping: ClassMapping,
fps: int,
max_frames: int,
device: torch.device,
max_items: int | None = None,
dtype: torch.dtype = torch.float,
video_extensions=(".mp4", ".avi", ".mov", ".mkv")):
self.root_dir = root_dir
self.class_mapping = class_mapping
self.samples = get_videos_and_classes(root_dir, video_extensions)
self.fps = fps
self.device = device
self.max_frames = max_frames
self.max_items = max_items
self.dtype = dtype
if max_items is not None:
self.samples = random.sample(self.samples, max_items)
def __len__(self) -> int:
return len(self.samples)
def __getitem__(self, index: int) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
class_name, video_path = self.samples[index]
try:
decoder = VideoDecoder(video_path, device=str(self.device), seek_mode="approximate")
source_fps = decoder.metadata.average_fps
sampled_frames = self._sample_frames(decoder, source_fps, self.fps, self.max_frames)
zero_frame = torch.zeros_like(sampled_frames[0])
actual_frames = len(sampled_frames)
while len(sampled_frames) < self.max_frames:
sampled_frames.append(zero_frame)
mask = torch.arange(self.max_frames) < actual_frames
label = self.class_mapping.one_hot(class_name)
video_tensor = torch.stack(sampled_frames, dim=0)
return video_tensor.to(dtype=self.dtype), mask, label.to(dtype=self.dtype)
except Exception as e:
print(f"Failed: {video_path}, {e}")
raise e
def _sample_frames(self, decoder: VideoDecoder, source_fps: float, target_fps: float, max_frames: int):
step = source_fps / target_fps # Compute step size
sampled_frames = []
current_index = 0 # Tracks the current frame index
next_frame_to_sample = 0 # The next frame index to keep
for frame in decoder:
if current_index >= next_frame_to_sample:
sampled_frames.append(frame)
if len(sampled_frames) == max_frames:
break
# Keep this frame
next_frame_to_sample += step # Update the next frame index to sample
current_index += 1 # Always increment the frame counter
return sampled_frames
data_loader = DataLoader(
datasets[split],
batch_size=5,
shuffle=False,
num_workers=1,
pin_memory=True
) This fails when sampling the dataloader in the _sample_frames loop ... on the exact video file ( reproducible ) I attached the full project code files for your convenience ( its just a toy project and i am only testing the data loading right now ) |
I think the difference between the 2 code snippets above is the use of the "approximate" mode. Can you try approximate mode outside of the |
Yes that was it... thank you, i was under the impression that should give me a performance improvement. In the basic sample:
This works perfectly... But in the training code:
This works now when i override to cpu, but if i leave it like above this fails as before (but works on cpu now, since i removed the approximate seek mode ) |
In some instances, yes. But approximate mode also relies entirely on the video's metadata for seeking. Approximate mode assumes a constant frame rate and accurate metadata. If either requirement fails, then you'll run into problems. See https://pytorch.org/torchcodec/stable/generated_examples/approximate_mode.html#which-mode-should-i-use for more. Regarding the problem with CUDA, we don't have access to the rest of your code and environment. For further help, please narrow the problem down to a chunk of code you can show us in its entirety that exhibits the behavior. |
🐛 Describe the bug
After diving into this for a few days, i still was not able to fix this:
Environment:
Fedora 41, Nvidia, Driver 570 with cuda support ( torch itself works fine in training with cuda )
Python 3.12.8
Torch: 2.6.0+cu126
TorchVision: 0.21.0
TorchCodec: 0.2.1+cu126
FFmpeg: 7.1.1 with cuda ( cuvid, nvenc, nvdec ) support
Dataset: UCF 101
Minimal Code:
I get the following errors:
Using CPU device:
RuntimeError: Requested next frame while there are no more frames left to decode.
Using CUDA device:
RuntimeError: CUDA error: initialization error, (on core.add_video_stream)
The videos work ( UCF101 ) when decoding with ffmpeg directly fine.
Also This worked fine with a previous version,
And on Ubuntu i have the same issues.
Any ideas what's going on ?
Versions
Fedora 41, Nvidia, Driver 570 with cuda support ( torch itself works fine in training with cuda )
Python 3.12.8
Torch: 2.6.0+cu126
TorchVision: 0.21.0
TorchCodec: 0.2.1+cu126
FFmpeg: 7.1.1 with cuda ( cuvid, nvenc, nvdec ) support
(collectin_env.py crashes )
The text was updated successfully, but these errors were encountered: