Loading all splits of object detection dataset #940
Replies: 3 comments
-
Hi @DANISHFAYAZNAJAR 👋🏻 Let me convert this issue into a discussion and put it into the Q&A section. |
Beta Was this translation helpful? Give feedback.
-
Hi @DANISHFAYAZNAJAR 👋🏻 YOLO is the only data format that uses one file to store information about subsets like this. COCO, PASCAL, and other formats treat subsets as separate datasets. To keep the API consistent, we made YOLO behave the same. At this point, we do not plan to change that behavior. I recommend writing a utility using supervision datasets as building blocks. |
Beta Was this translation helpful? Give feedback.
-
Hi @SkalskiP Thank you for your response. Indeed, we can develop a function to load dataset splits. By providing the path to the dataset folder, we can search for the 'train', 'valid', and 'test' folders, create three datasets, merge them, and then split them again. However, this method may result in mixed samples within splits. Furthermore, while merging the datasets, we may not obtain the same structured dataset with three distinct splits as provided by Hugging Face's Dataset. |
Beta Was this translation helpful? Give feedback.
-
Search before asking
Question
Why Instead of loading just a split from dataset, why can't we load whole dataset with all splits.
`import roboflow
from roboflow import Roboflow
import supervision as sv
roboflow.login()
rf = Roboflow()
project = rf.workspace(WORKSPACE_ID).project(PROJECT_ID)
dataset = project.version(PROJECT_VERSION).download("yolov5")
ds = sv.DetectionDataset.from_yolo(
images_directory_path=f"{dataset.location}/train/images",
annotations_directory_path=f"{dataset.location}/train/labels",
data_yaml_path=f"{dataset.location}/data.yaml"
)
ds.classes
['dog', 'person']
`
Additional
"I attempted to load my dataset, which is divided into training, validation, and testing sets. However, when I attempted to use sv.DetectionDataset.from_yolo to load the dataset, it required me to provide paths for the images folder, the annotations directory and data.yaml file. Is there a way to simply pass the main dataset folder path and have the function automatically load all the available splits?"
Beta Was this translation helpful? Give feedback.
All reactions