Coming soon!
Watch the demo video
MR_COGraphs_video_submission.mp4
Due to size restrictions, the full video can be downloaded here: 🎥 Download the video
We provide both small and large environments as USD files, which can be downloaded and opened in the Isaac Sim platform.
Once loaded, you can generate rosbag files using the following command:
# single robot example
rosbag record /clock /robot1/camera_info_left \
/robot1/depth_left /robot1/odom /robot1/rgb_left \
/robot1/imu /robot1/scan /tf -o rosbag_name.bag
# two robots example
rosbag record /clock /robot1/camera_info_left \
/robot1/depth_left /robot1/odom /robot1/rgb_left \
/robot1/imu /robot1/scan /robot2/camera_info_left \
/robot2/depth_left /robot2/odom /robot2/rgb_left \
/robot2/imu /robot2/scan /tf -o rosbag_name.bag
We develop a ROS wrapper to extract RGB-D sequences and ground-truth poses from the Replica Dataset, transforming them into ROS bag files.
For the replica apartment2 environment, you can directly download the single-robot rosbag and the two-robot rosbag.
We integrate iPhones (iPhone 12 Pro or later) as sensors in our framework in two ways:
- Data Collection & Conversion: Captured data is processed and converted into rosbag files, as demonstrated in /r3d_to_ROS/r3d_to_rosbag.py.
- Real-time Streaming: RGB-D and pose information are continuously transformed into ROS messages and published to the corresponding ROS topics. This enables real-time COGraph construction, implemented in /r3d_to_ROS/record3d_ros.zip.
Below are the rosbag files collected from our real-world environment, a 9m × 9m space with three rooms:
The GPU utilization during the COGraph generation process is shown above. For detailed metrics, please refer to the log file gpu_usage_log.txt. We have also conducted tests of our system in a more expansive real-world setting, featuring a corridor and three rooms. The illustration below depicts the nodes created by robot1 in the COGraph, along with the merged nodes contributed by robot1 and robot2.1.Download the dataset from the following URL: https://www.kaggle.com/c/imagenet-object-localization-challenge/data.
2.Input the file LOC_synset_mapping.txt into a large language model (GPT/Kimi), with the prompt: "Based on the information in the dataset, each line begins with a serial number, followed by the word it represents. Select the serial numbers and words that will definitely appear in a room, and output the results in the format of 'serial number + word (reason)'." The output file obtained is imagenet_classes_in_house_last.txt.
3.Utilizing the list of household objects from imagenet_classes_in_house_last.txt, in conjunction with the annotation files from ILSVRC/Annotations, obtain the cropped images.
4.Input the cropped images into CLIP to acquire features, which will subsequently be used for training the encoder and decoder.
1.Ranking Node Labels by Frequency in COGraph: Sort the node labels in the COGraph dataset based on their occurrence frequency and select the top 10 labels, which will be referred to as "Appeared" in the Query Type.
2.Synonym Generation Using Large Language Models: Input the 10 labels identified in step 1 into a large language model (GPT/Kimi) with the prompt "Find synonyms for these words" to generate a list of synonyms for each label, denoted as "Similar" in the Query Type.
3.Descriptive Phrase Generation Using Large Language Models: Input the 10 labels from step 1 into a large language model (GPT/Kimi) with the prompt "Provide brief descriptions in English for these terms" to obtain a set of brief descriptions for each label, labeled as "Descriptive" in the Query Type.