-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Robot doesn't offer ROS2 topics after some time, requiring reboot to come back #457
Comments
Hello! I haven't seen this behavior before, but I will try to replicate it. Can you tell me a little more about your RMW setup? Are you using Fast DDS or Cyclone and do you have a custom XML configuration on the Pi? |
I believe that error happens when Cyclone DDS has trouble due to a network change, so it is quite possible that log message is an indicator. Is something changing on your network? The log seems to indicate that your robot is losing a network connection. In my experience, Cyclone DDS is quite sensitive to changes in network settings and does not recover well when an interface goes down. |
Thanks for looking at this, we have lost the messaging with the robot under both Cyclone and FastRTPS Under Cyclone we run a custom configuration on the Pi and on the robot, to specify the network interface and to prevent additional routing as we have found this give greater stability. @shamlian I agree on cyclone and network adapters, this is why we rarely use it on the robot wifi side and stick to the usb0. However, when the wifi is working we get more reliable performance than FastRTPS. It would be good if we could disable the robot Wifi, only have it enable for hot-spotting.... Robot RMW config override:
Humble (2.2)
On the Pi we have two configurations to block the robot from excess ROS2 traffic, so the pi to robot side has:
Pi Humble cyclone cfg
Thanks |
Hi @lukeopteran! I've run some tests and re-examined your logs. I was not able to replicate what you are seeing by just letting it sit idle for several hours. However, I was able to replicate the error you're seeing in the logs by removing the usb0 IP assignment from the netplan yaml on the Pi, applying the new netplan, restarting the Pi, and adding back and re-applying the usb0 settings. I ran this check to see if a disruption in that interface would cause an issue, and it did. My suspicion is that you may have a faulty USB-C connection. Are you able to test with a new cable and then upload a full copy of your logs? I'd love to capture the logs particularly right before you start getting the failed with retcode -1 errors. |
Just to add to this -- it would be extremely helpful if you could share logs from the robot when this problem begins to happen, so we can try to reproduce the problem here. You've provided a snippet, but having the context and the event would be extremely helpful. If you are uncomfortable uploading the logs here, you can email them to [email protected] and they will make their way to the Create 3 team. |
We have multiple robots with this setup, with different usbc leads, I don't think we have a physical poor connection. It also happens after a while when the robots aren't moving. Here is a log file from just now, a reboot fixed it: |
Thanks for the update. It looks like these logs don't capture the point at which the connection stopped working, which may be hard to do with your setup since it is hard to predict exactly when it goes down. I've run a series of tests over the last 48 hours and am still unable to replicate the issue you are seeing. The Create 3 robot provided a topic list after 1 hour, 2 hours, and a full 24 hours with no reboot. Based on the information you've provided, there are two differences in our set ups. First, I factory reset my Create 3 robot to stop it from attempting to join a Wi-Fi network. The second is that I am not using any custom xml profiles for either the Create or the Pi. I'm going to run some tests with your settings now, but while I do, it would be helpful if you could attempt to replicate my set up so we can try to further isolate the root of the issue. Edit: I should note I've been testing using Galactic and 20.04 since it sounded like your issues were persistent across both Humble and Galactic, but let me know if you're seeing more on one distribution vs another. |
Hi again, I ran another overnight test replicating all of the settings you have provided, and I'm still able to receive a topic list after 12-18 hours of inactivity. Since I was only able to replicate the errors you are seeing in the logs by disrupting or altering the usb0 interface, I still suspect this is the cause. If you have tried different cables, then my next thought would be it is some sort of OS issue that is putting that interface to sleep on the Pi after a certain period of time. Can you please provide additional details on how you've configured your Ubuntu image on your Pi? |
Had some more network over USB comms failures, I got a log with the failure and after when reboot and working again. Just restarting the application didn't work. Running latest Humble H2.6FW and cyclone custom RMW config Thanks Luke after full restart working again.txt Cyclone RMW config:
RPi using netplan cfg
|
I'm wondering if there is a problem with the physical link. How well is your cable strain-relieved? Could it be bouncing around? Is it in good shape? Also, on the Pi, are there any interesting messages in |
The robot wasn't moving when the fault occurred, will check the logging |
It has done it again, the robot is sat completely still, but has been on (charging) for a good while (>5 hours). I did software restart of the Pi at one point a while ago. I get no ROS2 messages, but I can get to the Create3 web UI etc fine through the same network route (Pi > usb0)... Here are any messages in /var/log relating to the usb0 connection
|
My experience has generally been that any restart of the Pi can disrupt the ROS 2 connection with the Create 3. I think it's a weird middleware/ROS 2 issue. Whenever I restart the Pi, I also reboot the Create 3 after doing so. |
Hi @brianabouchard, thanks have just found the same, restarting the Pi reliably generates the issue. Can the network stack on the robot be automatically restarted if it looses usb0 comms, it's always the same error (tev: ddsi_udp_conn_write to udp/239.255.0.1:9650 failed with retcode -1)? As the boot cycle of the robot is very long and its quite easy to forget to do this... |
I can't speak for the iRobot team and whether or not that would be possible. But as a workaround, it should be possible to setup a service that runs on boot and executes either |
thanks @brianabouchard will give that a go |
Hi @lukeopteran, the reason you don't see the topics all of the time is due to discovery issues and the inability of the Create 3 to parse these in time in a DDS network. We have made a video explaining why this issue happens, and how it can be solved by simply pulling and running a docker container. Although it's made for the TurtleBot 4, it applies to any Create 3 with a computing unit attached. The basic idea is separating the I hope it helps! https://www.youtube.com/watch?v=xmK2I0D5sas Check out our docker image here 1. RequirementsRaspberry Pi configuration (use
Create 3 configuration (Access web server in a browser:
2.
|
We frequently have to reboot the robot (best via the UI) to discover the ROS2 topics, they are visible initially but then disappear after an hour or two.
I don't know if it is related to this message in the log files:
Nov 14 13:52:03 iRobot-A8BDD22042AB4DCBBA85072EAED55E2E user.notice create-platform: 1699969923.059798 [9] tev: ddsi_udp_conn_write to udp/239.255.0.1:9650 failed with retcode -1
Robot setup
Firmware: G5.2
Host: Pi4 inside Ubuntu 20.04 on usb0 ROS galactic, running chrony as a ntp server
The text was updated successfully, but these errors were encountered: