A Dummy’s Guide to Debugging ROS Systems.

The BLUEtongue 2.0 Rover being debugged during the 2016 ERC, with Team Members Assisting. (LTR): Timothy Chin, Sebastian Holzapfel, Simon Ireland, Nuno Das Neves, Harry J.E Day.

So you’ve written the ultimate ROS program: after thousands of lines of code your robot will finally achieve sentience and bring about the singularity!

One by one you launch your nodes. Each one bringing the Apocalypse ever closer. You hit enter on the last command. And. And, nothing happens. What went wrong? How will you find, and forever squash that bug that prevented your moment of triumph? This blog attempts to answer those questions, and more*.

At BLUEsat we’ve had our share of complicated ROS debugging problems. The best ones happen when you are half-way through a competition task, with time ticking on the clock. Although this article will also look at the more common situation of debugging in a less time pressured, and fire prone environment**.

Below are several tools and techniques that we have successfully deployed to debug our ROS environment.
Keep Calm And … FIRE!
You’ve probably heard this before but its very important when debugging to not jump to conclusions or apply fixes you haven’t tested properly. Google for example has a policy of rolling back changes on its services rather than trying to push a fix. A similar idea applies in a competition or time pressured situation: make sure you have thought through that patch that removes the “don’t kill humans” safety from your robot!  That being said, unfortunately a roll back is unlikely to be applicable in a competition situation, nor is it likely to be able to put out that fire you just started on your robot. So we can’t just copy Google, but we should think about what we are doing before we do it.

Basically any patches or configuration fixes you apply during such a situation is a calculated risk, and you should make sure you understand those risks before you do something. During the European Rover Challenge last year I found it was possible to tweak small settings, restart certain nodes, and re-calibrate systems; but it was too risky to power cycle the rover during a task due to the time it took to establish communication. Likewise restarting our drive systems or cameras was very disruptive to our pilot, so could only be done in certain situations where the damage done by not fixing the system could be worse. That being said, after a critical camera failed we did attempt to programmatically power cycle that device – the decision being that the camera was important enough to attempt such a risky move. (In the end we weren’t able to do this during the task, and our pilot managed to navigate the rover without the camera in question.)

In a non time pressured situation you can be more flexible. It is possible to test different options and see if they work. That is provided they don’t damage your robot. However a structured approach is often beneficial for those more complicated bugs. I often find that when I’m debugging an intermittent or difficult to detect problem that it is easy to end up lose track of what I’ve tried, or get results mixed up. A technique I’ve found to be very useful was to record what I was doing as I did it, especially if the problem includes sensor data. We had a number of problems with our Rover’s steering system when we first implemented our swerve drive and I found writing down ADC values and rotation readings in different situations really helped debug it (You can read more about how we use ADC’s in our steering system in one of our previous articles).

Basically the main point is too keep your head clear, and think through the consequences before you act. Know the risks and have your E-Stop button ready! Now lets look at some tools you can use to aid you in your debugging.

Debugging is often a team effort.…