gbeuzeboc
on 21 April 2023
Welcome to Part 4 of our “optimise your ROS snap” blog series. Make sure to check Part 3 before. This fourth part is going to explain what dynamic library caching is. We will present how to use it to optimise ROS snaps, along with the different points to be careful about. Finally, we will apply it to our gazebo snap and measure the performance impact.
Snaps are immutable. This means that every time we launch it, the snap is going to execute the exact same instructions and strategies. A Linux system is meant to evolve over time, thus, it uses mechanisms to support these evolutions and modularities. While such mechanisms bring reliability to a system, they can also slow down our processes during launch.
Dynamic library caching with ld-cache
Here we are addressing a more advanced optimisation topic. The topic of dynamic library caching for snap has been discussed and explored in the forum. We are going to summarise what it is, apply it to our ROS snap and measure the results.
When our program loads a dynamic library, it must find it first. The first way to look for a library is through rpath
. rpath
are library locations stored directly into the binary at build time. If not found on rpath
, we need to look for the library in each of the directories listed in the LD_LIBRARY_PATH
environment variable. In the case at hand, there are 148 libraries used by Gazebo, and each of them has to be potentially searched in 17 different paths on LD_LIBRARY_PATH
. The third mechanism is runpath
similar to rpath
, but it can point toward directories. And finally, the last mechanism is to go through the cache file located in /etc/ld.so.cache
. The cache is essentially a lookup table of dynamic libraries filenames and their known locations.
rpath
can be modified, but the binary must be writable, which is not the case of snaps. rpath
is considered deprecated in favour of runpath
. runpath
similarly cannot be modified because our binaries are non-writable. The idea then is to fill the cache with every library that is available in our snap, and setting the LD_LIBRARY_PATH
to an empty string. This way we will avoid the waste of time to look for the library paths at every launch.
Optimisation
The first problem is that we cannot modify the /etc/ld.so.cache
in our snap, since our snap is immutable. To overcome this, we will use a layout to bind calls from /etc/ld.so.cache
to $SNAP_DATA
a directory writable by our snap.
We declare the layout in our snapcraft.yaml
as follows:
layout:
/etc/ld.so.cache:
bind-file: $SNAP_DATA/etc/ld.so.cache
Now we will need two scripts. One to build our cache, and another to check that the cache is valid. We will store both scripts inside the snap/local
directory.
The build-cache.sh
script is the following:
#!/bin/bash -e
# Since this will be called by a hook, this script won’t have our application LD_LIBRAR_PATH
LD_LIBRARY_PATH="/snap/gazebo/current/opt/ros/snap/lib:/snap/gazebo/current/opt/ros/foxy/opt/yaml_cpp_vendor/lib:/snap/gazebo/current/opt/ros/foxy/lib/x86_64-linux-gnu:/snap/gazebo/current/opt/ros/foxy/lib:/var/lib/snapd/lib/gl:/var/lib/snapd/lib/gl32:/var/lib/snapd/void:/snap/gazebo/current/lib:/snap/gazebo/current/usr/lib:/snap/gazebo/current/lib/x86_64-linux-gnu:/snap/gazebo/current/usr/lib/x86_64-linux-gnu:/snap/gazebo/current/kf5/lib/x86_64-linux-gnu:/snap/gazebo/current/kf5/usr/lib/x86_64-linux-gnu:/snap/gazebo/current/kf5/usr/lib:/snap/gazebo/current/kf5/lib:/snap/gazebo/current/kf5/usr/lib/x86_64-linux-gnu/dri:/var/lib/snapd/lib/gl:/snap/gazebo/current/kf5/usr/lib/x86_64-linux-gnu/pulseaudio"
# run ldconfig on our LD_LIBRARY_PATH lib dirs
IFS=':' read -ra PATHS <<< "$LD_LIBRARY_PATH"
mkdir -p "$SNAP_DATA/etc"
ldconfig -v -X -C "$SNAP_USER_DATA/snap-ld.so.cache" -f "$SNAP_DATA/etc/ld.so.conf" "${PATHS[@]}"
# replace the generated ld.so.cache with the one pointed by the bind
cat "$SNAP_USER_DATA/snap-ld.so.cache" > "$SNAP_DATA/etc/ld.so.cache"
The check-cache.sh
script will make sure that all our dependencies are properly found. The reason is that the cache is going to be built at install and update steps (via hooks), but because of the content sharing snap containing a lot of libraries, something could break. In the case something is indeed broken, we simply launch Gazebo using the old method. Gazebo is based on a plugin system, hence most of the libs loaded at runtime are not dynamically linked. This means that all the plugins are unknown at build time and will be searched on the fly at every run. Additionally, everything in Gazebo is launched from a Ruby script that selects which library to load depending on the given command. For that reason, we decided to check the file $SNAP/opt/ros/snap/lib/libignition-gazebo3-gui.so
since it’s the most likely to change due to the Qt content sharing snap.
The check-cache.sh script is the following:
#!/bin/sh -e
# save the original LD_LIBRARY_PATH, and unset it to check the cache
ORIGINAL_LD_LIBRARY_PATH="$LD_LIBRARY_PATH"
export LD_LIBRARY_PATH=""
BINARY_TO_TEST="$SNAP/opt/ros/snap/lib/libignition-gazebo3-gui.so"
if [ -z "$BINARY_TO_TEST" ]; then
echo "BINARY_TO_TEST unset, can't check the dynamic linker cache for correctness"
else
if ldd "$BINARY_TO_TEST" | grep "=> not found" | grep -q "=> not found"; then
# We cannot regenerate the cache because we must be root.
# So we use the LD_LIBRARY_PATH until the next hook is triggered
export LD_LIBRARY_PATH="$ORIGINAL_LD_LIBRARY_PATH"
Fi
fi
# execute the next command in the chain
exec "$@"
As mentioned, the /etc/ld.so.cache
can only be modified by root. Here we decided to simply not use the dynamic library caching in case the ld.so.cache
is not complete. We could certainly think about adding an app entry to run as root to regenerate the cache.
The build-cache is going to be called at install or update and the check-cache simply in the command-chain of the snap application. Thus, we will add a ld-cache
part to our snapcraft.yaml
for the hooks and add the corresponding command-chain to our Gazebo part.
ld-cache:
after: [kde-neon-extension]
plugin: nil
source: snap/local
override-build: |
KDE_CONTENT_SNAP=$(echo $SNAPCRAFT_CMAKE_ARGS | sed -n 's/.*\/snap\/\(.*\)-sdk.*/\1/p')
mkdir $SNAPCRAFT_PART_INSTALL/bin cp *.sh $SNAPCRAFT_PART_INSTALL/bin
mkdir -p $SNAPCRAFT_PART_INSTALL/snap/hooks
# post refresh hook triggered at install and update
ln -s ../../bin/hook.sh $SNAPCRAFT_PART_INSTALL/snap/hooks/post-refresh
ln -s ../../bin/hook.sh $SNAPCRAFT_PART_INSTALL/snap/hooks/connect-plug-$KDE_CONTENT_SNAP
ln -s ../../bin/hook.sh $SNAPCRAFT_PART_INSTALL/snap/hooks/disconnect-plug-$KDE_CONTENT_SNAP
And for the command chain:
gz:
+ command-chain: [bin/check-ld-cache.sh]
command: usr/bin/ruby $SNAP/opt/ros/snap/bin/ign
plugs: [network, network-bind, home]
extensions: [kde-neon, ros2-foxy]
Rebuilding the snap and launching it, we stumble upon an unexpected error:
terminate called after throwing an instance of 'rclcpp::exceptions::RCLError' [gazebo.gz-1] what(): failed to initialize rcl init options: failed to find shared library 'rmw_fastrtps_cpp', at /tmp/binarydeb/ros-foxy-rmw-implementation-1.0.3/src/functions.cpp:73, at /tmp/binarydeb/ros-foxy-rcl-1.1.14/src/rcl/init_
The problem is that for now, rcl-init
from ROS 2 is only looking into LD_LIBRARY_PATH
for libraries and not following the library loading standard. The issue has been reported and is still opened.
It basically means that for Gazebo it’s fine, but ROS 2 libraries are still going to need an LD_LIBRARY_PATH
pointing to /snap/gazebo/current/opt/ros/foxy/lib
that will surely impact the performance of our optimisation.
We then have to modify our check-cache.sh
script:
- export LD_LIBRARY_PATH=""
+ export LD_LIBRARY_PATH="/snap/gazebo/current/opt/ros/foxy/lib"
Results
Now we can build our snap and run it.
In case we want to have a look at the content of our generated ld.so.cache
it is possible with:
ldconfig -p /var/snap/gazebo/current/etc/ld.so.cache
The results of this optimisation are the following:
Gazebo snap | Cold start | Hot start | RTF | .snap size | Installed snap size |
---|---|---|---|---|---|
Release | 6.06 | 2.72 | 4.39 | 232 M | 758 M |
Cleanup content sharing duplicates | 6.03 | 2.76 | 4.29 | 119 M | 427 M |
Ld-cache | 6.07 | 2.74 | 3.96 | 119 M | 427 M |
Ld-cache with empty LD_LIBRARY_PATH | 6.03 | 2.39 | NA | 119 M | 427 M |
Due to the limitation from rcl-init
we see no benefit from the dynamic library caching. Trying to launch Gazebo (it will crash) with an empty LD_LIBRARY_PATH
we see a small improvement of ~400ms, but we cannot really know if it’s due to the library caching or because simply not everything could be loaded.
Unfortunately, dynamic library caching cannot be recommended for ROS 2 at the moment. This optimisation might be interesting for other projects though, but the cost of maintenance of such custom scripts might be too high compared to the small benefit. This won’t be applied to the Gazebo snap.
Conclusion
This optimisation was not one of the simplest we have seen so far. A ROS snap might rely on a lot of different dynamic library mechanism (dynamic library linking, ROS plugins, Gazebo plugins) making the optimisation tricky. In the case of a more classic C++
application only relying on dynamic library linking, the benefits could be better. Even if this can’t be applied to our ROS snap, at least we explored an interesting topic about dynamic libraries.
Continue reading Part 5 of this series.