Generate multiple images at the same time in Yocto

A common task when using Yocto is to create multiple Linux images for a single target’s load. A very simple example is an emergency partition for fallback.

The easiest way is to go to your image recipe and add:

# my-image.bb
do_rootfs[depends] += "my-other-image:do_image_complete

In plain English, this means the task do_rootfs now depends on my-other-image recipe’s image_complete task. All images that inherit the image.bbclass will have the do_rootfs and do_image_complete task, so it is safe for you to depend on it.

Now every time you build your main image you also build “my-other-image“.

PS: A kind of top-level “meta” image recipe could be created that installs nothing and just depends on images so that IMAGE_CMD or wic takes the images and generates a target readable blob. Maybe one day I will post an example of such a recipe.

PSS: If your are thinking of fallback do not invent the wheel and have a look at RAUC.

Debugging a coredump with gdb-multiarch

This post is a small reminder on how to use the gdb-multiarch on my Ubuntu machine to debug coredumps from other arches/machines. I need to write it because the information I found in the internet does not always lead to the whole stack trace being available even when all the symbols and external sysroot is available.

One of the suggestions is to use:

$ gdb-multiarch -e <cross-executable> -c<core>

[...]License[...]
[New LWP 5246]

warning: `/lib/libgcc_s.so.1': Shared library architecture i386:x86-64 is not compatible with target architecture armv7.

warning: .dynamic section for "/lib/libgcc_s.so.1" is not at the expected address (wrong library or version mismatch?)

warning: Could not load shared library symbols for 11 libraries, e.g. /lib/libpthread.so.0.
Use the "info sharedlibrary" command to see the complete listing.
Do you need "set solib-search-path" or "set sysroot"?
Core was generated by `/usr/bin/fluent-bit --config=/etc/fluent-bit/fluent-bit.conf'.
Program terminated with signal SIGABRT, Aborted.
#0  0xac9d10a6 in ?? ()

As can be seen it is pretty useless and gdb-multiarch gets confused. Unfortunately without running set sysroot <path to sysroot>, gdb will be unable to get a whole stacktrace. But it is worse. Even if you set the sysroot properly, it will not for some reason reload the debug symbols for the crossexecutable. Instead it will only load for the shared libraries, which is often quite useless and leaves the backtrace like the following:

# continuation of the previous session
(gdb) set sysroot /tmp/tmp/my-root/
Reading symbols from /tmp/tmp/my-root/lib/libpthread.so.0...
Reading symbols from /tmp/tmp/my-root/lib/.debug/libpthread-2.31.so...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Reading symbols from /tmp/tmp/my-root/lib/libm.so.6...
Reading symbols from /tmp/tmp/my-root/lib/.debug/libm-2.31.so...
[...]
(gdb) bt
#0  __libc_do_syscall () at libc-do-syscall.S:49
#1  0xac9df676 in __libc_signal_restore_set (set=0xbeea6d78) at ../sysdeps/unix/sysv/linux/internal-signals.h:86
#2  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:48
#3  0xac9d0be2 in __GI_abort () at abort.c:79
#4  0x0750a49e in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Pretty useless as well, and even misleading.

The correct way to do it is:

gdb-multiarch <cross-executable>
(gdb) set sysroot <path to rootfs+debugsymbols>
(gdb) core <path to core>
[...]propper backtracke shows[...]

Only then will the whole stack trace be available.

PS: Often gdb will only show the stack trace of the thread that crashed by default, but there might be more threads worth inspecting. To show the stack trace of all the threads just do the following in gdb:

(gdb) thread apply all bt

rauc keyring requiring absolute path keyring

This is a post to remind myself why rauc extract fails with a strange error message: “Not enough substeps: check_bundle” as also described in this github issue. The error is really not useful and it comes up if the –keyring=path is not an absolute path. Even ~/<path> will trigger that error and message. As the cherry on the top it core dumps on exit.

This issue has since been fixed but Ubuntu 20.04 contains rauc 1.2 which has this issue.

Development and Production gaps in Yocto

(The original article had typos because Grammarly did not work correctly and disabled my browser’s dictionary. My apologies.)

Yocto is great to build customized Linux distributions from source code. It is a very complex tool and needs to build a huge variety of software projects from source code. It often happens that a Linux distribution image needs to be configured differently when building for a developer version vs for production. Often the production version is built in an automated way through a CI(continuous integration) server. This workflow gap between production and developer has been a major driver of developer dissatisfaction. I am going to talk about my insights on what are the parts and drivers of this dissatisfaction but also provide some hints on what works, from my experience.

The usual scenario triggering conflicts is: “It works on my machine but the verification step in the CI fails“. Such a usual scenario can have multiple origins:

  • Build time:
    • The developer is building with different configurations than the CI
    • A given recipe produces different package contents every time it is built
  • Run time:
    • A program has inherent non deterministic behavior, even with the same exact binary. A garden variety bug
    • The verification step is broken

Divergent configurations at build time

This situation is mostly unavoidable in the real world. The most common reason for different configurations are:

  • Debug symbols
  • Debug tools
  • Builds with debug flags enabled
  • Secrets or security only enabled in production

Debug symbols

Debug symbol’s existence is mostly benign and rarely leads to technical issues. Even so, it can result in:

  • Intellectual property leakage. This is not relevant from the point of product operation but can have devastating legal and business impacts
  • Too large images. Some embedded devices do not have sufficient space. Insufficient space can manifest boot failures which are hard to explain and very hard to debug, due to bootloader or flash physical constraints.

Debug tools

Images with debug tools are also pretty common. Unless the debug tools are part of a critical function they also very rarely lead to verification issues. They deserve consideration though, as they can affect Key Performance Indicators(KPIs) of the development, deployment, and physical device unit cost.

The main issues associated with including debug tools vs not, can be summarized as:

  • Excessively long build times. The more tools you put in your image the more recipes and associated packages need to be built and installed. Iteration speed can come to a crawl
  • Very hard real-world diagnostics due to lack of tools/APIs. This is specially true in embedded devices due to their one-purpose nature enforced by cryptographically verified read-only flash. This means that a developer needs to completely regenerate a new image with the diagnostic tools required
  • Increased security risks due to bigger system tampering surface
  • Increased costs due to redundant storage. In volume, extra storage can lead to massive expenses

Builds with debug flags enabled

Debug build flags can significantly impact system and application behavior that result in side effects like the stereotypical “but it works on my machine”:

  • Increased logging and diagnostics has a quality of changing concurrent execution dynamics, often masking issues in concurrent code that are only visible in production
  • Release functionality being dependent on diagnostic flags, which in turn bring unvetted and untested states. Such unvetted situations can disable or bypass security features
  • Different optimization levels exposing or hiding latent bugs result in verification vs developer mismatch
SO I SAW you broke the build but it worked on my machine - Inception Meme |  Meme Generator

This situation is the most damaging from the organizational point of view. In the end, the software is done by people and the feeling that testing is unreliable may lead to the following negative outcomes:

  • QA team vs Dev vs Customer support blame games, leading to increased bug fix turnaround
  • Decreased morale in the developer team due to belief the system is rigged against their productivity
  • Decreased confidence QA team is doing it’s job correctly and that quality checks are meaningful

Secrets only known in production

Generally, the mismatch of secrets in developer and release builds does not result in issues although I had the misfortune of having a situation where it impacted me: In development, it is normal for security systems to be disabled. Unfortunately in production, the addition of an actual key to some bootloader security internals led to a buffer overflow that was very luckily detected on time. My changes were in a completely unrelated part so I did not even know I would trigger it.

Divergent behavior at run time

The following is not necessarily Yocto specific and is valid for software engineering in general.

Normal bugs visible only at release

Often it is hard to know if a given behavior seen for the first time in a production release is due to the different configurations used for release, or if it is just bad luck it became evident in the release. The following lessons I learned:

  • A bug’s manifestation is not the end. Fixing the bug is.
  • The identification of the bug often starts with analyzing the manifestation which can be made difficult by the lack of knowledge of what triggers the bug. This is where development/release mismatch adds another layer of uncertainty and effort.

Incorrect verification/test steps

This one, highlights the old “who tests the tests?” question. My experience is no one, unless:

  • They fail to catch a bug
  • They fail when they should succeed
  • They fail in the release verification flow and succeed in the developer’s flow. The contrary is also true.

I am going to only write about the last point where a flow mismatch leads to different test and verification outcomes.

There should be a single command where all the setup, building, and testing is done owning the solutions to all mismatches between developer and release/CI. This has 2 reasons:

  • A single command means there is a technically enforced agreement and connection between release owners and developers.
  • There is only one way to run verification.

In the real world, it happens that a whole verification flow would be too time-consuming for developers but this fact is already providing insights and venues for improvement:

  • The single verification command logic could be smart enough to detect what tests are required. For example there is no need to build any software if only the testing repository changed
  • Logical steps of the whole verification flows are visible both on the CI dashboard as well as callable locally.
  • A tool where the whole organization can give feedback:
    • The release team giving feedback on verification gaps or reliability
    • The developer team giving feedback on improved modularization

Final notes

This topic became quite bigger than initially intended. It was also a much broader topic than what I wanted to write about, so if something is confusing or you have suggestions let me know as I would like to improve.

Also, my experience is mostly in embedded-devices but I have a feeling that most of it is relevant to other areas of software engineering. It was also not easy to write as it collects many disconnected experiences I had over my career and projects.

u-boot exposing mass storage to usb interface

As previously stated, this blog is a place for me to record some notes on things that i learn on my day to day.

One of the things I learned and whished I knew when doing embedded development was how to easily flash certain files in an embedded device’s storage card, without needing to do whole image flashing.

It turns out to be very simple if your embedded board has usb support in u-boot. Running the following command in the u-boot command line:

ums 0 mmc 1

Leads to exposure of mmc1 to the usb-gadget interface 0. See https://u-boot.readthedocs.io/en/latest/usage/ums.html for more details

When successful, your development computer should detect new mass storage devices.