Validating global systemd defaults through properties

On the topic of Infinite loops on crashing systemd units, I got to the point where I needed an automatic way to verify that my systems had the correct systemd unit defaults. Let’s assume we have the following configuration:

$ cat /etc/systemd/system.conf
[manager]
DefaultRestartSec=1min
$

It proved to not be straightforward because the documentation at the time was wrong. Quoting it:

For the manager itself, systemctl show will show all available properties. Those properties are documented in systemd-system.conf(5).

https://www.freedesktop.org/software/systemd/man/systemctl.html (2021/11/14)

This meant that in the case of DefaultRestartSec configured above and defined in systemd-system.conf(5), we could get the direct values with the following command:

$ systemctl show -p DefaultRestartSec
$

It returns nothing. Unfortunately, things are not as the manual stated. It also returns exit code 0, indicating there was no error. Luckily when using the above command in a test we would use its output to compare to our desired value and the test would fail.

We established the manual is wrong at the time of writing, but we still have the goal of testing that our system conforms to our mission’s defaults. Let’s check if there are systemd properties that are similar or derived from DefaultRestartSec:

$ systemctl show | grep DefaultRestart
DefaultRestartUSec=60sec
$

Notice the equivalent property! The only difference is that the property’s suffix is USec and not Sec. Also, notice that the suffixed scale on the name of the property or option may be inconsistent with the scale that is set or displayed. While not intellectually friendly, the reason for the different scales in the names has to do with historical and code reasons.

Properties are internally represented as dbus properties and this sets limits on the data types available as well as on backward compatibility. Properties are part of a public API so updating their names or synching their names to the systemd-system.conf options names is not really possible. I tried to inspect systemctl-show.c for a way to match properties to options but that would need hard coding of such mapping making it a maintenance issue.

Options on the other hand have user-facing characteristics so the scale suffix is naturally human-friendly. With that said, given that a system architect can set options in a variety of scales that are automatically converted to internal representations, it is baffling to me why at least time-related configuration options do not match.

With all that said, I submitted an accepted clarification to systemctl‘s manual which summarizes what an architect needs to know when trying to map between systemd‘s configuration options and properties:

systemctl show will show all available properties, most of which are derived or closely match the options described in systemd-system.conf

https://github.com/systemd/systemd/blob/10b1c3cd24f5f95e6e72caebdd6896e2eaf8b853/man/systemctl.xml#L1664

Taking the example of the configuration options mentioned in this post, DefaultRestartSec, DefaultStartLimitIntervalSec and DefaultStartLimitBurst:

DefaultRestartSecDefaultRestartUSec
DefaultStartLimitIntervalSecDefaultStartLimitIntervalUSec
DefaultStartLimitBurstDefaultStartLimitBurst

Accessing a process command line without ps

I work on embedded boards quite a lot, and often the ps command utility is a bare bones busybox stripped down version without options. One of the issues I faced while using it is that it truncates the process command line, and options that would allow it to display an untruncated output are not available.

I remembered that every active process has it’s entry in /proc/<PID>/cmdline and i tried to use it directly. Unfortunately the output of /proc/<PID>/cmdline contains the individual arguments separated by null characters(\000). The \000 character is printed as a @ in my terminal which is unusable to copy/paste. I also did not have a hexadecimal dumper to know what exactly was the character. Fortunately stackoverflow came to the rescue:

cat /proc/$PID/cmdline | tr '\000' ' ' # The solution i would come to
cat /proc/$PID/cmdline | xargs -0 echo # xargs showing it's versatility.

Infinite loops on crashing systemd units.

When using systemd it is often wise to set mission specific “global defaults”. In this article we will look at the case of “how often a unit can fail in a given amount of time” and it’s impacts.

systemd experts will create their units with properties set to the correct values for their specific units, but often these properties are not obvious or are forgotten by the unit’s creators. systemd comes with defaults as explained in systemd-system.conf manual but that does not mean they should not be evaluated for suitability to the system’s constrains and mission.

Let’s approach the requirement where we do not want a crashing service, respawning infinitely fast and thus compromising the whole system. This requirement is valid for any type of computing, be it embedded, server or even a desktop.

systemd gives us a quite a few options to achieve that requirement but let’s focus on DefaultRestartSec, DefaultStartLimitIntervalSec and DefaultStartLimitBurst. These system wide properties can be overridden in individual units, just in the unit files, the Default prefix is not used. From the manual:

DefaultStartLimitIntervalSec=, DefaultStartLimitBurst=

Configure the default unit start rate limiting, as configured per-service by StartLimitIntervalSec= and StartLimitBurst=. See systemd.service(5) for details on the per-service settings. DefaultStartLimitIntervalSec= defaults to 10s. DefaultStartLimitBurst= defaults to 5.

This means that the unit will start at most DefaultStartLimitBurstTimes times per DefaultStartLimitIntervalSec seconds. If a unit start is unsuccessful and matches the criteria described, then the unit will be set as failed and the unit will not start anymore over the defined period. Failed is not disabled. The DefaultRestartSec states how fast to restart the service inside the period defined by DefaultStartLimitIntervalSec.

With good values one can make sure the system will never be starved of resources due to an infinite crash loop. The “factory” default that comes from the systemd code is, 5(times)/10(seconds). This default by itself maybe fine, but i personally find the “factory” DefaultRestartSec=100ms” not very reasonable. In a worst case scenario, this means the unit will fail 5 times in half a second, with a “resting” period of 10 seconds.

Worst case scenario timeline for systemd restart factory defaults. Each letter F is a failure that takes 0.1s. Every 10(DefaultStartLimitIntervalSec) seconds the failure pattern illustrated can repeat itself.

If the unit’s start has side effects on the system, the system can become unusable. From a security point of view this can lead to a denial of service or infrastructure damage. Damage scenarios are:

  • A unit start side effect taking more than 2 seconds to recover from, leaving the system in a death spiral.
  • IO exhaustion, disk space exhaustion or extra infrastructure charges

To sum up, system architects should have a look at the worst-case side effect cleanup/handling of the units running. With this data, define good defaults that allow the system to continue operating degraded even when an infinite crash loop exists. Degraded here means that the service is still operating or has capacity to execute remedying measures.

u-boot exposing mass storage to usb interface

As previously stated, this blog is a place for me to record some notes on things that i learn on my day to day.

One of the things I learned and whished I knew when doing embedded development was how to easily flash certain files in an embedded device’s storage card, without needing to do whole image flashing.

It turns out to be very simple if your embedded board has usb support in u-boot. Running the following command in the u-boot command line:

ums 0 mmc 1

Leads to exposure of mmc1 to the usb-gadget interface 0. See https://u-boot.readthedocs.io/en/latest/usage/ums.html for more details

When successful, your development computer should detect new mass storage devices.

Forcing gerrit to regenerate a new review

Often in my work I have been asked how to detach a commit from an existing review in gerrit. The usual use-case is that somebody picked up some commit and then re-worked it significantly, making the original review, and review owner not applicable for follow-up.

Most regular git users will be confused because a git commit –amend will still end on the old review. The reason is that gerrit tracks changes not by commit hash but by metadata text called Change-Id. Think of it. If it did not have another mechanism besides the commit hash it would not know how to track a patchset review across rebases or amends

commit 71fbee6a7a41cbf8f59444fc48294e51c7cf9613 (HEAD)
Author: Paulo Neves <ptsneves@gmail.com>
Date:   Mon Sep 6 14:07:14 2021 +0200

    My reworked stuff
    
    Change-Id: Ifdbfff8fbd8ee217b07b7b053c8927ae2a1126f0

Most people just manually change a random character in the Change-Id and this will lead to a unique Change-Id. Personally the way i do, given I often have hooks that automatically add the Change-Id, is to just edit the commit message to delete the line completely. A post-commit hook will just re-generate a new Change-Id line in the commit message and when i push it to gerrit a fresh review will be generated.

sudo in system()

In Linux there are so many permission mechanisms, depending on exactly what you want to do, it dazzles the mind. There is suid, dbus policies, polkit, Linux capabilities, files attributes, PAM modules, SELinux and the list goes on. It does not surprise then, that choosing the correct approach can become paralyzing or give anxiety inducing.

In the end though, most of us will just use sudo and that is fine for fast scripts or manual interventions, but what about using it inside an unprevileged program? Better, while we are being pragmatic why not use good old system C stdlib function and call the application with it and sudo. At first look it might seems a bad idea as per manual:

       Do not use system() from a privileged program (a set-user-ID or
       set-group-ID program, or a program with capabilities) because
       strange values for some environment variables might be used to
       subvert system integrity.  For example, PATH could be manipulated
       so that an arbitrary program is executed with privilege.  Use the
       exec(3) family of functions instead, but not execlp(3) or
       execvp(3) (which also use the PATH environment variable to search
       for an executable).

       system() will not, in fact, work properly from programs with set-
       user-ID or set-group-ID privileges on systems on which /bin/sh is
       bash version 2: as a security measure, bash 2 drops privileges on
       startup.  (Debian uses a different shell, dash(1), which does not
       do this when invoked as sh.)

       Any user input that is employed as part of command should be
       carefully sanitized, to ensure that unexpected shell commands or
       command options are not executed.  Such risks are especially
       grave when using system() from a privileged program.

Summarizing, do not use system() if:

  • Invoking from a program with setuid or capabilities
  • The program is found or uses $PATH
  • Invoking Bash v2
  • User input is not sanitized and allowing for shell injection

Coming back to the use of sudo with system() we might think it a bad combination. In reality it is not so, as the corners are covered by sudo, a truly well thought out tool.

Regarding the setuid constraint, it is hardly going to be an issue as if you need to run sudo it is exactly because your program calling system(“sudo …”) is not privileged. Regardless the effective UID/GID is the same as the real GID.

The caveat regarding relying on $PATH for binary location is also made secure with 2 defenses:

  • Hard-code your system() call with an absolute path, like system(“/bin/echo Abra kadabra”) and allow that specific absolute path on the sudoers file. Use man sudoers for more information. The manual is truly high quality.
  • Delete your /etc/environment, as for example in Ubuntu it contains $PATH. Even though the manual refers to /etc/environment as the source of environment variables, the code shows sudo works as intended if it does not exist at all.

Below is an example for an entry of /etc/sudoers.d/myspecial_permission that illustrates the point made above:

%puny_user ALL=NOPASSWD: /sbin/ifconfig wlan1 up

In a correctly configured system this means that sudo will allow privileged execution of ifconfig only if it is called exactly as written in the sudoers configuration. For example:

system("/sbin/ifconfig wlan1 up"); //Would work
system("/sbin/ifconfig wlan2 up"); //Would not work

The issue with shell invocations deserves special emphasis. Try very hard to not use a shell inside sudo as if for any reason you need to pass user input to the shell command it can be trivial to craft user input that allows arbitrary privileged code execution. I would go as far as stating that if you have shell invocations inside sudo, then you are not serious about security. If you need shell work just write a script. A trivial example of an exploit for the record:

//input =  "&& reboot"
void sudo_myself(const char* input) {
   int cmd_len = 17 + strlen(input);
   const char buf = malloc(cmd_len);
   int r = snprintf(buf, cmd_len, "sudo bash -c 'echo Abra kadabra %s'); //check for r
   r = system(buf); 
}
sudo_myself("&& reboot"); Will reboot!

What the example above also illustrates is when taking unsanitized input, you need to consider if the user might do something you do not want. Thus, craft your system() call in tandem with the /etc/sudoers so that ideally the user can only input the argument expected and not inject unexpected behavior. In a sense this is a general concern with sudo usage, where system() does little to make it better or worse.

Finally keep in mind that it is likely there is an API that does the same work as the command you are passing to system() and sudo, but my experience is that due to the privilege escalation mitigations, it maybe quite cumbersome to implement in the best case, and in the worst case require other external system-wide configurations/permissions that may also have their own pitfalls. An example that immediately comes to my mind is making a change in some network interface. You likely need to set your application to have capabilities configured, which means you will not be able to just run your binary without some kind of installation process.

As usual if you find any inaccuracy let me know.

Awk: An example usage with the linux kernel command line

Recently I needed to have some boot loader information accessible in a user space application. This information can be a build id, hardware specific state or even cryptographic data. This kind of information is also usually passed by the device tree. It just so happens that modifying an in memory device tree is quite more work just passing this information to the kernel command line (Maybe this is a good idea for a future article). The kernel command line is also often in some boot environment variable so it is very easy to modify it.

The reason the linux kernel command line is really convenient, is that it is accessible in plain text in /proc/cmdline. The format of the kernel command line is generally quite simple separated with white space with the occasional key=value argument. Below is an example of an Ubuntu device’s /proc/cmdline:

$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-5.4.0-80-generic root=UUID=a754270e-6c98-417b-87ac-761765280a95 ro quiet splash vt.handoff=7

Let’s say i have a provisioning script that needs to know what BOOT_IMAGE is in use. I would use the following awk script

$ awk  'BEGIN { RS = "[ \n]"; FS = "="} $1 == "BOOT_IMAGE" { print $2 }' /proc/cmdline

The above awk script is definitely not adaptable to all the cases. Specifically it would not work if we would like to know the root UUID due to the field separator being repeated twice in the root record. Notice the words record and field as they are very important terms in awk, and specially in this article.

A record is a piece of data that can have many fields. In the above kernel command line example we have the following records:

$ awk  'BEGIN { RS = "[ \n]"}  { print $0 }' /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-5.4.0-80-generic
root=UUID=a054270e-6c98-417b-87ac-761765180a95
ro
quiet
splash
vt.handoff=7

There are 3 things that make the above expression work:

  • BEGIN
  • RS
  • print $0

The BEGIN rule is a rule that is evaluated before any other. Conversely, END (not shown here) is the last. Any separator settings should be set in the BEGIN so that they are available in subsequent rules. You cannot set separators and use the set value in the same rule.

As you can see, it split the input (kernel command line) by white space and line end. This is denoted by the magical RS or record separator. The RS can be assigned to a regular expression which makes it a very powerful string splitter. For this specific example it means: I want records that are the tokens resulting from splitting the input by a white pace character or a line end.

After the input has been divided into records, awk gives us placeholder variables for the fields it found on the record. Above, we use the $0 which contains the whole record, but $1..$n will give us the value for nth field found in the record.

With that knowledge in hand we can see that our BOOT_IMAGE=/boot/vmlinuz-5.4.0-80-generic information is treated as a whole record. Even records without fields are printed out. You can already see how you would use awk for records without fields.

Let’s go back to the full expression and analyze the bold styled parts of it

$ awk 'BEGIN { RS = "[ \n]"; FS = "="} $1 == "BOOT_IMAGE" { print $2 }' /proc/cmdline
/boot/vmlinuz-5.4.0-80-generic
  • FS = “=”
  • $1 == “BOOT_IMAGE
  • print $2

FS stands for Field Separator and is what tells awk how to split a record. The FS = “=” statement might seem awkward but it is nothing but “assign the character = to FS”.

Without $1 == “BOOT_IMAGE” selecting the specific record we are looking for, the above FS statement would lead to weird results: Many of the records would not have 2 fields and empty lines would be printed.

As mentioned above the print $2 simply prints the second field of each record. As we set a filter to only print the record whose first field is BOOT_IMAGE, only one match will occur, with the result being the print of /boot/vmlinuz-5.4.0-80-generic. That’s it.

From the above explanation it is visible that you can create very powerful one liner commands with a tool that is almost guaranteed to exist in any UNIX environment. From server, desktops and most importantly to embedded environments with only busybox. To install bash, python or perl would have been impractical in most embedded environments or initramfs. Doing it though sed and grep would also not be so clear as awk exists specifically to process data, not to process/edit text or filter by regular expressions.

An ASP.net core validator to reject empty IEnumerables

This is a post on a recent problem I faced. As you may know from my previous post on Check suming axios downloaded files in jest I give very high priority to integration tests of backend/frontend communication. The story goes like:

I have a test which sets up files tracked by the backend. The test then launches a frontend action which POSTs a request with the list of files to delete. When the frontend finishes it’s thing, the test checks whether the file was indeed deleted on the backend. Turns out the frontend was saying everything was fine, but the test failed because the file was not actually deleted. The test failed successfully 🙂

To spare you the details the bug was that the frontend expected more files than the test was providing and there was an out of bound access, which in Javascript means undefined. When qs stringify finds a property with undefined value it just skips it with no error(I hate it, i mean what is the point of safe languages). This meant that my backend was receiving a POST request to delete files but the list was empty. Before the ValidationAttribute I am going to show you, this was a valid request and HTTP OK was sent to the frontend. The ValidationAttribute now makes it so there cannot be empty lists passed to the deleteFiles endpoint, and the frontend will get a notification that something went wrong.

 public class NonEmptyEnumerableValidator : ValidationAttribute {
            public override bool IsValid(object value) {
                if (!(value is IEnumerable enumerable))
                    return false;

                return enumerable.GetEnumerator().MoveNext();
            }
        }
 }
...
public async Task<IActionResult> DeleteMediaObjects([NonEmptyEnumerableValidator]IEnumerable<string> MediaIdList) { ... }

This validator makes sure the value exists, thus it is required, and needs to be a list with at least one element.

In some sources the GetEnumerator was used as a disposable but I found no evidence of that, so I am not going to be doing cargo cults. Correctly me if I am wrong though.

I also did not use enumerable.Any() because it is an internal method that is subject to be changed according to Jetbrains warnings. Fair enough.

systemd fallbacks to google ntp servers. Pay attention!

As the title suggests Google NTP servers are compiled by default in systemd. For common user desktops and even some servers this is harmless. For embedded or critical computing networks this is a little known phone home mechanism.

I wrote the “pay attention” in the title and decided to write about this topic because in my career more than once customers did security assessments and found devices with no business connecting to the internet, trying to connect to Google servers.

There are several hypothesis that can lead to the phone google scenario:

  • By default, systemd‘s build system has a ntp-servers option point to Google NTP servers. This will mean systemd will have Google servers hard coded as a fallback. Most people do not know of the Google hard code into binaries. After all how many people know meson and inspect the many options of systemd manually.
  • Most dhcp leases do not offer NTP servers, so systemd tries to use any NTP server. Often this means the one hard coded. In my opinion this is the most common reason the fallback is triggered.
  • Also running networkctl status -a, will not display any NTP server information.
  • Most people do not configure timesyncd services explicitly, and likely many people do not know that NTP servers are relevant to their machines.
  • timedatectl status -a states that the NTP service is active but does not display what NTP servers were used.

With all that said if you want to check what are the current NTP fallback servers you need to run:

$timedatectl show-timesync
FallbackNTPServers=ntp.ubuntu.com
ServerName=ntp.ubuntu.com
ServerAddress=91.189.91.157
RootDistanceMaxUSec=5s
PollIntervalMinUSec=32s
PollIntervalMaxUSec=34min 8s
PollIntervalUSec=34min 8s
NTPMessage={ Leap=0, Version=4, Mode=4, Stratum=2, Precision=-24, RootDelay=46.966ms, RootDispersion=22.445ms, Reference=84A36001, OriginateTimestamp=Thu 2021-07-29 15:36:17 CEST, ReceiveTimestamp=Thu 2021-07-29 15:36:17 CEST, TransmitTimestamp=Thu 2021-07-29 15:36:17 CEST, DestinationTimestamp=Thu 2021-07-29 15:36:17 CEST, Ignored=no PacketCount=100, Jitter=9.101ms }
Frequency=470668

As you can see above, the Ubuntu distribution is careful to change the default to ntp.ubuntu.com. Good on Canonical.

Getting the maximum number of characters in a path with a POSIX shell

I recently needed to do a fast count of the longest path’s string length possible relative to another path. This is useful for example on how to dimension some C buffer. I did not want to spend much time on this and came up with the following one liner (i love bash one liners):

for i in $(find); do echo $i | wc -c; done | sort -nu | tail -n1

You may need to subtract at least 2 characters the ./ but otherwise that is it. These are all basic commands available in a minimal UNIX environment like busybox.