Environmental variables for secrets. A simple threat analysis

One of my favorite ways of passing information to a program without leakage of these secrets is environmental variables. The rationale is:

  1. Reading from a plain text with no permissions set, will lead to any user in the system to be able to read this file which is unnaceptable.
  2. Reading from a plain text file with permissions only to the user running the process, may lead to the file being included by mistake in the version control system. It can also mean that the whole process may call a function to read the file. Better but still easy to make mistakes with.
  3. Passing the secrets as command line arguments leads to their complete exposure. Every user is able to access any process’ command lines. This means that even a lowly privileged user can in theory access /proc/<pid>/cmdline and inspect the secrets passed through command line arguments. This is further illustrated in another post: Accessing a process command line without ps. This is unnaceptable.
  4. The other way is for some environmental variables to be passed to the application. Often the environmental variables are passed from a file or command line which obviously leads to the problems described above. There are safe ways to pass environmental variables besides variables or command line and I will talk about them.

From the above methods, we can extract at the very least 2 requirements:

  1. The secrets should not be accessible to arbitrary parts of the process, at arbitrary states.
  2. The secrets should not be accessible by any other means than through the process.
    • The secret should not be accessible by any other user in the system, except for the root user. Without security enclaves it is not possible to hide anything from a root user. I will give some of the reasons further down.
    • The secret should not be accessible by the process’ user, except through the process that is supposed to use it.

Dockefiles/docker-compose.yml and systemd units are often interpreted in a privileged or sandboxed environment, so they actually provide a medium where you can define and write down your secrets and have them safely passed as environmental variables to the process consuming the secrets. Naturally, you need to make sure that any systemd units or Dockerfiles are not readable by anybody but the root user. This ensures compliance with the second requirement but not with (1).

The non-compliance with the first requirement is where I believe this post has something to offer.

I have seen a lot of support for passing secrets with environmental variables but not much advice on their compartmentalization inside the process. This is especially important because environmental variables are in fact global variables to a process. Any arbitrary process’ state can access environment variables. If for any reason a process is compromised, an attacker can just spawn a process or call the relevant standard library function and get the current process-wide environmental variables set. Most process spawnings pass the current process’ environmental variable so if there is a shell injection issue in your code you will have your secrets exposed.

Similarly, if an attacker can compromise your application so it dumps the current environmental variables your secrets will also be exposed. A common situation is a variable expansion issue like the ones possible in PHP. There $_ENV can be accessed and your secrets revealed. As stated environmental variables are super-global information.

One might argue that both the process spawn and running of environmental variable-related code requires a compromised process and that in such a situation one has lost any ability to protect itself. That is not true. Defense in depth is a thing, and I propose mitigating actions that can fulfill requirement (1):

  • Retrieve the secrets from the environmental variables as early as possible in the startup of your process.
  • Remove or set the the environmental variables with the secrets to blank.
  • Reduce the availability of secrets throughtout your application, making them available only to the parts that need them. This is may be hard to achieve in terms of architecture.

The above actions mean the shell injection attacks will be unable to inspect secrets from the parent process. It also means that in the case of a local code path allowing inspection of global variables or code execution, an inspection of the environmental variables will not reveal any secrets.

Naturally, if your application’s memory is compromised your secrets are revealable, but that is beside the point of this post, as is the case where the root account is compromised. If an attacker is a root user they can inspect arbitrary parts of the system’s memory. Similarly, if internal program memory is compromised an attacker can just read the secrets from memory directly.

In sum, to use environmental variables effectively for secret sharing:

  • Pass secrets as environmental variables from a more priviliged user than the process requiring the secrets.
  • Wipe the environmental variables as soon as you capture them into memory.
  • Keep availability of secrets as restricted as possible.

bash: infinite history and a nice prompt

Recently I deleted my .bashrc and .bash_history by mistake. I lost a good deal of dear commands plus aliases and needed to come up with another .bashrc to get infinite history and a nice prompt.

There are often one-off commands for some very specific tasks that are rarely used but very useful. Such one-offs often are forgotten and not explicitly saved. When they are required though, I am very happy to just make Ctrl-r on my terminal and write a few fragments that I remember and then they come up. It is like my own toolbox. For that, I rely on the fact that my .bash_history is infinite.

The prompt is also very useful in my day-to-day. Often I have very long-running terminal sessions that have commands whose timestamps and exit codes matter to me over the course of my tasks. Timestamps help me get the duration of the command by checking the timestamp of the command vs the next prompt. The exit code helps me understand if the command was successful, normally much later after it ran, and when I no longer remember some task’s details.

In the course of redoing my .bashrc I leaned on this StackOverflow post and a really nice site that allows one to generate custom PS1(prompt). The result is this

# Eternal bash history.
# ---------------------
# Undocumented feature which sets the size to "unlimited".
# http://stackoverflow.com/questions/9457233/unlimited-bash-history
export HISTSIZE=
export HISTTIMEFORMAT="[%F %T] "
# Change the file location because certain bash sessions truncate .bash_history file upon close.
# http://superuser.com/questions/575479/bash-history-truncated-to-500-lines-on-each-login
export HISTFILE=~/.bash_eternal_history
# Force prompt to write history after every command.
# http://superuser.com/questions/20900/bash-history-loss
export PS1="[\t \$?] \h:\w \[$(tput sgr0)\]"

Edit: You may need to manually sanitize your history from time to time if you by mistake input a secret into the command line. Such a secret will be visible in plain text. Model your threats.

Validating global systemd defaults through properties

On the topic of Infinite loops on crashing systemd units, I got to the point where I needed an automatic way to verify that my systems had the correct systemd unit defaults. Let’s assume we have the following configuration:

$ cat /etc/systemd/system.conf

It proved to not be straightforward because the documentation at the time was wrong. Quoting it:

For the manager itself, systemctl show will show all available properties. Those properties are documented in systemd-system.conf(5).

https://www.freedesktop.org/software/systemd/man/systemctl.html (2021/11/14)

This meant that in the case of DefaultRestartSec configured above and defined in systemd-system.conf(5), we could get the direct values with the following command:

$ systemctl show -p DefaultRestartSec

It returns nothing. Unfortunately, things are not as the manual stated. It also returns exit code 0, indicating there was no error. Luckily when using the above command in a test we would use its output to compare to our desired value and the test would fail.

We established the manual is wrong at the time of writing, but we still have the goal of testing that our system conforms to our mission’s defaults. Let’s check if there are systemd properties that are similar or derived from DefaultRestartSec:

$ systemctl show | grep DefaultRestart

Notice the equivalent property! The only difference is that the property’s suffix is USec and not Sec. Also, notice that the suffixed scale on the name of the property or option may be inconsistent with the scale that is set or displayed. While not intellectually friendly, the reason for the different scales in the names has to do with historical and code reasons.

Properties are internally represented as dbus properties and this sets limits on the data types available as well as on backward compatibility. Properties are part of a public API so updating their names or synching their names to the systemd-system.conf options names is not really possible. I tried to inspect systemctl-show.c for a way to match properties to options but that would need hard coding of such mapping making it a maintenance issue.

Options on the other hand have user-facing characteristics so the scale suffix is naturally human-friendly. With that said, given that a system architect can set options in a variety of scales that are automatically converted to internal representations, it is baffling to me why at least time-related configuration options do not match.

With all that said, I submitted an accepted clarification to systemctl‘s manual which summarizes what an architect needs to know when trying to map between systemd‘s configuration options and properties:

systemctl show will show all available properties, most of which are derived or closely match the options described in systemd-system.conf


Taking the example of the configuration options mentioned in this post, DefaultRestartSec, DefaultStartLimitIntervalSec and DefaultStartLimitBurst:


Accessing a process command line without ps

I work on embedded boards quite a lot, and often the ps command utility is a bare bones busybox stripped down version without options. One of the issues I faced while using it is that it truncates the process command line, and options that would allow it to display an untruncated output are not available.

I remembered that every active process has it’s entry in /proc/<PID>/cmdline and i tried to use it directly. Unfortunately the output of /proc/<PID>/cmdline contains the individual arguments separated by null characters(\000). The \000 character is printed as a @ in my terminal which is unusable to copy/paste. I also did not have a hexadecimal dumper to know what exactly was the character. Fortunately stackoverflow came to the rescue:

cat /proc/$PID/cmdline | tr '\000' ' ' # The solution i would come to
cat /proc/$PID/cmdline | xargs -0 echo # xargs showing it's versatility.

Infinite loops on crashing systemd units.

When using systemd it is often wise to set mission specific “global defaults”. In this article we will look at the case of “how often a unit can fail in a given amount of time” and it’s impacts.

systemd experts will create their units with properties set to the correct values for their specific units, but often these properties are not obvious or are forgotten by the unit’s creators. systemd comes with defaults as explained in systemd-system.conf manual but that does not mean they should not be evaluated for suitability to the system’s constrains and mission.

Let’s approach the requirement where we do not want a crashing service, respawning infinitely fast and thus compromising the whole system. This requirement is valid for any type of computing, be it embedded, server or even a desktop.

systemd gives us a quite a few options to achieve that requirement but let’s focus on DefaultRestartSec, DefaultStartLimitIntervalSec and DefaultStartLimitBurst. These system wide properties can be overridden in individual units, just in the unit files, the Default prefix is not used. From the manual:

DefaultStartLimitIntervalSec=, DefaultStartLimitBurst=

Configure the default unit start rate limiting, as configured per-service by StartLimitIntervalSec= and StartLimitBurst=. See systemd.service(5) for details on the per-service settings. DefaultStartLimitIntervalSec= defaults to 10s. DefaultStartLimitBurst= defaults to 5.

This means that the unit will start at most DefaultStartLimitBurstTimes times per DefaultStartLimitIntervalSec seconds. If a unit start is unsuccessful and matches the criteria described, then the unit will be set as failed and the unit will not start anymore over the defined period. Failed is not disabled. The DefaultRestartSec states how fast to restart the service inside the period defined by DefaultStartLimitIntervalSec.

With good values one can make sure the system will never be starved of resources due to an infinite crash loop. The “factory” default that comes from the systemd code is, 5(times)/10(seconds). This default by itself maybe fine, but i personally find the “factory” DefaultRestartSec=100ms” not very reasonable. In a worst case scenario, this means the unit will fail 5 times in half a second, with a “resting” period of 10 seconds.

Worst case scenario timeline for systemd restart factory defaults. Each letter F is a failure that takes 0.1s. Every 10(DefaultStartLimitIntervalSec) seconds the failure pattern illustrated can repeat itself.

If the unit’s start has side effects on the system, the system can become unusable. From a security point of view this can lead to a denial of service or infrastructure damage. Damage scenarios are:

  • A unit start side effect taking more than 2 seconds to recover from, leaving the system in a death spiral.
  • IO exhaustion, disk space exhaustion or extra infrastructure charges

To sum up, system architects should have a look at the worst-case side effect cleanup/handling of the units running. With this data, define good defaults that allow the system to continue operating degraded even when an infinite crash loop exists. Degraded here means that the service is still operating or has capacity to execute remedying measures.

u-boot exposing mass storage to usb interface

As previously stated, this blog is a place for me to record some notes on things that i learn on my day to day.

One of the things I learned and whished I knew when doing embedded development was how to easily flash certain files in an embedded device’s storage card, without needing to do whole image flashing.

It turns out to be very simple if your embedded board has usb support in u-boot. Running the following command in the u-boot command line:

ums 0 mmc 1

Leads to exposure of mmc1 to the usb-gadget interface 0. See https://u-boot.readthedocs.io/en/latest/usage/ums.html for more details

When successful, your development computer should detect new mass storage devices.

Forcing gerrit to regenerate a new review

Often in my work I have been asked how to detach a commit from an existing review in gerrit. The usual use-case is that somebody picked up some commit and then re-worked it significantly, making the original review, and review owner not applicable for follow-up.

Most regular git users will be confused because a git commit –amend will still end on the old review. The reason is that gerrit tracks changes not by commit hash but by metadata text called Change-Id. Think of it. If it did not have another mechanism besides the commit hash it would not know how to track a patchset review across rebases or amends

commit 71fbee6a7a41cbf8f59444fc48294e51c7cf9613 (HEAD)
Author: Paulo Neves <johny@gmail.com>
Date:   Mon Sep 6 14:07:14 2021 +0200

    My reworked stuff
    Change-Id: Ifdbfff8fbd8ee217b07b7b053c8927ae2a1126f0

Most people just manually change a random character in the Change-Id and this will lead to a unique Change-Id. Personally the way i do, given I often have hooks that automatically add the Change-Id, is to just edit the commit message to delete the line completely. A post-commit hook will just re-generate a new Change-Id line in the commit message and when i push it to gerrit a fresh review will be generated.

sudo in system()

In Linux there are so many permission mechanisms, depending on exactly what you want to do, it dazzles the mind. There is suid, dbus policies, polkit, Linux capabilities, files attributes, PAM modules, SELinux and the list goes on. It does not surprise then, that choosing the correct approach can become paralyzing or give anxiety inducing.

In the end though, most of us will just use sudo and that is fine for fast scripts or manual interventions, but what about using it inside an unprevileged program? Better, while we are being pragmatic why not use good old system C stdlib function and call the application with it and sudo. At first look it might seems a bad idea as per manual:

       Do not use system() from a privileged program (a set-user-ID or
       set-group-ID program, or a program with capabilities) because
       strange values for some environment variables might be used to
       subvert system integrity.  For example, PATH could be manipulated
       so that an arbitrary program is executed with privilege.  Use the
       exec(3) family of functions instead, but not execlp(3) or
       execvp(3) (which also use the PATH environment variable to search
       for an executable).

       system() will not, in fact, work properly from programs with set-
       user-ID or set-group-ID privileges on systems on which /bin/sh is
       bash version 2: as a security measure, bash 2 drops privileges on
       startup.  (Debian uses a different shell, dash(1), which does not
       do this when invoked as sh.)

       Any user input that is employed as part of command should be
       carefully sanitized, to ensure that unexpected shell commands or
       command options are not executed.  Such risks are especially
       grave when using system() from a privileged program.

Summarizing, do not use system() if:

  • Invoking from a program with setuid or capabilities
  • The program is found or uses $PATH
  • Invoking Bash v2
  • User input is not sanitized and allowing for shell injection

Coming back to the use of sudo with system() we might think it a bad combination. In reality it is not so, as the corners are covered by sudo, a truly well thought out tool.

Regarding the setuid constraint, it is hardly going to be an issue as if you need to run sudo it is exactly because your program calling system(“sudo …”) is not privileged. Regardless the effective UID/GID is the same as the real GID.

The caveat regarding relying on $PATH for binary location is also made secure with 2 defenses:

  • Hard-code your system() call with an absolute path, like system(“/bin/echo Abra kadabra”) and allow that specific absolute path on the sudoers file. Use man sudoers for more information. The manual is truly high quality.
  • Delete your /etc/environment, as for example in Ubuntu it contains $PATH. Even though the manual refers to /etc/environment as the source of environment variables, the code shows sudo works as intended if it does not exist at all.

Below is an example for an entry of /etc/sudoers.d/myspecial_permission that illustrates the point made above:

%puny_user ALL=NOPASSWD: /sbin/ifconfig wlan1 up

In a correctly configured system this means that sudo will allow privileged execution of ifconfig only if it is called exactly as written in the sudoers configuration. For example:

system("/sbin/ifconfig wlan1 up"); //Would work
system("/sbin/ifconfig wlan2 up"); //Would not work

The issue with shell invocations deserves special emphasis. Try very hard to not use a shell inside sudo as if for any reason you need to pass user input to the shell command it can be trivial to craft user input that allows arbitrary privileged code execution. I would go as far as stating that if you have shell invocations inside sudo, then you are not serious about security. If you need shell work just write a script. A trivial example of an exploit for the record:

//input =  "&& reboot"
void sudo_myself(const char* input) {
   int cmd_len = 17 + strlen(input);
   const char buf = malloc(cmd_len);
   int r = snprintf(buf, cmd_len, "sudo bash -c 'echo Abra kadabra %s'); //check for r
   r = system(buf); 
sudo_myself("&& reboot"); Will reboot!

What the example above also illustrates is when taking unsanitized input, you need to consider if the user might do something you do not want. Thus, craft your system() call in tandem with the /etc/sudoers so that ideally the user can only input the argument expected and not inject unexpected behavior. In a sense this is a general concern with sudo usage, where system() does little to make it better or worse.

Finally keep in mind that it is likely there is an API that does the same work as the command you are passing to system() and sudo, but my experience is that due to the privilege escalation mitigations, it maybe quite cumbersome to implement in the best case, and in the worst case require other external system-wide configurations/permissions that may also have their own pitfalls. An example that immediately comes to my mind is making a change in some network interface. You likely need to set your application to have capabilities configured, which means you will not be able to just run your binary without some kind of installation process.

As usual if you find any inaccuracy let me know.

Awk: An example usage with the linux kernel command line

Recently I needed to have some boot loader information accessible in a user space application. This information can be a build id, hardware specific state or even cryptographic data. This kind of information is also usually passed by the device tree. It just so happens that modifying an in memory device tree is quite more work just passing this information to the kernel command line (Maybe this is a good idea for a future article). The kernel command line is also often in some boot environment variable so it is very easy to modify it.

The reason the linux kernel command line is really convenient, is that it is accessible in plain text in /proc/cmdline. The format of the kernel command line is generally quite simple separated with white space with the occasional key=value argument. Below is an example of an Ubuntu device’s /proc/cmdline:

$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-5.4.0-80-generic root=UUID=a754270e-6c98-417b-87ac-761765280a95 ro quiet splash vt.handoff=7

Let’s say i have a provisioning script that needs to know what BOOT_IMAGE is in use. I would use the following awk script

$ awk  'BEGIN { RS = "[ \n]"; FS = "="} $1 == "BOOT_IMAGE" { print $2 }' /proc/cmdline

The above awk script is definitely not adaptable to all the cases. Specifically it would not work if we would like to know the root UUID due to the field separator being repeated twice in the root record. Notice the words record and field as they are very important terms in awk, and specially in this article.

A record is a piece of data that can have many fields. In the above kernel command line example we have the following records:

$ awk  'BEGIN { RS = "[ \n]"}  { print $0 }' /proc/cmdline

There are 3 things that make the above expression work:

  • RS
  • print $0

The BEGIN rule is a rule that is evaluated before any other. Conversely, END (not shown here) is the last. Any separator settings should be set in the BEGIN so that they are available in subsequent rules. You cannot set separators and use the set value in the same rule.

As you can see, it split the input (kernel command line) by white space and line end. This is denoted by the magical RS or record separator. The RS can be assigned to a regular expression which makes it a very powerful string splitter. For this specific example it means: I want records that are the tokens resulting from splitting the input by a white pace character or a line end.

After the input has been divided into records, awk gives us placeholder variables for the fields it found on the record. Above, we use the $0 which contains the whole record, but $1..$n will give us the value for nth field found in the record.

With that knowledge in hand we can see that our BOOT_IMAGE=/boot/vmlinuz-5.4.0-80-generic information is treated as a whole record. Even records without fields are printed out. You can already see how you would use awk for records without fields.

Let’s go back to the full expression and analyze the bold styled parts of it

$ awk 'BEGIN { RS = "[ \n]"; FS = "="} $1 == "BOOT_IMAGE" { print $2 }' /proc/cmdline
  • FS = “=”
  • $1 == “BOOT_IMAGE
  • print $2

FS stands for Field Separator and is what tells awk how to split a record. The FS = “=” statement might seem awkward but it is nothing but “assign the character = to FS”.

Without $1 == “BOOT_IMAGE” selecting the specific record we are looking for, the above FS statement would lead to weird results: Many of the records would not have 2 fields and empty lines would be printed.

As mentioned above the print $2 simply prints the second field of each record. As we set a filter to only print the record whose first field is BOOT_IMAGE, only one match will occur, with the result being the print of /boot/vmlinuz-5.4.0-80-generic. That’s it.

From the above explanation it is visible that you can create very powerful one liner commands with a tool that is almost guaranteed to exist in any UNIX environment. From server, desktops and most importantly to embedded environments with only busybox. To install bash, python or perl would have been impractical in most embedded environments or initramfs. Doing it though sed and grep would also not be so clear as awk exists specifically to process data, not to process/edit text or filter by regular expressions.

An ASP.net core validator to reject empty IEnumerables

This is a post on a recent problem I faced. As you may know from my previous post on Check suming axios downloaded files in jest I give very high priority to integration tests of backend/frontend communication. The story goes like:

I have a test which sets up files tracked by the backend. The test then launches a frontend action which POSTs a request with the list of files to delete. When the frontend finishes it’s thing, the test checks whether the file was indeed deleted on the backend. Turns out the frontend was saying everything was fine, but the test failed because the file was not actually deleted. The test failed successfully 🙂

To spare you the details the bug was that the frontend expected more files than the test was providing and there was an out of bound access, which in Javascript means undefined. When qs stringify finds a property with undefined value it just skips it with no error(I hate it, i mean what is the point of safe languages). This meant that my backend was receiving a POST request to delete files but the list was empty. Before the ValidationAttribute I am going to show you, this was a valid request and HTTP OK was sent to the frontend. The ValidationAttribute now makes it so there cannot be empty lists passed to the deleteFiles endpoint, and the frontend will get a notification that something went wrong.

 public class NonEmptyEnumerableValidator : ValidationAttribute {
            public override bool IsValid(object value) {
                if (!(value is IEnumerable enumerable))
                    return false;

                return enumerable.GetEnumerator().MoveNext();
public async Task<IActionResult> DeleteMediaObjects([NonEmptyEnumerableValidator]IEnumerable<string> MediaIdList) { ... }

This validator makes sure the value exists, thus it is required, and needs to be a list with at least one element.

In some sources the GetEnumerator was used as a disposable but I found no evidence of that, so I am not going to be doing cargo cults. Correctly me if I am wrong though.

I also did not use enumerable.Any() because it is an internal method that is subject to be changed according to Jetbrains warnings. Fair enough.