Awk: An example usage with the linux kernel command line

Recently I needed to have some boot loader information accessible in a user space application. This information can be a build id, hardware specific state or even cryptographic data. This kind of information is also usually passed by the device tree. It just so happens that modifying an in memory device tree is quite more work just passing this information to the kernel command line (Maybe this is a good idea for a future article). The kernel command line is also often in some boot environment variable so it is very easy to modify it.

The reason the linux kernel command line is really convenient, is that it is accessible in plain text in /proc/cmdline. The format of the kernel command line is generally quite simple separated with white space with the occasional key=value argument. Below is an example of an Ubuntu device’s /proc/cmdline:

$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-5.4.0-80-generic root=UUID=a754270e-6c98-417b-87ac-761765280a95 ro quiet splash vt.handoff=7

Let’s say i have a provisioning script that needs to know what BOOT_IMAGE is in use. I would use the following awk script

$ awk  'BEGIN { RS = "[ \n]"; FS = "="} $1 == "BOOT_IMAGE" { print $2 }' /proc/cmdline

The above awk script is definitely not adaptable to all the cases. Specifically it would not work if we would like to know the root UUID due to the field separator being repeated twice in the root record. Notice the words record and field as they are very important terms in awk, and specially in this article.

A record is a piece of data that can have many fields. In the above kernel command line example we have the following records:

$ awk  'BEGIN { RS = "[ \n]"}  { print $0 }' /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-5.4.0-80-generic
root=UUID=a054270e-6c98-417b-87ac-761765180a95
ro
quiet
splash
vt.handoff=7

There are 3 things that make the above expression work:

  • BEGIN
  • RS
  • print $0

The BEGIN rule is a rule that is evaluated before any other. Conversely, END (not shown here) is the last. Any separator settings should be set in the BEGIN so that they are available in subsequent rules. You cannot set separators and use the set value in the same rule.

As you can see, it split the input (kernel command line) by white space and line end. This is denoted by the magical RS or record separator. The RS can be assigned to a regular expression which makes it a very powerful string splitter. For this specific example it means: I want records that are the tokens resulting from splitting the input by a white pace character or a line end.

After the input has been divided into records, awk gives us placeholder variables for the fields it found on the record. Above, we use the $0 which contains the whole record, but $1..$n will give us the value for nth field found in the record.

With that knowledge in hand we can see that our BOOT_IMAGE=/boot/vmlinuz-5.4.0-80-generic information is treated as a whole record. Even records without fields are printed out. You can already see how you would use awk for records without fields.

Let’s go back to the full expression and analyze the bold styled parts of it

$ awk 'BEGIN { RS = "[ \n]"; FS = "="} $1 == "BOOT_IMAGE" { print $2 }' /proc/cmdline
/boot/vmlinuz-5.4.0-80-generic
  • FS = “=”
  • $1 == “BOOT_IMAGE
  • print $2

FS stands for Field Separator and is what tells awk how to split a record. The FS = “=” statement might seem awkward but it is nothing but “assign the character = to FS”.

Without $1 == “BOOT_IMAGE” selecting the specific record we are looking for, the above FS statement would lead to weird results: Many of the records would not have 2 fields and empty lines would be printed.

As mentioned above the print $2 simply prints the second field of each record. As we set a filter to only print the record whose first field is BOOT_IMAGE, only one match will occur, with the result being the print of /boot/vmlinuz-5.4.0-80-generic. That’s it.

From the above explanation it is visible that you can create very powerful one liner commands with a tool that is almost guaranteed to exist in any UNIX environment. From server, desktops and most importantly to embedded environments with only busybox. To install bash, python or perl would have been impractical in most embedded environments or initramfs. Doing it though sed and grep would also not be so clear as awk exists specifically to process data, not to process/edit text or filter by regular expressions.

Getting the maximum number of characters in a path with a POSIX shell

I recently needed to do a fast count of the longest path’s string length possible relative to another path. This is useful for example on how to dimension some C buffer. I did not want to spend much time on this and came up with the following one liner (i love bash one liners):

for i in $(find); do echo $i | wc -c; done | sort -nu | tail -n1

You may need to subtract at least 2 characters the ./ but otherwise that is it. These are all basic commands available in a minimal UNIX environment like busybox.