Wednesday, November 7, 2012

Extract vmlinux with the power of the command line

While working on a problem today, I wanted to search the kernel image for some strings of text.  The kernel image, however, is compressed which means a simple strings-and-grep on the image file won't find anything:

$ file /boot/vmlinuz-3.6.2-4.fc17.x86_64
/boot/vmlinuz-3.6.2-4.fc17.x86_64: Linux kernel x86 boot executable bzImage...
$ strings /boot/vmlinuz-3.6.2-4.fc17.x86_64 | grep microcode

Note that bzImage does not necessarily mean the kernel was compressed with bzip2!  It's typically done with gzip.

But a simple gunzip won't work either.  I'll actually use zcat since it doesn't expect a .gz suffix on the filename:

$ zcat /boot/vmlinuz-3.6.2-4.fc17.x86_64
gzip: /boot/vmlinuz-3.6.2-4.fc17.x86_64: not in gzip format

The kernel image is a self-extracting compressed file.  There's a small bit of code at the beginning of the file which extracts the compressed payload in the remainder of the file.  I need to strip that chunk of code from the beginning in order to use gunzip (or zcat).  But how many bytes do I need to strip?

Many file types start with a simple header including a "magic number" (just a well-known fixed number) to help identify and verify the file's type.  If I can find the gzip magic number somewhere in the vmlinuz file, I can strip all the bytes before the magic number.

Files compressed with gzip start with 0x1F8B0800.  A simple test can verify that.  Let's compress a small bit of data and send it to xxd (a hex dumper) and show the first 4 bytes:

$ echo "hello world" | gzip -c | xxd -l4
0000000: 1f8b 0800                                ....

Ok, so now I need to find 0x1F8B0800 somewhere in the vmlinuz file.  Let's start simple with xxd and grep:

$ xxd /boot/vmlinuz-3.6.2-4.fc17.x86_64 | grep "1f8b 0800"

Nothing?  Hmm, maybe it's not grouped the same way?  That is, maybe it's "__1f 8b08 00__"?  Fortunately, xxd can change the grouping using the -g option.  Let's use -g1 to just show 1 byte per group, i.e., "1f 8b 08 00".

$ xxd -g1 /boot/vmlinuz-3.6.2-4.fc17.x86_64 | grep "1f 8b 08 00"

Still nothing?  Maybe the magic number is split on a line break?  Let's try something shorter, just the first 3 bytes, with a line of context:

$ xxd -g1 /boot/vmlinuz-3.6.2-4.fc17.x86_64 | grep -C1 "1f 8b 08"
00044b0: 78 49 00 48 c7 c1 40 78 49 00 48 c1 e9 03 fd f3  xI.H..@xI.H.....
00044c0: 48 a5 fc 5e 48 8d 83 00 08 49 00 ff e0 1f 8b 08  H..^H....I......
00044d0: 00 00 00 00 00 02 03 ec dd 09 7c 14 e5 1d f8 ff  ..........|.....

Ah-hah, there it is!  The "1f 8b 08" is at the end of the second line and the "00" is at the beginning of the third line.  The byte 0x1f is on the line with offset 0x44c0 and it's 13, or 0xd, bytes into the line, so the magic number begins at offset 0x44cd.  Converting to decimal with the bc command (note: bc expects hex digits in all caps!):

$ echo 'ibase=16; 44CD' | bc

So now I can use dd to extract the compressed image by skipping over the first 17613 bytes of the vmlinuz file:

$ dd bs=1 skip=17613 if=/boot/vmlinuz-3.6.2-4.fc17.x86_64 of=/tmp/vmlinux.gz
4814195+0 records in
4814195+0 records out
4814195 bytes (4.8 MB) copied, 6.28984 s, 765 kB/s

And just to double-check the header:

$ xxd -g1 -l4 /tmp/vmlinux.gz
0000000: 1f 8b 08 00                                      ....

It looks good, so run it through gunzip:

$ gunzip /tmp/vmlinux.gz
gzip: /tmp/vmlinux.gz: decompression OK, trailing garbage ignored


But, clearly this method is fragile since the magic number is difficult to find with line-breaks and grouping and what-not.  There must be a better way.  What I really need is a simple raw hex dump of the kernel — no offset at the beginning of each line, no byte grouping, no line breaks, no ASCII interpretation at the end of each line — and then I know I can find 1f8b0800 and its offset.

Perusing the xxd man page, I see the -g0 option will remove all grouping, and the -p option will give a plain hex dump (no offsets, no ASCII).  That's a good start:

$ xxd -g0 -p /boot/vmlinuz-3.6.2-4.fc17.x86_64

I can join each line to make one continuous string by deleting the newline characters with the tr command (and use head to only show the first 100 bytes instead of dumping megabytes of hex into my terminal):

$ xxd -g0 -p /boot/vmlinuz-3.6.2-4.fc17.x86_64 | tr -d '\n' | head -c 100

The grep command has a -b option to print the byte offset of the matching string.  Combine it with the -o option to only print the first match:

$ xxd -g0 -p /boot/vmlinuz-3.6.2-4.fc17.x86_64 | tr -d '\n' | grep -b -o 1f8b0800

Ah-hah, I have an offset of 35226!  But, remember, 1 byte is printed as 2 hex digits, so it has to be divided by 2.  And I only need the offset (the field before the colon).  I can do both with awk:

$ xxd -g0 -p /boot/vmlinuz-3.6.2-4.fc17.x86_64 | tr -d '\n' | grep -b -o 1f8b0800 | awk -F: '{print $1/2}'

Hooray, 17613 is the same offset I found the hard way above!
I can actually combine all of this into one big command:

$ dd bs=1 skip=$(xxd -g0 -p /boot/vmlinuz-3.6.2-4.fc17.x86_64 | \
                 tr -d '\n' | grep -b -o 1f8b0800 | awk -F: '{print $1/2}') \
     if=/boot/vmlinuz-3.6.2-4.fc17.x86_64 | gzip -d -c > /tmp/vmlinux
4814195+0 records in
4814195+0 records out
4814195 bytes (4.8 MB) copied, 6.00158 s, 802 kB/s

gzip: stdin: decompression OK, trailing garbage ignored

And finally, I can search for my strings:

$ strings /tmp/vmlinux | grep microcode
microcode    : 0x%x
4Atom PSE erratum detected, BIOS microcode update recommended
6perf_event_intel: PEBS enabled due to microcode update
6perf_event_intel: PEBS disabled due to CPU errata, please upgrade
0mce: [Hardware Error]: PROCESSOR %u:%x TIME %llu SOCKET %u APIC %x
microcode %x


Update: Shortly after writing this blog post, I found the grep command can search binary files for hex values with a couple more options: -a to search binary files, and -P to use Perl-compatible regular expressions.  This shortens the command to find the offset to a much simpler:

$ grep -a -b -o -P '\x1f\x8b\x08\x00' /boot/vmlinuz-3.6.2-4.fc17.x86_64 | awk -F: '{print $1}'

And if I use a couple of environment variable, it becomes much easier to read:

$ export VMLINUZ=/boot/vmlinuz-3.6.2-4.fc17.x86_64
$ SKIP=$(grep -aboP '\x1f\x8b\x08\x00' $VMLINUZ | awk -F: '{print $1}')
$ dd bs=1 skip=$SKIP if=$VMLINUZ | gzip -d -c > /tmp/vmlinux
4814195+0 records in
4814195+0 records out
4814195 bytes (4.8 MB) copied, 5.15674 s, 934 kB/s

gzip: stdin: decompression OK, trailing garbage ignored


  1. Thank you for the excellent illustration! A bzimage that I just inspected has two gzip blocks. The first one is the kernel .config and the second one is a cpio archive.

  2. Yes, it is really great and helpful!!!

  3. Also, if you use `cut -d: -f1` instead of awk to extract the offset, the command becomes even shorter, easiert to type (no '{}) and requies fewer ressources.