Friday, February 15, 2008

HP-UX aCC compiler bug

Been working on a crash on a program for running on hpux on and off for the last couple of days. Basically, here is where the code was crashing (simplified version of the code in question):

void PollData::toBinary(BinaryPollData *bpd) {
memset(bpd, 0, sizeof(BinaryPollData));
bpd->bpd_version = 4;
bpd->cpu_pct = 5; // <--- CRASHES HERE
...
}

Now this is a bit strange, since I am basically just setting a couple of fields in a struct that was initialized to zero). Also, why does it crash when setting the "cpu_pct" field? Why not crash on the line before when setting the "bpd_version" field?

It turns out I uncovered a compiler bug.

First, let's look at the struct definition itself:

#pragma pack (1)
struct BinaryPollData {
short bpd_version;
int cpu_pct;
...
}
#pragma pack ()

Why is the struct packed? Well, because it's used in an wire protocol that supports multiple platforms, and you want the structure definition to be consistent across compilers, CPU architectures, and bus width (32 vs 64 bit).

Shamoun came by a couple of days ago, saw that I had the disassembled program on my screen, and half jokingly asked, "how come it always comes down to the disassembly with you?" Well, here's why. After ruling out an invalid memory reference, inconsistent definition/use of the struct, use of memory after free, thread stack overflow, heap corruption, stack corruption, turning off the optimizer, and variable scoping conflict, I couldn't think of anything else to do but disassemble the function and break out the PA-RISC 1.1 Architecture and Instruction Set Reference Manual.

So here it is:

;;; void PollData::toBinary(BinaryPollData *bpd) {
0xfd920 stw %rp,-0x14(%sp)
0xfd924 stw,ma %r3,0x40(%sp)
0xfd928 stw %r4,-0x3c(%sp)
0xfd92c copy %r25,%r4
0xfd930 copy %r26,%r3
;;; memset(bpd, 0, sizeof(BinaryPollData));
0xfd934 ldi 0,%r25
0xfd938 ldi 0xce,%r24
0xfd93c call 0xfd8f8
0xfd940 copy %r4,%r26
;;; bpd->bpd_version = BPD_VERSION;
0xfd944 ldi 4,%r25
0xfd948 sth %r25,0(%r4)
;;; bpd->cpu_pct = 5;
0xfd94c ldi 5,%r26
0xfd950 stw %r26,2(%r4)
...

The crash happens at 0xfd950. The value of register 4 is 0x76547bd8. See it yet? Perhaps the definition of STW would help:


So the line in question:
0xfd950 stw %r26,2(%r4)
basically says "take the value in register 26 and store it in the memory location represented by taking the address in register 4 and adding 2. In other words, stick the value 5 into memory address 0x76547bda.

It boils down to the compiler not doing unaligned access properly. The address of the cpu_pct member (%r4) is 0x76547bd8 plus the offset 2. The resulting address 0x76547bda is not aligned on a 32 bit boundary, so the STW operation throws a bus error when attempting to write the content of register 26 (0x05) into that address.

What should it have done? Writing a foo.c shows the proper code emitted by the compiler should have been something along the lines of:

0x2bfc ldi 5,%r31
0x2c00 extrw,u %r31,15,16,%r19
0x2c04 sth %r19,-0x3e(%sp)
0x2c08 sth %r31,-0x3c(%sp)

The above code breaks the value into two halfwords and stores them separately.