Thursday, February 8, 2007

Another bug in Perl core

Fun fun fun. Looks like I found another bug in Perl core today. Been trying to track down the source of a crash for the last couple of days. After finally getting Perl and all my modules compiled with debug info, got this stack trace (truncated by me):


(gdb) bt
#0 0x08179556 in Perl_vmess (my_perl=0x85e79d0, pat=0xf4c51060 "%s handle %s is not a DBI handle", args=0xf7526ad4) at util.c:1013
#1 0x08179b8b in S_vdie_croak_common (my_perl=0x85e79d0, pat=0xf4c51060 "%s handle %s is not a DBI handle", args=0xf7526ad4,
msglen=0xf7526ab0, utf8=0xf7526aac) at util.c:1088
#2 0x0817a214 in Perl_vcroak (my_perl=0x85e79d0, pat=0xf4c51060 "%s handle %s is not a DBI handle", args=0xf7526ad4) at util.c:1208
#3 0x0817a364 in Perl_croak_nocontext (pat=0xf4c51060 "%s handle %s is not a DBI handle") at util.c:1229
#4 0xf4c3beae in dbih_inner (orv=0x97651e4, what=0xf4c511ad "dbih_getcom") at DBI.xs:735
#5 0xf4c3c195 in dbih_getcom2 (hrv=0x97651e4, mgp=0x0) at DBI.xs:814
#6 0xf4c437f0 in XS_DBI_dispatch (my_perl=0x85e79d0, cv=0x912d130) at DBI.xs:2629
#7 0x08197119 in Perl_pp_entersub (my_perl=0x85e79d0) at pp_hot.c:2888
#8 0x081247c8 in S_call_body (my_perl=0x85e79d0, myop=0xf75270e0, is_eval=0) at perl.c:2361
#9 0x081244a1 in Perl_call_sv (my_perl=0x85e79d0, sv=0x912d130, flags=150) at perl.c:2282
#10 0x081a16bf in Perl_sv_clear (my_perl=0x85e79d0, sv=0x9ba5074) at sv.c:5110
#11 0x081a1f5f in Perl_sv_free (my_perl=0x85e79d0, sv=0x9ba5074) at sv.c:5356
#12 0x0819874b in do_clean_objs (my_perl=0x85e79d0, sv=0x912d850) at sv.c:394
#13 0x0819868a in S_visit (my_perl=0x85e79d0, f=0x81986b1 , flags=524288, mask=524288) at sv.c:340
#14 0x081988c3 in Perl_sv_clean_objs (my_perl=0x85e79d0) at sv.c:434
#15 0x08120859 in perl_destruct (my_perl=0x85e79d0) at perl.c:565


This is pretty enlightening, since it tells me exactly what line the segfault occurred on (util.c line 1013). And it looks pretty simple:

if (GvIO(PL_last_in_gv) && IoLINES(GvIOp(PL_last_in_gv))) {


That is, until you expand the macros, which means the above line is really:

if ((((my_perl->Tlast_in_gv)) && (((SV*)(my_perl->Tlast_in_gv))->sv_flags & 0xff) == SVt_PVGV && (((XPVGV*)((my_perl->Tlast_in_gv))->sv_any)->xgv_gp) ? ((((XPVGV*)((my_perl->Tlast_in_gv))->sv_any)->xgv_gp)->gp_io) : 0) && ((XPVIO*) (((((XPVGV*)((my_perl->Tlast_in_gv))->sv_any)->xgv_gp)->gp_io))->sv_any)->xio_lines) {

Also raises the question as to why nobody else has seen this. Did some poking with gdb, and found out that the first part of the if statement does evaluate to a non-null value. The segfault actually occurs in the second part. You see, GvIOp() actually translates to this:

(((((XPVGV*)((my_perl->Tlast_in_gv))->sv_any)->xgv_gp)->gp_io))


and then the result is taken and passed into IoLINES(), which just dereferences it like so:

->sv_any->xio_lines

BUT.... "->sv_any" is 0x0 according to the debugger. In other words, the attempt to offset for xio_lines is a null dereference.

(gdb) print ((((((XPVGV*)((my_perl->Tlast_in_gv))->sv_any)->xgv_gp)->gp_io))->sv_any)
$5 = (XPVIO *) 0x0

CRASH.

So, why doesn't this happen more often? Well, take a look again at the stack. The problem occurs when the Perl interpreter is trying to figure out what Op code and therefore what line number the error occurred on (that's what Perl_vmess does after all). But in this case, the error is actually occurring while shutting down DBI, and in particular in the XS portion of DBI (DBI.xs line 735). So the Perl interpreter cannot figure out what line it occurred on since the error occurred in native code.

Given all this, the fix looks pretty simple. Just don't use GvIOp() if its null.


if (GvIO(PL_last_in_gv) &&
GvIOp(PL_last_in_gv) &&
IoLINES(GvIOp(PL_last_in_gv))) {


Neato.

On a separate note , I still have to get to the bottom of why the handle is not a DBI handle (which is what prompted the error message in the first place), but at least it won't crash anymore...