Ticket #4 (closed defect: fixed)

Opened 5 years ago

Last modified 5 years ago

Compiling with GDC causes runtime segmentation faults or undefined symbols

Reported by: Pse Assigned to: Pse
Priority: major Milestone: Pre-release 8
Component: gtkd - classes Version: TRUNK
Keywords: gdc segmentation fault Cc:

Description (Last modified by JJR)

A tough one to crack. Compiling both libraries and programs seems to work fine using GDC. However, resulting binaries get immediate segfaults when running them:

Loaded lib = libgtk-x11-2.0.so
Loaded lib = libglib-2.0.so
Loaded lib = libatk-1.0.so
Loaded lib = libgobject-2.0.so
Loaded lib = libgdk-x11-2.0.so
Loaded lib = libgdk_pixbuf-2.0.so
Loaded lib = libgthread-2.0.so
Loaded lib = libpango-1.0.so
Segmentation fault (core dumped)

A quick gdb backtrace points to g_option_context_get_summary(). Probably more information will be needed to track this bug down.

Attachments

gdc_undefsymbol_workaround.sh (1.6 kB) - added by Pse on 12/20/07 09:43:01.
Workaround for GDC codegen bug. Fixes undefined symbols while compiling programs. Segmentation faults are not fixed by this.

Change History

12/19/07 13:01:21 changed by Pse

  • owner changed from JJR to Pse.

12/19/07 13:01:28 changed by Pse

  • status changed from new to assigned.

12/20/07 09:40:45 changed by Pse

  • summary changed from Compiling with GDC causes runtime segmentation faults to Compiling with GDC causes runtime segmentation faults or undefined symbols.

It seems we're dealing with two separate bugs. The first one is causing the segmentation faults described above. The other one is an apparently known codegen bug first described in Tango ticket #450. We can work around this bug in a similar fashion to the Tango guys, at least until gdc gets fixed. Since this is only an ugly hack, it'd be wiser to make no permanent changes to the codebase. Ergo, we now have this little script that makes the necessary changes to each file.

Of course, this doesn't fix the first problem...

12/20/07 09:43:01 changed by Pse

  • attachment gdc_undefsymbol_workaround.sh added.

Workaround for GDC codegen bug. Fixes undefined symbols while compiling programs. Segmentation faults are not fixed by this.

12/21/07 07:41:30 changed by ShprotX

The reason is that strings, which are passed to gtk_init in GtkD.init(), are not null-terminated.

12/21/07 08:34:40 changed by Pse

I don't think that's the case ShprotX for two reasons:
1. There are no segmentation faults with DMD.
2. Even when properly terminated strings are passed, programs segfault.

12/22/07 11:12:39 changed by Pse

(follow-up: ↓ 15 ) 12/25/07 11:16:16 changed by JJR

A question: when you experience this problem, are you using dsss to compile the gtkd demos and library? (I also assume this is linux specific).

The reason I ask is because there is a similar problem with project guisterax: project guisterax compiles, links, and runs with no issues when built via a makefile (using gdc). The same project (using dsss with gdc) compiles and links with dsss, but the resulting binary segfaults on run in similar fashion to gtkD... right after the dynamic loading of the libraries. In fact, it seems to be the first call to a shared library function that causes the trouble (in the case of guisterax, it happens in SDL_Init). The libraries are being loaded correctly in both cases and the loaded functions appear to be valid (non-null), but I can't figure out what the problem is. Could be a faulty codegen also, but it's very hard to trace.

Incidentally, I never get the undefined symbol error in the link stage.

- dsss 0.73 (tried 0.74 but that version won't build the project with dmd) - gdc 0.24 - Opensuse Linux 10.3 (Intel Pentium M) - codebase: branches/experimental/alternate1

12/25/07 11:29:38 changed by Pse

I'm inclined to believe the undefined symbols bug is unrelated to the segmentation faults we're both seeing. You won't get any undefined symbols as long as you're not building a library with enums that have default values in them.
As to the segmentation faults, I was about to bring this up in the newsgroup. It does seem hard to trace. My backtraces usually show data that seems correct, though I've seen some stuff that may be questionable, such as different calls just before segfaulting (stack corruption?). I'm at a loss as how to proceed here, I just hope some of the GDC guys can give us some pointers.

Merry Christmas!

12/25/07 11:34:29 changed by Pse

Forgot to say... I've been trying to come up with a simplified (and clean) test case for this problem (segfaults), but I haven't had any luck. I guess I'll just have to tell everybody to build the full library and test for themselves. Not nice.

(follow-up: ↓ 12 ) 12/25/07 11:38:33 changed by Pse

Forgot to say... again... that you won't get any undefined symbols while building the library (that is gtkD). You'll have to first build the library, then install it, and at last try to compile the demos. Cairo compiles cleanly as it doesn't seem to need to link against any faulty enums, but as soon as you get into building any of the other demos you'll get an undefined symbol.

(follow-up: ↓ 13 ) 12/25/07 11:50:44 changed by Pse

JJR, would you mind posting the arguments passed to gdc when building guisterax by makefile and by DSSS?

dsss build -v

We may find something interesting.

(in reply to: ↑ 10 ) 12/25/07 12:08:49 changed by JJR

Replying to Pse:

Forgot to say... again... that you won't get any undefined symbols while building the library (that is gtkD). You'll have to first build the library, then install it, and at last try to compile the demos. Cairo compiles cleanly as it doesn't seem to need to link against any faulty enums, but as soon as you get into building any of the other demos you'll get an undefined symbol.

Ah! right! That was the problem... I was just testing with the cairo example. :P

(in reply to: ↑ 11 ) 12/25/07 12:24:33 changed by JJR

Replying to Pse:

JJR, would you mind posting the arguments passed to gdc when building guisterax by makefile and by DSSS? {{{ dsss build -v }}} We may find something interesting.

Sure thing, here's the link to the output of dsss: http://paste.dprogramming.com/dpkbxaat

and here's the guisterax makefile line (release version): {{{release: src/*.d

gdc -frelease -o guisterax src/*.d src/derelict/util/*.d src/derelict/sdl/*.d -Isrc/ -Isrc/derelict -ldl

}}}

I see a "-q,-rdynamic" in the link stage passed to gdmd.. I'm not familiar with those. The other difference is that gdmd is passed a -version=Posix automatically in the compile phase of timer.d.

The major difference could be in gdmd being used verses gdc.

12/25/07 12:35:33 changed by JJR

I think it's time to build a makefile for gtkd to help figure out this problem.

(in reply to: ↑ 7 ; follow-up: ↓ 16 ) 12/27/07 15:09:53 changed by Mike Wey

Replying to JJR:

A question: when you experience this problem, are you using dsss to compile the gtkd demos and library? (I also assume this is linux specific).

I just build the libgtkd.a by hand and it has the same problem with the segfault. when trying the gtkD demo.

(in reply to: ↑ 15 ; follow-up: ↓ 17 ) 12/27/07 15:22:59 changed by Pse

Replying to Mike Wey:

Replying to JJR:

A question: when you experience this problem, are you using dsss to compile the gtkd demos and library? (I also assume this is linux specific).

I just build the libgtkd.a by hand and it has the same problem with the segfault. when trying the gtkD demo.

Did you use gdmd or gdc?

(in reply to: ↑ 16 ; follow-up: ↓ 18 ) 12/28/07 08:16:24 changed by Mike Wey

Replying to Pse:

Replying to Mike Wey:

Replying to JJR:

A question: when you experience this problem, are you using dsss to compile the gtkd demos and library? (I also assume this is linux specific).

I just build the libgtkd.a by hand and it has the same problem with the segfault. when trying the gtkD demo.

Did you use gdmd or gdc?

gdc

(in reply to: ↑ 17 ; follow-ups: ↓ 19 ↓ 20 ) 12/28/07 08:28:10 changed by Mike Wey

I can build the lib and Exemple with gdc 0.23 without any problems.

gcc version 4.1.1 20060524 ( (gdc 0.23, using dmd 1.007))

(in reply to: ↑ 18 ) 12/28/07 10:02:27 changed by Mike Wey

An other follow up if i comment out:

notebook.appendPage(new TestTreeView?1,"TreeView? 1");

in TestWindow?.d line: 319 of the gtkD example. a am able to build it with gdc 0.24.

gcc version 4.1.2 20070214 ( (gdc 0.24, using dmd 1.020))

(in reply to: ↑ 18 ) 12/28/07 12:56:49 changed by Pse

Replying to Mike Wey:

I can build the lib and Exemple with gdc 0.23 without any problems.

gcc version 4.1.1 20060524 ( (gdc 0.23, using dmd 1.007))

Meaning no segmentation faults when running the examples? Or no undefined symbols? Or both? I'll downgrade to gdc 0.23 and try it out.

(follow-up: ↓ 22 ) 12/28/07 14:34:27 changed by JJR

I missed these posts until now. So the problem still exists with gdc compilation by hand, but not with gdc 0.23. If there is a way to reduce this problem for gdc 0.24, that would be optimal. :P So I guess we can keep investigating TestWindow?.d and see what's causing it?

The forum post indicated an issue with the doubly indirect pointer...this could go back to an issue with GtkD.init where the D char[][] is manually converted to a char**... maybe we should try to reimplement that method completely... this time using a malloc (or merely a local variable) instead of a new before passing the pointer to gtk_init. It could be related to the memory access.

Just a thought.

(in reply to: ↑ 21 ) 12/28/07 15:42:55 changed by Mike Wey

I think i tracked down the problem with the gtkD example, it comes from the char** in TestTreeView?1.d. They are null when compiled with gdc-0.24, but are not null with gdc-0.23 and dmd 1.024.

See Bugzilla Issue 1751: http://d.puremagic.com/issues/show_bug.cgi?id=1751

I was able to get it down to:

import std.stdio;

void main()
{
	//prints true with gdc-0.24,
	//and false with gdc-0.23 and dmd 1.024
	writefln(test is null);
	assert(test, "test is only null in gdc 0.24");
}

char** test = ["hello", "test", "    "];

12/28/07 15:54:21 changed by Pse

I actually saw valid data being passed to gtk_init() when debugging with gdb. This is really odd. But in any case, this seems to nail it. It's amazing just how many different gdc bugs gtkD was able to hit at the same time. I mean, what about all the other projects? I'm sure someone must have hit something on his or her way, it's like a huge pile of stones on a tiny little roadway.

I'll finish some pending commits and get on with testing gdc-0.23. We may have gdc and docs ready for pre-release 8 after all.

(follow-up: ↓ 25 ) 12/28/07 16:18:59 changed by JJR

Yep, now gtkD just needs a workaround implemented for gdc 0.24. Thank you very much for tracking this down, Pse and Mike.

Concerning gtk_init(), I noticed the same as you Pse. I think the data and pointer may be valid up to a certain point, but, later, the pointer is may become invalid once inside the gtk library... not sure. But it would be good to see if a workaround will fix this too. Basically we need to try an alternate method of doing the same thing in GtkD.Init.

I'm still not sure why the guesterax project works when built with gdc directly verses when using dsss. I may start looking for another instance of a char** lurking somewhere in there. It's good to at least have some idea of what's going on.

(in reply to: ↑ 24 ; follow-up: ↓ 26 ) 12/28/07 18:11:55 changed by JJR

Regarding dsss build verses plain gdc build, the difference in success or failure is due to -rdynamic flag being absent or present (works when absent: dsss uses this flag).

See discussion in this topic: http://www.dsource.org/forums/viewtopic.php?t=3475

(in reply to: ↑ 25 ; follow-up: ↓ 27 ) 12/28/07 19:00:07 changed by Mike Wey

Replying to JJR:

Regarding dsss build verses plain gdc build, the difference in success or failure is due to -rdynamic flag being absent or present (works when absent: dsss uses this flag). See discussion in this topic: http://www.dsource.org/forums/viewtopic.php?t=3475

When linking with the -rdynamic flag not only GtkD.init generates a segfault but also GtkD.setLocale() for example, the -rdynamic flag might be messing up the liker in GtkD.

(in reply to: ↑ 26 ) 12/28/07 19:47:20 changed by JJR

Yes, this kind of makes sense given the info on -rdynamic. Since GtkD.init contains references to the dynamicly loaded symbols (gtk_init, etc), problems will always begin with every call to these symbols inside the GtkD classes. gtk_init I believe is pretty much the first shared library call that is made.

-rdynamic, as passed to the linker, seems to mess with how these symbols are registered in the dynamic global symbol table. I'm guessing this creates some sort of stack or register corruption as the calls progress inside the gtk libraries.

It's enough to know that this doesn't work and -rdynamic cannot be used in this situation; but if we were really curious what was going on, I guess we could compare stack and register outputs of versions of the binaries linked with and without -rdynamic (zerobugs shows all this information in fairly user-friendly manner).

12/28/07 20:31:03 changed by Pse

I've added modified rebuild configuration files in r339. You can now use dsss to build gtkd without passing rdynamic as an argument to the linker. Please test it by running:

dsss build -dc=gdc-posix.dsss

Or

dsss build -dc=gdc-posix-tango.dsss

...if you're using Tango.

If everything works well, I'll update the README files with a notice for those who want to use gdc.

12/29/07 00:06:47 changed by JJR

Actually there is no need to add new config files, thankfully. Also the library may be built with the -rdynamic option enabled (leave as is) because there is no link stage when building the library. The link stage is where -rdynamic takes effect. This means we just leave things as is in the trunk directory. GtkD users just do a "dsss build" as usual to make the library.

Now for the demos... In the demo section, we merely need to add "-no-export-dynamic" to the "buildflags" variable of the demo's dsss.conf and build them as usual with a "dsss build". Since the link stage now comes into effect, this is the time "-rdynamic" would activate if it were enabled. "-no-export-dynamic" disables it, and the demos compile as they should (well... up until the other bugs show up).

I've tested this on the cairo example and it works fine with gdc 0.24 and dsss. It's nice to know rebuild has this flag. :)

12/29/07 00:14:06 changed by Pse

Great finding, JJR. And you're right about the linking stage in gtkD, it's a library we're building after all. I'll get to changing the right dsss.conf files if you haven't changed them already. A proper warning message should be added to README_DSSS, as users will like to know why they are getting segfaults.

12/29/07 00:30:12 changed by JJR

Yep, go ahead and add the changes if you will. I have to "svn up" anyway which might take a while over a slow modem connection :(. I guess all the demos dsss.conf's will need the addition.

(follow-up: ↓ 34 ) 12/29/07 01:09:12 changed by Pse

In r340. Thanks a lot to both of you, JJR and Mike. I'll report this on the forums and leave this bug open till it's been properly tested that this works. Support for gdc remains limited to versions 0.22 and 0.23 for now. Proper notices have been added to the README files.

12/29/07 01:27:08 changed by Pse

  • version changed from 1.0 to HEAD.

(in reply to: ↑ 32 ) 12/29/07 10:16:30 changed by Mike Wey

Tested with dsss+gdc-0.23, dsss+gdc-0.24 and dsss+dmd 1.024 using r340.

dsss+gdc-0.23: All examples build. Only gtk/DrawRect gets an:

AssertError? Failure gdk/Window.d(272) struct gdkWindow is null on constructor

the rest of the Examples work.

dsss+gdc-0.24 needs the gdc_undefsymbol_workaround.sh. All examples build. gtkD/gtkDtests segfaults char** issue gtk/DrawRect gets an:

AssertError? Failure gdk/Window.d(272) struct gdkWindow is null on constructor

the rest of the Examples work.

dsss+dmd 1.024: All examples build. Only gtk/DrawRect gets an:

AssertError? Failure gdk/Window.d(272) struct gdkWindow is null on constructor

the rest of the Examples work.

12/29/07 11:12:34 changed by Pse

Well, to be honest, that seems to be the expected result of that demo:

module gtk.DrawRect;

import gdk.Window;
import glib.ListG;
import glib.ListSG;
import glib.Source;


void main(char[][] args)
{
        (new Window(null)).drawRectangle(null, false, 0,0,0,0);
}

Ticket #12. Rewrite/disable non-working demos.

12/29/07 11:49:45 changed by JJR

Actually, drawRectangle appears to be there just to test the null constructor exception. Afterall, it's just drawing a rectangle with 0,0,0,0 size/coordinates.

So the options are:

  • to make it a working example
  • leave it as is and rename it "exceptionTest" to indicate its actual purpose.

12/29/07 11:53:33 changed by JJR

  • description changed.

01/01/08 09:05:49 changed by Pse

  • status changed from assigned to closed.
  • resolution set to fixed.

I am setting this to fixed as there are no easily implementable fixes for gdc-0.24 and previous versions work fine with the added changes. If anyone feels gdc support needs more care, feel free to reopen this ticket.