View previous topic :: View next topic |
Author |
Message |
kris
Joined: 27 Mar 2004 Posts: 1494 Location: South Pacific
|
Posted: Sun Jul 11, 2004 2:17 am Post subject: |
|
|
JJR wrote: | Program received signal SIGSEGV, Segmentation fault.
0x0806abad in _D3gcx3Gcx4markFPvPvZv()
I don't know what "mark" is... |
Just had a thought that it might be worth listing the code on either side of this address to see if we, or someome else, can figure out which module it's in. The additional symbols might shed some light on the matter (I presume gdb can list the assembly with symbolic info embedded?). |
|
Back to top |
|
|
JJR
Joined: 22 Feb 2004 Posts: 1104
|
Posted: Sun Jul 11, 2004 2:51 am Post subject: |
|
|
Hope you had a good time at the concert. I'm sorry I didn't get back to your last response. I took a little break myself...
I've yet to play with the static destructor (ie remove it). I'll do that shortly. But to answer your most recent post above...
I couldn't get gdb to give me the location of "mark" (I'm still not quite familiar with how to use it), but I was able to do some hunting of my own. Basically, I decided to hunt down anything related to "mark" in all the object files using "grep." I turned up nothing in mango, but then I realized that it links to libphobos.a also. So I decided to grep that too. It gave a positive find. I figured I should grep the phobos source also then. There I found several references to "mark" in the gc subdirectory of the phobos src: mark() is defined and called within gcx.d (./dmd/src/phobos/internal/gc). Apparently it's used to "mark any pointers into the GC pool."
Actually now that I look at the symbol, it's location makes complete sense. I see gcx and Gcx embedded in the ouput. And just maybe the Pv Pv is the mangling for the 2 "void *" arguments...
Could this be related to the problem? If so what mango object would be causing a crash in the garbage collector? Maybe I'm getting off track here; but nevertheless, I thought this could be a lead.
Later,
John |
|
Back to top |
|
|
csauls
Joined: 27 Mar 2004 Posts: 278
|
Posted: Sun Jul 11, 2004 6:15 am Post subject: |
|
|
The following will generate that error:
Code: |
void writeVar(Var var, TextWriter tw) {
switch (var.type) {
case CLEAR:
tw.put(NCLEAR); // fails. NCLEAR is int.
break;
case INT:
tw.put(NINT).put(var.i);
break;
case FLOAT:
tw.put(NFLOAT).put(var.f);
break;
case STR:
tw.put(NSTR).put(var.s);
break;
case OBJ:
tw.put(NOBJ).put(var.i);
break;
case ERR:
tw.put(NERR).put(var.i);
break;
case LIST:
tw.put(NLIST).put(var.l.length);
foreach (inout Var x; var.l) {
writeVar(x, tw);
}
break;
}
}
|
_________________ Chris Nicholson-Sauls |
|
Back to top |
|
|
JJR
Joined: 22 Feb 2004 Posts: 1104
|
Posted: Sun Jul 11, 2004 10:37 am Post subject: |
|
|
Strange... On Linux, windows, or both? |
|
Back to top |
|
|
kris
Joined: 27 Mar 2004 Posts: 1494 Location: South Pacific
|
Posted: Sun Jul 11, 2004 11:58 am Post subject: |
|
|
JJR wrote: | Hope you had a good time at the concert. I'm sorry I didn't get back to your last response. I took a little break myself...
I've yet to play with the static destructor (ie remove it). I'll do that shortly. But to answer your most recent post above...
I couldn't get gdb to give me the location of "mark" (I'm still not quite familiar with how to use it), but I was able to do some hunting of my own. Basically, I decided to hunt down anything related to "mark" in all the object files using "grep." I turned up nothing in mango, but then I realized that it links to libphobos.a also. So I decided to grep that too. It gave a positive find. I figured I should grep the phobos source also then. There I found several references to "mark" in the gc subdirectory of the phobos src: mark() is defined and called within gcx.d (./dmd/src/phobos/internal/gc). Apparently it's used to "mark any pointers into the GC pool."
Actually now that I look at the symbol, it's location makes complete sense. I see gcx and Gcx embedded in the ouput. And just maybe the Pv Pv is the mangling for the 2 "void *" arguments...
Could this be related to the problem? If so what mango object would be causing a crash in the garbage collector? Maybe I'm getting off track here; but nevertheless, I thought this could be a lead. |
Gig was good; thanks John. I prefer much of the early material ~ but Rush are consumate pro's, so it's usually a reasonable show. The washing machines made another appearance this year ...
Good job on tracking down the GC mark method. Unfortunately that tells us only that memory has been corrupted.
The way out of this is to strip functionality until it executes cleanly, and then perhaps do some fine grained changes. How about adding the s.cancel() method as noted yesterday, or just comment out the
Code: | ms.join (ia);
ISocketListener s = new SocketListener (ms, new Buffer(1500), &listen); |
incrementally (in reverse) ? |
|
Back to top |
|
|
kris
Joined: 27 Mar 2004 Posts: 1494 Location: South Pacific
|
Posted: Sun Jul 11, 2004 12:11 pm Post subject: |
|
|
csauls wrote: | The following will generate that error:
Code: |
void writeVar(Var var, TextWriter tw) {
switch (var.type) {
case CLEAR:
tw.put(NCLEAR); // fails. NCLEAR is int.
break;
case INT:
tw.put(NINT).put(var.i);
break;
case FLOAT:
tw.put(NFLOAT).put(var.f);
break;
case STR:
tw.put(NSTR).put(var.s);
break;
case OBJ:
tw.put(NOBJ).put(var.i);
break;
case ERR:
tw.put(NERR).put(var.i);
break;
case LIST:
tw.put(NLIST).put(var.l.length);
foreach (inout Var x; var.l) {
writeVar(x, tw);
}
break;
}
}
|
|
Chris, I moved this into a seperate thread/topic. Hope you don't mind. |
|
Back to top |
|
|
JJR
Joined: 22 Feb 2004 Posts: 1104
|
Posted: Sun Jul 11, 2004 2:45 pm Post subject: |
|
|
kris wrote: | The way out of this is to strip functionality until it executes cleanly, and then perhaps do some fine grained changes. How about adding the s.cancel() method as noted yesterday, or just comment out the
Code: | ms.join (ia);
ISocketListener s = new SocketListener (ms, new Buffer(1500), &listen); |
incrementally (in reverse) ? |
s.cancel() does nothing to fix the segfault.
Commenting out...
ISocketListener s = new SocketListener (ms, new Buffer(1500), &listen);
seems to be the only thing that keeps the segfault from occurring. |
|
Back to top |
|
|
kris
Joined: 27 Mar 2004 Posts: 1494 Location: South Pacific
|
Posted: Sun Jul 11, 2004 2:56 pm Post subject: |
|
|
Okay; that's good! |
|
Back to top |
|
|
JJR
Joined: 22 Feb 2004 Posts: 1104
|
Posted: Sun Jul 11, 2004 3:44 pm Post subject: |
|
|
Ok, here's what I've done so far:
I uncommented this again (in unittest.d):
Code: | ISocketListener s = new SocketListener (ms, new Buffer(1500), &listen); |
Below that, I added:
And in SocketListener.d I removed the try/catch block so that it now looks like this:
Code: | override int run()
{
while (true) {
// try {
// wait for incoming content
reader.read (buffer);
// time to quit?
if (quit || Socket.isCancelled())
break;
// invoke callback
notify (buffer);
// } catch (Object x) {
// time to quit?
// if (quit || Socket.isCancelled())
// break;
//Stderr.put ("SocketListener: ").put(x.toString).cr();
// }
}
printf("listener exit\n");
return 0;
} |
With these changes, it appears that s.cancel() works...
output:
Code: | receive(): 1000 bytes
listener exit
1501 INFO mango.unittest - Done
Socket.d: static Destructor
SocketListener destructor |
No segfault occurs. If I remove s.cancel() from testMulticast(), the segfault returns.
Later,
John |
|
Back to top |
|
|
kris
Joined: 27 Mar 2004 Posts: 1494 Location: South Pacific
|
Posted: Sun Jul 11, 2004 6:16 pm Post subject: |
|
|
Well, you've unearthed a logical codegen bug in both the Win32 and linux versions of DMD: the break statement within the try/catch block does not exit the enclosing while loop at all; in fact it does absolutely nothing useful!
The documentation states that a finally clause will always be executed first before the "break target" is reached, however, there's no finally clause here. Win32 seems to handle this situation at program termination without much ado, but linux definately does not like it at all.
The upshot is that reader.read() is probably called yet again even after the OS has interrupted it, since the break statement does not exit the loop properly. Can you make the code look like this and try again please John?
Code: | override int run()
{
while (true)
try {
// wait for incoming content
reader.read (buffer);
// time to quit?
if (quit || Socket.isCancelled())
return 0; // <<<<<< change here <<<<<<<
// invoke callback
notify (buffer);
} catch (Object x)
{
// time to quit?
if (quit || Socket.isCancelled())
return 0; // <<<<<< change here <<<<<<<
printf ("SocketListener: ?.*s\n", x.toString());
//Stderr.put ("Exception: "~x.toString).cr();
}
return 0;
}
} |
The static destructor in Socket.d needs to be restored also. Even if this doesn't resolve it, you've still found a codegen error. |
|
Back to top |
|
|
JJR
Joined: 22 Feb 2004 Posts: 1104
|
Posted: Sun Jul 11, 2004 6:43 pm Post subject: |
|
|
kris wrote: | Well, you've unearthed a logical codegen bug in both the Win32 and linux versions of DMD: the break statement within the try/catch block does not exit the enclosing while loop at all; in fact it does absolutely nothing useful! |
Ahh... I was wondering what was going on... in fact it looked to me like it was getting stuck in there somehow, but I couldn't put my finger on it.
kris wrote: | The documentation states that a finally clause will always be executed first before the "break target" is reached, however, there's no finally clause here. Win32 seems to handle this situation at program termination without much ado, but linux definately does not like it at all.
The upshot is that reader.read() is probably called yet again even after the OS has interrupted it, since the break statement does not exit the loop properly. Can you make the code look like this and try again please John? |
I believe you are correct. Previously, I inserted a printf() just after the while() and found that it executes at least twice, probably resulting in the second and unintended execution of reader.read().
kris wrote: | The static destructor in Socket.d needs to be restored also. Even if this doesn't resolve it, you've still found a codegen error. |
I was confused as to why the thread was getting stuck there. Good to know at least that the "exit" is indeed the issue. At any rate, I'll change the code per your instructions and test it out.
Later,
John |
|
Back to top |
|
|
kris
Joined: 27 Mar 2004 Posts: 1494 Location: South Pacific
|
Posted: Sun Jul 11, 2004 6:47 pm Post subject: |
|
|
then, the next thing I would try would be an isolated test case, with no join(), send(), or anything else, like this:
Code: | void testSocketListener()
{
void listen (IBuffer buffer)
{
printf ("listener received ?d bytes\n", buffer.readable());
}
MulticastSocket ms = new MulticastSocket ();
ISocketListener s = new SocketListener (ms, new Buffer(1500), &listen);
} |
and if that segfaults, then I would change the Socket type to be a SocketConduit instead:
Code: | SocketConduit ms = new SocketConduit ();
ISocketListener s = new SocketListener (ms, new Buffer(1500), &listen); |
and try again. If that fails, then I'd try running a contrived thread like so;
Code: | class MyThread : Thread
{
override int run()
{
while (true);
return 0;
}
}
testThread()
{
MyThread mt = new MyThread;
mt.start();
}
|
which will start a seperate thread and then exit the program, causing the thread to be terminated while it's thrashing away in the while-loop. What I'd be looking to discover is whether linux Threads generally barf during program termination, whether it's an interrupted datagram read operation that causes the problem, whether it's any interrupted socket read operation, or whether it some combination thereof. |
|
Back to top |
|
|
JJR
Joined: 22 Feb 2004 Posts: 1104
|
Posted: Sun Jul 11, 2004 6:50 pm Post subject: |
|
|
That did it! No seg faults anymore after replacing the "break" with "return 0;"
I made sure the destructors were operational also.
Later,
John |
|
Back to top |
|
|
JJR
Joined: 22 Feb 2004 Posts: 1104
|
Posted: Sun Jul 11, 2004 6:56 pm Post subject: |
|
|
My previous post was answer to your first solution concerning the exit bug. That fixed the problem. Just clarifying since you posted some more things to try after I posted my success. :-p
Out of sync posts can get confusing... |
|
Back to top |
|
|
kris
Joined: 27 Mar 2004 Posts: 1494 Location: South Pacific
|
Posted: Sun Jul 11, 2004 6:58 pm Post subject: |
|
|
Fu$k! Don't you just hate Beta software? Mango included ...
Thanks John, for yet another bug worked out on linux. For those who use Mango on linux ~ you should know it would not be running without the tenacious efforts of JJR |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|