FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Mango Beta 9
Goto page Previous  1, 2, 3, 4, 5  Next
 
Post new topic   Reply to topic     Forum Index -> Mango
View previous topic :: View next topic  
Author Message
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Sun Jul 11, 2004 2:17 am    Post subject: Reply with quote

JJR wrote:
Program received signal SIGSEGV, Segmentation fault.
0x0806abad in _D3gcx3Gcx4markFPvPvZv()

I don't know what "mark" is...

Just had a thought that it might be worth listing the code on either side of this address to see if we, or someome else, can figure out which module it's in. The additional symbols might shed some light on the matter (I presume gdb can list the assembly with symbolic info embedded?).
Back to top
View user's profile Send private message
JJR



Joined: 22 Feb 2004
Posts: 1104

PostPosted: Sun Jul 11, 2004 2:51 am    Post subject: Reply with quote

Hope you had a good time at the concert. I'm sorry I didn't get back to your last response. I took a little break myself...

I've yet to play with the static destructor (ie remove it). I'll do that shortly. But to answer your most recent post above...

I couldn't get gdb to give me the location of "mark" (I'm still not quite familiar with how to use it), but I was able to do some hunting of my own. Basically, I decided to hunt down anything related to "mark" in all the object files using "grep." I turned up nothing in mango, but then I realized that it links to libphobos.a also. So I decided to grep that too. It gave a positive find. I figured I should grep the phobos source also then. There I found several references to "mark" in the gc subdirectory of the phobos src: mark() is defined and called within gcx.d (./dmd/src/phobos/internal/gc). Apparently it's used to "mark any pointers into the GC pool."

Actually now that I look at the symbol, it's location makes complete sense. I see gcx and Gcx embedded in the ouput. And just maybe the Pv Pv is the mangling for the 2 "void *" arguments... Smile

Could this be related to the problem? If so what mango object would be causing a crash in the garbage collector? Maybe I'm getting off track here; but nevertheless, I thought this could be a lead.

Later,

John
Back to top
View user's profile Send private message
csauls



Joined: 27 Mar 2004
Posts: 278

PostPosted: Sun Jul 11, 2004 6:15 am    Post subject: Reply with quote

The following will generate that error:
Code:

void writeVar(Var var, TextWriter tw) {
  switch (var.type) {
    case CLEAR:
      tw.put(NCLEAR); // fails.  NCLEAR is int.
      break;

    case INT:
      tw.put(NINT).put(var.i);
      break;

    case FLOAT:
      tw.put(NFLOAT).put(var.f);
      break;

    case STR:
      tw.put(NSTR).put(var.s);
      break;

    case OBJ:
      tw.put(NOBJ).put(var.i);
      break;

    case ERR:
      tw.put(NERR).put(var.i);
      break;

    case LIST:
      tw.put(NLIST).put(var.l.length);
      foreach (inout Var x; var.l) {
        writeVar(x, tw);
      }
      break;
  }
}

_________________
Chris Nicholson-Sauls
Back to top
View user's profile Send private message AIM Address Yahoo Messenger
JJR



Joined: 22 Feb 2004
Posts: 1104

PostPosted: Sun Jul 11, 2004 10:37 am    Post subject: Reply with quote

Strange... On Linux, windows, or both?
Back to top
View user's profile Send private message
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Sun Jul 11, 2004 11:58 am    Post subject: Reply with quote

JJR wrote:
Hope you had a good time at the concert. I'm sorry I didn't get back to your last response. I took a little break myself...

I've yet to play with the static destructor (ie remove it). I'll do that shortly. But to answer your most recent post above...

I couldn't get gdb to give me the location of "mark" (I'm still not quite familiar with how to use it), but I was able to do some hunting of my own. Basically, I decided to hunt down anything related to "mark" in all the object files using "grep." I turned up nothing in mango, but then I realized that it links to libphobos.a also. So I decided to grep that too. It gave a positive find. I figured I should grep the phobos source also then. There I found several references to "mark" in the gc subdirectory of the phobos src: mark() is defined and called within gcx.d (./dmd/src/phobos/internal/gc). Apparently it's used to "mark any pointers into the GC pool."

Actually now that I look at the symbol, it's location makes complete sense. I see gcx and Gcx embedded in the ouput. And just maybe the Pv Pv is the mangling for the 2 "void *" arguments... Smile

Could this be related to the problem? If so what mango object would be causing a crash in the garbage collector? Maybe I'm getting off track here; but nevertheless, I thought this could be a lead.

Gig was good; thanks John. I prefer much of the early material ~ but Rush are consumate pro's, so it's usually a reasonable show. The washing machines made another appearance this year ...

Good job on tracking down the GC mark method. Unfortunately that tells us only that memory has been corrupted.

The way out of this is to strip functionality until it executes cleanly, and then perhaps do some fine grained changes. How about adding the s.cancel() method as noted yesterday, or just comment out the
Code:
        ms.join (ia);
        ISocketListener s = new SocketListener (ms, new Buffer(1500), &listen);

incrementally (in reverse) ?
Back to top
View user's profile Send private message
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Sun Jul 11, 2004 12:11 pm    Post subject: Reply with quote

csauls wrote:
The following will generate that error:
Code:

void writeVar(Var var, TextWriter tw) {
  switch (var.type) {
    case CLEAR:
      tw.put(NCLEAR); // fails.  NCLEAR is int.
      break;

    case INT:
      tw.put(NINT).put(var.i);
      break;

    case FLOAT:
      tw.put(NFLOAT).put(var.f);
      break;

    case STR:
      tw.put(NSTR).put(var.s);
      break;

    case OBJ:
      tw.put(NOBJ).put(var.i);
      break;

    case ERR:
      tw.put(NERR).put(var.i);
      break;

    case LIST:
      tw.put(NLIST).put(var.l.length);
      foreach (inout Var x; var.l) {
        writeVar(x, tw);
      }
      break;
  }
}

Chris, I moved this into a seperate thread/topic. Hope you don't mind.
Back to top
View user's profile Send private message
JJR



Joined: 22 Feb 2004
Posts: 1104

PostPosted: Sun Jul 11, 2004 2:45 pm    Post subject: Reply with quote

kris wrote:
The way out of this is to strip functionality until it executes cleanly, and then perhaps do some fine grained changes. How about adding the s.cancel() method as noted yesterday, or just comment out the
Code:
        ms.join (ia);
        ISocketListener s = new SocketListener (ms, new Buffer(1500), &listen);

incrementally (in reverse) ?


s.cancel() does nothing to fix the segfault.

Commenting out...

ISocketListener s = new SocketListener (ms, new Buffer(1500), &listen);

seems to be the only thing that keeps the segfault from occurring.
Back to top
View user's profile Send private message
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Sun Jul 11, 2004 2:56 pm    Post subject: Reply with quote

Okay; that's good!
Back to top
View user's profile Send private message
JJR



Joined: 22 Feb 2004
Posts: 1104

PostPosted: Sun Jul 11, 2004 3:44 pm    Post subject: Reply with quote

Ok, here's what I've done so far:

I uncommented this again (in unittest.d):

Code:
ISocketListener s = new SocketListener (ms, new Buffer(1500), &listen);


Below that, I added:

Code:
s.cancel();


And in SocketListener.d I removed the try/catch block so that it now looks like this:

Code:
override int run()
        {
                while (true) {
  //                     try {
                           // wait for incoming content
                           reader.read (buffer);
                           
            // time to quit?
                           if (quit || Socket.isCancelled())
                               break;
                           // invoke callback                       
                           notify (buffer);
 //                       } catch (Object x) {
            // time to quit?
 //                               if (quit || Socket.isCancelled())
 //                                 break;
                      //Stderr.put ("SocketListener: ").put(x.toString).cr();
 //                        }
      }
printf("listener exit\n");
                return 0;
}


With these changes, it appears that s.cancel() works...

output:

Code:
receive(): 1000 bytes
listener exit

1501 INFO mango.unittest - Done
Socket.d: static Destructor
SocketListener destructor


No segfault occurs. If I remove s.cancel() from testMulticast(), the segfault returns.

Later,

John
Back to top
View user's profile Send private message
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Sun Jul 11, 2004 6:16 pm    Post subject: Reply with quote

Well, you've unearthed a logical codegen bug in both the Win32 and linux versions of DMD: the break statement within the try/catch block does not exit the enclosing while loop at all; in fact it does absolutely nothing useful!

The documentation states that a finally clause will always be executed first before the "break target" is reached, however, there's no finally clause here. Win32 seems to handle this situation at program termination without much ado, but linux definately does not like it at all.

The upshot is that reader.read() is probably called yet again even after the OS has interrupted it, since the break statement does not exit the loop properly. Can you make the code look like this and try again please John?
Code:
        override int run()
        {
                while (true)
                       try {
                           // wait for incoming content
                           reader.read (buffer);

                           // time to quit?
                           if (quit || Socket.isCancelled())
                               return 0; // <<<<<< change here <<<<<<<
                           
                           // invoke callback                       
                           notify (buffer);

                           } catch (Object x)
                                   {
                                   // time to quit?
                                   if (quit || Socket.isCancelled())
                                       return 0; // <<<<<< change here <<<<<<<
                           
                                   printf ("SocketListener: ?.*s\n", x.toString());
                                   //Stderr.put ("Exception: "~x.toString).cr();
                                   }
                return 0;
        }
}

The static destructor in Socket.d needs to be restored also. Even if this doesn't resolve it, you've still found a codegen error.
Back to top
View user's profile Send private message
JJR



Joined: 22 Feb 2004
Posts: 1104

PostPosted: Sun Jul 11, 2004 6:43 pm    Post subject: Reply with quote

kris wrote:
Well, you've unearthed a logical codegen bug in both the Win32 and linux versions of DMD: the break statement within the try/catch block does not exit the enclosing while loop at all; in fact it does absolutely nothing useful!


Ahh... I was wondering what was going on... in fact it looked to me like it was getting stuck in there somehow, but I couldn't put my finger on it.

kris wrote:
The documentation states that a finally clause will always be executed first before the "break target" is reached, however, there's no finally clause here. Win32 seems to handle this situation at program termination without much ado, but linux definately does not like it at all.

The upshot is that reader.read() is probably called yet again even after the OS has interrupted it, since the break statement does not exit the loop properly. Can you make the code look like this and try again please John?


I believe you are correct. Previously, I inserted a printf() just after the while() and found that it executes at least twice, probably resulting in the second and unintended execution of reader.read().

kris wrote:
The static destructor in Socket.d needs to be restored also. Even if this doesn't resolve it, you've still found a codegen error.


I was confused as to why the thread was getting stuck there. Good to know at least that the "exit" is indeed the issue. At any rate, I'll change the code per your instructions and test it out.

Later,

John
Back to top
View user's profile Send private message
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Sun Jul 11, 2004 6:47 pm    Post subject: Reply with quote

then, the next thing I would try would be an isolated test case, with no join(), send(), or anything else, like this:

Code:
void testSocketListener()
{
        void listen (IBuffer buffer)
        {
                printf ("listener received ?d bytes\n", buffer.readable());
        }

        MulticastSocket ms = new MulticastSocket ();
        ISocketListener s = new SocketListener (ms, new Buffer(1500), &listen);
}

and if that segfaults, then I would change the Socket type to be a SocketConduit instead:
Code:
        SocketConduit ms = new SocketConduit ();
        ISocketListener s = new SocketListener (ms, new Buffer(1500), &listen);

and try again. If that fails, then I'd try running a contrived thread like so;

Code:
class MyThread : Thread
{
   override int run()
   {
      while (true);
      return 0;
   }
}

testThread()
{
    MyThread mt = new MyThread;
    mt.start();
}

which will start a seperate thread and then exit the program, causing the thread to be terminated while it's thrashing away in the while-loop. What I'd be looking to discover is whether linux Threads generally barf during program termination, whether it's an interrupted datagram read operation that causes the problem, whether it's any interrupted socket read operation, or whether it some combination thereof.
Back to top
View user's profile Send private message
JJR



Joined: 22 Feb 2004
Posts: 1104

PostPosted: Sun Jul 11, 2004 6:50 pm    Post subject: Reply with quote

That did it! No seg faults anymore after replacing the "break" with "return 0;"

I made sure the destructors were operational also.

Smile

Later,

John
Back to top
View user's profile Send private message
JJR



Joined: 22 Feb 2004
Posts: 1104

PostPosted: Sun Jul 11, 2004 6:56 pm    Post subject: Reply with quote

My previous post was answer to your first solution concerning the exit bug. That fixed the problem. Just clarifying since you posted some more things to try after I posted my success. :-p

Out of sync posts can get confusing...
Back to top
View user's profile Send private message
kris



Joined: 27 Mar 2004
Posts: 1494
Location: South Pacific

PostPosted: Sun Jul 11, 2004 6:58 pm    Post subject: Reply with quote

Fu$k! Don't you just hate Beta software? Mango included ...

Thanks John, for yet another bug worked out on linux. For those who use Mango on linux ~ you should know it would not be running without the tenacious efforts of JJR
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic     Forum Index -> Mango All times are GMT - 6 Hours
Goto page Previous  1, 2, 3, 4, 5  Next
Page 4 of 5

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group