Download Reference Manual
The Developer's Library for D
About Wiki Forums Source Search Contact

Simple Threads Crashing Randomly

Moderators: kris

Posted: 04/02/08 14:46:17

Hi, could anyone tell me if I've got this code right or not. It some times exits correctly and sometimes segfaults at closing of the client.

its a server which says "hello world" over tcp then the client sends "hello back", the client creates multiple threads which connect to the server, the number is specified on the client command line.

The client or the server will randomly crash, to me the crash seems to happen inside tango.core.Thread.Thread.remove(Thread) or inside memcpy somewhere, using gdb. I am using Linux 2.6.

Server: import tango.core.Thread; import tango.io.Stdout; import tango.net.ServerSocket?,

tango.net.SocketConduit?;

void main() {

auto server = new ServerSocket?(new InternetAddress?(10000), 5, true); scope(exit) delete server;

uint conncount; while (true) {

auto request = server.accept(); //scope(exit) delete request;

auto handler = new Handler(request); handler.sequence = conncount; handler.start(); conncount++; Stdout(conncount).newline;

}

}

class Handler : Thread {

static char[] data = "Hello World!!!\r\n"; uint sent; uint received; uint sequence;

SocketConduit? request; this (SocketConduit? sc) {

this.request = sc;

super(&run);

// this.isDaemon = true;

}

void run() {

try {

do {

auto lensent = request.output.write(data); //Stdout("sent: ")(lensent).newline; sent++;

char[] buf; buf.length = 1024; auto len = request.input.read(buf); if (len == IConduit.Eof)

throw new Exception("Connection lost");

buf.length = len; //Stdout("got: ")(buf).newline; received++;

//if (sent %10000 == 0) // Stdout.format("#{} sent {} times", sequence, sent).newline;

} while (true);

} catch (Exception e) {

Stdout("Caught: ")(e).newline;

}

}

}

Client: import tango.core.Thread; import tango.io.Stdout; import tango.net.ServerSocket?,

tango.net.SocketConduit?;

import Integer = tango.text.convert.Integer; import tango.time.StopWatch?;

void main(char[][] args) {

if (args.length < 2)

throw new Exception("Count not specified: "~ args[0] ~" count");

uint numthreads = Integer.parse(args[1]); uint count; while (count < numthreads) {

(new Client()).start(); count++;

} Stdout("started: ")(count).newline;

}

class Client : Thread {

SocketConduit? client; StopWatch? stopwatch; this() {

client = new SocketConduit?(); client.connect(new InternetAddress?("localhost", 10000));

super(&run);

// this.isDaemon = true;

}

void run() {

try {

stopwatch.start(); do {

char[64] response; auto len = client.input.read(response); if (len == IConduit.Eof)

throw new Exception("Connection lost");

//Stdout("client got: ")(response[0..len]).newline;

if (stopwatch.microsec > 10_000_000) { // exit loop if we've been busy for 10 seconds

break;

}

auto sentlen = client.output.write("Hello Back!"); //Stdout("sent: ")(sentlen).newline;

} while (true);

} catch(Exception e) {

//Stdout("Caught: ")(e).newline;

} //Stdout("done").newline;

}

}

Author Message

Posted: 04/03/08 14:36:27

I have also tried using the new ThreadPool? implementation for the server, with the same problem, it crashes seemingly at random points in the code.

Server:

import tango.core.Thread; import tango.core.ThreadPool?; import tango.io.Stdout; import tango.net.SocketConduit?; import tango.net.ServerSocket?; import tango.net.InternetAddress?;

struct Config_ {

uint workers=20; uint port=10000;

} Config_ Config;

void main() {

bool on = true;

void handler(SocketConduit? sc, in int sequence) {

Stdout("handling: ")(sequence).newline; int len; char[] hello = "Hello World!!!\r\n"; while (true) {

len = sc.output.write(hello);

char[] buf; buf.length = 1024; len = sc.input.read(buf); if (len == IConduit.Eof) {

Stdout("Connection lost").newline; //request.shutdown(); //sc.close(); delete sc; return;

} buf.length = len;

}

}

// create a Thread Pool auto handlerworkers = new ThreadPool?!(SocketConduit?, int)(Config.workers);

// create server socket auto server = new ServerSocket?(new InternetAddress?(Config.port));

// Wait for connection when we receive a connection, add its handler to the // handlerworkers. int count; while (on) {

auto request = server.accept();

handlerworkers.append(&handler, request, count); count++;

}

}

Posted: 04/04/08 09:56:14

Update:
Backtrace from server crash, in this case the client did not crash.

[Thread -1286501488 (LWP 10639) exited]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1219359856 (LWP 10624)]
0x08060772 in gcx.GC.mallocNoSync() ()
Current language:  auto; currently c
(gdb) bt
#0  0x08060772 in gcx.GC.mallocNoSync() ()
#1  0x0806160b in gcx.GC.malloc() ()
#2  0x0805bfb1 in gc_malloc ()
#3  0x0805b1f0 in _d_arraysetlengthiT (ti=@0x806b138, newlength=1024, 
    p=0xb7520380) at lifetime.d:693
#4  0x08049b06 in server.Handler.run() (this=@0xb7d22a00) at server.d:44
#5  0x080643a7 in thread_entryPoint ()
#6  0xb7f68462 in start_thread () from /lib/i686/libpthread.so.0
#7  0xb7ef282e in clone () from /lib/i686/libc.so.6

This is a test case for a server that my company has built which I currently use forking for, but forking wastes far too much memory, a prompt response would be greatly appreciated.

Regards
Rory McGuire

Posted: 04/04/08 10:54:09

Hi Rory,

I've notified Sean of your post, hopefully he (or someone else capable) will look into your issue.

One detail I didn't see from your posts, is this GDC or DMD, x86, x86_64 or PPC?

Posted: 04/04/08 11:35:17

Did a quick test of the server/client from post 1 - both ran without segfaulting on Linux (x86) using both DMD and GDC, using Tango trunk.

Posted: 04/04/08 12:47:35

Hi, thank you for the prompt response, the code does work sometimes.

I get it to crash by running both programs under gdb and then running the client with 10 as the argument until it crashes, usually only takes about 2 tries.

SIGSEGV seems to happen while the server threads are exiting. The backtrace I posted is for the first version of the program (without ThreadPool?).

I'm using gdc:
gdc -v
Using built-in specs.
Target: i686-pc-linux-gnu
Configured with: ../gcc-4.1.2/configure --prefix=/usr/local/staging/r24rc1/Rgcc --disable-shared --enable-languages=c,d --disable-shared
Thread model: posix
gcc version 4.1.2 20070214 (  (gdc 0.24, using dmd 1.020))

This is the backtrace for the server with ThreadPool?, with full gdb output. I started clientthreaded with: ./clientthreaded 100

[rory@localhost stabilitytests]$ gdb threadworker
GNU gdb 6.5
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...Using host libthread_db library "/lib/i686/libthread_db.so.1".

(gdb)  handle SIGUSR2 noprint
Signal        Stop      Print   Pass to program Description
SIGUSR2       No        No      Yes             User defined signal 2
(gdb)  handle SIGUSR1 noprint
Signal        Stop      Print   Pass to program Description
SIGUSR1       No        No      Yes             User defined signal 1
(gdb) r
Starting program: /home/rory/Development/test/networking/stabilitytests/threadworker 
[Thread debugging using libthread_db enabled]
[New Thread -1210083648 (LWP 14102)]
[New Thread -1211135088 (LWP 14105)]
[New Thread -1219527792 (LWP 14106)]
[New Thread -1227920496 (LWP 14107)]
[New Thread -1236313200 (LWP 14108)]
[New Thread -1244705904 (LWP 14109)]
[New Thread -1253098608 (LWP 14110)]
[New Thread -1261491312 (LWP 14111)]
[New Thread -1269884016 (LWP 14112)]
[New Thread -1278276720 (LWP 14113)]
[New Thread -1286669424 (LWP 14114)]
[New Thread -1295062128 (LWP 14115)]
[New Thread -1303454832 (LWP 14116)]
[New Thread -1311847536 (LWP 14117)]
[New Thread -1320240240 (LWP 14118)]
[New Thread -1328632944 (LWP 14119)]
[New Thread -1337025648 (LWP 14120)]
[New Thread -1345418352 (LWP 14121)]
[New Thread -1353811056 (LWP 14122)]
[New Thread -1362203760 (LWP 14123)]
[New Thread -1370596464 (LWP 14124)]
handling: 0
handling: 1
handling: 2
handling: 3
handling: 4
handling: 5
handling: 6
handling: 11
handling: 11
handling: 10
handling: 9

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1269884016 (LWP 14112)]
0x080620d2 in gcx.GC.mallocNoSync() ()
Current language:  auto; currently c
(gdb) bt
#0  0x080620d2 in gcx.GC.mallocNoSync() ()
#1  0x08062f6b in gcx.GC.malloc() ()
#2  0x0805d911 in gc_malloc ()
#3  0x0805cb50 in _d_arraysetlengthiT (ti=@0x806cc78, newlength=1024, 
    p=0xb44f130c) at lifetime.d:693
#4  0x08049d7a in threadworker.main() ()
#5  0x0804a6a5 in tango.core.ThreadPool.__T10ThreadPoolTC5tango3net13SocketConduit13SocketConduitTiZ.ThreadPool.doJob() ()
#6  0x08065d07 in thread_entryPoint ()
#7  0xb7f3f462 in start_thread () from /lib/i686/libpthread.so.0
#8  0xb7ec982e in clone () from /lib/i686/libc.so.6

Posted: 04/04/08 13:32:25

I'm still not able to reproduce your issue, I did however get a few asserts at line 147 in tango.io.Buffer meaning its invariant is invalidated. I suspect that is due to Stdout being used as it is not threadsafe. You should try tango.util.log.Trace for threadsafe logging (to rule that out).

Also, try the Zero debugger - it should be easier for you to pinpoint crashes there: http://www.zero-bugs.com

Posted: 04/04/08 13:53:03

Great, thanks again, will try your suggestions.

-Rory

Posted: 04/04/08 14:45:50

Thanks larsivi, I have started using Trace for the logging and that seems to fix the problem, I can have 6 100 thread clients trying to start, and the server does not crash on startup or shutdown. Now I can tell how Linux is handling the Threads, Great!!! Thankyou!

Is there no way that Tango could give an error instead of SEGFAULT?

-Rory

Posted: 04/04/08 18:22:21

Not easily. Stdout would have to be modified to detect that it was being used concurrently with no locking in place, at least insofar as detecting this error is concerned.

Posted: 04/05/08 08:32:23

Right, well perhaps Tango could just have a HUGE warning which says which modules are not thread safe?

-Rory