Download Reference Manual
The Developer's Library for D
About Wiki Forums Source Search Contact

Ticket #1049 (closed defect: fixed)

Opened 8 months ago

Last modified 3 months ago

Semaphore.d unittest can fail

Reported by: fawzi Assigned to: sean
Priority: normal Milestone: 0.99.8
Component: Core Functionality Version: trunk
Keywords: triage Cc:

Description

the assertion at line 339 in Semaphore.d

                assert( numComplete == numConsumers );

can fail. Indeed the semaphore is notified, and the main thread yields, but then it just does a small loop and hopes the other threads are finished. As there is no explicit synchronization it might be that the threads are still running and the test fails. adding a yield at the loop before

            for( int i = numConsumers * 10; i > 0; --i )
            {
+               Thread.yield();
                synchronized( synComplete )
                {
                    if( numComplete == numConsumers )
                        break;
                }
            }

should make failure less likely (I was not able to reproduce the error with this change). Waiting some real time (not a loop) would be even better, but still without a hard guarantee.

Fawzi

Change History

04/17/08 08:16:09 changed by fawzi

I have looked at the code more in depth and it is messy, I do not really understand what it does test, as it is not a queue (the number of consumer and products has to be equal for it to work correctly if I understood it correctly). Anyway as test it might be ok, if the probability of failure remains small.

The test assumes that all threads are started before the consumer sets allProduced=true but no thread is finished. As the semaphore.notify(); does not guarantee that the thread starts immediately this conditions might fail.

To make it more unlikely one should make the running time of the consumer long and of the producer a little bit longer for example like this:

--- tango/core/sync/Semaphore.d (revision 3432)
+++ tango/core/sync/Semaphore.d (working copy)
@@ -291,7 +291,8 @@
                     if( allProduced )
                         break;
                 }
-
+               for (int i=0;i<200;i++)
+                  Thread.yield();
                 synchronized( synConsumed )
                 {
                     ++numConsumed;
@@ -314,6 +315,8 @@
                 Thread.yield();
             }

+           for (int i=0;i<20;i++)
+               Thread.yield();
             synchronized( synProduced )
             {
                 allProduced = true;
@@ -325,21 +328,24 @@
                 Thread.yield();
             }

-            for( int i = numConsumers * 10; i > 0; --i )
+            for( int i = numConsumers * 1000; i > 0; --i )
             {
+                Thread.yield();
                 synchronized( synComplete )
                 {
                     if( numComplete == numConsumers )
                         break;
                 }
             }

             synchronized( synComplete )
             {
                 assert( numComplete == numConsumers );
             }
-            assert( numConsumed == numToProduce );
-
+            synchronized( synConsumed )
+           {
+                   assert( numConsumed == numToProduce );
+            }
             assert( !semaphore.tryWait() );
             semaphore.notify();
             assert( semaphore.tryWait() );

This changes (that also add an extra synchronization) make the failure much less likely, also under load (10 tests concurrently) I did not see any failures.

05/17/08 17:23:41 changed by larsivi

Verified to fail (most of the time after installing a composite manager) with both gdc and dmd on linux.

05/27/08 17:31:45 changed by larsivi

  • keywords set to triage.

07/10/08 06:46:31 changed by larsivi

  • milestone changed from 0.99.7 to 0.99.8.

07/20/08 04:57:11 changed by larsivi

  • priority changed from minor to normal.

#1199 is marked as a duplicate and contains additional information.

07/26/08 15:47:28 changed by sean

  • status changed from new to closed.
  • resolution set to fixed.

(In [3813]) Made a few changes to imcrease the likelihood that the unit tests will succeed. This (hopefully) closes #1049

08/04/08 05:23:51 changed by larsivi

  • status changed from closed to reopened.
  • resolution deleted.

This still occasionally fails on line 345.

09/08/08 16:46:22 changed by sean

  • status changed from reopened to closed.
  • resolution set to fixed.

(In [3931]) Lengthened timeout for unittest. Hopefully this closes #1049