[[TOC()]]
= Threads Introduction =

== A Word of Warning ==

There is an awful lot of confusion and hype about threads.  The enthusiasm for threads is largely driven by the growth of multi-core hardware, while the headaches are a natural consequence of the difficulties posed by threaded programs.  Throughout this article, it is important to remember that though the basic concept of a thread is quite simple, threaded programming is very hard.  Always keep in mind that just because solution to a thread problem seems obvious, you must remain skeptical.  The best practice for threaded programming is relentless scrutiny and careful testing - do not rely solely on your intuition, since it is very often wrong.

== Why Threads ==

Threads provide two essential services:  parallelism and concurrency.  Of the two, parallelism is by far simpler, and it makes for the best starting place.  The basic idea behind parallelism is that sometimes we can get stuff done faster by doing more things at once.  We do this all the time in our day-to-day life.  As an example, say you are cooking a meal of fish and rice.  You are very hungry so you want to eat as soon as possible.  Therefore, you start the rice pot and put the fish in the oven so both will get cooked at the same time.  This is far faster than first cooking fish, then cooking the rice:
{{{
#!d
<----Cook Fish----> + <-----Cook Rice-----> 

	-- or --

<----Cook Fish---->
<-----Cook Rice----->
}}}

This basic strategy also works in programming.  Suppose you want to sum two large arrays of integers.  Here is an obvious first attempt:
{{{
#!d
void sum(inout int[] dest, int[] a, int[] b)
{
	for(int i=0; i<a.length; i++)
	{
		dest[i] = a[i] + b[i];
	}
}
}}}

Of course, it really doesn't matter what order we sum the integers in.  We could just as easily write something like this as well:
{{{
#!d
void sum(inout int[] dest, int[] a, int[] b)
{
	// Iterate in reverse order
	for(int i=a.length-1; i>=0; i--)
	{
		dest[i] = a[i] + b[i];
	}
}
}}}

In fact, we could even sum up each integer at the same time.  Here is how we could do it with threads:
{{{
#!d
void sum(inout int[] dest, int[] a, int[] b)
{
	Thread first_half = new Thread(
	{
		for(int i=0; i<a.length / 2; i++)
		{
			dest[i] = a[i] + b[i];
		}
	});

	Thread second_half = new Thread(
	{
		for(int i=a.length / 2; i<a.length; i++)
		{
			dest[i] = a[i] + b[i];
		}
	});

	//Sum the arrays
	first_half.start();
	second_half.start();

	//Wait for threads to complete
	first_half.join();
	second_half.join();
}
}}}

In theory, this version should run twice as fast as either of the first two examples.  This is because we are summing both halves of the array at the same time.  In practice however you will probably need a multi core system to see any noticeable difference.  Even then, there are still some costs associated with creating and starting a thread which will affect the overall performance of this method.

Parallelism works here because all of the data is independent.  Now suppose that sum looked like this:

{{{
#!d
void sum(inout int[] dest, int[] a, int[] b)
{
	for(int i=1; i<a.length; i++)
	{
		dest[i] = dest[i-1] + a[i] + b[i];
	}
}
}}}

In this case the order does matter, and it is no longer easily parallelizable.  In general, whenever you can think of some way to split a problem into a series of independent sub-problems, you can parallelize without too much trouble.

Of course, parallelism only the first part of threading.  The hard part is concurrency.  It is very easy to confuse these two concepts, so pay close attention.  Concurrency occurs when a program contains multiple states which communicate.  Throughout our day we run into problems with concurrency all the time.  When we need to communicate or coordinate with other people, the basic principles of concurrency are there working to make our lives difficult.  


A familiar concurrent activity is juggling.  While juggling, each ball is in one of three states, held, thrown or falling.  To keep the balls moving, each of your hands needs to be in one of several states.  Either you are waiting to catch a ball, holding a ball or throwing a ball into the air.  To make things even more complicated, you need to move your left and right hands independent of each other, yet still coordinate their movements.

{{{
   *      |
   ^      |
   |      v
   |      *

   \*/   \_/

[Note:  This picture sucks.  Need to find something better.]
}}}

Virtually anyone with hands can juggle one ball.  Two is a bit more difficult, but still not too bad.  Three is impossible without practice.  As the number of balls increases, so does the complexity of the hand motions.  The deft manipulations required to keep many objects falling through the air constantly are very impressive.  Each ball is moving in its own direction, and the juggler must watch them all.

Of course, if we want to carry multiple balls there are easier ways to do it than juggling.  We could simply carry them one at a time and be done with it.  It may not be as flashy, and it may not be as fast, but it is definitely  safer.  This often the case with concurrency in programs.

In another everyday example, suppose you are taking two classes - one on mathematics, and the other on chemistry.  Each class has a textbook which you must read by some time in the near future.  One solution is to simply read the mathematics textbook all the way through first, then finish the chemistry text.  The problem with this is that you might have outstanding assignments in both classes, so you would miss some homework and perhaps receive a bad grade in chemistry.

{{{
<-----Math--|--> <------Chemistry----->
            |
  Chemistry Homework Due
}}}

You could also try the opposite, read your chemistry then your mathematics, but then you would be forgetting your math homework.

{{{
<----Chemistry--|-><-----Math----->
                |
         Math Homework Due
}}}

Neither of these situations are desirable.  In the cooking example, we were able to solve our problem by doing two things simultaneously.  Here that isn't an option (unless you can somehow figure out how to use one eye to read the chemistry book and one the other to read the math book.)  The solution that most people use is to split up the reading into parts.  Typically your early homework isn't going to require you to completely read the text book.  So you read say 20 pages of chemistry, then 20 pages of math until you are done.  You must be able to remember your current page in each book in order for this trick to work, but fortunately this is not too difficult in this case.  Your new reading schedule now looks something like this:

{{{
<-Chem-><-Math->|<-Chem-><-Math-> ...
                |
          Homework Due
}}}

You are effectively reading the two texts concurrently - but not in parallel.  This type of concurrency is usually very helpful for structuring programs.  Unlike the juggling example, here we are not going out of our way to make things more complicated.  Programming concurrent programs with threads is a bit like juggling, while this second example is like something completely different: Fibers.



== Using Fibers ==

Fibers are like threads in that they execute some function independently of the rest of the program.  The key difference is that fibers are not parallel.  You get to control when and where they are executed.  Here is an example:

{{{
#!d
private import
	tango.core.Thread,
	tango.io.Stdout;

void main()
{
	//Create a new fiber using the inline delegate syntax.
	Fiber f = new Fiber(
	{
		Stdout.print("-- I am here").newline;
		Fiber.yield();
		Stdout.print("-- Now I am here!").newline;
	});

	Stdout.print("Running f once").newline;
	f.call();
	Stdout.print("Running f again").newline;
	f.call();
}
}}}

This is basically a hello world program.  When you run it, it will print out the following:

{{{
Running f once
-- I am here
Running f again
-- Now I am here
}}}

For those who are well versed in functional programming, a Fiber is sort of like "call-with-current-continuation".  When you execute the fiber, it runs until it yields.  The next time you run it, it will pick up right where it left off.  Here is another more sophisticated example:

{{{
#!d
void main()
{
	Fiber f = new Fiber(
	{
		for(int i=1; i<10000; i++)
		{
			Stdout.print("i = ", i).newline;
			Fiber.yield();
		}
	});

	Stdout("First call").newline;
	f.call();
	Stdout("Second call").newline;
	f.call();
	Stdout("Third call").newline;
	f.call();
}
}}}

Which will print out:

{{{
First call
i = 1
Second call
i = 2
Third call
i = 3
}}}

At first this may seem like a strange concept, but there really isn't much to it.  If it seems almost too simple, you've probably figured it out.  

== A Case Study: Enumeration ==

Suppose you have a set of objects, like a list, and you want to iterate over all of them.  How do you do it?

This seemingly vague question has at least a hundred different answers in each programming languages.  Classically speaking,   we call this "enumeration".  Loops and recursion are just one way we can enumerate things, but there are always others.  If we want to think of things in object oriented programming, we could rephrase this problem using design patterns.  What design pattern do you use to traverse the elements of a container?  There are two classic patterns to solve this problem, iterators and visitors.  Of the two, iterators work more like loops and visitors work more like recursion.  We shall look at both of these methods separately before examining an even better method based on concurrency.  But first, let's review:

Iterators are by far the most common of the three strategies we are going to examine.  C++'s STL and Java's container libraries widely employ this technique.  Going back to the list example, suppose you wish to provide code to enumerate all the elements in the list sequentially.  You might write something like the following:

{{{
#!d
class Node
{
	Node next;
	int value;
}

public class ListIterator
{
	private Node cur;

	this(Node n)
	{	cur = n;
	}

	public bool done()
	{	return cur !is null;
	}

	public void next()
	{	
		if(!done)
		{
			cur = cur.next;
		}
	}

	public int value()
	{	
		if(done)
		{	
			return -1;
		}

		return cur.value;
	}
}

public class List
{
	private Node head;

	public ListIterator getIterator()
	{
		return new ListIterator(head);
	}

	...
}
}}}

In this situation, we provide one simple interface for performing all sorts of operations on a list.  This is good object oriented design, and it works well in many situations.  A common problem is to test if a given container has some object inside it.  To do this with a linked list, you would typically iterate over every element in the list until you either reached the end of the container or found what you were looking for.  Using the iterator we just wrote, this is simple:

{{{
#!d
// Tests if the list l contains an element with value n.
bool contains(List l, int n)
{
	for(ListIterator it = l.getIterator; !it.done; it.next)
	{
		if(it.value == n)
		{
			return true;	// found it - no need to continue iteration
		}
	}

	return false;	//Not in the list.
}
}}}

Of course, iterators aren't perfect.  While their interface is very clean, the code itself is usually very messy.  In this example, the choice of a linked list was quite deliberate.  What if we wanted to iterate over something more complicated?  Suppose you had a binary tree, and you wanted to do an inorder traversal of each node, how would you write an inorder-tree-iterator?  This is a very difficult problem, and there aren't really any good solutions.

Fortunately, this is where the second design pattern comes in.  Visitors are a nice way to travel through the elements of a container without writing complicated and inefficient iterator code.  This is a very common technique in the functional programming world, where it is often known as iteration-by-first-class functions.  The basic ideas is that we pass a function pointer to the container object, and then recursively apply the function to each element in the container.  To see how it works, let's write a binary tree inorder traversal using visitors:

{{{
#!d
public class BinaryTree
{
	private BinaryTree left, right;
	private int value;

	public void inorderTraversal(void delegate(int) visitor)
	{
		//Visit left tree
		if(left !is null)
			left.inorderTraversal(visitor);

		//Apply the visitor
		visitor(value);

		//Visit right tree
		if(right !is null)
			right.inorderTraversal(visitor);
	}

	...
}
}}}

And to see how this works, here is a simple test function to sum all the elements in the tree:

{{{
#!d
void sum_tree(BinaryTree tree)
{
	int s = 0;

	tree.inorderTraversal((int v) { s += v; });

	return s;
}
}}}

This code for the traversal is very simple, but as before it comes at a price.  While our implementation has gotten simpler, the interface is much more limited than before.  What if we took tried to implement "contains" as we did for the list, only this time we will operate on trees?  Let's take a look at the results (and it won't be pretty...)

{{{
#!d
bool contains(BinaryTree tree, int n)
{
	bool r = false;

	tree.inorderTraversal((int v)
	{
		if(n == v)
			r = true;
	});

	return r;
}
}}}

This is not very efficient at all.  If n occurs early during the traversal, it is impossible to abort.  The dumb computer is stuck visiting each node in tree, even though it is a total waste of time.  Of course, the visitor could be modified to support an early out type escape code, but it doesn't really solve the underlying problem.  What if you wanted to skip the first 3 elements of the tree?  Or what if you wanted to only check every other element?  The number of contingency circumstances and additional edge cases just stack up.

Of these two patterns, neither is perfect.  Iterators suffer from a clumsy implementation, while the visitors have a terrible interface.  If we look for the source of the problem, we can see why this is happening.  In the iterators, it is the user that is is in charge, while with visitors it is the container.  Here is a diagram which illustrates the chain of execution:

{{{
Iterators:

Client ---->        resume   +---Client---->
            \               /
        call \             / return
              *---Server--*


Visitors:

             *---Client---*
       call /              \ return
           /                \
--Server-->          resume  +--Server---->
}}}

Whoever gets control of the stack, gets to write clean code.  The other guy is stuck hacking out horrible state machines and fake stacks to get the job done.  The solution to this problem is to simply give both sides their own stack - that way everyone gets to write clean code.  With Fibers, this is possible.  Let's look at that tree example one more time:

{{{
#!d

//to be written...

}}}


== Thread Based Concurrency ==

Concurrency can be the source of various problems.

=== Race conditions ===

If a counter is modified from more than one thread, problems can occur. The problem is, that most actions are not atomic, they can be interrupted by a thread switch. E.g. a incrementing a counter means loading the value from memory into a cpu register, the value is incremented and stored back into memory. If another thread is run, in time the first one modifies the value, the interrupting thread can also load/modify/store the value. Back to the first thread, it will overwrite the value in memory. So the modification of the second thread is lost.

What makes this problem hard is, it will occur very seldom. It is only a very short time slice where the one thread interrupting the other will do this data corruption.

=== Dead locks ===
If locking mechanisms are used to protect concurrency problems this can yield the problem of dead locks. A dead lock is a scenario where on lock wait for another lock, which waits on the first. The dead lock scenario can be very hard to find out what happends and why some threads are locked.

== Synchronization ==

The D programming language support the concept of the keyword {{{synchronized}}}. Sometimes this is not sufficient to solved programming problem. For these kind of problems see the package {{{tango.core.sync}}}.

=== Mutex ===

'''Docs:''' [[docs(Mutex,tango.core.sync.Mutex.html)]]

A mutex is a lock with boolean state. The keyword {{{synchronized}}} can do a mutual exclusion for parts of code on a given object (an object reference is used as a mutex), '''but''' the lock cannot be taken or released manually.

Mutex offers the {{{lock()}}}, {{{unlock()}}} and a {{{tryLock()}}} method. 

A Mutex can also be used with the synchronized keyword.
{{{
#!d
{
    mutex.lock();
    scope(exit) mutex.unlock();
    // ...
}
}}}
is equivalent to this syntax
{{{
#!d
synchronized(mutex){
    // ...
}
}}}

'''Note:''' On Windows this implementation used {{{CriticalSection}}} on Posix {{{pthread_mutex}}} is used.


=== Semaphore ===

'''Docs:''' [[docs(Semaphore,tango.core.sync.Semaphore.html)]] 

What is is?[[br]]
See [http://en.wikipedia.org/wiki/Semaphore_%28programming%29]

In general, a semaphore is a mutex, but it is not a boolean variable, instead it is an integer counter. The semaphore can be initiailized with a number of "permits". 
A call to {{{wait()}}} decrements the permit counter if it is greater than zero. If the counter is already zero (no permits available), the call to {{{wait()}}} will block until a permit will be available. This is equivalent to the {{{lock()}}} call on the Mutex.

Releasing a permit is done with the {{{notify()}}} methods. The call to {{{notify()}}} does not block. If there are threads waiting for a permit, one of them (blocking in the wait() call) will be awaken. The {{{notify()}}} is equivalent to {{{unlock()}}} in case of Mutex.

There are also the variants for {{{wait(Interval)}}} with timeout and {{{tryWait()}}} which does never block, returning if a permit was taken or not.

=== Condition ===
'''Docs:''' [[docs(Condition,tango.core.sync.Condition.html)]] 

A condition variable is a mechanism that allows the testing and signalling of a condition from within a synchronized block or while holding a mutex lock.

The condition variable is associated with a mutex.
{{{
#!d
import tango.core.sync.Mutex;
import tango.core.sync.Condition;

Mutex m = new Mutex;
Condition c = new Condition(m);
}}}

{{{
#!d
synchronized(m) {
    (new Thread({
                    doSomething(); // Executes while wait() blocks
                    c.notify(); // Causes one (in this case, the only) caller of wait() to return
                    // ...
                })).start();
    c.wait(); // atomically m.unlock(), block until notify or notifyAll, m.lock() before returning.   
}
}}}
The wait is blocked until {{{c.notify()}}}, or {{{c.notifyAll()}}} is called.

=== Barrier ===
'''Docs:'''  [[docs(Barrier,tango.core.sync.Barrier.html)]] 

A barrier across which threads may only travel in groups of a specific size.

=== !ReadWriteMutex ===
'''Docs:'''  [[docs(ReadWriteMutex,tango.core.sync.ReadWriteMutex.html)]] 

A primitive for maintaining shared read access and mutually exclusive write access.

== Thread Local Storage (TLS) ==

'''Docs:''' [[docs(tango.core.Thread,tango.core.Thread.html)]]

What is TLS?[[br]]
See [http://en.wikipedia.org/wiki/Thread-local_storage]

The support in Thread is untyped and offers these functions:
{{{
#!d
class Thread{
    // ...
    static uint createLocal(){ ... }
    static void deleteLocal( uint key ){ ... }
    static void* getLocal( uint key ){ ... }
    static void* setLocal( uint key, void* val ){ ... }
    // ...
}
}}}

First, it is neccessary to create a storage with {{{createLocal()}}}. The result is the key, which can be used in every thread to access the thread local storage. With creation this storage is available in every thread, but it is empty, holding a {{{null}}} value.

From now on, every thread can use {{{Thread.setLocal(key,val)}}} to store untyped pointers ({{{void*}}}) and retrieve them with {{{Thread.getLocal(key)}}}. Each thread stores it own value.

For releasing the resources used for TLS, call {{{Thread.deleteLocal(key)}}}.

There is also another way to use TLS, by using the {{{ThreadLocal(T)}}} class template. With this class, TLS can be used in a typed manner.
{{{
#!d
import tango.core.Thread;

alias ThreadLocal!(TLSData) MyData;

MyData data = new MyData( getSomeInitialTLSData() );

// retrieve
TLSData d = data.val();
// store
data.val( getSomeTLSData() );
}}}

'''Note:''' Only one instance of {{{ThreadLocal}}} is used for all threads.


== !ThreadGroup ==
'''Doc:'''  [[docs(ThreadGroup,tango.core.Thread.html)]] 

A !ThreadGroup is a simple container for threads. Beside of {{{add()}}}, {{{remove()}}}, {{{create()}}}, it offers {{{opApply()}}} to make a foreach over all threads and also has a {{{joinAll()}}} methods.

{{{
#!d
auto tg = new ThreadGroup();
tg.create({
    // work load for first thread
});
tg.create({
    // work load for second thread
});
tg.joinAll();
}}}