Download Reference Manual
The Developer's Library for D
About Wiki Forums Source Search Contact

Changes from Version 1 of ChapterStorageCollection

Show
Ignore:
Author:
keinfarbton (IP: 85.180.21.52)
Timestamp:
10/06/08 19:49:21 (15 years ago)
Comment:

Copied collection wiki from ChapterStorage, befor starting the collection to container rework

Legend:

Unmodified
Added
Removed
Modified
  • ChapterStorageCollection

    v0 v1  
     1[[TOC()]] 
     2= The collection classes = 
     3Tango provides a number of collection classes, sometimes also called container classes. 
     4In tango they are located in the '''tango.util.collection''' package, 
     5which is home to a variety of templated containers. 
     6 
     7Collection classes store ''things'', and do so in different ways. 
     8For example, the user can choose between a list and an array to store the same 
     9information - each storage mechanism has its own pros and cons. 
     10The collection classes are templates, which means they need a type parameter specified 
     11for the ''things'' to  be stored. 
     12 
     13It is good practice to define an alias for the type one wants to use 
     14 
     15{{{ 
     16#!d 
     17alias LinkSeq!(int) MyIntList; 
     18 
     19auto list = new MyIntList; 
     20list.append(1); 
     21}}} 
     22 
     23== Types of containers == 
     24 
     25The following table is an overview of the existing collections. The first column list the kind of storage, the column heading list the kind of implementations: 
     26 
     27||            || '''Array''' || '''Link''' || '''Tree''' || '''Hash''' ||  
     28|| Bag        || [[docs(ArrayBag,tango.util.collection.ArrayBag.html,)]]    ||            || [[docs(TreeBag,tango.util.collection.TreeBag.html,)]]    ||            || 
     29|| Set        ||             ||            ||            || [[docs(HashSet,tango.util.collection.HashSet.html,)]]   || 
     30|| Sequence   || [[docs(ArraySeq,tango.util.collection.ArraySeq.html,)]]   || [[docs(LinkSeq,tango.util.collection.LinkSeq.html,)]], [[docs(CircularSeq,tango.util.collection.CircularSeq.html,)]] ||  ||            || 
     31|| Map        ||             || [[docs(LinkMap,tango.util.collection.LinkMap.html,)]]    || [[docs(TreeMap,tango.util.collection.TreeMap.html,)]]   || [[docs(HashMap,tango.util.collection.HashMap.html,)]]   || 
     32 
     33 
     34'''Bag''' 
     35  This is the simplest container. Throw everything in.  
     36  The sequence of elements is not defined.  
     37 
     38'''Set''' 
     39  In a set there are no duplicates. The distinction between Set and Bag is that the latter may have duplicates. 
     40 
     41'''Sequence''' 
     42  A container that will store the elements in the sequence they were added. 
     43 
     44'''Map''' 
     45  An associative container. It has a set of keys. Each key is unique and is mapped to 
     46  a value. Both the key and the value type are template parameters. 
     47 
     48'''Arrays''' 
     49  They can access their members very fast and without the need for any special sequence. 
     50  But inserting an element sometimes requires the array to be resized. Resizing the array  
     51  is very slow. The array then needs to be reallocated and all data must be copied over. 
     52 
     53'''Linked lists''' 
     54  These containers increase their size with constant cost.  
     55  Whether prepending, inserting, or appending, 
     56  every element is allocated and linked in constant time.  
     57  The drawback is that finding an element in the middle of a linked list 
     58  requires iteration over the list. For big data collections, this is very slow. 
     59 
     60'''Trees''' 
     61  Containers that use a binary tree to access their members.  
     62  The cost for insertion and the retrieval is low. 
     63 
     64'''Hashes''' 
     65  Hashes are the fastest container in general, but they need much memory.  
     66  Also the elements need to support a good '''toHash()''' implementation. 
     67 
     68== Model and Method Overview == 
     69 * [http://www.dsource.org/projects/tango/browser/trunk/doc/images/collection.png?format=raw Relationships between the collection classes] 
     70 * [http://www.dsource.org/projects/tango/browser/trunk/doc/images/methods.png?format=raw Methods exposed by the collection classes] 
     71 
     72== Create collections == 
     73 
     74The easiest form is to instantiate directly and use an auto variable to hold the reference. 
     75{{{ 
     76#!d 
     77    auto list2 = new LinkSeq!( int ); 
     78}}} 
     79 
     80In most cases a collection type is use more than once. 
     81So it is good to have an alias. 
     82{{{ 
     83#!d 
     84    alias LinkSeq!( char[] ) StrList; 
     85    StrList list1 = new StrList; 
     86}}} 
     87 
     88== Add and remove elements == 
     89 
     90Each container type has it own set of methods. 
     91{{{ 
     92#!d 
     93    list1.append( "The first item in list1" ); 
     94    list2.prepend( 2 ); 
     95}}} 
     96See the [http://www.dsource.org/projects/tango/browser/trunk/doc/images/methods.png?format=raw Methods exposed by the collection classes] to get a good overview. 
     97 
     98 
     99== Visiting contained elements == 
     100 
     101The typical, and simplest pattern is 
     102 
     103{{{ 
     104#!d 
     105    foreach (key, value; map) 
     106             Stdout.formatln ("Key: {0}, Value: {1}", key, value); 
     107}}} 
     108 
     109One variation is using only 'value' instead of both 'key' and 'value'. 
     110Both of these utilize a slightly more efficient mechanism to visit each contained element (or element pair), 
     111as they can avoid heap activity when instantiating an '''Iterator'''. 
     112 
     113Collections also expose iterators, supporting both ''foreach'' and a familiar '''more()'''/'''next()''' pattern. 
     114Each container exposes method(s) to instantiate an appropriate '''Iterator''' 
     115   
     116{{{ 
     117#!d 
     118    auto it = map.keys(); 
     119 
     120    // The call to .more returns true if there are more elements, 
     121    // and switches the iterator to the next one if available. 
     122    while(it.more){ 
     123          char[] key;  
     124          auto value = it.get(key); 
     125          Stdout.formatln ("Key: {0}, Value:{1}", key, value); 
     126    } 
     127}}} 
     128 
     129For associative containers, the iterator's '''get''' method has an out parameter for key retrival. 
     130 
     131The '''keys()''' method is used to also get a set of the key values. For the 'normal' containers, like '''!LinkSeq''' etc. the '''elements()''' method is used to get a conventional iterator. Each container also supports ''foreach()'' directly. 
     132 
     133=== Asynchronous mutation during iteration === 
     134 
     135Iterators are sensitive to container changes whilst they operate, and will throw an exception when this occurs. This it to ensure the client sees a consistent view of the container for the lifetime of the iterator. 
     136 
     137=== Caution with iterators === 
     138 
     139It is currently not possible to modify the container while iterating over it with ''foreach'' or an iterator. To delete certain elements from a container it is necessary to do it in two steps. First find the elements to delete, then do the deletion 
     140 
     141{{{ 
     142#!d 
     143// 1. Build a list of elements to delete 
     144auto dellist = new LinkSeq!(int); 
     145foreach( int i; container ){ 
     146  if( i >= 3 && i < 7 ){ 
     147    // container.remove( i ); /* NOT POSSIBLE */ 
     148    dellist.append( i ); 
     149  } 
     150} 
     151// 2. Iterate over this deletion list and 
     152// delete the items in the original container. 
     153foreach( int i; dellist ){ 
     154    container.remove( i ); 
     155} 
     156}}} 
     157 
     158This may be rectified in the future.  
     159 
     160== !InterleavingIterator == 
     161 
     162This more complex ''Iterator'' shows why iterators are still useful, even if we have the ''foreach'' construct. 
     163 
     164'''[[docs(InterleavingIterators, tango.util.collection.iterator.InterleavingIterator.html,)]]''' allow you to combine the elements of two different enumerations as if they were one enumeration before they are seen by their ''consumers''. 
     165This sometimes allows one to avoid use of a collection object, to temporarily combine two sets of Collection '''elements()''' that need to be collected together for common processing. 
     166 
     167The elements are revealed (via '''get()''') in a purely interleaved fashion, alternating between the first and second enumerations unless one of them has been exhausted, in which case all remaining elements of the other are revealed until it too is exhausted.  
     168 
     169{{{ 
     170#!d 
     171import tango.util.collection.ArrayBag; 
     172import tango.util.collection.iterator.InterleavingIterator; 
     173 
     174void testInterleavingIterator(){ 
     175    int[] result; 
     176 
     177    // Create and fill the containers 
     178    auto l1 = new LinkSeq!(int); 
     179    auto l2 = new ArraySeq!(int); 
     180    for( int i = 0; i < 6; i+=2 ){ 
     181        l1.append( i ); 
     182        l2.append( i+1 ); 
     183    } 
     184 
     185    // define the InterleavingIterator 
     186    auto it = new InterleavingIterator!(int)(l1.elements, l2.elements); 
     187 
     188    // use the InterleavingIterator to iterate over the container 
     189    while (it.more) 
     190        result ~= it.get; 
     191 
     192    // test the result 
     193    assert(result == [0, 1, 2, 3, 4, 5], "testInterleavingIterator"); 
     194} 
     195 
     196}}} 
     197 
     198== !FilteringIterator == 
     199 
     200'''[[docs(FilteringIterators,tango.util.collection.iterator.FilteringIterator.html,)]]''' allow you to filter out elements from other enumerations before they are seen by their ''consumers'' (i.e. the callers of '''get'''). 
     201 
     202{{{ 
     203#!d 
     204import tango.util.collection.LinkSeq; 
     205import tango.util.collection.ArrayBag; 
     206import tango.util.collection.iterator.FilteringIterator; 
     207 
     208void testFilteringIterator(){ 
     209    int[] result; 
     210    alias ArrayBag!(int) IntBag; 
     211 
     212    // Create and fill the container 
     213    auto ib = new IntBag; 
     214    for( int i = 0; i < 20; i++ ){ 
     215        ib.add( i ); 
     216    } 
     217 
     218    // define the FilteringIterator with a function literal 
     219    auto it = new FilteringIterator!(int)( ib.elements(), (int i){  
     220        return i >= 3 && i < 7;  
     221    }); 
     222 
     223    // use the FilteringIterator to iterate over the container 
     224    while (it.more()) 
     225        result ~= it.get(); 
     226 
     227    // test the result 
     228    assert( result == [ 3, 4, 5, 6 ], "testFilteringIterator" ); 
     229} 
     230}}} 
     231 
     232== Comparators == 
     233 
     234Sorted containers like trees, need to compare the elements. The algorithm that is used for this comparison can be user specified. E.g. strings can be sorted in different ways. If you want a bag to have its addIf using your compare method, this can be done like this 
     235 
     236{{{ 
     237#!d 
     238    // Create a bag that is not case sensitive 
     239    auto nameSet = new TreeBag!(char[])( null,  
     240      new class() Comparator!(char[]){ 
     241        int compare( char[] first, char[] second ){ 
     242            return Ascii.icompare( first, second ); 
     243        } 
     244    }); 
     245 
     246    nameSet.addIf( "Alice" ); 
     247    nameSet.addIf( "Bob" ); 
     248    nameSet.addIf( "aliCe" ); // will not be added 
     249}}} 
     250 
     251== Screeners == 
     252 
     253Each container has the possibility to have a predicate as a so called screener. The definition can be found in the '''Collection''' class. 
     254 
     255{{{ 
     256#!d 
     257alias bool delegate(T) Predicate; 
     258}}} 
     259 
     260This delegate will be called, each time a new element is to be inserted into the container. If the screener returns false, the container throws an '''!IllegalElementException''' to the caller. 
     261 
     262{{{ 
     263#!d 
     264    // Create a restricted container 
     265    auto ratioSamples = new ArraySeq!(float)( (float v){ 
     266        // restrict element to the range 0.0 .. 0.99999.. 
     267        return v >= 0.0f && v < 1.0f; 
     268    }); 
     269 
     270    // try to insert some values 
     271    try{ 
     272        ratioSamples.append( 0.0f );      // OK 
     273        ratioSamples.append( 0.5f );      // OK 
     274        ratioSamples.append( 0.99999f );  // OK 
     275        ratioSamples.append( 1.0f );      // Bang 
     276        // will never get here 
     277        assert( false ); 
     278    } catch( IllegalElementException e ){ 
     279    } 
     280}}} 
     281 
     282To test if an element is allowed for a certain container, use the {{{bool allows( T )}}} method, which is part of the Collection class. 
     283 
     284 
     285== Bag and Set == 
     286 
     287[[docs(Bags,tango.util.collection.model.Bag.html,)]] are collections supporting multiple occurrences of elements. 
     288[[docs(Sets,tango.util.collection.model.Set.html,)]] do only support unique elements. 
     289 
     290In both, the sequence of elements is not defined. An exception is !TreeBag, there the elements are ordered in their natural way. Or the user supplies a Comparator object. 
     291 
     292Both '''bags''' support the '''addIf()''' method. 
     293So they can be used like '''sets'''. 
     294 
     295[[docs(HashSet,tango.util.collection.model.HashSet.html,)]] uses a hash algorithm to find potential duplicates very fast, but this algorithm can use a lot of memory. 
     296 
     297 
     298== Sequence, the ordered storage == 
     299Sequences are indexed, sequentially ordered collections. Indices are always in the range 0 .. size() -1. All accesses by index are checked, raising exceptions if the index falls out of range. 
     300 
     301The '''elements()''' enumeration for all sequences is guaranteed to be traversed (via '''more()/get()''') in sequential order. 
     302 
     303 * [[docs(Seq,tango.util.collection.model.Seq.html,)]] is an interface for all sequences 
     304 * [[docs(ArraySeq,tango.util.collection.ArraySeq.html,)]] implementing class, uses an array. Best used when the storage size is not changed too often, or a fast element access per index is needed. 
     305 * [[docs(LinkSeq,tango.util.collection.LinkSeq.html,)]] implementing class, uses element nodes which are allocated dynamically and linked with each other. This implementation can easy append elements and remove elements from the head (take()). Making random access to elements per index is slow, because it is neccessary to follow the links from the beginning of the storage. 
     306 * [[docs(CircularSeq,tango.util.collection.CircularSeq.html,)]] is similar to !LinkSeq. But this class uses double linked cells. So all operations are equal in operation, nevertheless the are started from the beginning or end of the list. 
     307 
     308== Map, the associative Storage == 
     309[[docs(Maps,tango.util.collection.model.Map.html,)]] maintain keyed elements. Any kind of Object or primitive type may serve as a key for an element. Each key is unique and is associated with one value. 
     310 
     311{{{ 
     312#!d 
     313    import tango.store.HashMap; 
     314    // ... 
     315    alias HashMapT!(char[], char[]) StrStrMap; 
     316 
     317    auto map = new StrStrMap(); 
     318    map.add("key1", "value1"); 
     319    char[] key = "key1"; 
     320    Stdout.format ("Key: {0}, Value:{1}", key, map.get(key)).newline; 
     321}}} 
     322 
     323= Hash vs Tree = 
     324'''Sets''' and '''Maps''' are available with tree or hash implementation. When should one choose which?  
     325 
     326Using a tree means it is necessary to navigate a tree structure to find an element. The number of stages to traverse is around O(log2(n)).  
     327 
     328When using a hash, an array is used to store a given number of buckets. A hash function is used to calculate a hash from the object value. A hash is a n-bit integer value, which should be mostly unique for each object. The same content of an object must always calculate the same hash value. This value is used when the bucket to store or retrieve the element is chosen.  
     329 
     330In the best case, this algorithm can find the element directly (if the bucket contains exactly one element). That would be access in constant time or O(1). But if the bucket holds many elements, then (depending on the bucket implementation) this can lead to a linear search. In a good hash this should not happen. To ensure the buckets are not too full, the array for them should be chosen big. This means, hash needs much more memory than tree and a good quality hash function for the stored object, to be efficient. 
     331 
     332= D builtin arrays and hashes vs. !ArraySeq and !HashMap = 
     333 
     334D already supports builtin dynamic arrays and hashes.  
     335Why should one use a template container for these container types? 
     336 
     337There is nothing against the builtin arrays/hashes, use them if they have all you need. 
     338Be aware of the additional features the tango container !ArraySeq/!HashMap has: 
     339 * A screener function to restrict the inserted elements to a configurable set. 
     340 * Iterators that can implement various behaviour. 
     341 * the tango collections are compatible between each other,  
     342   because they implement common interfaces.  
     343   So it is theoretical possible to exchange a !ArraySeq container with a !LinkSeq container, 
     344   if you only use the methods from View. 
     345 
     346= In Review = 
     347 
     348Don't reinvent the wheel, use containers where ever possible. 
     349 
     350Choose the appropriate container class for you needs. 
     351 
     352== Suggestion for this documentation == 
     353 * Show the use of procedure 
     354 
     355== References == 
     356 * For further readings on hash tables, see this [http://en.wikipedia.org/wiki/Hash_table Wikipedia-article] 
     357 
     358== Illustrations == 
     359 
     360[[Image(source:/trunk/doc/images/collection.png)]] 
     361 
     362 
     363 
     364[[Image(source:/trunk/doc/images/methods.png)]] 
     365 
     366== User Comments == 
     367 
     368[[EmbedReplies(DocComments,ChapterStorage)]]