Download Reference Manual
The Developer's Library for D
About Wiki Forums Source Search Contact

Changes between Version 38 and Version 39 of ChapterClustering

larsivi (IP:
02/10/08 23:10:03 (11 years ago)



  • ChapterClustering

    v38 v39  
    1 [wiki:ClusteringComments Leave Comments, Critiques, and Suggestions Here] 
    32= Tango Clusters = 
    5150This embedded QOS is geared towards high efficiency, and makes a point of avoiding any heap activity in all but the one case where it becomes a necessity (cache hosting). It leverages both TCP/IP and multicast for transmission purposes, and disk files for queue storage. The embedded QOS is affectionately known as ''Tina'' and, to use it, an application would import '''' 
    53 = Key Notions
     52== Key Notions =
    5554As with any toolkit, there are a few key ideas to become familiar with in order the get the best out of it. There are only a few to deal with and we’ll address these in this section, beginning with the ''channel'': 
    57 == Channel == 
     56=== Channel === 
    5958Ever used a publish/subscribe system? If so, the notion of a channel will likely be familiar. It is a named entity through which messages are transported and delivered. Clustered applications subscribe (or listen) to a channel, and publish (or write) on a channel. Without a channel, there is no way in which to communicate with the cluster. You obtain one by asking the QOS to create one on your behalf, and then utilize it from that point forward. In reality, various utility classes exposed by the model will perform channel creation in the background for you. However, you are free to utilize a channel directly if the need arises. 
    6362When a message is sent on a channel of a given name, only those listening on the same channel are candidates to receive that message. The channel name must be identical in both places for communication to occur: this provides the basis for segregating different types of messages for different purposes. In practice, we’ve found it highly convenient to use dot-notation for channel names – '''' for example – and to use channels to differentiate between different data types, or aggregates thereof. In fact, channels are a good way of representing a class in the D programming language – one channel for each distributed class – which nicely segregates differing content from one another and notably simplifies the transmission of aggregate data across the cluster itself. This is a good point at which to segue into ''message'': 
    65 == Message == 
     64=== Message === 
    6766Messages are the basis of all cluster content. When you send something to a queue it is in the form of a message. When retrieving content from a cache, you will receive a message. When executing a task within the cluster, it is represented by a message. When asynchronous multicast bulletins are distributed across the cluster, they are message instances. Everything in the cluster is a message. 
    8887Other than registration requirements, each message is a standard D class and operates in the normal fashion. It just has the additional abilities to appear and optionally ''behave'' upon other machines in the cluster. 
    90 == Queue == 
     89=== Queue === 
    9291A queue is a stash of messages. Each queue is identified by its channel name, thus each channel being queued will have a distinct queue instance. Messages are placed into the queue(s) via channel activity and retrieved in a similar manner. These latter two operations represent synchronous activity. Alternatively, message consumers (channel subscribers) can listen asynchronously for message activity, and have messages fed to them when queue activity occurs. Both approaches have their utility and the choice is yours to make. 
    9897Queues are persistent; they survive power failures. 
    100 == Cache == 
     99=== Cache === 
    102101Cache hosts store messages in much the same way as an associative array, or hash table, does. Messages are isolated by channel name, and are addressable by a key value. In Tango, this key is an array of the char type (char[]). 
    108107Cache instances are intended to be temporal only, thus the ''Tina'' implementation does not persist them. 
    110 == Task == 
     109=== Task === 
    112111A task is an executable message, and executes outside of the invoking process. In general, it will appear on one of the available task servers (in the cluster) and execute there before returning to the caller with results. This is a synchronous execution model. For a decoupled execution model, the task can be sent to a queue and hosted there until a subscriber retrieves and executes it. Replies from the decoupled model would generally be sent back via another queue, in the same manner as generic queue messages are replied to (see Queue above). 
    116115Task messages are also distinct in that they should be ''registered'' with the cluster. This means that the task message is an integral part of each task server, such that it can be executed there. In practice, there are two principal options available: statically link the task messages into each task server, or dynamically distribute and link them into each task server. Please note that dynamic linking is not currently available to D on all platforms, so the default ''Tina'' implementation takes the former route for now – registering with a task server is a matter of an import and a method call. 
    118 == Bulletin == 
     117=== Bulletin === 
    120119A notification message style sent to all cluster participants, leveraging the most efficient underlying mechanisms available. These messages are limited in size (generally less than 1KB maximum), and are intended to be simple and lightweight in nature. The Tina QOS uses bulletins for cluster discovery, queue activity notification, cache coherence, and uses multicast as the distribution mechanism. When a bulletin is sent, ''all'' listeners on the same channel will receive it.  
    130129When a notification occurs, the arrival context and incoming message are made available via a parameter passed to the listener. In most notification cases, the arriving message is a single entity representing the notification itself. However, a queue notification will result in one or more queued messages being delivered. 
    132 = Client Usage
     131== Client Usage =
    134133In this section we’ll take a look at how to use the cluster features through code examples. The first step is to import an appropriate cluster. For these examples we’ll be using the Tina QOS provided, but for other implementations one would import the relevant package instead. Note that we’ll focus on the client side here, and the server side in a following section. 
    136 == Cache Client == 
     135=== Cache Client === 
    138137In this example we show how to use the cluster as a distributed cache. There are a number of operations available, though the general idea is illustrated here. Note that we pass the command-line arguments to the join() methods: this configures the cache with the full set of valid cache instances available. Unlike other facilities, cache instances are not self-discovering. 
    166 == Bulletin Client == 
     165=== Bulletin Client === 
    168167How to send and receive notifications across the cluster. These are send to every listener on the specific broadcast channel. Take note that we create a callback function and pass to the cluster as our bulletin consumer. 
    201 == Queue Pull Client == 
     200=== Queue Pull Client === 
    203202How to setup and use a queue in synchronous mode. We just place something into our queue and retrieve it: 
    226 == Queue Push Client == 
     225=== Queue Push Client === 
    228227Illustrates how to setup and use a Queue in asynchronous mode. We provide a listener delegate to the cluster, invoked when subscribed content arrives in a queue (from anywhere on the cluster). 
    265 == Queue Reply Client == 
     264=== Queue Reply Client === 
    267266In this variation we queue a message in the cluster, receive it via a listener, reply to that message on a different channel and, finally, receive the reply. There are two listeners in this example: 
    301 == Task Client == 
     300=== Task Client === 
    303302Cluster task execution generally comprises three participants. First we create the task itself, generally in a distinct module. In this case we're demonstrating the use of an ''expression'' task: 
    348 = Tina
     347== Tina =
    350349Tina is the default QOS implementation, providing three distinct servers for handling each or queue, cache, and task requests. Source code is provided in the form of a toolkit, and one is expected to configure each server to specific needs. However, there are also examples programs supplied, through which a working server can be constructed via a simple compilation. These examples are trivial front-ends to the server functionality, so there should be little difficulty in getting going. For example, here is ''qserver.d'' in full: 
    371370Each of these servers has a set of command-line options for configuring the amount of log data emitted and the server port number. If neither is specified, an appropriate default will be set. All cluster examples reside in the ''tango/example/cluster'' folder, and the modules therein are referred to by name in the following discussion. 
    373 == Queue Server == 
     372=== Queue Server === 
    375374The queue is straightforward to configure: compile the example module ''qserver.d'' and start it up. Each queue is written to a (distinct) file in the directory where the server is started from. This means that two queue-server instances cannot be started from the same directory, since the queue files are not shared. To instantiate multiple queue-servers on a single machine, start them from different directories. 
    377 == Cache Server == 
     376=== Cache Server === 
    379378The cache is also straightforward to configure: compile the example module ''cserver.d'' and start it up. When using Tina, cache clients require a set of ''server:port'' combinations in order to identify the set of valid cache servers. This is needed due to the nature of the distribution algorithm in use, which requires knowledge of all servers. If, for example, not all cache-instances were running when a client started, the cluster-wide cache would potentially be viewed differently by that particular client than another. Thus, when each cache-server is started, make a note of the port selected or configure it on a specific port. This list should be provided to the cache-clients when they are started. 
    387386In general, it would be considered good practice to isolate each task, or group of tasks, into distinct modules – if for no other reason that maintenance and ease of isolation. 
    389 == Logging == 
     388=== Logging === 
    391390The servers in Tina all use the Tango logging subsystem to report activity. By default the content is logged to the console only, but by adjusting the server configuration one can direct the log to various other targets, including files and so on. Each server is provided with a logger instance by the hosting application, and this is where such configuration should take place (adding an ''appender'', etc). Please see the documentation on logging for further details. 
    393 = Tech Notes
     392== Tech Notes =
    395394These are programming concerns which may help you get the most out of the cluster toolkit. 
    397 == Threads == 
     396=== Threads === 
    399398Cluster listeners are asynchronous by nature, being processed on a separate thread from the main program. When a bulletin notification arrives (''push''), a delegate provided by the client is invoked with sufficient information to retrieve the incoming message(s).  
    401400It is up to the client to ensure appropriate measures are taken to ensure correct action ensues when a notification arrives, given that it is inherently a multi-threaded application at that point. We will likely add a module to convert these asynchronous notifications into event, once the event-subsystem is put in place. In the latter case, all asynchronous notifications would effectively be converted into synchronous notification instead. 
    403 == Message Slicing == 
     402=== Message Slicing === 
    405404IO within Tina is multi-threaded. Rather than share a single set of IO buffers, each channel instance has its own set. This sidesteps any issues regarding thread-contention & synchronization, and enables Tina to avoid heap-allocation entirely for all network activity. This significantly reduces the memory footprint of your applications, avoids a common point of thread contention, removes clustering as a potential instigator of garbage collection, and generally limits the load placed upon the host computer. 
    409408This may becomes an issue where a client intends to store the message locally for a period of time, rather than process it immediately. The design trades-off a large savings in GC pressure for the potential of some message ''cloning'' as and when necessary – the act of copying an incoming message such that it is no longer considered transient. The message class has a clone() method specifically for this purpose, and it should be used accordingly. 
    411 == Message Constraints == 
     410=== Message Constraints === 
    413412In order to successfully send a message it should generally be self-contained. That is – wherever a message is re-instantiated, the representation of it should not require the influence of any third party - it should support what's known as a default-constructor. 
    419418Shipping and executing unregistered tasks on the cluster will result in a remote exception, returned to the caller. However we expect to add a facility to install and register tasks dynamically, subject to potential security concerns. 
    421 == Registration and Hosting == 
     420=== Registration and Hosting === 
    423422Upon receipt of each incoming message, a cluster client requires a class instance to ''host'' the content. In most cases, the host is selected from the message registry where all your application message types were previously enrolled. This is not required for task messages, since the outgoing message instance is used to host the result also. For other message types though, the host is required. Instead of depending upon the registry, an application may manually supply an appropriate host as part of a cluster request. This can be convenient in some advanced uses, especially where the channel name maps directly to a specific message type (a one-to-one mapping between the channel and a message class). 
     424== Translations == 
     426 * [!502060A314B1A145!1601.entry Chinese] 
     428== User Comments ==