Last published: March 1, 2010 by 'mkobetic'
GridDrone is a generic package meant to be used on any number of clients. When loaded the client image is supposed to contact a system running a GridController, at which point it is added to the drone pool of the controller.
The purpose of a GridController is to execute a job, consisting of a number of tasks. It distributes tasks among its drones, coordinates their execution and collects results. Therefore a task goes through following phases:
1) Configuration: The drone is loaded with task specific code and instructed what to do (#do:* variants)
2) Execution: The drone is instructed to execute the task along with any task parameters specific for a given task run (#go* variants)
3) Result Collection: The drone forwards the task run result using a callback. By default the callback is targetted at the controller using a predefined selector #reply:from:in:. Note that the drone can be instructed to use other reply destination and selector in step 2.
Steps 2 and 3 can be executed multiple times without reconfiguring the drone.
To allow statefull drones while remaining generic, each instance of drone has its own namespace and the task definition is compiled in it. Therefore variables from that namespace are functionally equivalent to drone instance variables.
To write a GridController, one is expected to subclass the generic GridController and implement the run method that configures drones and runs the individual tasks, and extend the #reply:from:in callback to do whatever is expected with the incoming results.
To execute a job one has to instantiate a grid controller and execute its #run method. Then launch necessary number of drones telling it to connect to the specific controller (this should be doable with a clean image and command-line arguments).
Following scheme is implemented to aid synchronization of the controller with incoming results. Any drone that connects to the controller and is accepted is added to the controller's "pool". To run a job task, controller takes a set of drones out of the pool (using #next), configures them and starts them. If it is desirable to wait for completion of the sub-tasks, controller can suspend itself on an internal semaphore (using #wait). As the drones send the result back, the result callback implementation should invoke #return: to put the drone back into the pool. When the last drone returns back into the pool the semaphore is signaled, thus resuming the main controller thread.
The semaphore can optionally be replaced with a delay (using #timeout:) in which case the controller can be woken up with a timeout instead of completion. The #cancel method can be used to notify the drones still working on their tasks that the results are no longer interesting.