There is a tree of spill data that gets passed to each worker per spill and any given worker only knows about the event and any information (see Data Structure) associated with its parents: run information and global information like filenames. A worker can either be a "map" or a "reduce" worker (see Design) and every worker is a class. Within the map step, processors are chained together à la pipeline programming such that every worker just modifies the data a little then passes it on to the next worker.

Calls to Processors

There are three phases to the life of a worker: birth, life, and death.


Each worker can have a function called 'Birth()'. It is recommended to put initializiation code into Start() instead of the constructor to allow for the worker to be used many times.


For maps: For each spill, Process(spill) is called and a spill is returned.
For reducers: The collection of spills are passed to Process(spill[]) and some value is returned. This value can be a number, the data passed as input, a history, or any other type.


When the worker is no longer needed by the map-reduce then Death() is called.

Updated by Tunnell, Christopher over 10 years ago · 6 revisions