Project

General

Profile

MAUSOnlineReconstructionOverview » History » Version 14

Rogers, Chris, 06 June 2013 08:51

1 11 Jackson, Mike
h1. Online reconstruction overview
2 1 Jackson, Mike
3 7 Jackson, Mike
{{>toc}}
4
5 10 Jackson, Mike
"Architecture diagrams":http://micewww.pp.rl.ac.uk/attachments/855/20120315-OnlineReconstructionArchitecture.ppt
6 1 Jackson, Mike
7 6 Jackson, Mike
h2. Introduction
8
9 12 Jackson, Mike
The MAUS framework executes four types of component. Each component creates or manipulates spills. In this context, a spill is a JSON document representing information about particles.
10 6 Jackson, Mike
11 12 Jackson, Mike
* Inputters - these input or generate spills e.g. read live DAQ data and convert into spills, read archived DAQ data and convert into spills, or generate simulated spill data.
12 6 Jackson, Mike
* Transforms - these analyse the data within the spill in various ways, derive new data and add this to the spill. 
13
* Mergers - these summarise data within a series of successive spills and output new spills with the summarised data.
14 12 Jackson, Mike
* Outputters - these take in merged spills and output spills, for example saving them as JSON files or, where applicable, extracting image data from them and saving this as image files.
15 1 Jackson, Mike
16 12 Jackson, Mike
Basically, the framework repeats the following loop until there are no more spills:
17 1 Jackson, Mike
18 12 Jackson, Mike
* Reads a spill from an inputter.
19
* Transforms the spill.
20
* Passes the transformed spill to a merger.
21
* Passes a merged spill to an outputter.
22 5 Jackson, Mike
23 12 Jackson, Mike
h2. Parallel transformation of spills
24 1 Jackson, Mike
25 12 Jackson, Mike
As each spill is independent, spills can be transformed in parallel. Celery, a distributed asynchronous task queue for Python is used to implement parallel transformation of spills. Celery uses RabbitMQ as a message broker to handle communications between clients and Celery worker nodes.
26 1 Jackson, Mike
27 12 Jackson, Mike
A Celery worker executes transforms. When a worker starts up it registers with RabbitMQ - the RabbitMQ task broker can be local or remote. By default Celery spawns N sub-processes where N is the number of CPUs on a node, though N can be explicitly set by a user. Each sub-process can execute one transform at a time. Multiple Celery workers can be deployed, each of which with one or more sub-processes. Celery allows MAUS to execute highly-parallelised transforming of spills.
28 1 Jackson, Mike
29 12 Jackson, Mike
For MAUS, each Celery worker needs a complete MAUS deployment running on the same node as the worker. The MAUS distributed execution framework, configures Celery as follows:
30 5 Jackson, Mike
31 12 Jackson, Mike
* The framework gets the names of the transforms the user wants to apply e.g. @MapPyGroup(MapPyBeamMaker, MapCppSimulation, MapCppTrackerDigitization)@. This is termed a transform specification.
32 5 Jackson, Mike
* A Celery broadcast is invoked, passing the transform specification, the MAUS configuration and a configuration ID (e.g. the client's process ID).
33 1 Jackson, Mike
* Celery broadcasts are received by all Celery workers registered with the RabbitMQ message broker.
34 5 Jackson, Mike
* On receipt of the broadcast, each Celery worker:
35 12 Jackson, Mike
** Checks that the framework's MAUS version is the same as the worker's. If not then an error is returned to the client.
36 5 Jackson, Mike
** Forces the transform specification down to each sub-process.
37
** Waits for the sub-processes to confirm receipt.
38 12 Jackson, Mike
** If all sub-processes update correctly then a success message is returned to the framework.
39
** If any sub-process fails to update then a failure message, with details, is returned to the framework.
40 9 Jackson, Mike
* Each Celery sub-process:
41 1 Jackson, Mike
** Invokes @death@ on the existing transforms, to allow for clean-up to be done.
42 12 Jackson, Mike
** Updates their configuration.
43 9 Jackson, Mike
** Creates new transforms as specified in the transform configuration.
44 1 Jackson, Mike
** Invokes @birth@ on these with the new configuration.
45
** Confirms with the Celery worker that the update has been done.
46
* Celery workers and sub-processes catch any exceptions they can to avoid the sub-processes or, more seriously, the Celery worker itself from crashing in an unexpected way.
47 9 Jackson, Mike
48 12 Jackson, Mike
MAUS uses Celery to transform spills as follows:
49 5 Jackson, Mike
50 12 Jackson, Mike
* The framework gets the next spill from its input.
51 1 Jackson, Mike
* A Celery client-side proxy is used to submit the spill to Celery. It gets an object which it can use to poll the status of the "job".
52
* The client-side proxy forwards the spill to RabbitMQ.
53
* RabbitMQ forwards this to an available Celery worker. If none are available then the job is queued.
54
* The Celery worker picks an available sub-process.
55 5 Jackson, Mike
* The sub-process executes the current transform on the spill.
56
* The result spill is returned to the Celery worker and there back to RabbitMQ.
57 12 Jackson, Mike
* The framework regularly polls the status of the transform job until it's status is successful, in which case the result spill is available, or failed, in which case the error is recorded but execution continues.
58 5 Jackson, Mike
59 1 Jackson, Mike
h2. Document-oriented database
60 5 Jackson, Mike
61 12 Jackson, Mike
After spills have been transformed, a document-oriented database, MongoDB, is used by the framework to store the transformed spills. This database represents the interface between the input-transform and merge-output phases of a spill processing workflow.
62 1 Jackson, Mike
63 12 Jackson, Mike
The framework is given the name of a collection of spills and reads these in order of the dates and times they were added to the database. It passes each spill to a merger and then takes the output of the merger and passes it to an outputter.
64 1 Jackson, Mike
65 12 Jackson, Mike
Use of a database allows the input-transform part of a workflow to be separate from the merge-output part. It also allows them to operate in concurrently - one process can input and transform spills, another can merge transformed spills and output the merged results. This also allows many merge-output workflows to use the same transformed data, for example to generate multiple types of histogram from the same data.
66 1 Jackson, Mike
67
h2. Histogram mergers
68
69 5 Jackson, Mike
Histogram mergers take in spills and, from the data within the spills, update histograms. They regularly output one or more histograms (either on a spill-by-spill basis or every N spills, where N is configurable). The histogram is output in the form of a JSON document which includes:
70 6 Jackson, Mike
71 1 Jackson, Mike
* A list of keywords.
72 6 Jackson, Mike
* A description of the histogram.
73 12 Jackson, Mike
* A tag which can be used to name a file when the histogram is saved. The tags can also be auto-numbered if the user wants.
74
* An image type e.g. EPS, PNG, JPG, or PDF. The image type is selected by the user.
75
* The image data itself in a base64-encoded format.
76 6 Jackson, Mike
77 12 Jackson, Mike
Histogram mergers do not display or save the histograms. That is the responsibility of other components.
78 1 Jackson, Mike
79 6 Jackson, Mike
Example histogram mergers, and generic super-classes to build these, currently exist for histograms drawn using PyROOT (@ReducePyTOFPlot and @ReducePyROOTHistogram@) and matplotlib (@ReducePyHistogramTDCADCCounts@ and @ReducePyMatplotlibHistogram@).
80 5 Jackson, Mike
81 6 Jackson, Mike
h2. Saving images
82 5 Jackson, Mike
83 12 Jackson, Mike
An outputter (@OutputPyImage@) allows the JSON documents output by histogram mergers to be saved. The user can specify the directory where the images are saved and a file name prefix for the files. The tag in the JSON document is also used to create a file name.
84 5 Jackson, Mike
85 6 Jackson, Mike
The outputter extracts the base-64 encoded image data, unencodes it and saves it in a file. It also saves the JSON document (minus the image data) in an associated meta-data file.
86 5 Jackson, Mike
87 6 Jackson, Mike
h2. Web front-end
88 5 Jackson, Mike
89 12 Jackson, Mike
The web front-end allows histogram images to be viewed. The web front-end is implemented in Django, a Python web framework. Django ships with its own lightweight web server or can be run under Apache web server.
90 5 Jackson, Mike
91 12 Jackson, Mike
The web front-end serves up histogram images from a directory and supports keyword-based searches for images whose file names contain those key words. 
92 5 Jackson, Mike
93 12 Jackson, Mike
The web pages dynamically refresh so updated images deposited into the image directory can be automatically presented to users.
94 5 Jackson, Mike
95 12 Jackson, Mike
The interface between the online reconstruction framework and the web front-end is just a set of image files and their accomanying JSON meta-data documents (though the web front-end can also render images without any accopanying JSON meta-data).
96 6 Jackson, Mike
97 13 Jackson, Mike
h2. Design details
98 1 Jackson, Mike
99 6 Jackson, Mike
h3. Run numbers
100
101 12 Jackson, Mike
Each spill will be part of a run and have an associated run number. Run numbers are assumed to be as follows:
102 1 Jackson, Mike
103
* -N : Monte Carlo simulation of run N
104
* 0 : pure Monte Carlo simulation
105
* +N : run N
106
107 6 Jackson, Mike
h3. Transforming spills from an input stream (Input-Transform)
108 1 Jackson, Mike
109 12 Jackson, Mike
This is the algorithm used to transform spills from an input:
110 1 Jackson, Mike
<pre>
111
CLEAR document store
112
run_number = None
113
WHILE an input spill is available
114
  GET next spill
115
  IF spill does not have a run number
116
    # Assume pure MC
117
    spill_run_number = 0
118
  IF (spill_run_number != run_number)
119
    # We've changed run.
120
    IF spill is NOT a start_of_run spill
121
      WARN user of missing start_of_run spill
122
    WAIT for current Celery tasks to complete
123
      WRITE result spills to document store
124
    run_number = spill_run_number
125
    CONFIGURE Celery by DEATHing current transforms and BIRTHing new transforms
126
  TRANSFORM spill using Celery
127
  WRITE result spill to document store
128
 DEATH Celery worker transforms
129
</pre>
130 12 Jackson, Mike
If there is no initial @start_of_run@ spill (or no @spill_num@ in the spills) in the input stream (as can occur when using @simple_histogram_example.py@ or @simulate_mice.py@) then @spill_run_number@ will be @0@, @run_number@ will be @None@ and a Celery configuration will be done before the first spill needs to be transformed. 
131 1 Jackson, Mike
132
Spills are inserted into the document store in the order of their return from Celery workers. This may not be in synch with the order in which they were originally read from the input stream.
133
134
h3. Merging spills and passing results to an output stream (Merge-Output)
135
136
This is the algorithm used to merge spills and pass the results to an output stream:
137
<pre>
138
run_number = None
139
end_of_run = None
140
is_birthed = FALSE
141
last_time = 01/01/1970
142 2 Jackson, Mike
WHILE TRUE
143 1 Jackson, Mike
  READ spills added since last time from document store
144
  IF spill IS "end_of_run"
145 2 Jackson, Mike
    end_of_run = spill
146
  IF spill_run_number != run_number
147 1 Jackson, Mike
    IF is_birthed
148
      IF end_of_run == None
149
          end_of_run = {"daq_event_type":"end_of_run", "run_num":run_number}
150
      Send end_of_run to merger
151
      DEATH merger and outputter
152 2 Jackson, Mike
    BIRTH merger and outputter
153 1 Jackson, Mike
    run_number = spill_run_number
154
    end_of_run = None
155
    is_birthed = TRUE
156
  MERGE and OUTPUT spill
157
Send END_OF_RUN block to merger
158
DEATH merger and outputter
159
</pre>
160
161
The Input-Transform policy of waiting for the processing of spills from a run to complete before starting processing spills from a new run means that all spills from run N-1 are guaranteed to have a time stamp < spills from run N.
162
163 12 Jackson, Mike
@is_birthed@ is used to ensure that there is no BIRTH-DEATH-BIRTH redundancy on receipt of the first spill from the document store.
164 1 Jackson, Mike
165 6 Jackson, Mike
h3. Document store
166 1 Jackson, Mike
167 12 Jackson, Mike
Spills are stored in documents in a collection in the document store.
168 1 Jackson, Mike
169
Documents are of form @{"_id":ID, "date":DATE, "doc":SPILL}@ where:
170
171 12 Jackson, Mike
* ID: index of this document in the chain of those successfuly transformed. It has no significance beyond being unique in an execution of the Input-Transform loop which deposits the spill. It is not equal to the spill_num (Python @string@)
172
* DATE: date and time to the milli-second noting when the document was added. A Python @timestamp@.
173
* DOC: spill document. A Python @string@ holding a valid JSON document.
174 1 Jackson, Mike
175 6 Jackson, Mike
h4. Collection names
176 1 Jackson, Mike
177
For Input-Transform,
178
179
* If configuration parameter @doc_collection_name@ is @None@, @""@, or @auto@ then @HOSTNAME_PID@, where @HOSTNAME@ is the machine name and @PID@ the process ID, is used.
180
* Otherwise the value of @doc_collection_name@ is used.
181
* @doc_collection_name@ has default value @spills@.
182
183
For Merge-Output,
184
185
* If configuration parameter @doc_collection_name@ is @None@, @""@, or undefined then an error is raised.
186
* Otherwise the value of @doc_collection_name@ is used.
187
188 12 Jackson, Mike
h3. Miscellaneous comments
189 1 Jackson, Mike
190
* Currently Celery timeouts are not used, transforming a spill takes as long as it takes.
191
* Celery task retries on failure option is not used. If the transformation of a spill fails first time it can't be expected to succeed on a retry.
192
* If memory leaks arise, e.g. from C++ code, look at Celery rate limitss, which allow the time or number of tasks before sub-process is killed and respawned, to be defined. Soft rate limits would allow @death@ to be run on the transforms first.