Project

General

Profile

MAUSOnlineReconstructionOverview » History » Version 6

Jackson, Mike, 15 March 2012 15:57

1 1 Jackson, Mike
h1. Online Reconstruction Overview
2
3 6 Jackson, Mike
"Manchego" is the online reconstruction components of MAUS. Manchego stands for "Manchego is Analysis No-nonsense Control room Helping Existentially Guided Onlineness".
4 5 Jackson, Mike
5 4 Jackson, Mike
"Overview talk":https://micewww.pp.rl.ac.uk/attachments/816/20120209-MAUS-SSI-Status.ppt - 09/02/12 - contains a summary of the main concepts and architecture.
6 1 Jackson, Mike
7 6 Jackson, Mike
h2. Introduction
8
9
The MAUS framework consists of four types of component. Each component creates or manipulates spills. A spill is a JSON document representing information about particles.
10
11
* Inputters - these input or generate spills e.g. read DAQ data live and convert into spills, read archived DAQ data and convert into spills, or generate simulated spill data.
12
* Transforms - these analyse the data within the spill in various ways, derive new data and add this to the spill. 
13
* Mergers - these summarise data within a series of successive spills and output new spills with the summarised data.
14
* Outputters - these output spills, for example saving them as JSON files or, where applicable, extracting image data from them and saving this as image files.
15
16 1 Jackson, Mike
h2. Parallel transformation of spills
17 5 Jackson, Mike
18 6 Jackson, Mike
As each spill is independant, spills can be transformed in parallel.
19 5 Jackson, Mike
20
Celery, an distributed asynchronous task queue for Python is used to support parallel transformation of spills. Celery uses RabbitMQ as a message broker to handle communications between clients and Celery worker nodes.
21
22
A Celery worker executes one or more sub-processes each of which execute tasks, in the case of MAUS the transforms. By default Celery spawns N sub-processes where N is the number of CPUs on the host, but N can be explicitly set by a user. Each sub-process can execute tasks.
23
24
Multiple Celery workers can be deployed, each of which with one or more sub-processes. Celery therefore supports highly-parallelisable applications.
25
26
MAUS configures Celery as follows:
27
28
* The framework uses reflection to get the names of the transforms the user wants to use e.g. @MapPyGroup(MapPyBeamMaker, MapCppSimulation, MapCppTrackerDigitization)@. This is termed a "transform specification".
29
* A Celery broadcast is invoked, passing the transform specification, the MAUS configuration and a configuration ID (e.g. the client's process ID).
30
* Celery broadcasts are received by all Celery workers registered with the RabbitMQ message broker.
31
* On receipt of the broadcast, each Celery worker:
32
** Checks that the client's MAUS version is the same as the workers. If not then an error is returned to the client.
33
** Forces the transform specification down to each sub-process.
34
** Waits for the sub-processes to confirm receipt.
35
** If all sub-processes update correctly then a success message is returned to the client.
36
** If any sub-process fails to update then a failure message, with details, is returned to the client.
37
** Each Celery sub-process:
38
*** Invokes @death@ on the existing transforms, to allow for clean-up to be done.
39
*** Updates their configuration to be the one received.
40
*** Creates new transforms as specified in the transform configuration.
41
*** Invokes @birth@ on these with the new configuration.
42
*** Confirms with the Celery worker that the update has been done.
43
** Celery workers and sub-processes catch any exceptions they can to avoid the sub-processes or, more seriously, the Celery worker itself from crashing in an unexpected way.
44
45
MAUS uses Celery as follows:
46
47
* A Celery client-side proxy is used to submit the spill to Celery. It gets an object which it can use to poll the status of the "job".
48
* The client-side proxy forwards the spill to RabbitMQ.
49
* RabbitMQ forwards this to a Celery worker.
50
* The Celery worker picks a sub-process.
51
* The sub-process executes the current transform on the spill.
52
* The result spill is returned to the Celery worker and there back to RabbitMQ.
53
* The MAUS framework regularly polls the status of the transform job until it's status is successful, in which case the result spill is available, or failed, in which case the error is recorded but execution continues.
54
55
h2. Document-oriented database
56
57 1 Jackson, Mike
After spills have been transformed, a document-oriented database, MongoDB, is used to store the transformed spills. The database represents the interface between the input-transform and merge-output phases of a spill processing workflow.
58
59
Use of a database allows many merge-output clients to use the same data.
60
61
The MAUS framework is given the name of a collection of spills and reads these in order of the dates and times they were added to the database. It passes each spill to a merger and then takes the output of the merger and passes it to an outputter.
62 5 Jackson, Mike
63 1 Jackson, Mike
h2. Histogram mergers
64
65 6 Jackson, Mike
Histogram mergers take in spills and, from the data within the spills, update histograms. They regularly output one or more histograms (either on a spill-by-spill basis or every N spills, where N is configurable). The histogram is output in the form of a JSON document which includes:
66 5 Jackson, Mike
67 6 Jackson, Mike
* A list of keywords.
68
* A description of the histogram.
69
* A tag which can be used to name a file when the histogram is saved.
70
* An image type e.g. EPS, PNG, JPG,.... The image type is a configuration option.
71
* The image data itself in base64-encoded format./
72 5 Jackson, Mike
73 6 Jackson, Mike
Histogram mergers do not display or save the histograms, merely output the image and its meta-data.
74 5 Jackson, Mike
75 6 Jackson, Mike
Example histogram mergers, and generic super-classes to build these, currently exist for histograms drawn using PyROOT (@ReducePyTOFPlot and @ReducePyROOTHistogram@) and matplotlib (@ReducePyHistogramTDCADCCounts@ and @ReducePyMatplotlibHistogram@).
76 5 Jackson, Mike
77 6 Jackson, Mike
h2. Saving images
78 5 Jackson, Mike
79 6 Jackson, Mike
An outputter (@OutputPyImage@) allows the JSON documents output by histogram mergers to be saved. The user can configure the directory where the images are saved and a file name prefix for the files. The tag in the JSON document is also used for the file name.
80 5 Jackson, Mike
81 6 Jackson, Mike
The outputter extracts the base-64 encoded image data, unencodes it and saves it in a file. It also saves the JSON document (minus the image data) in an associated meta-data file.
82 5 Jackson, Mike
83 6 Jackson, Mike
h2. Web front-end
84 5 Jackson, Mike
85 6 Jackson, Mike
A web front-end allows histogram images to be vierwed in real-time. The MAUS web front-end is implemented in Django, a Python web framework. Django ships with its own lightweight web server or can be run under Apache web server.
86 5 Jackson, Mike
87 6 Jackson, Mike
The web front-end serves up histogram images from a directory and supports keyword-based searches for images of particular types. 
88 5 Jackson, Mike
89 6 Jackson, Mike
Each web page dynamically refreshes so updated images deposited into the image directory can be automatically presented to users.
90 5 Jackson, Mike
91 6 Jackson, Mike
The interface between the online reconstruction framework and the web front-end is just a set of image files and their accomanying JSON meta-data documents (though the web front-end can render images without any accopanying JSON meta-data).
92 1 Jackson, Mike
93 6 Jackson, Mike
h2. Design in detail
94 1 Jackson, Mike
95 6 Jackson, Mike
h3. Run numbers
96
97 1 Jackson, Mike
Run numbers are assumed to be as follows:
98
99
* -N : Monte Carlo simulation of run N
100
* 0 : pure Monte Carlo simulation
101
* +N : run N
102
103 6 Jackson, Mike
h3. Transforming spills from an input stream (Input-Transform)
104 1 Jackson, Mike
105
This is the algorithm used to transform spills from an input stream:
106
<pre>
107
CLEAR document store
108
run_number = None
109
WHILE an input spill is available
110
  GET next spill
111
  IF spill does not have a run number
112
    # Assume pure MC
113
    spill_run_number = 0
114
  IF (spill_run_number != run_number)
115
    # We've changed run.
116
    IF spill is NOT a start_of_run spill
117
      WARN user of missing start_of_run spill
118
    WAIT for current Celery tasks to complete
119
      WRITE result spills to document store
120
    run_number = spill_run_number
121
    CONFIGURE Celery by DEATHing current transforms and BIRTHing new transforms
122
  TRANSFORM spill using Celery
123
  WRITE result spill to document store
124
 DEATH Celery worker transforms
125
</pre>
126
If there is no initial @start_of_run@ spill (or no @spill_num@ in the spills) in the input stream (as can occur when using @simple_histogram_example.py@ or @simulate_mice.py@) then spill_run_number will be 0, run_number will be None and a Celery configuration will be done before the first spill needs to be transformed. 
127
128
Spills are inserted into the document store in the order of their return from Celery workers. This may not be in synch with the order in which they were originally read from the input stream.
129
130 6 Jackson, Mike
h3. Merging spills and passing results to an output stream (Merge-Output)
131 1 Jackson, Mike
132 2 Jackson, Mike
This is the algorithm used to merge spills and pass the results to an output stream:
133 1 Jackson, Mike
<pre>
134
run_number = None
135
end_of_run = None
136
is_birthed = FALSE
137
last_time = 01/01/1970
138 2 Jackson, Mike
WHILE TRUE
139 1 Jackson, Mike
  READ spills added since last time from document store
140
  IF spill IS "end_of_run"
141 2 Jackson, Mike
    end_of_run = spill
142
  IF spill_run_number != run_number
143
    IF is_birthed
144 1 Jackson, Mike
      IF end_of_run == None
145
          end_of_run = {"daq_event_type":"end_of_run", "run_num":run_number}
146
      Send end_of_run to merger
147
      DEATH merger and outputter
148 2 Jackson, Mike
    BIRTH merger and outputter
149 1 Jackson, Mike
    run_number = spill_run_number
150
    end_of_run = None
151
    is_birthed = TRUE
152
  MERGE and OUTPUT spill
153
Send END_OF_RUN block to merger
154
DEATH merger and outputter
155
</pre>
156
157
The Input-Transform policy of waiting for the processing of spills from a run to complete before starting processing spills from a new run means that all spills from run N-1 are guaranteed to have a time stamp < spills from run N.
158
159
is_birthed is used to ensure that there is no BIRTH-DEATH-BIRTH redundancy on receipt of the first spill.
160
161 6 Jackson, Mike
h3. Document store
162 1 Jackson, Mike
163
Spills are stored in documents in a collection in the document store. 
164
165
Documents are of form @{"_id":ID, "date":DATE, "doc":SPILL}@ where:
166
167
* ID: index of this document in the chain of those successfuly transformed. It has no significance beyond being unique in an execution of Input-Transform loop below. It is not equal to the spill_num (Python @string@)
168
* DATE: date and time to the milli-second noting when the document was added (Python @timestamp@)
169
* DOC: spill document. (Python @string@ holding a valid JSON document)
170
171 6 Jackson, Mike
h4. Collection names
172 1 Jackson, Mike
173
For Input-Transform,
174
175
* If configuration parameter @doc_collection_name@ is @None@, @""@, or @auto@ then @HOSTNAME_PID@, where @HOSTNAME@ is the machine name and @PID@ the process ID, is used.
176
* Otherwise the value of @doc_collection_name@ is used.
177
* @doc_collection_name@ has default value @spills@.
178
179
For Merge-Output,
180
181
* If configuration parameter @doc_collection_name@ is @None@, @""@, or undefined then an error is raised.
182
* Otherwise the value of @doc_collection_name@ is used.
183
184 6 Jackson, Mike
h3. Miscellaneous
185 1 Jackson, Mike
186
* Currently Celery timeouts are not used, transforming a spill takes as long as it takes.
187
* Celery task retries on failure option is not used. If the transformation of a spill fails first time it can't be expected to succeed on a retry.
188
* If memory leaks arise, e.g. from C++ code, look at Celery rate limitss, which allow the time or number of tasks before sub-process is killed and respawned, to be defined. Soft rate limits would allow @death@ to be run on the transforms first.