Project

General

Profile

MAUSOnlineReconstructionOverview » History » Version 4

Jackson, Mike, 15 March 2012 13:50

1 1 Jackson, Mike
h1. Online Reconstruction Overview
2
3 4 Jackson, Mike
"Overview talk":https://micewww.pp.rl.ac.uk/attachments/816/20120209-MAUS-SSI-Status.ppt - 09/02/12 - contains a summary of the main concepts and architecture.
4 3 Jackson, Mike
5 1 Jackson, Mike
h2. Run numbers
6
7
Run numbers are assumed to be as follows:
8
9
* -N : Monte Carlo simulation of run N
10
* 0 : pure Monte Carlo simulation
11
* +N : run N
12
13
h2. Transforming spills from an input stream (Input-Transform)
14
15
This is the algorithm used to transform spills from an input stream:
16
<pre>
17
CLEAR document store
18
run_number = None
19
WHILE an input spill is available
20
  GET next spill
21
  IF spill does not have a run number
22
    # Assume pure MC
23
    spill_run_number = 0
24
  IF (spill_run_number != run_number)
25
    # We've changed run.
26
    IF spill is NOT a start_of_run spill
27
      WARN user of missing start_of_run spill
28
    WAIT for current Celery tasks to complete
29
      WRITE result spills to document store
30
    run_number = spill_run_number
31
    CONFIGURE Celery by DEATHing current transforms and BIRTHing new transforms
32
  TRANSFORM spill using Celery
33
  WRITE result spill to document store
34
 DEATH Celery worker transforms
35
</pre>
36
If there is no initial @start_of_run@ spill (or no @spill_num@ in the spills) in the input stream (as can occur when using @simple_histogram_example.py@ or @simulate_mice.py@) then spill_run_number will be 0, run_number will be None and a Celery configuration will be done before the first spill needs to be transformed. 
37
38
Spills are inserted into the document store in the order of their return from Celery workers. This may not be in synch with the order in which they were originally read from the input stream.
39
40
h2. Merging spills and passing results to an output stream (Merge-Output)
41
42
This is the algorithm used to merge spills and pass the results to an output stream:
43
<pre>
44
run_number = None
45 2 Jackson, Mike
end_of_run = None
46 1 Jackson, Mike
is_birthed = FALSE
47
last_time = 01/01/1970
48
WHILE TRUE
49
  READ spills added since last time from document store
50 2 Jackson, Mike
  IF spill IS "end_of_run"
51
    end_of_run = spill
52 1 Jackson, Mike
  IF spill_run_number != run_number
53
    IF is_birthed
54 2 Jackson, Mike
      IF end_of_run == None
55
          end_of_run = {"daq_event_type":"end_of_run", "run_num":run_number}
56
      Send end_of_run to merger
57 1 Jackson, Mike
      DEATH merger and outputter
58
    BIRTH merger and outputter
59
    run_number = spill_run_number
60 2 Jackson, Mike
    end_of_run = None
61 1 Jackson, Mike
    is_birthed = TRUE
62
  MERGE and OUTPUT spill
63
Send END_OF_RUN block to merger
64
DEATH merger and outputter
65
</pre>
66
67
The Input-Transform policy of waiting for the processing of spills from a run to complete before starting processing spills from a new run means that all spills from run N-1 are guaranteed to have a time stamp < spills from run N.
68
69
is_birthed is used to ensure that there is no BIRTH-DEATH-BIRTH redundancy on receipt of the first spill.
70
71
h2. Document store
72
73
Spills are stored in documents in a collection in the document store. 
74
75
Documents are of form @{"_id":ID, "date":DATE, "doc":SPILL}@ where:
76
77
* ID: index of this document in the chain of those successfuly transformed. It has no significance beyond being unique in an execution of Input-Transform loop below. It is not equal to the spill_num (Python @string@)
78
* DATE: date and time to the milli-second noting when the document was added (Python @timestamp@)
79
* DOC: spill document. (Python @string@ holding a valid JSON document)
80
81
h3. Collection names
82
83
For Input-Transform,
84
85
* If configuration parameter @doc_collection_name@ is @None@, @""@, or @auto@ then @HOSTNAME_PID@, where @HOSTNAME@ is the machine name and @PID@ the process ID, is used.
86
* Otherwise the value of @doc_collection_name@ is used.
87
* @doc_collection_name@ has default value @spills@.
88
89
For Merge-Output,
90
91
* If configuration parameter @doc_collection_name@ is @None@, @""@, or undefined then an error is raised.
92
* Otherwise the value of @doc_collection_name@ is used.
93
94
h2. Miscellaneous
95
96
* Currently Celery timeouts are not used, transforming a spill takes as long as it takes.
97
* Celery task retries on failure option is not used. If the transformation of a spill fails first time it can't be expected to succeed on a retry.
98
* If memory leaks arise, e.g. from C++ code, look at Celery rate limitss, which allow the time or number of tasks before sub-process is killed and respawned, to be defined. Soft rate limits would allow @death@ to be run on the transforms first.