MLCR Deployment » History » Revision 12

Revision 11 (Rogers, Chris, 06 June 2013 08:02) → Revision 12/42 (Rogers, Chris, 06 June 2013 08:02)

h1. MLCR Deployment 

 There are three installations of MAUS in the MICE Local Control Room: 

 * The current version @.maus_release@ (bound to lp:maus) 
 * The previous version @.maus_release_old@ (bound to lp:maus) 
 * A development copy @.maus_control-room@ (bound to lp:~maus-mlcr/maus/control-room/) 

 The current and previous version are updated by leapfrogging; at the moment only the control-room version is used for online reconstruction, the control room stuff was never merged into the trunk. Note that versions are in hidden folders (prefixed by '.'). To see them do <pre>ls -h</pre> There is a soft link that points to the "default" version (for use by shifters). This should usually point at the current version @.maus_release@. 

 h2. New version of main MAUS release 

 To update to a new version: 

 # Get permission from MOM 
 # First move the current MAUS install onto the fall back release 
 ## Do @mv .maus_release_old .maus_release_old_@ 
 ## Do @mv .maus_release .maus_release_old@ 
 ## Reconfigure the old install  
 *** @cd .maus_release_old; ./configure; source 
 ## Check it is okay - do @bash tests/unit_tests.bash; bash tests/application_tests.bash@ 
 # Next get the current MAUS release copy (or a version as specified by MAUS experts) 
 ## Do @bzr checkout lp:maus .maus_release@ or @bzr checkout lp:maus -rMAUS-vx.y.z .maus_release@ 
 ## Build the code @cd .maus_release; ./install_build_test.bash@ 
 ## Run the integration tests @bash tests/application_tests.bash >& test.log@ 
 ## Check that the online libraries are running okay @python python tests/integration/test_distributed_processing/ 
 ## Run @tests/integration/test_analyze_data_online/ Bring up a browser and look at pretty plots. 
 ## Check that the integration tests passed; check that analyze_data_online and test_multi_vs_single_threaded tests did not skip. 
 # check the updated installation functions correctly by following the shifter instructions 
 ** A cosmics run is ideal 
 # Add a note in the ELog 
 # Email MOGUL head and MOM to let them know that something changed 

 h2. New version of control room branch 

 Then update the control-room branch: 

 * Check that the release was merged into the control-room branch 
 * Do <pre>bzr update</pre> 
 * Check README file that this is indeed the new version of the code 
 * Commit any changes. If appropriate, merge them back into lp:maus/merge 


 * check the updated installation functions correctly by following the shifter instructions 


 If you want to use the Apache web server, then: 

 * Configure the web server, 
 cd /home/mice/ 
 rm -rf media/thumbs/* 
 rm -rf media/raw/* 
 * As super user, restart Apache, 
 /usr/local/apache2/bin/apachectl restart 
 * Check the logs, 
 more /usr/local/apache2/logs/error_log  
 * You should see something like, 
 [Mon Mar 12 12:41:16 2012] [notice] Apache/2.2.22 (Unix) mod_wsgi/3.3  
  Python/2.7.2 configured -- resuming normal operations 
 * If you get a warning like, 
 httpd: Syntax error on line 55 of /usr/local/apache2/conf/httpd.conf:  
 Cannot load /usr/local/apache2/modules/ into server:  
 cannot open shared object file: No such file or directory 
 ** Then unset @MAUS_ROOT_DIR@ and try again: 
 /usr/local/apache2/bin/apachectl restart 

 h2. Go to web site 

 http://localhost:9000/maus/ if using Django web server. 

 http://localhost:80/maus/ if using Apache web server. 

 You should see a MAUS page listing no histograms. 

 h2. Run a quick web front-end test 

 Copy some sample image files into the web front-end media directory: 
 cp images/* media/raw 
 Refresh the web page. 

 Type @sample@ into the search form. 

 A new page should appear with two histograms. 

 Now, delete the images and the thumbnails that would have been auto-generated: 
 rm -rf media/thumbs/* 
 rm -rf media/raw/* 

 h2. Terminal 2 - Celery worker terminal 

 Start up a Celery worker that will use up to 8 cores: 

 cd /home/mice/ 
 celeryd -c 8 -l INFO --purge 

 Wait for the Celery worker to start. This may take a minute or two.  

 When it is initialised it will display a start up message that will look similar to this: 
 The 'CELERY_AMQP_TASK_RESULT_EXPIRES' setting is scheduled for deprecation in 
 version 2.5 and removal in version v3.0.       CELERY_TASK_RESULT_EXPIRES 
 [2012-03-12 12:19:31,252: WARNING/MainProcess] discard: Erased 0 message from the queue. 
 [2012-03-12 12:19:31,253: WARNING/MainProcess] -------------- celery@miceonrec01a v2.5.1 
 ---- **** ----- 
 --- * ***    * -- [Configuration] 
 -- * - **** ---     . broker:        amqp://maus@localhost:5672/maushost 
 - ** ----------     . loader:        celery.loaders.default.Loader 
 - ** ----------     . logfile:       [stderr]@INFO 
 - ** ----------     . concurrency: 8 
 - ** ----------     . events:        OFF 
 - *** --- * ---     . beat:          OFF 
 -- ******* ---- 
 --- ***** ----- [Queues] 
  --------------     . celery:        exchange:celery (direct) binding:celery 
   . mauscelery.maustasks.MausGenericTransformTask 
 [2012-03-12 12:19:31,269: INFO/MainProcess] MAUS version: MAUS release version 0.1.4 
 [2012-03-12 12:19:31,286: INFO/PoolWorker-1] child process calling 
 [2012-03-12 12:19:31,289: INFO/PoolWorker-1] Setting MAUS ErrorHandler to raise  
 [2012-03-12 12:19:31,307: INFO/PoolWorker-2] child process calling 
 [2012-03-12 12:19:31,311: INFO/PoolWorker-2] Setting MAUS ErrorHandler to raise  
 [2012-03-12 12:19:31,320: INFO/PoolWorker-3] child process calling 
 [2012-03-12 12:19:31,324: INFO/PoolWorker-3] Setting MAUS ErrorHandler to raise  
 [2012-03-12 12:19:31,325: INFO/PoolWorker-4] child process calling 
 [2012-03-12 12:19:31,329: INFO/PoolWorker-4] Setting MAUS ErrorHandler to raise  
 [2012-03-12 12:19:31,340: INFO/PoolWorker-5] child process calling 
 [2012-03-12 12:19:31,343: INFO/PoolWorker-5] Setting MAUS ErrorHandler to raise  
 [2012-03-12 12:19:31,356: INFO/PoolWorker-6] child process calling 
 [2012-03-12 12:19:31,359: INFO/PoolWorker-6] Setting MAUS ErrorHandler to raise  
 [2012-03-12 12:19:31,372: INFO/PoolWorker-7] child process calling 
 [2012-03-12 12:19:31,375: INFO/PoolWorker-7] Setting MAUS ErrorHandler to raise  
 [2012-03-12 12:19:31,389: INFO/PoolWorker-8] child process calling 
 [2012-03-12 12:19:31,392: INFO/PoolWorker-8] Setting MAUS ErrorHandler to raise  
 [2012-03-12 12:19:31,390: WARNING/MainProcess] celery@miceonrec01a has started. 
 Note that one Celery pool worker will be created for each of the 8 cores. 

 You can ignore any warnings like: 
 *** Break *** write on a pipe with no one to read it 

 h2. Run a quick Celery and MongoDB test 

 In terminal 3, test that Celery and the document store, MongoDB, are available. First check that the Celery worker has spawned 8 sub-processes: 

 $ ps -a  
   PID TTY            TIME CMD  
 22903 pts/1      00:00:01 celeryd  
 22910 pts/1      00:00:00 celeryd 
 22911 pts/1      00:00:00 celeryd  
 22912 pts/1      00:00:00 celeryd  
 22913 pts/1      00:00:00 celeryd  
 22914 pts/1      00:00:00 celeryd  
 22915 pts/1      00:00:00 celeryd  
 22916 pts/1      00:00:00 celeryd  
 22917 pts/1      00:00:00 celeryd  
 There should be 9 entries. One will be the parent Celery process, and there should be 8 sub-processes. 

 Now run a simple spill processing example, 
 $ ./bin/examples/ -type_of_dataflow=multi_process   
 Four spills should be processed and then the program should just sit there. At which point you can press CTRL-C. 
 Check there are histograms and JSON documents with meta-data  
 $ ls 

 Check the database contains the associated documents  
 $ ./bin/utilities/ --database ALL  
 Database: mausdb  
   Collection: spills : 4 documents (5776 bytes 5 Kb 0 Mb)  
 Database: local  
   No collections  
 If this is the case then all is well! 

 h2. Terminal 3 - input-transform terminal 

 Now, start up the MAUS input-transform process: 

 If you haven't already, run this: 

 export DATE_DB_MYSQL_USER=daq 
 export DATE_DB_MYSQL_PWD=daq 
 export DATE_DB_MYSQL_HOST=miceacq07 
 export DATE_SITE=/dateSite 
 export DATE_HOSTNAME=`hostname` 

 Now, start up the input-transform process which continuously reads the data, transforms it and stores it in a MongoDB database: 

 source /home/mice/ 
 $MAUS_ROOT_DIR/bin/user/ -type_of_dataflow=multi_process_input_transform 

 h2. Terminal 4 - merge-output terminal 

 Start up the MAUS merge-output process, which continuously reads the transformed data from MongoDB, merges it to update histograms and outputs these histograms: 

 source /home/mice/ 
 source /home/mice/ 
 $MAUS_ROOT_DIR/bin/user/ -type_of_dataflow=multi_process_merge_output 

 h2. What you should see 

 These are examples of the sort of output you can expect to see during online operation, *if all is running well*. 

 For errors and possible recovery options, see [[MAUSCeleryRabbitMQRecovery|Distributed spill transformation troubleshooting and recovery]]. 

 h3. Web page 

 Once operation begins, the MAUS web page will display a list of available images. If you type a keyword e.g. @tof@ into the search box then a new page will appear with the images. This page will automatically refresh as the images are updated. 

 h3. Terminal 2 - Celery worker terminal 

 The Celery worker terminal displays information every time a spill is received Celery worker and the worker starts to execute it e.g.: 

 [2012-03-12 14:41:34,497: INFO/MainProcess] Got task from broker:  
 [2012-03-12 14:41:34,533: INFO/PoolWorker-4] None[92dd453d-549e-4ed8-9d3e-f9e820056e2c]: 
 Task invoked by 

 The Celery worker displays the host that submitted the task and the process ID. 

 When the transform workers have completed and the Celery worker is ready to return the result spill, this will be logged e.g.: 

 [2012-03-12 14:41:34,548: INFO/MainProcess]  
 Task mauscelery.maustasks.MausGenericTransformTask[92dd453d-549e-4ed8-9d3e-f9e820056e2c]  
 succeeded in 0.0160460472107s: '{"daq_data":null,"daq_event_type":"end_of_bur... 

 h3. Terminal 3 - input-transform terminal 

 After MAUS starts up: 

 Welcome to MAUS: 
         Process ID (PID): 23685 
         Program Arguments: ['./bin/user/, '-type_of_dataflow=multi_process_input_transform'] 
         Version: MAUS release version 0.1.4 

 The MAUS framework will check for active Celery workers e.g.: 

 Purging document store 
 Checking for active Celery nodes... 
 Number of active Celery nodes: 1 

 If there are no Celery workers available the MAUS framework will exit. 

 If there are Celery workers then the input worker will be configured. 

 It will then enter a loop which operates as follows: 

 * Read a spill from the input worker. 
 * Check for a new run. 
 * If a new run is detected then, 
 ** Wait for any transforms currently being executed by Celery to complete. 
 ** Death the transformers. 
 ** Birth the transformers. 
 * Submit a spill to a Celery worker for execution of the transform (map) workers. 
 * Check the status of the Celery worker. 
 * Get the result spill from the Celery worker when it is available. 
 * Put the result spill into the MongoDB database. 
 * Check spills remaining from the input worker. 

 For example: 

 INPUT: read next spill 
 Spills input: 1 Processed: 0 Failed 0 
 New run detected...waiting for current processing to complete 
 ---------- RUN 3386 ---------- 
 Configuring Celery nodes and birthing transforms... 
 Celery nodes configured! 
 TRANSFORM: processing spills 
 Task ID: ddbf57a0-17a3-4abd-8b10-d340fbfd31b9 
 1 spills left in buffer 
  Celery task ddbf57a0-17a3-4abd-8b10-d340fbfd31b9 SUCCESS  
    SAVING to collection spills (with ID 1) 
 Spills input: 1 Processed: 1 Failed 0 
 INPUT: read next spill 
 Spills input: 2 Processed: 1 Failed 0 
 Task ID: 95b0e098-952d-4da4-8b9c-39dd2c2a514a 
 1 spills left in buffer 
 INPUT: read next spill 
 Spills input: 3 Processed: 1 Failed 0 
 Task ID: 426ffdfa-11e7-43e8-a36a-ad07e5aebe3a 
 1 spills left in buffer 
 INPUT: read next spill 
 Spills input: 4 Processed: 1 Failed 0 
 Task ID: 0d83b05d-00f6-4269-ae1f-c2e592d8d01e 

 The input buffer always has 1 spill until the final spill is processed. 

 The MAUS framework maintains a count of the spills it's read from its input, those successfully transformed and those for which the transform failed. The count of processed spills is used as an index for the results in MongoDB but note that this is not the same as the "spill_num" which occurs within a spill. 

 When all the spills from the input have been processed, a message is printed: 
 Requesting Celery nodes death transforms... 
 Celery node transforms deathed! 
 TRANSFORM: transform tasks completed 
 INPUT: Death 

 h3. Terminal 4 - merge-output terminal 

 After MAUS starts up: 

 Welcome to MAUS: 
         Process ID (PID): 23700 
         Program Arguments: ['./bin/user/, '-type_of_dataflow=multi_process_merge_output'] 
         Version: MAUS release version 0.1.4 
 -------- MERGE OUTPUT -------- 

 It will then enter a loop which interleaves: 

 * Read spills added to MongoDB since the last read. 
 * For each spill, 
 ** Check for a new run. 
 ** If a new run is detected then, 
 *** Death the mergers and outputters. 
 *** Birth the mergers and outputters. 
 ** Pass the spill to the merger. 
 ** Pass the result to the outputter. 

 For example: 

 Read spill 1 (dated 2012-03-12 14:50:28.628000) 
 Change of run detected 
 ---------- START RUN 0 ---------- 
 BIRTH merger ReducePyTOFPlot.ReducePyTOFPlot 
 BIRTH outputer OutputPyImage.OutputPyImage 
 Executing Merge for spill 1 
 Executing Output for spill 1 
 Spills processed: 1 
 Read spill 2 (dated 2012-03-12 14:50:28.657000) 
 Executing Merge for spill 2 
 Executing Output for spill 2 
 Spills processed: 2 
 Read spill 3 (dated 2012-03-12 14:50:28.686000) 
 Executing Merge for spill 3 
 Executing Output for spill 3 
 Spills processed: 3 
 Read spill 4 (dated 2012-03-12 14:50:28.711000) 
 Executing Merge for spill 4 
 Executing Output for spill 4 
 Spills processed: 4 

 The spill number shown is the spill ID as described above. 

 At present this loop does not terminate. So, if you are sure that all the spills have been processed, press CTRL-C. The mergers and outputters will be deathed before the program exists. 

 h3. Terminal 1 - web server terminal (if using Django web server) 

 After the Django web server starts up: 

 Validating models... 
 0 errors found 
 Django version 1.3.1, using settings 'mausweb.settings' 
 Development server is running at http://localhost:9000/ 
 Quit the server with CONTROL-C. 

 This will display a list of HTTP requests that have been received e.g. 

 [12/Mar/2012 14:55:21] "GET /maus/ HTTP/1.0" 200 5335 
 [12/Mar/2012 14:55:41] "GET /maus/imagetof_hit_y.eps/png HTTP/1.0" 200 10647 

 Do not worry if the image names or paths are different. 

 Occasionally you may see a stack trace (see issue #811) - this usually means that the web server has been interrupted mid-request.