Project

General

Profile

Actions

Distributed spill transformation clients and tools

MAUS comes with a number of example offline and online reconstruction and simulation scripts. These are in the bin/ directory.

Distributed spills and MAUS scripts

Scripts that use the MAUS framework (Go.py) to execute input-transform-merge-output actions on spills are configured to use distributed spill processing via command-line arguments.

If you have set up RabbitMQ, Celery and MongoDB for distributed spill processing then you can tell MAUS scripts to use these by passing flags to these clients:

  • -type_of_dataflow=multi_process - MAUS will input and transform spills using Celery and put the results in MongoDB. When there are no more input spills, the transformed spills will be read from MongoDB, merged and output.
  • -type_of_dataflow=multi_process_input_transform - MAUS will input and transform spills using Celery and put the results in MongoDB. When there are no moreinput spills the framework will exit.
  • -type_of_dataflow=multi_process_merge_output - MAUS will read transformed spills from MongoDB, merge these and output the merged results.

Recommended configuration for online reconstruction or small data sets

Run two instances of the script, one configured with multi_process_input_transform and the other with multi_process_merge_output. This allows spills to be input and transformed concurrently to transformed spills being merged and output. For example:

$ ./bin/user/reconstruct_daq.py -type_of_dataflow=multi_process_input_transform \
-daq_data_file="03386.000" -daq_data_path=/home/user/data/ 

and
$ ./bin/user/reconstruct_daq.py -type_of_dataflow=multi_process_merge_output

Recommended configuration for small data sets

Either the above or a single script invocation using multi_process can be used e.g.

$ ./bin/user/reconstruct_daq.py -type_of_dataflow=multi_process \
 -daq_data_file="03386.000" -daq_data_path=/home/user/data/ 

Configuring MongoDB document store usage

By default the MongoDB document store is used. This is set via the configuration parameter:

doc_store_class = "docstore.MongoDBDocumentStore.MongoDBDocumentStore" 

in src/common_py/ConfigurationDefaults.py.

By default, a MongoDB server at URL localhost/27017, a default database name of mausdb, and a default collection name of spills are used. You can override these settings, set in src/common_py/ConfigurationDefaults.py, by using the following command-line flags:

  • -mongodb_host - MongoDB host name.
  • -mongodb_port - MongoDB port number.
  • -mongodb_database_name - MongoDB database name. This will be created by MAUS if it does not already exist.
  • doc_collection_name - document collection name. This will be created by MAUS in the database named above if it does not already exist.

If you specify a value of -doc_collection_name="auto" when using multi_process_input_transform or multi_process then a collection name will be auto-generated of the form HOSTNAME-PROCESSID e.g. maus.epcc.ed.ac.uk-24542.

Using an in-memory document store

If running a script using the multi_process option then you can use an in-memory document store instead of MongoDB. This is done by providing the argument:

-doc_store_class="docstore.InMemoryDocumentStore.InMemoryDocumentStore" 

For example,
$ ./bin/user/reconstruct_daq.py -type_of_dataflow=multi_process -daq_data_file="03386.000" \
-daq_data_path=/home/user/data/ \
-doc_store_class="docstore.InMemoryDocumentStore.InMemoryDocumentStore" 

This is only recommended if processing a very small set of spills.

Configuring the OutputPyImage image directory

The default output directory for image files created by OutputPyImage is either:

  • The value of the environment variable MAUS_WEB_MEDIA_RAW, if this variable is set. This variable is set by the MAUS web front-end. This default allows OutputPyImage to deposit images ready for serving by the MAUS web front end.
  • The current directory in which MAUS is running if the environment variable is not set.

You can explicity set another directory using the -image_directory=DIRECTORY command-line flag.

Other scripts

Summarise MongoDB databases - summarise_mongodb.py

This prints out the collections in a MongoDB database, along with the number of documents in each collection and the memory these occupy. The syntax is:

$ ./bin/utilities/summarise_mongodb.py -h
usage: summarise_mongodb.py [-h] [--url URL] [--database DATABASE]

Summarise the contents of MongoDB.

optional arguments:
  -h, --help           show this help message and exit
  --url URL            MongoDB URL
  --database DATABASE  Database name or ALL for all

The default URL is the MongoDB default of localhost:27017 and the default database is mausdb.

For example:

$  ./bin/utilities/summarise_mongodb.py 
Database: mausdb
  Collection: spills : 174 documents (145732000 bytes 142316 Kb 138 Mb)

If you want to print information about another database then use --database NAME e.g.
$ ./bin/utilities/summarise_mongodb.py --database mydb
Database: mydb
  Collection: spills : 10 documents (71928796 bytes 70242 Kb 68.5 Mb)

For all databases use --database "ALL" e.g.
$  ./bin/utilities/summarise_mongodb.py --database ALL
Database: mausdb
  Collection: spills : 174 documents (145732000 bytes 142316 Kb 138 Mb)
Database: mydb
  Collection: spills : 10 documents (71928796 bytes 70242 Kb 68.5 Mb)

Delete MongoDB collections and databases - delete_mongodb.py

This client can be used to delete a collection from a MongoDB database or to delete an entire database (including all its collections). The syntax is:

$ ./bin/utilities/delete_mongodb.py -h
usage: delete_mongodb.py [-h] [--url URL] [--database DATABASE]
                         [--collection COLLECTION]

Delete a collection from MongoDB.

optional arguments:
  -h, --help            show this help message and exit
  --url URL             MongoDB URL
  --database DATABASE   Database
  --collection COLLECTION
                        Collection

The default URL is the MongoDB default of localhost:27017 and the default database is mausdb.

For example to delete collection spills from database mausdb use

$ ./bin/utilities/delete_mongodb.py --database mausdb --collection spills

For example to delete database mausdb use

$ ./bin/utilities/delete_mongodb.py --database mausdb 

Updated by Jackson, Mike over 11 years ago ยท 6 revisions