Distributed spill transformation clients and tools¶
- Table of contents
- Distributed spill transformation clients and tools
MAUS comes with a number of example offline and online reconstruction and simulation scripts. These are in the bin/ directory.
Distributed spills and MAUS scripts¶
Scripts that use the MAUS framework (Go.py
) to execute input-transform-merge-output actions on spills are configured to use distributed spill processing via command-line arguments.
If you have set up RabbitMQ, Celery and MongoDB for distributed spill processing then you can tell MAUS scripts to use these by passing flags to these clients:
-type_of_dataflow=multi_process
- MAUS will input and transform spills using Celery and put the results in MongoDB. When there are no more input spills, the transformed spills will be read from MongoDB, merged and output.-type_of_dataflow=multi_process_input_transform
- MAUS will input and transform spills using Celery and put the results in MongoDB. When there are no moreinput spills the framework will exit.-type_of_dataflow=multi_process_merge_output
- MAUS will read transformed spills from MongoDB, merge these and output the merged results.
Recommended configuration for online reconstruction or small data sets¶
Run two instances of the script, one configured with multi_process_input_transform
and the other with multi_process_merge_output
. This allows spills to be input and transformed concurrently to transformed spills being merged and output. For example:
$ ./bin/user/reconstruct_daq.py -type_of_dataflow=multi_process_input_transform \ -daq_data_file="03386.000" -daq_data_path=/home/user/data/
and
$ ./bin/user/reconstruct_daq.py -type_of_dataflow=multi_process_merge_output
Recommended configuration for small data sets¶
Either the above or a single script invocation using multi_process
can be used e.g.
$ ./bin/user/reconstruct_daq.py -type_of_dataflow=multi_process \ -daq_data_file="03386.000" -daq_data_path=/home/user/data/
Configuring MongoDB document store usage¶
By default the MongoDB document store is used. This is set via the configuration parameter:
doc_store_class = "docstore.MongoDBDocumentStore.MongoDBDocumentStore"
in
src/common_py/ConfigurationDefaults.py
.
By default, a MongoDB server at URL localhost/27017
, a default database name of mausdb
, and a default collection name of spills
are used. You can override these settings, set in src/common_py/ConfigurationDefaults.py
, by using the following command-line flags:
-mongodb_host
- MongoDB host name.-mongodb_port
- MongoDB port number.-mongodb_database_name
- MongoDB database name. This will be created by MAUS if it does not already exist.doc_collection_name
- document collection name. This will be created by MAUS in the database named above if it does not already exist.
If you specify a value of -doc_collection_name="auto"
when using multi_process_input_transform
or multi_process
then a collection name will be auto-generated of the form HOSTNAME-PROCESSID
e.g. maus.epcc.ed.ac.uk-24542
.
Using an in-memory document store¶
If running a script using the multi_process
option then you can use an in-memory document store instead of MongoDB. This is done by providing the argument:
-doc_store_class="docstore.InMemoryDocumentStore.InMemoryDocumentStore"
For example,
$ ./bin/user/reconstruct_daq.py -type_of_dataflow=multi_process -daq_data_file="03386.000" \ -daq_data_path=/home/user/data/ \ -doc_store_class="docstore.InMemoryDocumentStore.InMemoryDocumentStore"
This is only recommended if processing a very small set of spills.
Configuring the OutputPyImage image directory¶
The default output directory for image files created by OutputPyImage
is either:
- The value of the environment variable
MAUS_WEB_MEDIA_RAW
, if this variable is set. This variable is set by the MAUS web front-end. This default allowsOutputPyImage
to deposit images ready for serving by the MAUS web front end. - The current directory in which MAUS is running if the environment variable is not set.
You can explicity set another directory using the -image_directory=DIRECTORY
command-line flag.
Other scripts¶
Summarise MongoDB databases - summarise_mongodb.py¶
This prints out the collections in a MongoDB database, along with the number of documents in each collection and the memory these occupy. The syntax is:
$ ./bin/utilities/summarise_mongodb.py -h usage: summarise_mongodb.py [-h] [--url URL] [--database DATABASE] Summarise the contents of MongoDB. optional arguments: -h, --help show this help message and exit --url URL MongoDB URL --database DATABASE Database name or ALL for all
The default URL is the MongoDB default of
localhost:27017
and the default database is mausdb
.
For example:
$ ./bin/utilities/summarise_mongodb.py Database: mausdb Collection: spills : 174 documents (145732000 bytes 142316 Kb 138 Mb)
If you want to print information about another database then use
--database NAME
e.g.$ ./bin/utilities/summarise_mongodb.py --database mydb Database: mydb Collection: spills : 10 documents (71928796 bytes 70242 Kb 68.5 Mb)
For all databases use
--database "ALL"
e.g.$ ./bin/utilities/summarise_mongodb.py --database ALL Database: mausdb Collection: spills : 174 documents (145732000 bytes 142316 Kb 138 Mb) Database: mydb Collection: spills : 10 documents (71928796 bytes 70242 Kb 68.5 Mb)
Delete MongoDB collections and databases - delete_mongodb.py¶
This client can be used to delete a collection from a MongoDB database or to delete an entire database (including all its collections). The syntax is:
$ ./bin/utilities/delete_mongodb.py -h usage: delete_mongodb.py [-h] [--url URL] [--database DATABASE] [--collection COLLECTION] Delete a collection from MongoDB. optional arguments: -h, --help show this help message and exit --url URL MongoDB URL --database DATABASE Database --collection COLLECTION Collection
The default URL is the MongoDB default of
localhost:27017
and the default database is mausdb
.
For example to delete collection spills
from database mausdb
use
$ ./bin/utilities/delete_mongodb.py --database mausdb --collection spills
For example to delete database mausdb
use
$ ./bin/utilities/delete_mongodb.py --database mausdb
Updated by Jackson, Mike about 11 years ago ยท 6 revisions