Project

General

Profile

MAUSDocumentCacheConfiguration » History » Revision 10

Revision 9 (Jackson, Mike, 15 March 2012 17:21) → Revision 10/15 (Jackson, Mike, 15 March 2012 17:26)

h1. How to configure MongoDB as a document cache 

 MAUS can use the MongoDB (http://www.mongodb.org/) document-oriented database to cache spills that have been transformed until they are ready to be merged. A MongoDB server holds 0 or more databases. Each database holds 1 or more collections and each collection 0 or more documents. MongoDB is schema free - the documents can be all of the same structure or of different structures. 

 h2. Set up h3. Installing MongoDB  

 MongoDB can be installed using @yum@ as follows. 

  * Log in as a super-user by using @sudo su -@ or @su@. 
  * Edit @/etc/yum.repos.d/10gen.repo@ and add the lines 
 <pre> 
 [10gen] 
 name=10gen Repository 
 baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/i686 
 gpgcheck=0 
 </pre> 
  * Run  
 <pre> 
 $ yum install mongo-10gen 
  ... 
  mongo-10gen           i686           2.0.1-mongodb_1             10gen            28 M 
  ... 
 $ yum install mongo-10gen-server 
  ... 
  mongo-10gen-server         i686         2.0.1-mongodb_1            10gen         5.4 M 
 ... 
 </pre> 
  * Start the server 
 <pre> 
 $ /sbin/service mongod start 
 Starting mongod: forked process: 4357 
                                                            [    OK    ] 
 all output going to: /var/log/mongo/mongod.log 
 $ /sbin/service mongod status 
 mongod (pid 4357) is running... 
 </pre> 
 (as an alternative to @service mongod@ you can use @/etc/init.d/mongod@) 

 By default MongoDB is available on http://localhost:27017/. 

 h2. Set up pymngo  

 pymongo(http://api.mongodb.org/python/current/) provides a h3. MongoDB and Python API to MongoDB. 

 MAUS uses pymongo - http://api.mongodb.org/python/current/. This is automatically downloaded and installed when you build MAUS.  

 h2. Set up 

 h3. MongoDB connection and MAUS 

 By default MAUS is set up to To use a MongoDB database running locally.  

 If with MAUS you need to change this, or make other configuration changes, then provide the supported following configuration parameters are as follows: parameters: 

  * Document store class name. This mandatory parameter specifies the MAUS a Python module that handles will handle interaction with MongoDB. The parameter and value needs to be: 
 <pre> 
 doc_store_class="MongoDBDocumentStore.MongoDBDocumentStore" 
 </pre> 
  * MongoDB host. This optional parameter specifies the MongoDB host. If omitted then the default of @localhost@ is used. To override this value do: 
 <pre> 
 mongodb_host="maus.org.uk" 
 </pre> 
  * MongoDB port. This optional parameter specifies the MongoDB port. If omitted then the default of @27017@ is used. To override this value do: 
 <pre> 
 mongodb_port=12345 
 </pre> 
  * MongoDB database name. This optional parameter specifies the database within MongoDB to use. If omotted then the default of @mausdb@ is used. To override this value do: 
 <pre> 
 mongodb_database_name="someotherdbname"  
 </pre> 
  ** Note that if the database is not present in MongoDB it will be created automatically. 
  * MongoDB collection name. This optional parameter specifies the collection within the MongoDB database to use. If omotted then the default of @spills@ is used. To override this value do: 
 <pre> 
 mongodb_collection_name="someothercollectionname"  
 </pre> 
  ** Note that if the database is not present in MongoDB it will be created automatically. 

 Here is an example of running the simple histogram example using MongoDB as the document cache: 
 <pre> 
 $ ./bin/examples/simple_histogram_example.py -type_of_dataflow=multi_process 
 </pre> 

 h2. In-memory document cache 

 MAUS also supports a simple in-memory document cache. This only can be used with the @multi_process@ data flow (and not its subtypes @multi_process_input_transform@ or @multi_process_merge_output@). For example: 

 <pre> 
 $ ./bin/examples/simple_histogram_example.py -type_of_dataflow=multi_process \ 
 -doc_store_class="InMemoryDocumentStore.InMemoryDocumentStore" 
 </pre>