Project

General

Profile

MAUSDocumentCacheConfiguration » History » Revision 5

Revision 4 (Jackson, Mike, 20 December 2011 10:53) → Revision 5/15 (Jackson, Mike, 21 December 2011 14:07)

h1. Document Cache Configuration 

 *Please note that at present this only applies to commit "693":http://bazaar.launchpad.net/~michaelj-h/maus/devel/revision/693 and above of "Mike Jackson":http://micewww.pp.rl.ac.uk/users/74's BZR branch.* 

 MAUS can use a database to cache JSON documents until they are ready for processing. An example of this is caching the outputs from transforms (maps) until ready for merging (reduce). Two databases are currently supported - MongoDB CouchDB and CouchDB. MongoDB. 

 h2. MongoDB CouchDB 

 MongoDB (http://www.mongodb.org/) CouchDB (http://couchdb.apache.org/) is a document-oriented database. Each MongoDB server holds 0 or more databases. Each database holds 1 or more collections and each a collection of 0 or more documents. MongoDB CouchDB is schema free - the documents can be all of the same structure or of different structures. 

 h3. Installing MongoDB  

 MongoDB CouchDB 

 CouchDB can be installed using @yum@ as follows. 

  * Log in as root 
  * Edit @/etc/yum.repos.d/10gen.repo@ and add the lines 
 <pre> 
 [10gen] 
 name=10gen Repository 
 baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/i686 
 gpgcheck=0 
 </pre> 
  * Run  
 <pre> 
 $ yum install mongo-10gen couchdb 
  ... 
  mongo-10gen           i686           2.0.1-mongodb_1             10gen couchdb            28 M i386            1.0.1-2.el5.rf            rpmforge            749 k 
  ... 
 $ yum install mongo-10gen-server 
  ... 
  mongo-10gen-server         i686         2.0.1-mongodb_1            10gen         5.4 M 
 ... 
 </pre> 
  * Start the server 
 <pre> 
 $ /sbin/service mongod couchdb start 
 Starting mongod: forked process: 4357 
                                                            [    OK    ] database server couchdb 
 all output going to: /var/log/mongo/mongod.log 
 $ /sbin/service mongod couchdb status 
 mongod (pid 4357) Apache CouchDB is running... running as process 6723, time to relax. 
 </pre> 
 (as an alternative to @service mongod@ couchdb@ you can use @/etc/init.d/mongod@) @/etc/init.d/couchdb@) 

 By default MongoDB CouchDB is available on http://localhost:27017/. http://localhost:5984/. 

 h3. MongoDB CouchDB and Python 

 MAUS uses pymongo couchdb-python - http://api.mongodb.org/python/current/. http://code.google.com/p/couchdb-python/. This is installed when you build MAUS. 

 h3. MongoDB CouchDB and MAUS 

 To use MongoDB CouchDB with MAUS you need to provide the following configuration parameters: 

  * Document store class name. This mandatory parameter specifies a Python module that will handle interaction with MongoDB. CouchDB. The parameter and value needs to be: 
 <pre> 
 doc_store_class="MongoDBDocumentStore.MongoDBDocumentStore" doc_store_class="CouchDBDocumentStore.CouchDBDocumentStore" 
 </pre> 
  * MongoDB host. CouchDB URL. This optional parameter specifies the MongoDB host. CouchDB URL. If omitted then the default of @localhost@ @localhost:5984@ is used. To override this value do: 
 <pre> 
 mongodb_host="maus.org.uk" couchdb_url="http://maus.org.uk:5984" 
 </pre> 
  * MongoDB port. This optional parameter specifies the MongoDB port. If omitted then the default of @27017@ is used. To override this value do: 
 <pre> 
 mongodb_port=12345 
 </pre> 
  * MongoDB CouchDB database name. This optional parameter specifies the database within MongoDB CouchDB to use. If omotted then the default of @mausdb@ is used. To override this value do: 
 <pre> 
 mongodb_database_name="someotherdbname" couchdb_database_name="someotherdbname"  
 </pre> 
  ** Note that if the database is not present in MongoDB CouchDB it will be created automatically. 
  * MongoDB collection name. This optional parameter specifies the collection within the MongoDB database to use. If omotted then the default of @spills@ is used. To override this value do: 
 <pre> 
 mongodb_collection_name="someothercollectionname"  
 </pre> 
  ** Note that if the database is not present in MongoDB it will be created automatically. 

 Here is an example of running the simple histogram example using MongoDB CouchDB as the document cache: 
 <pre> 
 $ ./bin/examples/simple_histogram_example.py -type_of_dataflow=multi_process \ 
 -doc_store_class="MongoDBDocumentStore.MongoDBDocumentStore" -doc_store_class="CouchDBDocumentStore.CouchDBDocumentStore" 
 </pre> 

 h2. CouchDB MongoDB 

 CouchDB (http://couchdb.apache.org/) MongoDB (http://www.mongodb.org/) is a document-oriented database. Each MongoDB server holds 0 or more databases. Each database holds a 1 or more collections and each collection of 0 or more documents. CouchDB MongoDB is schema free - the documents can be all of the same structure or of different structures. 

 h3. Installing CouchDB 

 CouchDB MongoDB  

 MongoDB can be installed using @yum@ as follows. 

  * Log in as root 
  * Edit @/etc/yum.repos.d/10gen.repo@ and add the lines 
 <pre> 
 [10gen] 
 name=10gen Repository 
 baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/i686 
 gpgcheck=0 
 </pre> 
  * Run  
 <pre> 
 $ yum install couchdb mongo-10gen 
  ... 
  couchdb mongo-10gen           i686           2.0.1-mongodb_1             10gen            i386            1.0.1-2.el5.rf            rpmforge            749 k 28 M 
  ... 
 $ yum install mongo-10gen-server 
  ... 
  mongo-10gen-server         i686         2.0.1-mongodb_1            10gen         5.4 M 
 ... 
 </pre> 
  * Start the server 
 <pre> 
 $ /sbin/service couchdb mongod start 
 Starting database server couchdb mongod: forked process: 4357 
                                                            [    OK    ] 
 all output going to: /var/log/mongo/mongod.log 
 $ /sbin/service couchdb mongod status 
 Apache CouchDB mongod (pid 4357) is running as process 6723, time to relax. running... 
 </pre> 
 (as an alternative to @service couchdb@ mongod@ you can use @/etc/init.d/couchdb@) @/etc/init.d/mongod@) 

 By default CouchDB MongoDB is available on http://localhost:5984/. http://localhost:27017/. 

 h3. CouchDB MongoDB and Python 

 MAUS uses couchdb-python pymongo - http://code.google.com/p/couchdb-python/. http://api.mongodb.org/python/current/. This is installed when you build MAUS. 

 h3. CouchDB MongoDB and MAUS 

 To use CouchDB MongoDB with MAUS you need to provide the following configuration parameters: 

  * Document store class name. This mandatory parameter specifies a Python module that will handle interaction with CouchDB. MongoDB. The parameter and value needs to be: 
 <pre> 
 doc_store_class="CouchDBDocumentStore.CouchDBDocumentStore" doc_store_class="MongoDBDocumentStore.MongoDBDocumentStore" 
 </pre> 
  * CouchDB URL. MongoDB host. This optional parameter specifies the CouchDB URL. MongoDB host. If omitted then the default of @localhost:5984@ @localhost@ is used. To override this value do: 
 <pre> 
 couchdb_url="http://maus.org.uk:5984" mongodb_host="maus.org.uk" 
 </pre> 
  * CouchDB MongoDB port. This optional parameter specifies the MongoDB port. If omitted then the default of @27017@ is used. To override this value do: 
 <pre> 
 mongodb_port=12345 
 </pre> 
  * MongoDB database name. This optional parameter specifies the database within CouchDB MongoDB to use. If omotted then the default of @mausdb@ is used. To override this value do: 
 <pre> 
 couchdb_database_name="someotherdbname" mongodb_database_name="someotherdbname"  
 </pre> 
  ** Note that if the database is not present in CouchDB MongoDB it will be created automatically. 
  * MongoDB collection name. This optional parameter specifies the collection within the MongoDB database to use. If omotted then the default of @spills@ is used. To override this value do: 
 <pre> 
 mongodb_collection_name="someothercollectionname"  
 </pre> 
  ** Note that if the database is not present in MongoDB it will be created automatically. 

 Here is an example of running the simple histogram example using CouchDB MongoDB as the document cache: 
 <pre> 
 $ ./bin/examples/simple_histogram_example.py -type_of_dataflow=multi_process \ 
 -doc_store_class="CouchDBDocumentStore.CouchDBDocumentStore" -doc_store_class="MongoDBDocumentStore.MongoDBDocumentStore" 
 </pre>