Project

General

Profile

Actions

MAUSDocumentCacheConfiguration » History » Revision 4

« Previous | Revision 4/15 (diff) | Next »
Jackson, Mike, 20 December 2011 10:53


Document Cache Configuration

Please note that at present this only applies to commit 693 and above of Mike Jackson BZR branch.

MAUS can use a database to cache JSON documents until they are ready for processing. An example of this is caching the outputs from transforms (maps) until ready for merging (reduce). Two databases are currently supported - CouchDB and MongoDB.

CouchDB

CouchDB (http://couchdb.apache.org/) is a document-oriented database. Each MongoDB server holds 0 or more databases. Each database holds a collection of 0 or more documents. CouchDB is schema free - the documents can be all of the same structure or of different structures.

Installing CouchDB

CouchDB can be installed using yum as follows.

  • Log in as root
  • Run
    $ yum install couchdb
     ...
     couchdb          i386          1.0.1-2.el5.rf          rpmforge          749 k
     ...
    
  • Start the server
    $ /sbin/service couchdb start
    Starting database server couchdb
    $ /sbin/service couchdb status
    Apache CouchDB is running as process 6723, time to relax.
    

    (as an alternative to service couchdb you can use /etc/init.d/couchdb)

By default CouchDB is available on http://localhost:5984/.

CouchDB and Python

MAUS uses couchdb-python - http://code.google.com/p/couchdb-python/. This is installed when you build MAUS.

CouchDB and MAUS

To use CouchDB with MAUS you need to provide the following configuration parameters:

  • Document store class name. This mandatory parameter specifies a Python module that will handle interaction with CouchDB. The parameter and value needs to be:
    doc_store_class="CouchDBDocumentStore.CouchDBDocumentStore" 
    
  • CouchDB URL. This optional parameter specifies the CouchDB URL. If omitted then the default of localhost:5984 is used. To override this value do:
    couchdb_url="http://maus.org.uk:5984" 
    
  • CouchDB database name. This optional parameter specifies the database within CouchDB to use. If omotted then the default of mausdb is used. To override this value do:
    couchdb_database_name="someotherdbname" 
    
    • Note that if the database is not present in CouchDB it will be created automatically.

Here is an example of running the simple histogram example using CouchDB as the document cache:

$ ./bin/examples/simple_histogram_example.py -type_of_dataflow=multi_process \
-doc_store_class="CouchDBDocumentStore.CouchDBDocumentStore" 

MongoDB

MongoDB (http://www.mongodb.org/) is a document-oriented database. Each MongoDB server holds 0 or more databases. Each database holds 1 or more collections and each collection 0 or more documents. MongoDB is schema free - the documents can be all of the same structure or of different structures.

Installing MongoDB

MongoDB can be installed using yum as follows.

  • Log in as root
  • Edit /etc/yum.repos.d/10gen.repo and add the lines
    [10gen]
    name=10gen Repository
    baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/i686
    gpgcheck=0
    
  • Run
    $ yum install mongo-10gen
     ...
     mongo-10gen         i686         2.0.1-mongodb_1           10gen          28 M
     ...
    $ yum install mongo-10gen-server
     ...
     mongo-10gen-server       i686       2.0.1-mongodb_1          10gen       5.4 M
    ...
    
  • Start the server
    $ /sbin/service mongod start
    Starting mongod: forked process: 4357
                                                               [  OK  ]
    all output going to: /var/log/mongo/mongod.log
    $ /sbin/service mongod status
    mongod (pid 4357) is running...
    

    (as an alternative to service mongod you can use /etc/init.d/mongod)

By default MongoDB is available on http://localhost:27017/.

MongoDB and Python

MAUS uses pymongo - http://api.mongodb.org/python/current/. This is installed when you build MAUS.

MongoDB and MAUS

To use MongoDB with MAUS you need to provide the following configuration parameters:

  • Document store class name. This mandatory parameter specifies a Python module that will handle interaction with MongoDB. The parameter and value needs to be:
    doc_store_class="MongoDBDocumentStore.MongoDBDocumentStore" 
    
  • MongoDB host. This optional parameter specifies the MongoDB host. If omitted then the default of localhost is used. To override this value do:
    mongodb_host="maus.org.uk" 
    
  • MongoDB port. This optional parameter specifies the MongoDB port. If omitted then the default of 27017 is used. To override this value do:
    mongodb_port=12345
    
  • MongoDB database name. This optional parameter specifies the database within MongoDB to use. If omotted then the default of mausdb is used. To override this value do:
    mongodb_database_name="someotherdbname" 
    
    • Note that if the database is not present in MongoDB it will be created automatically.
  • MongoDB collection name. This optional parameter specifies the collection within the MongoDB database to use. If omotted then the default of spills is used. To override this value do:
    mongodb_collection_name="someothercollectionname" 
    
    • Note that if the database is not present in MongoDB it will be created automatically.

Here is an example of running the simple histogram example using MongoDB as the document cache:

$ ./bin/examples/simple_histogram_example.py -type_of_dataflow=multi_process \
-doc_store_class="MongoDBDocumentStore.MongoDBDocumentStore" 

Updated by Jackson, Mike over 12 years ago · 4 revisions