Project

General

Profile

Actions

How to configure MongoDB as a document cache

MAUS can use the MongoDB (http://www.mongodb.org/) document-oriented database to cache spills that have been transformed until they are ready to be merged. A MongoDB server holds 0 or more databases. Each database holds 1 or more collections and each collection 0 or more documents. MongoDB is schema free - the documents can be all of the same structure or of different structures.

Set up MongoDB

MongoDB can be installed using yum as follows.

  • Log in as a super-user by using sudo su - or su.
  • Edit /etc/yum.repos.d/10gen.repo and add the lines
    [10gen]
    name=10gen Repository
    baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/i686
    gpgcheck=0
    
  • Run
    $ yum install mongo-10gen
     ...
     mongo-10gen         i686         2.0.1-mongodb_1           10gen          28 M
     ...
    $ yum install mongo-10gen-server
     ...
     mongo-10gen-server       i686       2.0.1-mongodb_1          10gen       5.4 M
    ...
    
  • Start the server
    $ /sbin/service mongod start
    Starting mongod: forked process: 4357
                                                               [  OK  ]
    all output going to: /var/log/mongo/mongod.log
    $ /sbin/service mongod status
    mongod (pid 4357) is running...
    

    (as an alternative to service mongod you can use /etc/init.d/mongod)

By default MongoDB is available on http://localhost:27017/.

Set up pymngo

pymongo(http://api.mongodb.org/python/current/) provides a Python API to MongoDB. pymongo is automatically downloaded and installed when you build MAUS.

Set up MongoDB connection

By default MAUS is set up to use a MongoDB database running locally.

If you need to change this, or make other configuration changes, then the supported configuration parameters are as follows:

  • Document store class name. This mandatory parameter specifies the MAUS Python module that handles interaction with MongoDB. The parameter and value needs to be:
    doc_store_class="MongoDBDocumentStore.MongoDBDocumentStore" 
    
  • MongoDB host. This optional parameter specifies the MongoDB host. If omitted then the default of localhost is used. To override this value do:
    mongodb_host="maus.org.uk" 
    
  • MongoDB port. This optional parameter specifies the MongoDB port. If omitted then the default of 27017 is used. To override this value do:
    mongodb_port=12345
    
  • MongoDB database name. This optional parameter specifies the database within MongoDB to use. If omotted then the default of mausdb is used. To override this value do:
    mongodb_database_name="someotherdbname" 
    
    • Note that if the database is not present in MongoDB it will be created automatically.
  • MongoDB collection name. This optional parameter specifies the collection within the MongoDB database to use. If omotted then the default of spills is used. To override this value do:
    mongodb_collection_name="someothercollectionname" 
    
    • Note that if the database is not present in MongoDB it will be created automatically.

Run a quick test

Run the MAUS MongoDB integration tests:

$ python tests/integration/test_distributed_processing/test_docstore/test_MongoDBDocumentStore.py 
............
----------------------------------------------------------------------
Ran 12 tests in 76.781s

OK

Clear Mongo Database Cache

Mongo can encounter weird problems with data corruption, for example if the machine on which it is running crashes (power outage, etc). All existing data can be wiped from the database by doing (as root)

$ /sbin/service mongod stop
$ rm /var/lib/mongo/*
$ /sbin/service mongod start
$ /sbin/service mongod status

As mongo is only used as a transient database, this is probably a safe operation.

Updated by Rogers, Chris almost 8 years ago ยท 15 revisions