Project

General

Profile

MAUSDocumentCacheConfiguration » History » Version 6

Jackson, Mike, 22 February 2012 10:50

1 1 Jackson, Mike
h1. Document Cache Configuration
2
3 6 Jackson, Mike
MAUS can use a database to cache JSON documents until they are ready for processing. An example of this is caching the outputs from transforms (maps) until ready for merging (reduce). One database is currently supported - MongoDB.
4 2 Jackson, Mike
5 1 Jackson, Mike
h2. MongoDB
6
7 2 Jackson, Mike
MongoDB (http://www.mongodb.org/) is a document-oriented database. Each MongoDB server holds 0 or more databases. Each database holds 1 or more collections and each collection 0 or more documents. MongoDB is schema free - the documents can be all of the same structure or of different structures.
8 1 Jackson, Mike
9 2 Jackson, Mike
h3. Installing MongoDB 
10
11 1 Jackson, Mike
MongoDB can be installed using @yum@ as follows.
12
13
 * Log in as root
14
 * Edit @/etc/yum.repos.d/10gen.repo@ and add the lines
15
<pre>
16
[10gen]
17
name=10gen Repository
18
baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/i686
19
gpgcheck=0
20
</pre>
21
 * Run 
22
<pre>
23
$ yum install mongo-10gen
24
 ...
25
 mongo-10gen         i686         2.0.1-mongodb_1           10gen          28 M
26
 ...
27
$ yum install mongo-10gen-server
28
 ...
29
 mongo-10gen-server       i686       2.0.1-mongodb_1          10gen       5.4 M
30
...
31
</pre>
32
 * Start the server
33
<pre>
34
$ /sbin/service mongod start
35
Starting mongod: forked process: 4357
36
                                                           [  OK  ]
37
all output going to: /var/log/mongo/mongod.log
38
$ /sbin/service mongod status
39
mongod (pid 4357) is running...
40
</pre>
41
(as an alternative to @service mongod@ you can use @/etc/init.d/mongod@)
42
43
By default MongoDB is available on http://localhost:27017/.
44
45
h3. MongoDB and Python
46
47
MAUS uses pymongo - http://api.mongodb.org/python/current/. This is installed when you build MAUS.
48
49
h3. MongoDB and MAUS
50
51
To use MongoDB with MAUS you need to provide the following configuration parameters:
52
53
 * Document store class name. This mandatory parameter specifies a Python module that will handle interaction with MongoDB. The parameter and value needs to be:
54
<pre>
55
doc_store_class="MongoDBDocumentStore.MongoDBDocumentStore"
56
</pre>
57
 * MongoDB host. This optional parameter specifies the MongoDB host. If omitted then the default of @localhost@ is used. To override this value do:
58
<pre>
59
mongodb_host="maus.org.uk"
60
</pre>
61
 * MongoDB port. This optional parameter specifies the MongoDB port. If omitted then the default of @27017@ is used. To override this value do:
62
<pre>
63
mongodb_port=12345
64
</pre>
65
 * MongoDB database name. This optional parameter specifies the database within MongoDB to use. If omotted then the default of @mausdb@ is used. To override this value do:
66
<pre>
67
mongodb_database_name="someotherdbname" 
68
</pre>
69
 ** Note that if the database is not present in MongoDB it will be created automatically.
70
 * MongoDB collection name. This optional parameter specifies the collection within the MongoDB database to use. If omotted then the default of @spills@ is used. To override this value do:
71
<pre>
72
mongodb_collection_name="someothercollectionname" 
73
</pre>
74
 ** Note that if the database is not present in MongoDB it will be created automatically.
75
76
Here is an example of running the simple histogram example using MongoDB as the document cache:
77
<pre>
78
$ ./bin/examples/simple_histogram_example.py -type_of_dataflow=multi_process \
79 5 Jackson, Mike
-doc_store_class="MongoDBDocumentStore.MongoDBDocumentStore"
80 2 Jackson, Mike
</pre>