Project

General

Profile

MAUSDocumentCacheConfiguration » History » Version 4

Jackson, Mike, 20 December 2011 10:53

1 1 Jackson, Mike
h1. Document Cache Configuration
2
3 3 Jackson, Mike
*Please note that at present this only applies to commit "693":http://bazaar.launchpad.net/~michaelj-h/maus/devel/revision/693 and above of "Mike Jackson":http://micewww.pp.rl.ac.uk/users/74's BZR branch.*
4
5 1 Jackson, Mike
MAUS can use a database to cache JSON documents until they are ready for processing. An example of this is caching the outputs from transforms (maps) until ready for merging (reduce). Two databases are currently supported - CouchDB and MongoDB.
6
7
h2. CouchDB
8
9 2 Jackson, Mike
CouchDB (http://couchdb.apache.org/) is a document-oriented database. Each MongoDB server holds 0 or more databases. Each database holds a collection of 0 or more documents. CouchDB is schema free - the documents can be all of the same structure or of different structures.
10 1 Jackson, Mike
11 2 Jackson, Mike
h3. Installing CouchDB
12
13 1 Jackson, Mike
CouchDB can be installed using @yum@ as follows.
14
15
 * Log in as root
16
 * Run 
17
<pre>
18
$ yum install couchdb
19
 ...
20
 couchdb          i386          1.0.1-2.el5.rf          rpmforge          749 k
21
 ...
22
</pre>
23
 * Start the server
24
<pre>
25 4 Jackson, Mike
$ /sbin/service couchdb start
26 1 Jackson, Mike
Starting database server couchdb
27 4 Jackson, Mike
$ /sbin/service couchdb status
28 1 Jackson, Mike
Apache CouchDB is running as process 6723, time to relax.
29
</pre>
30
(as an alternative to @service couchdb@ you can use @/etc/init.d/couchdb@)
31
32
By default CouchDB is available on http://localhost:5984/.
33
34
h3. CouchDB and Python
35
36
MAUS uses couchdb-python - http://code.google.com/p/couchdb-python/. This is installed when you build MAUS.
37
38
h3. CouchDB and MAUS
39
40 2 Jackson, Mike
To use CouchDB with MAUS you need to provide the following configuration parameters:
41 1 Jackson, Mike
42 2 Jackson, Mike
 * Document store class name. This mandatory parameter specifies a Python module that will handle interaction with CouchDB. The parameter and value needs to be:
43
<pre>
44
doc_store_class="CouchDBDocumentStore.CouchDBDocumentStore"
45
</pre>
46
 * CouchDB URL. This optional parameter specifies the CouchDB URL. If omitted then the default of @localhost:5984@ is used. To override this value do:
47
<pre>
48
couchdb_url="http://maus.org.uk:5984"
49
</pre>
50
 * CouchDB database name. This optional parameter specifies the database within CouchDB to use. If omotted then the default of @mausdb@ is used. To override this value do:
51
<pre>
52
couchdb_database_name="someotherdbname" 
53
</pre>
54
 ** Note that if the database is not present in CouchDB it will be created automatically.
55 1 Jackson, Mike
56 2 Jackson, Mike
Here is an example of running the simple histogram example using CouchDB as the document cache:
57
<pre>
58
$ ./bin/examples/simple_histogram_example.py -type_of_dataflow=multi_process \
59
-doc_store_class="CouchDBDocumentStore.CouchDBDocumentStore"
60
</pre>
61
62 1 Jackson, Mike
h2. MongoDB
63
64 2 Jackson, Mike
MongoDB (http://www.mongodb.org/) is a document-oriented database. Each MongoDB server holds 0 or more databases. Each database holds 1 or more collections and each collection 0 or more documents. MongoDB is schema free - the documents can be all of the same structure or of different structures.
65 1 Jackson, Mike
66 2 Jackson, Mike
h3. Installing MongoDB 
67
68 1 Jackson, Mike
MongoDB can be installed using @yum@ as follows.
69
70
 * Log in as root
71
 * Edit @/etc/yum.repos.d/10gen.repo@ and add the lines
72
<pre>
73
[10gen]
74
name=10gen Repository
75
baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/i686
76
gpgcheck=0
77
</pre>
78
 * Run 
79
<pre>
80
$ yum install mongo-10gen
81
 ...
82
 mongo-10gen         i686         2.0.1-mongodb_1           10gen          28 M
83
 ...
84
$ yum install mongo-10gen-server
85
 ...
86
 mongo-10gen-server       i686       2.0.1-mongodb_1          10gen       5.4 M
87
...
88
</pre>
89
 * Start the server
90
<pre>
91 4 Jackson, Mike
$ /sbin/service mongod start
92 1 Jackson, Mike
Starting mongod: forked process: 4357
93
                                                           [  OK  ]
94
all output going to: /var/log/mongo/mongod.log
95 4 Jackson, Mike
$ /sbin/service mongod status
96 1 Jackson, Mike
mongod (pid 4357) is running...
97
</pre>
98
(as an alternative to @service mongod@ you can use @/etc/init.d/mongod@)
99
100
By default MongoDB is available on http://localhost:27017/.
101
102
h3. MongoDB and Python
103
104
MAUS uses pymongo - http://api.mongodb.org/python/current/. This is installed when you build MAUS.
105 2 Jackson, Mike
106
h3. MongoDB and MAUS
107
108
To use MongoDB with MAUS you need to provide the following configuration parameters:
109
110
 * Document store class name. This mandatory parameter specifies a Python module that will handle interaction with MongoDB. The parameter and value needs to be:
111
<pre>
112
doc_store_class="MongoDBDocumentStore.MongoDBDocumentStore"
113
</pre>
114
 * MongoDB host. This optional parameter specifies the MongoDB host. If omitted then the default of @localhost@ is used. To override this value do:
115
<pre>
116
mongodb_host="maus.org.uk"
117
</pre>
118
 * MongoDB port. This optional parameter specifies the MongoDB port. If omitted then the default of @27017@ is used. To override this value do:
119
<pre>
120
mongodb_port=12345
121
</pre>
122
 * MongoDB database name. This optional parameter specifies the database within MongoDB to use. If omotted then the default of @mausdb@ is used. To override this value do:
123
<pre>
124
mongodb_database_name="someotherdbname" 
125
</pre>
126
 ** Note that if the database is not present in MongoDB it will be created automatically.
127
 * MongoDB collection name. This optional parameter specifies the collection within the MongoDB database to use. If omotted then the default of @spills@ is used. To override this value do:
128
<pre>
129
mongodb_collection_name="someothercollectionname" 
130
</pre>
131
 ** Note that if the database is not present in MongoDB it will be created automatically.
132
133
Here is an example of running the simple histogram example using MongoDB as the document cache:
134
<pre>
135
$ ./bin/examples/simple_histogram_example.py -type_of_dataflow=multi_process \
136
-doc_store_class="MongoDBDocumentStore.MongoDBDocumentStore"
137
</pre>