How to set up Celery and RabbitMQ¶
- Table of contents
- How to set up Celery and RabbitMQ
MAUS can be used with the Celery asynchronous distributed task queue (http://celeryproject.org/) to allow transform (map) steps to be executed on multiple processors in parallel.
Celery uses RabbitMQ (http://www.rabbitmq.com/) as a broker to dispatch jobs to processing nodes.
Set up Celery¶
Celery is a Python tool which is automatically downloaded and installed when you build MAUS.
Set up RabbitMQ¶
To install on Scientific Linux or RedHat you should:
- Log in as a super-user by using
sudo su -
orsu
. - Run:
$ yum install rabbitmq-server Installing: rabbitmq-server noarch 2.2.0-1.el5 epel 890 k Installing for dependencies: erlang i386 R12B-5.10.el5 epel 39 M unixODBC i386 2.2.11-7.1 sl-base 832 k
- Check that
/usr/sbin
is in thePATH
:$ echo %PATH ...
- If not, then add it:
$ export PATH=$PATH:/usr/sbin
- Start the RabbitMQ server:
$ /sbin/service rabbitmq-server start
- Create a MAUS username and password pair e.g.
$ rabbitmqctl add_user maus suam Creating user "maus" ... ...done.
- Create a MAUS virtual host:
$ rabbitmqctl add_vhost maushost Creating vhost "maushost" ... ...done.
- Set the permissions for the user on this host:
$ rabbitmqctl set_permissions -p maushost maus ".*" ".*" ".*" Setting permissions for user "maus" in vhost "maushost" ... ...done.
- Check it is running OK:
$ /sbin/service rabbitmq-server status Status of all running nodes... Node 'rabbit@maus' with Pid 1377: running done.
Default port:
By default RabbitMQ uses port 5672. If you want worker nodes outside your firewall to use the RabbitMQ broker then you will need to open this port.
For more information see:
Configure nodes as Celery workers¶
Ensure the MAUS software is deployed on the nodes you want to use as workers and that you have run
$ source env.sh
Within the MAUS software directory, edit src/common_py/mauscelery/celeryconfig.py
and change
BROKER_HOST = "localhost"
to have the full hostname of the host on which RabbitMQ was deployed e.g.
BROKER_HOST = "maus.epcc.ed.ac.uk"
Run a quick test¶
Start the Celery workers on each node$ celeryd -l INFO --purge
Wait for the Celery workers on each node to start. This may take a minute or two.
- You should get output ending like
[2012-04-27 11:51:38,391: WARNING/MainProcess] celery@miceonrec01a has started.
- If you get output like
[2012-04-27 11:43:59,611: ERROR/MainProcess] Consumer: Connection Error: [Errno 111] Connection refused. Trying again in 4 seconds... [2012-04-27 11:44:03,611: ERROR/MainProcess] Consumer: Connection Error: [Errno 111] Connection refused. Trying again in 6 seconds...
It means rabbitmq is not set up.
Now, on any node with MAUS deployed, run
$ ./bin/examples/simple_histogram_example.py -type_of_dataflow=multi_process \ -doc_store_class="docstore.InMemoryDocumentStore.InMemoryDocumentStore"
The client should show information on spills being passed to Celery and the results returned.
You can also run the MAUS Celery integration tests:
$ python tests/integration/test_distributed_processing/test_celery.py
A lot of messages will be printed. However, the run should end with:
---------------------------------------------------------------------- Ran 11 tests in 76.748s OK
More information¶
See the MAUS pages on,
Updated by Rogers, Chris about 10 years ago ยท 18 revisions