Project

General

Profile

Actions

How to set up Celery and RabbitMQ

MAUS can be used with the Celery asynchronous distributed task queue (http://celeryproject.org/) to allow transform (map) steps to be executed on multiple processors in parallel.

Celery uses RabbitMQ (http://www.rabbitmq.com/) as a broker to dispatch jobs to processing nodes.

Set up Celery

Celery is a Python tool which is automatically downloaded and installed when you build MAUS.

Set up RabbitMQ

To install on Scientific Linux or RedHat you should:

  • Log in as a super-user by using sudo su - or su.
  • Run:
    $ yum install rabbitmq-server
    
    Installing:
     rabbitmq-server        noarch        2.2.0-1.el5          epel           890 k
    Installing for dependencies:
     erlang                 i386          R12B-5.10.el5        epel            39 M
     unixODBC               i386          2.2.11-7.1           sl-base        832 k
    
  • Check that /usr/sbin is in the PATH:
    $ echo %PATH
    ...
    
  • If not, then add it:
    $ export PATH=$PATH:/usr/sbin
    
  • Start the RabbitMQ server:
    $ /sbin/service rabbitmq-server start
    
  • Create a MAUS username and password pair e.g.
    $ rabbitmqctl add_user maus suam
    Creating user "maus" ...
    ...done.
    
  • Create a MAUS virtual host:
    $ rabbitmqctl add_vhost maushost
    Creating vhost "maushost" ...
    ...done.
    
  • Set the permissions for the user on this host:
    $ rabbitmqctl set_permissions -p maushost maus ".*" ".*" ".*" 
    Setting permissions for user "maus" in vhost "maushost" ...
    ...done.
    
  • Check it is running OK:
    $ /sbin/service rabbitmq-server status
    Status of all running nodes...
    Node 'rabbit@maus' with Pid 1377: running
    done.
    

Default port:

By default RabbitMQ uses port 5672. If you want worker nodes outside your firewall to use the RabbitMQ broker then you will need to open this port.

For more information see:

Configure nodes as Celery workers

Ensure the MAUS software is deployed on the nodes you want to use as workers and that you have run

$ source env.sh

Within the MAUS software directory, edit src/common_py/mauscelery/celeryconfig.py and change

BROKER_HOST = "localhost" 

to have the full hostname of the host on which RabbitMQ was deployed e.g.
BROKER_HOST = "maus.epcc.ed.ac.uk" 

Run a quick test

Start the Celery workers on each node
$ celeryd -l INFO --purge

Wait for the Celery workers on each node to start. This may take a minute or two.
  • You should get output ending like
    [2012-04-27 11:51:38,391: WARNING/MainProcess] celery@miceonrec01a has started.
    
  • If you get output like
    [2012-04-27 11:43:59,611: ERROR/MainProcess] Consumer: Connection Error: [Errno 111] Connection refused. Trying again in 4 seconds...
    [2012-04-27 11:44:03,611: ERROR/MainProcess] Consumer: Connection Error: [Errno 111] Connection refused. Trying again in 6 seconds...
    

    It means rabbitmq is not set up.

Now, on any node with MAUS deployed, run

$ ./bin/examples/simple_histogram_example.py -type_of_dataflow=multi_process \ 
-doc_store_class="docstore.InMemoryDocumentStore.InMemoryDocumentStore" 

The client should show information on spills being passed to Celery and the results returned.

You can also run the MAUS Celery integration tests:

$ python tests/integration/test_distributed_processing/test_celery.py

A lot of messages will be printed. However, the run should end with:
----------------------------------------------------------------------
Ran 11 tests in 76.748s

OK

More information

See the MAUS pages on,

Updated by Rogers, Chris over 8 years ago ยท 18 revisions