Project

General

Profile

2015-07-14-MEMO+Computing


MAUS

  • Speedup implementation nearly complete
    • no more string conversions between Input-Map-Output
    • Reconstruction side is done including offline reducers
    • MC side is nearly done (less critical for data-taking)
    • Tests show reconstruction speed is ~ data-taking rate
    • Some string conversions remain in the current online-reconstruction framework due to Celery and MongoDB
      • We now have a plan to change the online-reconstruction framework to pass data to reducers via ROOT sockets and ditch celery-based multiprocessing

Offline processing

  • Encountered two issues with offline processing
    • At current MAUS processing speed, reconstruction of some high-rate runs took longer than 24 hours which automatically killed a job on the RAL queue (due to proxy expiration)
      • Janusz has been working with the RAL grid people and has a solution to renew proxies so that jobs can continue on beyond 24 hours
      • This will not be an issue when the sped-up-MAUS implementation is released
    • To avoid the 24 hour limit on the RAL queue, we submitted jobs on the Imperial Tier-2 queue, but jobs died there eventually because of a memory leak
      • Chris Rogers has fixed some of the leaks, Adam Dobbs is investigating any remaining leaks.
    • With the MAUS speedup, reconstruction can keep up with data-taking and regardless of the GRID, we should reconstruct data "live".
      • implementation to be fully worked out -- which machine, whether it stays in the MLCR, etc.

Online

  • Important remaining item: DAQ feedback to EPICS to alert shifters to data corruption.
    • The unpacker catches errors
    • Ed Overton has some improvements to the tracker unpacking to catch corrupt data from tracker readout
    • Needs to be communicated to EPICS/Run Control so an alarm can be raised
      • Rhys Gardener has had a conversation with Pierrick, and is working on providing the necessary input to RC

Infrastructure

  • Nagios monitoring of file compactor and data mover chain has been implemented
  • Have a Nagios mirroring ability set up and tested (so that the status page is visible outside micenet)
    • needs approval to make sure it doesn't break any RAL computing guidelines