Project

General

Profile

Bug #1330

Online reconstruction not catching change of run number in histogram production.

Added by Hunt, Christopher over 7 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Online reconstruction
Target version:
Start date:
05 August 2013
Due date:
% Done:

100%

Estimated time:
Workflow:

Description

On rare occasions, after stopping and starting the DAQ, the online reconstruction will fail to pick up the change in run number.The histograms are not updated and the reducer log file does not report the start of the next run. Restarting the Online Recon solves the issue.

Unfortunately the log files are overwritten after restarting the Online Recon and as such, none have been saved. However the issue has been noted during the run change over 4993-4994 - the contents of "scalers.json" reports:

{ "run_number": 4993, "maus_event_type": "RunFooter"}

During the running period of 2nd to 5th August this behavior was recorded 3 times.

Please append log files and run numbers if found to happen again!


Files

celeryd.log (2.27 MB) celeryd.log Hunt, Christopher, 05 August 2013 16:42
maus-input-transform.log (1.34 MB) maus-input-transform.log Hunt, Christopher, 05 August 2013 16:42
maus-web.log (30.8 KB) maus-web.log Hunt, Christopher, 05 August 2013 16:42
mongodb.log (41.9 KB) mongodb.log Hunt, Christopher, 05 August 2013 16:42
reconstruct_daq_ckov_reducer.log (534 KB) reconstruct_daq_ckov_reducer.log Hunt, Christopher, 05 August 2013 16:42
reconstruct_daq_scalars_reducer.log (1.06 MB) reconstruct_daq_scalars_reducer.log Hunt, Christopher, 05 August 2013 16:43
reconstruct_daq_tof_reducer.log (535 KB) reconstruct_daq_tof_reducer.log Hunt, Christopher, 05 August 2013 16:43
reconstruct_monitor_reducer.log (639 KB) reconstruct_monitor_reducer.log Hunt, Christopher, 05 August 2013 16:43
micewww_target.tar.gz (156 Bytes) micewww_target.tar.gz Rogers, Chris, 05 August 2013 17:01

Related issues

Related to MAUS - Bug #1328: Memory leak in MAUS online reconClosedRogers, Chris02 August 2013

Actions
#1

Updated by Hunt, Christopher over 7 years ago

Occured during change over from run 4996 to 4997, approx 1635 5th August. Log files are attached.

#3

Updated by Rogers, Chris over 7 years ago

Time stamps indicate that the reconstruction has not updated log files since 16:35. It is now 16:50 and data taking continues. maus-input-transform.log has received one spill from 4997 and then hung, indicating a hang up on the death(...) process...

#4

Updated by Rogers, Chris over 7 years ago

#5

Updated by Rogers, Chris over 7 years ago

Looks like one of the end of run celery.tasks is getting stuck in PENDING. I will add a time out and see if that fixes things.

#6

Updated by Rogers, Chris over 7 years ago

I tried adding a timeout, for some reason it didn't take. So at the end of a run, I sleep for 10 seconds (hard-coded) then by hand kill all remaining reconstruction events. This is tested at integration level but I would really like to add unit tests to input_transform.py because, to be honest, getting this working is horrid without unit tests.

#7

Updated by Rogers, Chris over 7 years ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100

Fixed in r990

#8

Updated by Rajaram, Durga over 7 years ago

  • Target version changed from Future MAUS release to MAUS-v0.7.1

Also available in: Atom PDF