I ran the online recon overnight and received the following error
Ending job Clearing Globals Traceback (most recent call last): File "/home/mice/MAUS/.maus_control-room/bin/online/reconstruct_daq_tof_reducer.py", line 71, in <module> run() File "/home/mice/MAUS/.maus_control-room/bin/online/reconstruct_daq_tof_reducer.py", line 68, in run MAUS.Go(my_input, my_map, reducer, output_worker, data_cards) File "/home/mice/MAUS/.maus_control-room/src/common_py/Go.py", line 131, in __init__ self.get_job_footer()) File "/home/mice/MAUS/.maus_control-room/src/common_py/framework/merge_output.py", line 281, in execute raise DocumentStoreException(exc) docstore.DocumentStore.DocumentStoreException: Exception when using document store: cursor id '7857742081767663573' not valid at server
What does OperationFailure cursor id not valid at server mean? Cursors in MongoDB can timeout on the server if they’ve been open for a long time without any operations being performed on them. This can lead to an OperationFailure exception being raised when attempting to iterate the cursor.
May be related to a crash in the DAQ that happened during the same run (possibly at some point DAQ crapped out; and then MAUS sat waiting for spills and eventually gave up). MAUS should run indefinitely without receiving data however.
Updated by Rogers, Chris almost 11 years ago
Email from Yagmur:
It threw a fit around 18:30, FATAL!!! from V1290(GEO1): Trigger mismatch(nEvts 0!=3224). I started it up again. I first tried using run control (which Pierrick had restarted remotely) but it froze during the initial configuration dialog.
Looking at time stamps on the log file, looks like MAUS stopped around 19:09. Annoyingly, I accidentally overwrote the full log file however (sorry).
Updated by Rogers, Chris over 10 years ago
Looks like there are occasional problems in reading the database during merge_output cycle.
I edited MongoDBDocumentStore to raise a DocumentStoreError in the case that get, get_since makes an error. I changed the call structure in merge_output to pass on a DocumentStoreError and attempt to continue iteration - i.e. if the get_since() call fails merge_output will ignore the fail and wait for new data rather than crashing.
In the same set of changes I also changed the KeyboardInterrupt (ctrl-c) handling so that merge_output will finish processing any data in the docstore before exiting. If there is a backlog this can cause a problem... and user will have to use SIGKILL (e.g. kill <pid> from the command line).