Project

General

Profile

Bug #981

MongoDB ran out of address space

Added by Rogers, Chris over 11 years ago. Updated over 10 years ago.

Status:
Closed
Priority:
Low
Assignee:
Category:
Online reconstruction
Target version:
Start date:
27 April 2012
Due date:
% Done:

100%

Estimated time:
Workflow:
New Issue

Description

Running in the control room, we are getting an error like:

[miceonrec01a] reducers > ./reconstruct_daq_single_station_reducer.py -type_of_dataflow=multi_process_merge_output
Error in <TGClient::TGClient>: can't open display "isis57139.sci.rl.ac.uk:0.0", switching to batch mode...
 In case you run from a remote ssh session, reconnect with ssh -Y
MAUS running in online mode
Welcome to MAUS:
    Process ID (PID): 10152
    Program Arguments: ['./reconstruct_daq_single_station_reducer.py', '-type_of_dataflow=multi_process_merge_output']
    Version: MAUS release version 0.2.2
INITIATING EXECUTION
-------- MERGE OUTPUT --------
Traceback (most recent call last):
  File "./reconstruct_daq_single_station_reducer.py", line 74, in <module>
    run()
  File "./reconstruct_daq_single_station_reducer.py", line 71, in run
    MAUS.Go(my_input, my_map, reducer, output_worker, data_cards) 
  File "/home/mice/MAUS/.maus_control-room/src/common_py/Go.py", line 121, in __init__
    executor.execute()
  File "/home/mice/MAUS/.maus_control-room/src/common_py/framework/merge_output.py", line 215, in execute
    raise DocumentStoreException(exc)
docstore.DocumentStore.DocumentStoreException: Exception when using document store: database error: can't map file memory - mongo requires 64 bit build for larger datasets

Workaround is to use a different mongodb name, but this is a bit clunky... and probably means something has gone wrong somewhere. The feature is that the tracker is now in place and data sizes are indeed larger.

#1

Updated by Rogers, Chris over 11 years ago

Additional feature - MongoDB went into some horrible lock status and needed to be forced to die and restart...

#2

Updated by Tunnell, Christopher over 11 years ago

Is this a repeated problem? So happens every time?

#3

Updated by Rogers, Chris over 11 years ago

It didn't occur until after we were running (on test random noise) for about an hour. Now it happens every time unless I change the mongodb_database_name configuration parameter. My worry is that:

  • somewhere the data is getting cached on disk and we run out of disk space in a couple of weeks.
  • there is a load issue that will cause the online recon to crash in a nasty way every hour or so (so that even ctrl-c restart doesnt fix it)
#4

Updated by Jackson, Mike over 11 years ago

I did a Google. Deep within the MongoDB FAQ - http://www.mongodb.org/display/DOCS/FAQ - it has:
---
What are the 32-bit limitations?

MongoDB uses memory-mapped files. When running on a 32-bit operating system, the total storage size for the server (data, indexes, everything) is 2gb. If you are running on a 64-bit os, there is virtually no limit to storage size. Thus 64 bit production deployments are recommended. See the blog post for more information. One other note: journaling is not on by default in the 32 bit binaries as journaling uses extra memory-mapped views.
---
The blog post cited is http://blog.mongodb.org/post/137788967/32-bit-limitations, in which users do ask why this limitation is buried in the FAQ.

3 possibles:

  • Use a 64-bit deployment on a separate server if you have such a server.
  • Look at http://www.mongodb.org/display/DOCS/Sharding.
  • At present src/framework/merge_output.py reads spills for merging and outputing but leaves the spills in MongoDB. This is because N merge-output clients may be reading from MongoDB. Introducing something like a MergeOutputGroup which contains N Merger-Outputer pairs would mean that a single merge-output client could be used instead of N clients. merge_output.py could then delete the spills from MongoDB once they've been passed to the MergeOutputGroup. This can reduce the rate at which the database fills up.
#5

Updated by Tunnell, Christopher over 11 years ago

Thanks Mike for finding that!

We're moving to 64-bit anyways, Chris. Ask Matt Robinson and maybe he can prioritize the move? He's supposed to be coming down to RAL next week. Linda says the intention is to backup the other non-MAUS onrec, move that to 64-bit, then we can install MongoDB on that I guess?

I'm sure Alex can figure something else clever out, but this is just a proposed solution of minimal work for the software people (because Matt does the heavy lifting).

Chris: is the tracker zero suppressed? I don't think the CKOV is and that (I would guess) would be more information than the tracker outputs. If you're worried about it filling up every two weeks, then just tie the end-of-run signal to a flush of the database?

#6

Updated by Rogers, Chris over 11 years ago

One of the things they want to test is zero suppression in the tracker. Yes, they will be moving to 64-bit, probably not one week before a user run. Thanks for the advice guys, will try to figure something out!

#7

Updated by Rogers, Chris over 11 years ago

  • Priority changed from Normal to Urgent
#8

Updated by Rogers, Chris over 11 years ago

  • Priority changed from Urgent to Low

I lowered the priority because this is probably fixed now we have 64 bit server. We need to check though (note load test issue elsewhere).

#9

Updated by Rogers, Chris over 10 years ago

  • Status changed from Open to Closed
  • Assignee changed from Richards, Alexander to Rogers, Chris
  • % Done changed from 0 to 100

Fixed

#10

Updated by Rajaram, Durga over 10 years ago

  • Target version changed from Future MAUS release to MAUS-v0.5.5

Also available in: Atom PDF