Project

General

Profile

Bug #1844

Massive memory leak when running over input root files

Added by Pidcott, Celeste almost 7 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
27 April 2016
Due date:
% Done:

100%

Estimated time:
Workflow:
New Issue

Description

For run 7469, I ran TrackMatching over the official MC file (for which I did observe a large memory leak, usage reaching 5-6GB, but it did manage to run over the complete root file). I then tried to run my PID mapper over the root file this had produced, and found that by the time 31/200 events had been processed memory usage had already reached 2.6GB, and my laptop could not process the whole file. I stripped back my mapper so that it didn't actually do anything, and still the memory leak persisted. My simulate script used InputCppRootData, MapCppGlobalPID, ReducePyDoNothing, and OutputCppRoot

The steps I've taken to look in to this (and their results) are:

1) I switched to InputCppRoot, the leak remained the same.
2) I tried MapPyDoNothing with InputCppRoot, the leak reached ~6GB on the first event and I killed the process for the sake of my laptop.
3) I tried MapPyDoNothing with InputCppRootData, again the leak reached ~6GB on the first event before I killed it, and interestingly it ran far more slowly than it does with other mappers- in the time I allowed it to run for on the one event it would for a different mapper have gone through 20+ events.
4) I ran a simulation with all of the local recon mappers, plus GlobalReconImport, GlobalTrackMatching and my stripped back MapCppGlobalPID included in the mapper chain, and while there was a slow memory leak (roughly 80MB over 800 spills) it was nowhere near as catastrophic as the leak when using an input root file.

If anyone wants to try and reproduce the problem using the same input root file I've been using, it can be found here: https://files.warwick.ac.uk/cepidcott/browse


Files

reproc.py (2.14 KB) reproc.py Rajaram, Durga, 28 April 2016 12:41
#1

Updated by Rajaram, Durga almost 7 years ago

On the other hand, when I run against an 'official' MC sample that was produced with 2.1.0, I do not see this large memory consumption.
http://reco.mice.rl.ac.uk/MAUS-v2.1.0/MC/mc_3mm200_07469_4.root

Yes, there is a very slow leak but nothing nearly as crazy what you get.

Which version of MAUS did you use to produce the trackmatch root file?
Was it a standard release? [ if so which version? ]
Was it a revision of the trunk? [ if so, which rev# ?]

When I run against that file, yes I see very large memory usage.

But when I run with the trunk, I also get some root complaints like

TStreamerInfo::CompareContent:0: RuntimeWarning: The following data member of
the on-file layout version 2100 of class 'MAUS::Spill' differs from
the in-memory layout version 2100:
   MAUS::MCEventPArray* _mc; //
vs
   vector<MAUS::MCEvent*>* _mc; //
TStreamerInfo::CompareContent:0: RuntimeWarning: The following data member of
the on-file layout version 2100 of class 'MAUS::Spill' differs from
the in-memory layout version 2100:
   MAUS::ReconEventPArray* _recon; //
vs
   vector<MAUS::ReconEvent*>* _recon; //

the script I used to do it is attached -- reproc.py

Note, if you use MapPyDoNothing as you did in one of your tests you will see a severe slow down and maybe a leak. PyDoNothing passes json and so does string conversions. You don't want it.

#2

Updated by Pidcott, Celeste almost 7 years ago

I was using rev#1073 of the merge branch.

To check nothing was obviously going amiss with my particular branch, I ran your script on mc_3mm200_07469_4.root and I too only saw a slow leak.

trackmatched_7469.root was produced by running a script containing MapCppGlobalReconImport and MapCppGlobalTrackMatching over mc_3mm200_07469_4.root, and as I said when I ran those I saw a large memory leak.

I've just tried running only MapCppGlobalReconImport over mc_3mm200_07469_4.root and there was only a slow leak, and running your script over the output of that again only produced a slow leak, whereas running trackmatching over it reproduces a large but not quite system-crashing leak, so this points to this actually being a problem introduced by trackmatching (or at least the version of trackmatching in my branch- Jan has very recently made some changes).

How a leak that occurs whilst running trackmatching could then lead to the terrible leak when running a different mapper over the trackmatching output file I don't know. I'll raise this with Jan and see if the problem persists in the most up-to-date version of his code.

#3

Updated by Dobbs, Adam almost 7 years ago

Interesting.

Celeste, please could you or Jan monitor the memory use when running the trackmatching mapper and see if produces a memory leak before we try to do anything with reading back in root files.

Durga, it sounds like we need to create DoNothing cpp modules...

AD

#4

Updated by Rogers, Chris over 6 years ago

Durga, it sounds like we need to create DoNothing cpp modules...

I am confused. I think that Celeste is saying that InputCppRoot and InputCppRootData produces a memory leak? Whereas presumably InputCppDAQOfflineData does not? So that means there is a problem in InputCppRoot.

#5

Updated by Dobbs, Adam over 6 years ago

Exactly. My comment about the python DoNothing modules was because they still use JSON and are super slow.

#6

Updated by Pidcott, Celeste over 6 years ago

Rogers, Chris wrote:

Durga, it sounds like we need to create DoNothing cpp modules...

I am confused. I think that Celeste is saying that InputCppRoot and InputCppRootData produces a memory leak? Whereas presumably InputCppDAQOfflineData does not? So that means there is a problem in InputCppRoot.

I take back my some of accusations about InputCppRoot and InputCppRootData, which were based on my not seeing a memory leak when I ran all mappers (including trackmatching and my PID mapper) in a chain. I was using beam settings that were rarely getting any muons through the trackers so trackmatching wasn't actually doing anything. Running a pencil beam brings back the leak, so I was wrong to say that the trackmatching leak went away in simulation.

The fact that a root file produced by a leaky mapper could then cause a leak when running a normally non-leaky mapper over it is still mysterious though.

#7

Updated by Rogers, Chris over 6 years ago

There is a deep copy of the MAUS data structure at the beginning of each mapper - so if you put leaky classes into the data structure, you will continue to leak every time you subsequently call a mapper.

#8

Updated by Dobbs, Adam over 6 years ago

I suppose we have to do that if we want to retain the ability to multithread spills at some point in the future...

#9

Updated by Rogers, Chris over 6 years ago

Not really. The reason was just to make it easier to keep backwards compatibility with json stuff.

So abstract structure is (Data type 1) -> Deepcopy to data type 2 -> Do something -> Deepcopy to data type 3 -> Do something -> ...

One could add a logic that says "if data type 2 == data type 3, don't deepcopy" but that is not the way it was implemented.

#10

Updated by Dobbs, Adam over 6 years ago

Hmmmm, I might try and implement that, deep copies are expensive...

#11

Updated by Pidcott, Celeste over 6 years ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100

Found out that this is a more insidious issue that isn't related to the inputters, Jan and Melissa are currently investigating

Also available in: Atom PDF