Project

General

Profile

Bug #1278

Potential memory leak when running tracker reconstruction

Added by Rogers, Chris over 8 years ago. Updated over 8 years ago.

Status:
Closed
Priority:
Normal
Category:
Data Structure
Target version:
Start date:
22 May 2013
Due date:
% Done:

100%

Estimated time:
Workflow:
New Issue

Description

There appears to be a memory leak when running tracker reconstruction - see #1277

This is visible when running against tracker devel branch.


Files

process_monitor.py (902 Bytes) process_monitor.py Rogers, Chris, 23 May 2013 08:47

Related issues

Related to MAUS - Support #1277: MAUS Tracker Recon: Speed and Memory UsageOpenHunt, Christopher20 May 2013

Actions
#1

Updated by Rogers, Chris over 8 years ago

  1. took datacards.py attached to #1277.
  2. adjusted number of spills datacard to 1000 spills
  3. adjusted tracker_devel uncommenting the tracker recon mappers in bin/simulate_mice.py
         # SciFi
    - #   my_map.append(MAUS.MapCppTrackerMCDigitization()) # SciFi electronics model
    -#    my_map.append(MAUS.MapCppTrackerRecon()) # SciFi Recon
    +    my_map.append(MAUS.MapCppTrackerMCDigitization()) # SciFi electronics model
    +    my_map.append(MAUS.MapCppTrackerRecon()) # SciFi Recon
    
  4. added a small utility to monitor process, output below
    • %cpu is percentage of one cpu in use
    • %mem is percentage of memory in use
    • etime is execution time

Clearly there is a memory leak

26165 {'%mem': '1.9', '%cpu': '93.2', 'etime': '00:17'}
26165 {'%mem': '1.9', '%cpu': '94.8', 'etime': '00:22'}
26165 {'%mem': '1.9', '%cpu': '95.8', 'etime': '00:27'}
26165 {'%mem': '1.9', '%cpu': '96.4', 'etime': '00:32'}
26165 {'%mem': '1.9', '%cpu': '96.9', 'etime': '00:37'}
26165 {'%mem': '2.0', '%cpu': '97.3', 'etime': '00:42'}
26165 {'%mem': '2.0', '%cpu': '97.6', 'etime': '00:47'}
26165 {'%mem': '2.0', '%cpu': '97.9', 'etime': '00:52'}
26165 {'%mem': '2.0', '%cpu': '98.0', 'etime': '00:57'}
26165 {'%mem': '2.1', '%cpu': '98.2', 'etime': '01:02'}
26165 {'%mem': '2.1', '%cpu': '98.3', 'etime': '01:07'}
26165 {'%mem': '2.1', '%cpu': '98.5', 'etime': '01:12'}
26165 {'%mem': '2.1', '%cpu': '98.6', 'etime': '01:17'}
26165 {'%mem': '2.2', '%cpu': '98.6', 'etime': '01:22'}
26165 {'%mem': '2.2', '%cpu': '98.7', 'etime': '01:27'}
26165 {'%mem': '2.2', '%cpu': '98.8', 'etime': '01:32'}
26165 {'%mem': '2.2', '%cpu': '98.9', 'etime': '01:37'}
26165 {'%mem': '2.3', '%cpu': '98.9', 'etime': '01:42'}
26165 {'%mem': '2.3', '%cpu': '99.0', 'etime': '01:47'}
26165 {'%mem': '2.3', '%cpu': '99.0', 'etime': '01:52'}
26165 {'%mem': '2.3', '%cpu': '99.1', 'etime': '01:57'}
26165 {'%mem': '2.4', '%cpu': '99.1', 'etime': '02:02'}
26165 {'%mem': '2.4', '%cpu': '99.2', 'etime': '02:07'}
26165 {'%mem': '2.4', '%cpu': '99.2', 'etime': '02:12'}
26165 {'%mem': '2.4', '%cpu': '99.2', 'etime': '02:17'}
26165 {'%mem': '2.5', '%cpu': '99.3', 'etime': '02:22'}
26165 {'%mem': '2.5', '%cpu': '99.3', 'etime': '02:27'}
26165 {'%mem': '2.5', '%cpu': '99.3', 'etime': '02:32'}
26165 {'%mem': '2.5', '%cpu': '99.3', 'etime': '02:37'}
26165 {'%mem': '2.6', '%cpu': '99.4', 'etime': '02:42'}
26165 {'%mem': '2.6', '%cpu': '99.4', 'etime': '02:47'}
26165 {'%mem': '2.6', '%cpu': '99.4', 'etime': '02:52'}
26165 {'%mem': '2.6', '%cpu': '99.4', 'etime': '02:57'}
26165 {'%mem': '2.7', '%cpu': '99.4', 'etime': '03:02'}
26165 {'%mem': '2.7', '%cpu': '99.5', 'etime': '03:07'}
26165 {'%mem': '2.7', '%cpu': '99.5', 'etime': '03:12'}
26165 {'%mem': '2.7', '%cpu': '99.5', 'etime': '03:17'}
26165 {'%mem': '2.8', '%cpu': '99.5', 'etime': '03:22'}
26165 {'%mem': '2.8', '%cpu': '99.5', 'etime': '03:27'}
26165 {'%mem': '2.8', '%cpu': '99.5', 'etime': '03:32'}
#2

Updated by Rogers, Chris over 8 years ago

  1. I reverted bin/simulate_mice.py i.e. replaced comments to remove any SciFi reconstruction
  2. process monitor output below

Looks like the memory leak is specific to tracker recon...

26361 {'%mem': '1.8', '%cpu': '95.5', 'etime': '00:12'}
26361 {'%mem': '1.8', '%cpu': '96.9', 'etime': '00:17'}
26361 {'%mem': '1.8', '%cpu': '97.6', 'etime': '00:22'}
26361 {'%mem': '1.8', '%cpu': '98.1', 'etime': '00:27'}
26361 {'%mem': '1.8', '%cpu': '98.4', 'etime': '00:32'}
26361 {'%mem': '1.8', '%cpu': '98.7', 'etime': '00:37'}
26361 {'%mem': '1.8', '%cpu': '98.8', 'etime': '00:42'}
26361 {'%mem': '1.8', '%cpu': '99.0', 'etime': '00:47'}
26361 {'%mem': '1.8', '%cpu': '99.1', 'etime': '00:52'}
26361 {'%mem': '1.8', '%cpu': '99.2', 'etime': '00:57'}
26361 {'%mem': '1.8', '%cpu': '99.3', 'etime': '01:02'}
26361 {'%mem': '1.8', '%cpu': '99.3', 'etime': '01:07'}
26361 {'%mem': '1.8', '%cpu': '99.4', 'etime': '01:12'}
26361 {'%mem': '1.8', '%cpu': '99.4', 'etime': '01:17'}
26361 {'%mem': '1.8', '%cpu': '99.5', 'etime': '01:22'}
26361 {'%mem': '1.8', '%cpu': '99.5', 'etime': '01:27'}
26361 {'%mem': '1.8', '%cpu': '99.5', 'etime': '01:32'}
26361 {'%mem': '1.8', '%cpu': '99.6', 'etime': '01:37'}
26361 {'%mem': '1.8', '%cpu': '99.6', 'etime': '01:42'}
26361 {'%mem': '1.8', '%cpu': '99.6', 'etime': '01:47'}
26361 {'%mem': '1.8', '%cpu': '99.7', 'etime': '01:52'}
26361 {'%mem': '1.8', '%cpu': '99.7', 'etime': '01:57'}
26361 {'%mem': '1.8', '%cpu': '99.7', 'etime': '02:02'}
26361 {'%mem': '1.8', '%cpu': '99.7', 'etime': '02:07'}
26361 {'%mem': '1.8', '%cpu': '99.7', 'etime': '02:12'}
26361 {'%mem': '1.8', '%cpu': '99.7', 'etime': '02:17'}
26361 {'%mem': '1.8', '%cpu': '99.8', 'etime': '02:22'}
26361 {'%mem': '1.8', '%cpu': '99.8', 'etime': '02:27'}
26361 {'%mem': '1.8', '%cpu': '99.8', 'etime': '02:32'}
26361 {'%mem': '1.8', '%cpu': '99.8', 'etime': '02:37'}
26361 {'%mem': '1.8', '%cpu': '99.8', 'etime': '02:42'}
26361 {'%mem': '1.8', '%cpu': '99.8', 'etime': '02:47'}
26361 {'%mem': '1.8', '%cpu': '99.8', 'etime': '02:52'}
26361 {'%mem': '1.8', '%cpu': '99.8', 'etime': '02:57'}
26361 {'%mem': '1.8', '%cpu': '99.8', 'etime': '03:02'}
26361 {'%mem': '1.8', '%cpu': '99.8', 'etime': '03:07'}
26361 {'%mem': '1.8', '%cpu': '99.9', 'etime': '03:12'}
26361 {'%mem': '1.8', '%cpu': '99.9', 'etime': '03:17'}
26361 {'%mem': '1.9', '%cpu': '99.9', 'etime': '03:22'}
#3

Updated by Santos, Edward over 8 years ago

Tracker Recon or Json Processors, right? When you add the tracker reconstruction you also add objects to the recon branch which need to be processed.

#4

Updated by Rogers, Chris over 8 years ago

I guess my point is that this is not common to the whole datastructure, only some aspect of tracker implementation (still may be an issue with the framework of course).

I had a look at MapCppTrackerDigitization and found that save_json and read_in_json were calling CppToJson and JsonToCpp, without deleting the allocated memory. Note that the ObjectProcessor does specify that memory allocated by CppToJson and JsonToCpp is owned by the caller, so caller has responsibility to delete the allocated memory.

#5

Updated by Rogers, Chris over 8 years ago

So I pushed a fix to MapCppTrackerMCDigitization

bzr+ssh://bazaar.launchpad.net/~chris-rogers/maus/1278/

I now handle the memory properly. Ah, I noticed the unit test is spitting it back (I think my error handling is screwed up in MapCppTrackerMCDigitization::read_in_json)

#6

Updated by Rogers, Chris over 8 years ago

The process monitor returns for 1278

29276 {'%mem': '1.8', '%cpu': '99.3', 'etime': '00:10'}
29276 {'%mem': '1.8', '%cpu': '99.6', 'etime': '00:15'}
29276 {'%mem': '1.8', '%cpu': '99.7', 'etime': '00:20'}
29276 {'%mem': '1.8', '%cpu': '99.8', 'etime': '00:25'}
29276 {'%mem': '1.8', '%cpu': '99.9', 'etime': '00:30'}
29276 {'%mem': '1.8', '%cpu': '99.9', 'etime': '00:35'}
29276 {'%mem': '1.8', '%cpu': '99.9', 'etime': '00:40'}
29276 {'%mem': '1.8', '%cpu': '99.9', 'etime': '00:45'}
29276 {'%mem': '1.8', '%cpu': '100', 'etime': '00:50'}
29276 {'%mem': '1.8', '%cpu': '100', 'etime': '00:55'}
29276 {'%mem': '1.8', '%cpu': '100', 'etime': '01:00'}
29276 {'%mem': '1.8', '%cpu': '100', 'etime': '01:05'}
29276 {'%mem': '1.8', '%cpu': '100', 'etime': '01:10'}
29276 {'%mem': '1.8', '%cpu': '100', 'etime': '01:15'}
29276 {'%mem': '1.8', '%cpu': '100', 'etime': '01:20'}
29276 {'%mem': '1.8', '%cpu': '100', 'etime': '01:25'}
29276 {'%mem': '1.8', '%cpu': '100', 'etime': '01:30'}
29276 {'%mem': '1.8', '%cpu': '100', 'etime': '01:35'}
29276 {'%mem': '1.8', '%cpu': '100', 'etime': '01:40'}
29276 {'%mem': '1.8', '%cpu': '100', 'etime': '01:45'}
29276 {'%mem': '1.8', '%cpu': '100', 'etime': '01:50'}
29276 {'%mem': '1.8', '%cpu': '100', 'etime': '01:55'}
29276 {'%mem': '1.8', '%cpu': '100', 'etime': '02:00'}
29276 {'%mem': '1.8', '%cpu': '100', 'etime': '02:05'}
29276 {'%mem': '1.8', '%cpu': '100', 'etime': '02:10'}
29276 {'%mem': '1.8', '%cpu': '100', 'etime': '02:15'}
29276 {'%mem': '1.8', '%cpu': '100', 'etime': '02:20'}
29276 {'%mem': '1.8', '%cpu': '100', 'etime': '02:25'}
29276 {'%mem': '1.8', '%cpu': '100', 'etime': '02:30'}
29276 {'%mem': '1.8', '%cpu': '100', 'etime': '02:35'}
29276 {'%mem': '1.8', '%cpu': '100', 'etime': '02:40'}
29276 {'%mem': '1.8', '%cpu': '100', 'etime': '02:45'}
29276 {'%mem': '1.9', '%cpu': '100', 'etime': '02:50'}
29276 {'%mem': '1.9', '%cpu': '100', 'etime': '02:55'}
29276 {'%mem': '1.9', '%cpu': '100', 'etime': '03:00'}
29276 {'%mem': '1.9', '%cpu': '100', 'etime': '03:05'}
29276 {'%mem': '1.9', '%cpu': '100', 'etime': '03:10'}
29276 {'%mem': '1.9', '%cpu': '100', 'etime': '03:15'}
29276 {'%mem': '1.9', '%cpu': '100', 'etime': '03:20'}
29276 {'%mem': '1.9', '%cpu': '100', 'etime': '03:25'}
29276 {'%mem': '1.9', '%cpu': '100', 'etime': '03:30'}

So maybe there is still some problem, but it is certainly much reduced

#7

Updated by Rogers, Chris over 8 years ago

I realise that the way I modified the code, there is still an assignment (=) operator - I don't know if SciFi datastructure has implemented assignment operator to handle assignment correctly. For example, copying cross-references in the same data structure needs to be handled very carefully to maintain the integrity of the data.

#8

Updated by Santos, Edward over 8 years ago

The assignment should work correctly, I'll give this a spin and let you know. Just need to finish off the changes in the clustering/mice module handling first.

#9

Updated by Rogers, Chris over 8 years ago

Note I pushed another revision - bzr+ssh://bazaar.launchpad.net/~chris-rogers/maus/1278/ now passes all of the unit tests that the tracker_devel branch passed. (There were a couple of fails in the revision I picked up).

Adam, do you want me to do the same thing to MapCppTrackerRecon? Or would you rather tackle that one?

#10

Updated by Rogers, Chris over 8 years ago

  • Assignee changed from Rogers, Chris to Santos, Edward

Hand-off to Ed (or should it be Adam?) to do the other tracker modules.

#11

Updated by Santos, Edward over 8 years ago

Fixed by revision 979 of tracker_devel.Thanks, Chris.

#12

Updated by Rogers, Chris over 8 years ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100
#13

Updated by Rogers, Chris over 8 years ago

  • Target version changed from Future MAUS release to MAUS-v0.5.4

Also available in: Atom PDF