Bug #1642
Tests failing due to out of memory error in test_latest_geometry.py
100%
Description
Hi, The /tmp/suds permission got fixed. The same test then failed further down the road. I was able to reproduce the failure on heplnv157 a couple of times [ once running by hand ], but it passes on my SL6 box. Digging further, it looks like the culprit seems to be "conf3" in tests/integration/test_utilities/test_geometry/test_latest_geometry.py [ which actually runs python bin/utilities/geometry_validation.py --configuration_file tests/integration/test_utilities/test_geometry/conf3 -- the channel validation ] This is taking up too much memory and is getting killed on heplnv157, resulting in the parent test's failure. From /var/log/messages Mar 13 01:18:33 heplnv157 kernel: Out of memory: Kill process 44623 (python) score 756 or sacrifice child Mar 13 01:18:33 heplnv157 kernel: Killed process 44623, UID 601, (python) total-vm:2708008kB, anon-rss:1528392kB, file-r Durga On Mar 12, 2015, at 12:19 PM, Adam Dobbs <a.dobbs07@imperial.ac.uk <mailto:a.dobbs07@imperial.ac.uk>> wrote: > Hi All, > > I am seeing a strange test break during the integration tests on > heplnv157: > > Run the xboa test ... SKIP: No xboa installed in third party > > ====================================================================== > > FAIL: Test latest geometry downloads, builds and runs > ---------------------------------------------------------------------- > > Traceback (most recent call last): > File > "/var/lib/jenkins/jenkins_root/workspace/MAUS_integration_tests/label/heplnv157/tests/integration/test_utilities/test_geometry/test_latest_geometry.py", > line 68, in test_geometry self.assertTrue(validate_geometry()) > AssertionError: False is not true -------------------- >> begin > captured stdout << --------------------- Validating geometry id 48 > > > It seems to be a geometry issue, but Ryan doesn't believe it is one > of his changes. Chris, could this have anything to do with the xboa > update? > > Thanks, > > Ad ************************************************************ Adam > Dobbs High Energy Physics group, Imperial College Room 514, Blackett > Lab, Prince Consort Rd, London, SW7 2AZ Landline: 02075 947796 > Mobile: 07974 734371 Email: adobbs@imperial.ac.uk > <mailto:a.dobbs07@imperial.ac.uk> Twitter: ajdobbs Web: > http://www3.imperial.ac.uk/people/a.dobbs07 > ************************************************************
Files
Related issues
Updated by Rogers, Chris over 8 years ago
- File test_latest_geometry.log test_latest_geometry.log added
Log of memory size...
Updated by Rogers, Chris over 8 years ago
I made the test just run 3 events (with 5 mm step size). It also logs memory usage. In test now under MAUS_rogers, but there is a queue.
Updated by Rogers, Chris over 8 years ago
- File test_latest_geometry.log test_latest_geometry.log added
Nb: the test passed on my laptop, with memory footprint as per link (second test_latest_geometry.log)
The downside of this patch is that it means we no longer have a geometry validated by the test server automatically. I can fix that by doing e.g. something magic using the MAUS_geometry_download test job.
Updated by Rogers, Chris over 8 years ago
Hum, it still failed. I removed conf3 from the test (which makes plots of the cooling channel geometry) as this seems to be causing the problem. It looks like it is a problem caused by the size of the ROOT objects (just a bunch of TGraphs and histograms, etc), not by the memory foot print of the steps. Not sure I quite believe it, but I can fix that.
So, quick fix goes in the trunk and hopefully gets the release out, then longer fix is to:
- clean up ROOT objects after each plot
- Make a separate test that has a smaller step size, and is driven from MAUS_geometry test server job.
Updated by Rogers, Chris over 8 years ago
I disabled the conf3 in the trunk which was causing problems. Now back in test. I will leave the issue open so I can get the longer term solution sorted (as per details above).
Updated by Rogers, Chris about 8 years ago
- Status changed from Open to Closed
- % Done changed from 0 to 100
Looks like this is okay for now...