Bug #1862
NaN in tracker reco
100%
Description
Spotted by Ao... looking at run 8155 MAUS v2.5.1 I get lots of NaN in the scifitrackpoints. E.g.:
- spill 4 particle_number 3
- spill 6 particle_number 23
- etc
track_point.pos() and track_point.mom() both appear to be Nan in station 1. I didn't check the other stations (or space points, etc). Possibly related to #1849, which never got closed.
Files
Related issues
Updated by Rogers, Chris about 7 years ago
I checked, the offensive line:
momentum = sqrt(energy*energy - _muon_mass_sq);
is still in src/common_cpp/Recon/Kalman/MAUSSciFiPropagators.cc
line 328.
Updated by Liu, Ao about 7 years ago
Also the XYZ, and px py pz can be very big numbers like in run # 08155
Our imagination can go to infinity but our muons can't.
- SpillID EventID TrackerID StationID PlaneID x y z Px Py Pz
2769 34 1 5 0 -19016941770.2 5994995909.49 16925268.3285 6440998578.57 16583414399.7 18067872.4572
2769 34 1 5 2 -1.36369711081e+11 48243931437.9 117763556.894 29250301611.2 82786911530.4 87009011.4192
3654 27 0 5 1 -6305129629.36 3129439765.66 15353760.1413 1857056444.61 3823714203.6 10894942.3321
3654 27 0 5 2 -15202528541.6 7464526463.44 36737286.708 4529709958.36 9231297429.47 26264890.8252
4467 38 0 5 1 -225260846.974 113326119.698 566944.222242 66922700.2968 136423038.51 388096.238564
4467 38 0 5 2 -551463361.284 273868136.43 1356137.65813 166286149.867 334613255.834 950253.853919
5214 20 0 5 0 -75123402.3906 51344551.9568 242392.692188 31060792.2995 45738145.659 122697.143968
6633 12 0 5 0 234604.971679 66833900.0064 230845.832242 40399679.4457 319643.888219 32247.1138718
6633 12 0 5 1 35517.3943792 67214723.6627 232245.505513 40709899.4378 200510.824241 32888.9331182
6633 12 0 5 2 -165710.29334 67538933.425 233463.054473 40985601.4258 79698.869571 33507.8715531
11808 6 1 5 2 -176723349.132 -100602975.653 280695.137544 -61090861.1077 107308509.953 4047.79269746
12129 59 0 5 0 -89876150.1496 40486280.5122 219234.685664 22336893.613 53417001.6347 155107.763026
12129 59 0 5 1 -309613602.024 136983195.637 713021.152544 79283303.4822 186021820.65 539061.821731
12129 59 0 5 2 -310578034.421 134893652.488 707025.33294 81820980.103 188411879.465 544740.805683
13239 6 1 5 2 -4493977325.71 2664709805.87 3187418.97784 1617687406.26 2727203731.61 3584462.21842
Updated by Liu, Ao about 7 years ago
Dear all,
I tried to apply some filters to get rid of the previous NAN numbers, and it turned out to be a good test:
!!!! When pos() and mom() of the scifitrackpoints are normal,
!!!! pos_error() and mom_error() can be NAN sometimes.
Best regards,
Ao
Updated by Rajaram, Durga about 7 years ago
Rogers, Chris wrote:
I checked, the offensive line:
[...]
is still in
src/common_cpp/Recon/Kalman/MAUSSciFiPropagators.cc
line 328.
Paul, Adam -- can we get this fixed in the trunk?
Updated by Kyberd, Paul about 7 years ago
- Workflow changed from New Issue to Under Review
Running Maus2.5.0 reconstruction on raw data 08155.000 and the correct cdb geometry under ubuntu 14.04.
Job completes with no errors.
Running Durga's check_scifi.py reveals no Nan's.
Will ask for 8155 to be reprocessed. And check full file.
Updated by Rajaram, Durga about 7 years ago
- File 08155_debug.log 08155_debug.log added
- with the trunk -- but should get the same thing with 2.5.0
- did
mkdir geo-08155 python bin/utilities/download_geometry.py -geometry_download_by "run_number" -geometry_download_run_number 8155 -geometry_download_directory geo-08155 python bin/analyze_data_offline.py -daq_data_path /mice/data -daq_data_file 08155.000 -simulation_geometry_filename geo-08155/ParentGeometryFile.dat --Number_of_DAQ_Events=16 >& 08155_debug.log
I see NaNs in the output -- log attached, snippet below
..... tp:mom: x: 4.14646 y: 1.00939 z: 66.3132 tp:mom: x: 4.20808 y: 1.09294 z: 67.0911 tp:mom: x: 4.26369 y: 1.17301 z: 67.8686 tp:mom: x: -4.4967 y: -0.454565 z: 68.6824 tp:mom: x: -4.69619 y: -0.562667 z: 69.426 tp:mom: x: -4.89135 y: -0.671098 z: 70.1546 tp:mom: x: nan y: -nan z: nan tp:mom: x: nan y: -nan z: nan tp:mom: x: nan y: -nan z: nan .... -truncated-
Updated by Rajaram, Durga about 7 years ago
The debug output I just quoted was from inserting a print statement in
src/common_cpp/Recon/Kalman/MAUSTrackWrapper.ccafter line:387
386 new_point->set_pos(pos); 387 new_point->set_mom(mom); 388 std::cerr << "tp:mom: x: " << mom.x() << " y: " << mom.y() << " z: " << mom.z() << std::endl;
Updated by Kyberd, Paul about 7 years ago
The error seems to have its origin in the momentum state at line 328 of MAUSSciFiPropagators, but I am not sure why.
Solution is to look at the values of the tracks in MapCppTrackerRecon.cc and if any of the values of momentum or position are NaN not to add them to the event.
Doing longer check of effectiveness of fix.
Updated by Rajaram, Durga about 7 years ago
This does get rid of NaNs in the output [ I only checked the trackpoint->mom() -- I haven't checked pos() yet.
However -- a few things:
- MapCppTrackerRecon is, I believe, now deprecated. I note that Adam has refactored the tracker reco modules in the latest trunk. So, this fix will have to go into the appropriate module TrackFit(?) [ Is this correct, Adam? ]
- Isn't the more robust fix to just not calculate a square root of a negative quantity in the first place?
- i.e. why not just trap it at the source and throw an exception or handle it somehow before NaNs propagate?
src/common_cpp/Recon/Kalman/MAUSSciFiPropagators.cc :: line 328
- i.e. why not just trap it at the source and throw an exception or handle it somehow before NaNs propagate?
Updated by Kyberd, Paul about 7 years ago
- Workflow changed from Under Review to Awaiting Merge
I check both position and momentum so they should both be OK.
If Adam says what replaces MapCppTrackerRecon.cc I can move the code there.
What to do about the Nan? That depends on what is causing it. Trying to work out the source of the problem
is rather hard, given the lack of code comments. The Nan comes from data which is passed into the routine and
not from some operation during the execution of the routine. Throwing an exception is an option, but that still means
catching the exception and making a suitable decision - again slightly difficult to do since the exception
architecture of MAUS is not obviously documented in the maus user guide. If anyone could point me at some
documentation I will see if I can see a better solution via an exception..
I don't like simply throwing out these events without correcting the problem or understanding their source
and possibly logging the failures. If someone would point me at the error logging for maus I will add something
I agree the solution is not completely satisfactory, but any alternative will take rather more work to implement
and since my teaching is restarting, will take an unpredictably long time to complete.
Updated by Dobbs, Adam about 7 years ago
Some comments:
- MapCppTrackerRecon is indeed now deprecated, the replacements are:
- MapCppTrackerClusterRecon
- MapCppTrackerSpacePointRecon
- MapCppTrackerPatternRecognition
- MapCppTrackerTrackFit - the fix should go in here.
- For the solution:
"Isn't the more robust fix to just not calculate a square root of a negative quantity in the first place?"
I agree.
You mean the routine gets handed negative momentum values? If so I think it probably is reasonable to fix the code in MAUSSciFiPropagators.cc to deal with them in a more useful way e.g. an exception leading to the track being rejected when it is caught, in the first instance. Obviously after that we want to know why we are seeing these values in the first place, but it is ok if that takes longer to figure out."The Nan comes from data which is passed into the routine and not from some operation during the execution of the routine."
- The exception architecture wasn't written by me and I agree is not well documented, so I will leave others to comment (practical advice: find other bits of code that already use it and copy)
- Error logging: ditto
Updated by Rajaram, Durga almost 7 years ago
Any update on when we'll get the fix for this into MAUS?
Updated by Kyberd, Paul almost 7 years ago
At present I am having trouble duplicating the error - and am having trouble finding much time
with teaching commitments. They decrease this week and I will put more time into the problem.
No estimate
Updated by Kyberd, Paul almost 7 years ago
- File MapCppTrackerRecon.cc MapCppTrackerRecon.cc added
Modified MapCppTrackerRecon.cc so if any values in a track are nan, the track is not added to the event.
Checked using Durga's check_scifi.py - and all seems OK. If Durga has time to run the recon and he and/or Ao
can see that this does fix the problem I will push it for the next release
Updated by Rajaram, Durga almost 7 years ago
Paul -- sorry about the delay.
I can't test this though -- MapCppTrackerRecon has been deprecated since 2.5 or so, so the fix has to go into the appropriate
replacement module (see #note-11)
Updated by Dobbs, Adam over 6 years ago
- Status changed from Open to Closed
- % Done changed from 0 to 100
Should be fixed in 2.9.0.