Project

General

Profile

Bug #1862

NaN in tracker reco

Added by Rogers, Chris over 4 years ago. Updated almost 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Tracker
Target version:
Start date:
09 August 2016
Due date:
% Done:

100%

Estimated time:
Workflow:
Awaiting Merge

Description

Spotted by Ao... looking at run 8155 MAUS v2.5.1 I get lots of NaN in the scifitrackpoints. E.g.:

  • spill 4 particle_number 3
  • spill 6 particle_number 23
  • etc

track_point.pos() and track_point.mom() both appear to be Nan in station 1. I didn't check the other stations (or space points, etc). Possibly related to #1849, which never got closed.


Files

08155_debug.log (116 KB) 08155_debug.log Rajaram, Durga, 27 August 2016 18:39
MapCppTrackerRecon.cc (19.6 KB) MapCppTrackerRecon.cc Kyberd, Paul, 10 November 2016 19:01

Related issues

Related to MAUS - Bug #1915: Still inf/nan tracks in tracker reconClosedHunt, Christopher23 February 2017

Actions
Related to MAUS - Bug #1849: NaN in tracker reconClosedHunt, Christopher18 May 2016

Actions
#1

Updated by Rogers, Chris over 4 years ago

I checked, the offensive line:

          momentum = sqrt(energy*energy - _muon_mass_sq);

is still in src/common_cpp/Recon/Kalman/MAUSSciFiPropagators.cc line 328.

#2

Updated by Liu, Ao over 4 years ago

Also the XYZ, and px py pz can be very big numbers like in run # 08155

Our imagination can go to infinity but our muons can't.

  1. SpillID EventID TrackerID StationID PlaneID x y z Px Py Pz
    2769 34 1 5 0 -19016941770.2 5994995909.49 16925268.3285 6440998578.57 16583414399.7 18067872.4572
    2769 34 1 5 2 -1.36369711081e+11 48243931437.9 117763556.894 29250301611.2 82786911530.4 87009011.4192
    3654 27 0 5 1 -6305129629.36 3129439765.66 15353760.1413 1857056444.61 3823714203.6 10894942.3321
    3654 27 0 5 2 -15202528541.6 7464526463.44 36737286.708 4529709958.36 9231297429.47 26264890.8252
    4467 38 0 5 1 -225260846.974 113326119.698 566944.222242 66922700.2968 136423038.51 388096.238564
    4467 38 0 5 2 -551463361.284 273868136.43 1356137.65813 166286149.867 334613255.834 950253.853919
    5214 20 0 5 0 -75123402.3906 51344551.9568 242392.692188 31060792.2995 45738145.659 122697.143968
    6633 12 0 5 0 234604.971679 66833900.0064 230845.832242 40399679.4457 319643.888219 32247.1138718
    6633 12 0 5 1 35517.3943792 67214723.6627 232245.505513 40709899.4378 200510.824241 32888.9331182
    6633 12 0 5 2 -165710.29334 67538933.425 233463.054473 40985601.4258 79698.869571 33507.8715531
    11808 6 1 5 2 -176723349.132 -100602975.653 280695.137544 -61090861.1077 107308509.953 4047.79269746
    12129 59 0 5 0 -89876150.1496 40486280.5122 219234.685664 22336893.613 53417001.6347 155107.763026
    12129 59 0 5 1 -309613602.024 136983195.637 713021.152544 79283303.4822 186021820.65 539061.821731
    12129 59 0 5 2 -310578034.421 134893652.488 707025.33294 81820980.103 188411879.465 544740.805683
    13239 6 1 5 2 -4493977325.71 2664709805.87 3187418.97784 1617687406.26 2727203731.61 3584462.21842
#3

Updated by Liu, Ao over 4 years ago

Dear all,

I tried to apply some filters to get rid of the previous NAN numbers, and it turned out to be a good test:

!!!! When pos() and mom() of the scifitrackpoints are normal,
!!!! pos_error() and mom_error() can be NAN sometimes.

Best regards,
Ao

#4

Updated by Rajaram, Durga over 4 years ago

Rogers, Chris wrote:

I checked, the offensive line:

[...]

is still in src/common_cpp/Recon/Kalman/MAUSSciFiPropagators.cc line 328.

Paul, Adam -- can we get this fixed in the trunk?

#5

Updated by Kyberd, Paul over 4 years ago

  • Workflow changed from New Issue to Under Review

Running Maus2.5.0 reconstruction on raw data 08155.000 and the correct cdb geometry under ubuntu 14.04.
Job completes with no errors.
Running Durga's check_scifi.py reveals no Nan's.
Will ask for 8155 to be reprocessed. And check full file.

#6

Updated by Rajaram, Durga over 4 years ago

I am able to reproduce NaNs
  • with the trunk -- but should get the same thing with 2.5.0
  • did
    mkdir geo-08155
    python bin/utilities/download_geometry.py -geometry_download_by "run_number" -geometry_download_run_number 8155 -geometry_download_directory geo-08155
    python bin/analyze_data_offline.py -daq_data_path /mice/data -daq_data_file 08155.000 -simulation_geometry_filename geo-08155/ParentGeometryFile.dat --Number_of_DAQ_Events=16 >& 08155_debug.log
    

I see NaNs in the output -- log attached, snippet below

.....
tp:mom: x: 4.14646 y: 1.00939 z: 66.3132
tp:mom: x: 4.20808 y: 1.09294 z: 67.0911
tp:mom: x: 4.26369 y: 1.17301 z: 67.8686
tp:mom: x: -4.4967 y: -0.454565 z: 68.6824
tp:mom: x: -4.69619 y: -0.562667 z: 69.426
tp:mom: x: -4.89135 y: -0.671098 z: 70.1546
tp:mom: x: nan y: -nan z: nan
tp:mom: x: nan y: -nan z: nan
tp:mom: x: nan y: -nan z: nan
....
-truncated-
#7

Updated by Rajaram, Durga over 4 years ago

The debug output I just quoted was from inserting a print statement in

src/common_cpp/Recon/Kalman/MAUSTrackWrapper.cc
after line:387

386       new_point->set_pos(pos);
387       new_point->set_mom(mom);
388       std::cerr << "tp:mom: x: " << mom.x() << " y: " << mom.y() << " z: " << mom.z() << std::endl;
#8

Updated by Kyberd, Paul over 4 years ago

The error seems to have its origin in the momentum state at line 328 of MAUSSciFiPropagators, but I am not sure why.
Solution is to look at the values of the tracks in MapCppTrackerRecon.cc and if any of the values of momentum or position are NaN not to add them to the event.
Doing longer check of effectiveness of fix.

#9

Updated by Rajaram, Durga over 4 years ago

This does get rid of NaNs in the output [ I only checked the trackpoint->mom() -- I haven't checked pos() yet.

However -- a few things:

  • MapCppTrackerRecon is, I believe, now deprecated. I note that Adam has refactored the tracker reco modules in the latest trunk. So, this fix will have to go into the appropriate module TrackFit(?) [ Is this correct, Adam? ]
  • Isn't the more robust fix to just not calculate a square root of a negative quantity in the first place?
    • i.e. why not just trap it at the source and throw an exception or handle it somehow before NaNs propagate?
      src/common_cpp/Recon/Kalman/MAUSSciFiPropagators.cc :: line 328 
#10

Updated by Kyberd, Paul over 4 years ago

  • Workflow changed from Under Review to Awaiting Merge

I check both position and momentum so they should both be OK.
If Adam says what replaces MapCppTrackerRecon.cc I can move the code there.
What to do about the Nan? That depends on what is causing it. Trying to work out the source of the problem
is rather hard, given the lack of code comments. The Nan comes from data which is passed into the routine and
not from some operation during the execution of the routine. Throwing an exception is an option, but that still means
catching the exception and making a suitable decision - again slightly difficult to do since the exception
architecture of MAUS is not obviously documented in the maus user guide. If anyone could point me at some
documentation I will see if I can see a better solution via an exception..

I don't like simply throwing out these events without correcting the problem or understanding their source
and possibly logging the failures. If someone would point me at the error logging for maus I will add something

I agree the solution is not completely satisfactory, but any alternative will take rather more work to implement
and since my teaching is restarting, will take an unpredictably long time to complete.

#11

Updated by Dobbs, Adam over 4 years ago

Some comments:

  • MapCppTrackerRecon is indeed now deprecated, the replacements are:
    • MapCppTrackerClusterRecon
    • MapCppTrackerSpacePointRecon
    • MapCppTrackerPatternRecognition
    • MapCppTrackerTrackFit - the fix should go in here.
  • For the solution:

"Isn't the more robust fix to just not calculate a square root of a negative quantity in the first place?"

I agree.

"The Nan comes from data which is passed into the routine and not from some operation during the execution of the routine."

You mean the routine gets handed negative momentum values? If so I think it probably is reasonable to fix the code in MAUSSciFiPropagators.cc to deal with them in a more useful way e.g. an exception leading to the track being rejected when it is caught, in the first instance. Obviously after that we want to know why we are seeing these values in the first place, but it is ok if that takes longer to figure out.
  • The exception architecture wasn't written by me and I agree is not well documented, so I will leave others to comment (practical advice: find other bits of code that already use it and copy)
  • Error logging: ditto
#12

Updated by Rajaram, Durga over 4 years ago

Any update on when we'll get the fix for this into MAUS?

#13

Updated by Kyberd, Paul over 4 years ago

At present I am having trouble duplicating the error - and am having trouble finding much time
with teaching commitments. They decrease this week and I will put more time into the problem.
No estimate

#14

Updated by Kyberd, Paul over 4 years ago

Modified MapCppTrackerRecon.cc so if any values in a track are nan, the track is not added to the event.
Checked using Durga's check_scifi.py - and all seems OK. If Durga has time to run the recon and he and/or Ao
can see that this does fix the problem I will push it for the next release

#15

Updated by Rajaram, Durga over 4 years ago

Paul -- sorry about the delay.
I can't test this though -- MapCppTrackerRecon has been deprecated since 2.5 or so, so the fix has to go into the appropriate
replacement module (see #note-11)

#16

Updated by Dobbs, Adam almost 4 years ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100

Should be fixed in 2.9.0.

Also available in: Atom PDF