Project

General

Profile

Feature #1993

Runs with poor Tracker track-finding efficiency

Added by Nebrensky, Henry over 3 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
Normal
Start date:
25 November 2019
Due date:
30 December 2020
% Done:

100%

Estimated time:

Description

Paolo's system performance talk at CM55 found a tiny number of runs with lower-than-usual track-finding efficiency (96% instead of 98%).

> Do you have the run mumbers for those outliers in the Tracker plot? I can add trawling the eLog to my list of things to do in quiet moments.

a couple of outliners in field on downstream tracks

8549
8695

few other in straight tracks

10162
10161
10178

in general between 17/11/2017 and 20/11/2017 there are relatively low efficiencies in straight tracks. Also averaging the helical ones there is a general negative trend,

cheers,

Paolo

I volunteered to poke around in eLog, etc. and see if there's a possible hardware reason.


Files

MICEtracker-historical_analysis_plot.pdf (67.6 KB) MICEtracker-historical_analysis_plot.pdf Tracker historical analysis plot Nebrensky, Henry, 26 November 2019 09:48
10161.pdf (25.8 KB) 10161.pdf Events per spill Franchini, Paolo, 12 December 2019 15:15
10162.pdf (36.5 KB) 10162.pdf Events per spill Franchini, Paolo, 12 December 2019 15:15
#3

Updated by Nebrensky, Henry over 3 years ago

  • Description updated (diff)
  • Due date deleted (31 December 2019)
#4

Updated by Nebrensky, Henry over 3 years ago

Run Number: 8549

Run Type: Special Run
Start Date: 2016-11-21 18:28:27.193332
End Date: 2016-11-21 20:15:43.256838
Start Notes: More on H1c
End Notes: Piecing out
ToF1 Triggers 126623
ToF2 Triggers 22007

Shift summary: 21/11/16 - 16:00-00:00 (210)

18:23: Taking more data on the standard 10-140+M3-Test1, task H1c (run 8548)
--> ISIS went down directly, bad run

18:28: Resume task H1c (run 8549)
--> Run completed, 22007 TOF2 triggers

Looks like it was straightforward at the time, but earlier in day there'd been concern about the tracker - Shift summary 209 ~10 a.m. had

AFE Heater TKD Cryostat 1 L-cassete R-board heater 1 keeps dropping to 10 counts setting off warning - silence current

and eLog entry 4215 (VLSB unpacking, 11:49) and 4216 (Tracker channels); and then later more AFE alarms:

Paper logbook 1 p.118:

21 Nov 20:30 - Cryo 3 L-cassette heater 1 dropped below 5 counts
21 Nov 23:25 - Cryo 2.3[sic] L-cassette temperature alarm
22 Nov. morning: one of the AFE heaters is going over 9.07 K

but cryos 1 & 2 are upstream, not downstream. Ed was MOM so I'd expect the Tracker should have been behaving.

This one's quite the mystery...

#5

Updated by Nebrensky, Henry over 3 years ago

Run Number: 8695

Run Type: Special Run
Start Date: 2016-12-03 08:40:16.774454
End Date: 2016-12-03 08:43:48.198914
Start Notes: 3-140 H4a
End Notes: 3-140 H4a
ToF1 Triggers 206
ToF2 Triggers 121

Shift summary: 3 Dec 2016 - 8 am - 4 pm (232)

- Target and beam loss information is not being logged automatically. For now, target depth = 35.45 mm, target user delay = 10.80 ms
- SSD vacuum: after a very stable morning, the pressure has begun (as of ~14:00) to oscillate slightly around 1.6e-7 torr. Peak pressure is still below the 5e-7 torr threshold for calling MOM.

...

08:39 - Begin run 8695. Consulting with MOM re: beam loss. Stopped run 8695 almost immediately.

Nothing in main eLog. Paper logbook 1 p.141:

6:39-6:45 Alarm on Cryostat 2 R-cassette R-board heater 1
No beam-loss numbers coming through into EPICS/striptool, [hence the shifters consulting MOM].

Cryo 2 would be upstream, not downstream. Melissa was MOM so Tracker should have been behaving.
Is this just low stats? We're talking about O(3 vs 2) ropey tracks.

#6

Updated by Nebrensky, Henry over 3 years ago

Run Number: 10160

Run Type: Special Run
Start Date: 2017-11-18 20:55:28.836129
End Date: 2017-11-18 21:00:40.786841
Start Notes: start s15d
End Notes: DAQ errors and weird onmon output
ToF1 Triggers 45371
ToF2 Triggers 2552

Run Number: 10161

Run Type: Special Run
Start Date: 2017-11-18 21:02:29.893087
End Date: 2017-11-18 21:28:46.969404
Start Notes: start s15d
End Notes: end run - possible bad DAQ
ToF1 Triggers 260279
ToF2 Triggers 14581

Run Number: 10162

Run Type: Special Run
Start Date: 2017-11-18 21:32:41.137724
End Date: 2017-11-18 22:19:12.764422
Start Notes: start s15d
End Notes: stop s15d
ToF1 Triggers 414720
ToF2 Triggers 23474

Shift summary 451 :

19:49 Start run 10159, task S15d
19:51 Very high trigger rate at this setting so increase BCD to 37.25 mm.
20:50 Stop run 10159, 360k triggers
20:55 Start run 10160, task S15d
21:00 Stop run 10160 because of strange OnMon display: VLSB ADC, half empty.
      21k triggers
      Alarm from error in file compactor for run 10159
21:02 Start run 10161, task S15d
21:15 OnMon and OnRec displays both look weird. Have rung Durga and he says to carry
      on taking data while he investigates.
21:28 Durga rings to say he has restarted OnRec. Current run may have DAQ errors.
      At same time OnMon display comes back as normal.
      Stop run 10161, 119k triggers.
21:32 Start run 10162, task S15d
21:40 Still getting OnRec errors, display empty, and OnMon plots look strange again. 
22:19 ISIS off, stop run 10162, 188k triggers
      Durga is here and is restarting DATE. He has done a pulser run and everything
      now looks OK.
22:35 Start run 10164, task S15b

and also main eLog 21:11 5291

For run 10160, onmon was showing histograms below. Note that half the tracker readout appears to be missing. We stopped and restarted the run and it might have cured itself... note that file compactor failed on this run also. Now waiting for a reasonable statistic to come through for 10161...

and 23:28 5292

Durga came in and resolved the issue. Looks like somehow DATE had lost contact with miceraid5; so that onrec and onmon could no longer see the output. Following DATE restart, everything looks okay again. The offline reconstruction seems okay and data seems to be unaffected.

Paper logbook 2 p.164:

20:20 TKU Cryo 1 upper alarm
21:00 start 10160: DAQ errors and file compactor error. Start 10161
21:20 OnMon/OnRec plots empty. Called Durga, who advises restarting run
22:15 Durga restarts DATE.

#7

Updated by Nebrensky, Henry over 3 years ago

Run Number: 10177

Run Type: Special Run
Start Date: 2017-11-19 21:14:52.586175
End Date: 2017-11-19 21:30:12.502278
Start Notes: start s15d after restart DAQ
End Notes: end s15d - suspect daq issues
ToF1 Triggers 146512
ToF2 Triggers 8313

Run Number: 10178

Run Type: Special Run
Start Date: 2017-11-19 21:42:20.809091
End Date: 2017-11-19 22:12:20.704985
Start Notes: start 15d - still worried about daq
End Notes: stop 15d - isis down
ToF1 Triggers 275527
ToF2 Triggers 15524

Run Number: 10179

Run Type: Pulser Data
Start Date: 2017-11-19 22:16:29.084323
End Date: 2017-11-19 22:18:27.501915
Start Notes: test pulser
End Notes: test pulser ok
ToF1 Triggers 2823
ToF2 Triggers 204

Shift summaries 453

Occasional Alarms
...
1529 TKD Major alarm, excessive He!
...
Regular Alarms
Night shift advise temperature on Tracker Cryo 1 Upper is creeping up, currently Yellow and silenced
later it wanders slowly back down and wanders back up
again- hand over with yellow alarm silenced

and 454

Run: 10175 - CDB checked yes
 29818 triggers on s15d - 30k acc./75k req.; 86 triggers/spill (36.95 mm BCD -> 3 V loss)
As we are getting fewer than half of the requested triggers, we have restarted the run with the target one notch
out.

Run: 10176 - CDB checked yes
 47394 triggers on s15d - 47k acc./105k req.; 95 triggers/spill (37.25 mm BCD -> 2.3 V loss)
so we are getting more triggers per spill, with less loss.
Unfortunately we somehow caused DATE to stop sending the data to OnMon and OnRec, and had to restart DATE.

Run: 10177 - CDB checked yes
 65751 triggers on s15d -  66k acc./147k req.; 91 triggers/spill (37.25 mm BCD -> 2.3 V loss)
The DAQ is not actually reading out every spill - it is recording 148 triggers/spill for its accepted spills.

We have called Durga to take a look.
Note the high rate of reported trigger requests (Run10178.png).

Run: 10178 - CDB checked yes - Test run
Run: 10179 - CDB checked yes - Test pulser run
Run: 10180 - CDB checked yes - Test run, changing dip depth

It looks like the GDC can get overwhelmed by the number of accepted triggers it has to deal with - the threshold
seems to be in the 185-190 / spill region.
After this it can't finish everything (Evt Building, writing to disk, writing to EPICS etc.) before the next spill.
Or another way, if the accepted/requested Trigger ratio (= live time) is worse than 50% then it will conk out.

The data that actually gets recorded is expected to be OK.

Run: 10180 - CDB checked 
498154 triggers on s15d -  498k acc./905k req.; 119 triggers/spill (37.85 mm BCD -> 1.5 V loss)

main eLog 5293 21:16

We saw the same issue again tonight. Durga reminded me how to restart DATE via the phone.

Separately there's a specific concern about about heater counts, 23:38 5295 and 23:40 5296 , but is regarding TKU not TKD.

Paper logbook 2 p.166:

15:29 TKD major alarm - too much Helium! Looks like garbage readback
19:30 TKU comms error on Cryo 2 temperatures
21:10 DAQ stopped communicating with OnRec and OnMon. Restarted DAQ.
23:00 Durga left after fixing DAQ. We were overwhelming the DAQ with too much data.

Note to self: still need to check when the (DAQ) issue starts... is 17th still with helicals? A fault/change in the DAQ or purely rate causing this?
Why does losing spills affect the recon efficiency, and only in TKD? (Is it really getting all the data, or does the next spill start cause truncation? What empties the LDC buffers?) Also, I suppose, should check if the poor recon is there across all spills in the run, or just averaging out a 100% record with a few spills of complete garbage 0% at the end as the DAQ falls over.
Cross-reference with PPS logbook - controlled entries for the Helium flush?

DAQ crate videos from 19th Nov. 2017 (in the MICE photo album):
http://mice.iit.edu/mico/jpg/hn/20171101/MICEHall1001.AVI
http://mice.iit.edu/mico/jpg/hn/20171101/MICEHall1002.AVI

#8

Updated by Nebrensky, Henry over 3 years ago

Also averaging the helical ones there is a general negative trend,

I think that is visible by eye across MICEtracker-historical_analysis_plot.pdf, and - interestingly - is sort of continuous across the Tracker warm-up mid-2017.
Is the degradation there in the straight tracks? It looks like it might need number-crunching to pull it out because of how close all those points are on top of each other.

#9

Updated by Franchini, Paolo over 3 years ago

Just to clarify. I run on reconstructed events and basically the code loops over the spills selecting events that pass a set of cuts (included a simple muons PID), counting the events that have at least one straight or one helical track reconstructed.
Plotting the events per spill of a couple of runs with very high rate we can see that there is not any drop at the end of the run and the spills with zero events are uniformly distributed across the run.

Of course different beams and cooling channels setting have been considered so all the considerations might be biased by the "difficulty" of reconstructing a specific beam in a specific field with a specific absorber (e.g. different scattering, different performance of the reconstruction across the different beamlines, etcetera). This is convoluted with a degradation of the Tracker performances.

A more detailed study of "DAQ vs reconstruction" should consider the reconstruction process rather than the reconstructed quantities.

#10

Updated by Nebrensky, Henry over 2 years ago

  • % Done changed from 0 to 100
  • Status changed from Open to Closed
  • Due date set to 30 December 2020

I think this has come to a natural halt.

Also available in: Atom PDF