Issues arising from Cycle 2017/01

Plucked from the eLog.
Issues already fixed are in italics.


During (28 May - 1 June run)

  • Massive number of bogus alarms (e.g. on a PV that is inValid):
    • "Tracker 1 Cryostat - vacuum gauge - alarm as reading is inValid"
    • "I have disabled the "QPS Signals", "QPS Interlocks" and "Interlocks" trees on the SSU Alarm Handler, to match the SSD Alarm Handler State."
    • "I've also disabled the alarm on the Neutron Monitor - which is no longer connected so gives a permanent error."
    • "AC5 status disabled in ENV alarm handler"
    • "Henry has disabled the following in the alarm handler in consultation with MOM-Ed:
      • TKU: GHe, Cryo1 vacuum
      • TKD: GHe, Cryo1/2 (really 3/4) vacuum, Cryo2-LCRB-temp2 (actually Cryo4LC-temp2), Cryo2-LCRB-AFEheater2 (also mislabelled)"
    • "I've disabled the Q4 and Q7 MPS water flows in the Alarm Handler because of the false alarms. These are instantaneous readings of zero instead of 9 l/min, which is not physical" see below
    • FCD Load Cells - see below
    • What is the plan for SSU CC1 and 2? see below
  • "On Run Status, the Integrated TOF1 (requested) Triggers and Requested Triggers numbers differ by 2: 76810 vs 76812."
    • On various runs we've seen differences of up to 10 triggers. Why?
  • "SSU Alarm Handler DISABLED in all channels"
  • "U/S Tracker - LED system isn't working; possibly a switch got flipped."
    • Need better support for data fibres
    • Shifters not aware that the Tracker was missing from the readout. Problem was visible in the OnMon display, but there are no reference plots available.
  • "Run Control is hung up; restarted in separate terminal"
  • "Ed has checked the Tracker Helium and thinks it may have run out...
    Ed & Ajit have fixed the Tracker flush readback - this ended up with a Tracker IOC restart (TrackerFlush-20170529-1445.png)."
    • Tracker volume flush read-back broken
  • "We had an OnMon error about 50 spills in, showing a "SERIOUS" exception, so restarted the run."
    • "On 9356 we got an offset of 1 between TOF1 triggers and Requested Triggers already somewhen in the first 26 spills. There was NOT a trigger mismatch error from OnMon."
      • Need guidance and how-to on what to look for in OnMon and DATE, including logs/outputs
  • "Will also check ... reported water leak in trench. "
    "About 13:00 Ajit restarted the Environment IOC - this should have got the air temperature and other sensors going."
    • SecurityProbe system not being read out by the IOC - needs some sort of heartbeat alarm
  • "Noticed power supply for EMR Single Anode PMT # 26 was tripped off and probably had been for a while."
    "Noticed also that in the Detectors ALH the entire EMR tree is Disabled, although Ed did put the Detectors into Running yesterday."
    • EMR SAPMT down on or before 10 May
    • Not clear when the ALH EMR tree was turned off. Who is responsible for checking the Alarm Handlers over before running: experts or C&M?
    • MOM checklist should include checking that the State Machines are in the appropriate states. (which states are those?)
  • "EPICS reports that A/C unit 2 - which I believe is actually the reading from unit 3 - is reporting 26.3°C suggesting that the East end of the Hall is getting warm"
    • We need to understand the Hall temperatures: numbers presently range from 22 to 27 °C
  • "Roof water temperature caused a red alarm, upper limit. - Returned to normal before a cause to be found. Assumed to be a measurement fluctuation."
    • We see glitches in the roof water PV
  • "I've disabled the Q4 and Q7 MPS water flows in the Alarm Handler because of the false alarms ... I am suspicious of the fact that we only get these false alarms on those two supplies"
  • "We have selected "LH2 Empty" in Run Control: it STILL wrote "LH2" into the CDB! CDB viewer's reporting "LH2" as the material and "Empty" as the shape which I think is the wrong way round - not sure if this is the viewer or Run Control."
  • "Controlled Entry for Josef, Mark and Colin to inspect the LH2 and FC for the quench test. They have re-connected the FC load cells so the FC alarm handler is now clear as well."
    • How should the load cells have been disconnected?
    • If equipment is changed such that it generates a continual bogus alarm, system experts should be held responsible for removing it from the Alarm Handler
  • "I just realised that one reason the Alarm Handler is such a pain to use is that the software is incapable of scrolling and making a beep noise at the same time!"
  • "We have pulled out the target one notch to 35.75 mm BCD, as the losses are now much spikier since the work ISIS did this morning"
    • Raised with ISIS...
  • "There was a major alarm from DATE at about 1360 spills/ 83k triggers, but by the time we found it we couldn't see any cause or error message. No sign of a trip in the beamloss strip chart. The transfer of the ISIS beamloss numbers from ISIS into EPICS is still very unpredictable - sometimes we get the values updated on 2 successive spills, sometimes the values are stay frozen for 9 or 10 spills."
  • "the limits on SSU CC3 High-Side Pressure have been adjusted to get rid of the frequent alarms we were getting a couple of days ago. (now just to get CC1 and 2 sorted out)."
    • What is the plan for SSU CC1 and 2?
  • "Decay solenoid supply pressure PI21 - fluctuated back to normal"
    • DS needs gas cleaning and TLC
  • "Focus coil liquid He level is back to 93% readout ( had been at 0 for about an hour)"
    • The infamous FC level gauge issue...
  • "Noticed there was a step-change in the FC insulating vacuum about an hour ago"
    • FC insulating vacuum burps saga
  • Repeating alarms need thresholds adjusting putting them on a strip-chart makes it easier to see if there's a real problem
    • "SSU CC3 High-Side Pressure"
    • DS PSU voltage ?
    • "Q4 and Q7 are showing a consistent warning on the MPS water flow" #1880
    • "alarm from Q2 deltaT" #1880
    • "Q8 reports temperature change warning alarms." #1880
    • "Q8 has a warning on the voltage" #1880
    • "Supply air pressure reach 6.8 Bar before returning to normal."
    • "Alarms in beam line: quad delta T touches limit" : #1880
  • "discovered during this that the overtemperature alarms had been LEFT SILENCED at some point, without this being passed on to the next shift."
    Shifters silencing alarms rather than acknowledging, and/or not expanding the tree to identify the individual alarm
    • e.g. If you silence the Proton Absorber Status because it alarms when changing Beamline settings, how will you know if it droops open during the run (c.f. December)?
    • Status of Shift Operation Checklist?
  • "Heartbeat alarms for miceecserv1 and miceraid 5. The help pop-ups flash up an illegible error message."
    • In Alarm Handlers, we need working pop-ups ...
    • Changed during run?
  • "Check Nagios:
    miceraid5: DISK CRITICAL - free space: /data 150728 MB (9% inode=99%):"
  • "After acknowledging the miceraid5 alarm, the PV remains shown as red, but the alarm keeps returning. This seems to be because the Heartbeat re-sets the status to zero before making the next test, which fails again. Probably, this is Wrong as it operates differently to the majority of other alarms."
    • Is the Heartbeat test operation correct?
  • "temperature warnings from the CS1 VME crate."
    • Need to clarify temperatures of crates - are they really warmer than expected, else tweak limits
  • "MOM/DC Configuration Checklist" omitted
    • MOM checklist?


  • Beamstop Restraint possibly has stripped bolt
  • Tweak MPS temperatures difference limits #1880
  • Target laser saga, and control PC / R78
  • Question from ISIS about the MICE 3ms spill gate - why not 1 ms?
    • Possible balance between inefficiency from dead-time vs. activation of ring for no data
    • How are the good muons distributed within the MICE spill?

