Additional issues arising from Cycle 2017/01¶
Plucked from the eLog.
Issues already fixed are in italics.
Actions/assignments are in bold.
A/C Unit 3 alarm What do we do when John's away - P Masterton, ISIS(need to chase units 3 and 4 - Colin?)
- "On Run Status, the Integrated TOF1 (requested) Triggers and Requested Triggers numbers differ by 2: 76810 vs 76812."
- On various runs we've seen differences of up to 10 triggers. Why? Need to check number of reported integrated trips matches number of triggers in file: Action DR
- "U/S Tracker - LED system isn't working; possibly a switch got flipped." Check during run-up: MUchida and PKyberd
- Tracker volume flush read-back broken -
add to checklist for run-up : Action for SBStart Helium flush next week; check read-back in test runs: MUchida
- MOM checklist should include checking that the State Machines are in the appropriate states. (which states are those?)
AKurup will write a procedure on how to restart SM's, ALH and then mask out known false alarmsAction AKurup to produce a list of the states appropriate for running for MOM to check ; MOM to work through it at start of running ; Action SBoyd to modify MOM checklist
- "EPICS reports that A/C unit 2 - which I believe is actually the reading from unit 3 - is reporting 26.3°C suggesting that the East end of the Hall is getting warm"
- We need to understand the Hall temperatures: numbers presently range from 22 to 27 °C: Range of 5°C is not huge. Check with D.C. which are the sensitive items. Sensitive equipment should have local monitoring anyway.
"We have selected "LH2 Empty" in Run Control: it STILL wrote "LH2" into the CDB! CDB viewer's reporting "LH2" as the material and "Empty" as the shape which I think is the wrong way round - not sure if this is the viewer or Run Control."
Need to check CDB viewer displays correctly (DR/JM) - it's correct ; Need to update for Neon (which is what we actually had) - stands DR
- "discovered during this that the overtemperature alarms had been LEFT SILENCED at some point, without this being passed on to the next shift."
Shifters silencing alarms rather than acknowledging, and/or not expanding the tree to identify the individual alarm MOM/SteveBoyd
- e.g. If you silence the Proton Absorber Status because it alarms when changing Beamline settings, how will you know if it droops open during the run (c.f. December)?
- Status of Shift Operation Checklist? (SBoyd) Improve via the checklist: Action on SB to add alarm check to shift changeover checklist
- "MOM/DC Configuration Checklist" omitted add to MOM/DC handover Action SB
Controls and Monitoring¶
We noticed today that the TKU, TKD, SSU, SSD and FC Alarm Handlers were all running but disabled 'D'
This should now be fixed: State Machines and SM IOC's restarted; ALH Groups have masking "Force PV's" which have been re-set. Action AK. Calibration of compressed air pressure sensors - done Readouts from A/C units 2 and 3 swapped
- Massive number of bogus alarms (e.g. on a PV that is inValid):
- "Tracker 1 Cryostat - vacuum gauge - alarm as reading is inValid" - Tracker alarms need to be re-enabled; MUchida to notify DC when ready, probably next week.
- "I have disabled the "QPS Signals", "QPS Interlocks" and "Interlocks" trees on the SSU Alarm Handler, to match the SSD Alarm Handler State." - should be covered in the SM fix. AK needs to look at removing the alarms on the QPS; SSU should generally match SSD; fix via spreadsheets.
"I've also disabled the alarm on the Neutron Monitor - which is no longer connected so gives a permanent error." - Taken out completely "AC5 status disabled in ENV alarm handler" - Unit will not be networked, PVs to be removed from spreadsheet "Henry has disabled the following in the alarm handler in consultation with MOM-Ed: TKU: GHe, Cryo1 vacuum TKD: GHe, Cryo1/2 (really 3/4) vacuum, Cryo2-LCRB-temp2 (actually Cryo4LC-temp2), Cryo2-LCRB-AFEheater2 (also mislabelled)"
GHe monitoring boxes restarted; vacuum gauges replaced and the readout isn't working (AK/CMw); temp and AFE: need to check again; mislabelling fixed
- "I've disabled the Q4 and Q7 MPS water flows in the Alarm Handler because of the false alarms ... I am suspicious of the fact that we only get these false alarms on those two supplies" - Comes from the readback glitches - Time filter settings have been re-applied and should prevent the alarms HN to check present alarm status
FCD Load Cells - see below - HW disconnect - not a C&M thing What is the plan for SSU CC1 and 2? see below - Need to check cable - comms issue between compressor and IOC
"the limits on SSU CC3 High-Side Pressure have been adjusted to get rid of the frequent alarms we were getting a couple of days ago. (now just to get CC1 and 2 sorted out)." What is the plan for SSU CC1 and 2? - Need to check cable - comms issue between compressor and IOC
"SSU Alarm Handler DISABLED in all channels" - should be covered in the SM fix "Run Control is hung up; restarted in separate terminal" - Run Control has long-standing deep-seated issues: Unfixable. Use it slowly.
- "Will also check ... reported water leak in trench. "
"About 13:00 Ajit restarted the Environment IOC - this should have got the air temperature and other sensors going."
- SecurityProbe system not being read out by the IOC - needs some sort of heartbeat alarm: check the Read-back Status of the PV via Nagios: action PF/AK ; Stale info from SecurityProbe system has safety implications
- "Noticed power supply for EMR Single Anode PMT # 26 was tripped off and probably had been for a while."
"Noticed also that in the Detectors ALH the entire EMR tree is Disabled, although Ed did put the Detectors into Running yesterday." - should be covered in the SM fix
- Not clear when the ALH EMR tree was turned off. Who is responsible for checking the Alarm Handlers over before running: experts or C&M? - see below
- MOM checklist should include checking that the State Machines are in the appropriate states. (which states are those?) AKurup will write a procedure on how to restart SM's, ALH and then mask out known false alarms ; MOM to work through it at start of running ; MOM to hold the list of known issues. ; SBoyd to oversee MOMs
- "Roof water temperature caused a red alarm, upper limit. - Returned to normal before a cause to be found. Assumed to be a measurement fluctuation."
- We see glitches in the roof water PV
MICE-SER-H2O-01:TPrimary: 6:06 31 May 2017 Checked Archiver - glitch in output variable isn't accompanied by any glitch in its inputs. SBoyd: ask shifters to keep an eye on water (stripcharts?).
- We see glitches in the roof water PV
"We have selected "LH2 Empty" in Run Control: it STILL wrote "LH2" into the CDB! CDB viewer's reporting "LH2" as the material and "Empty" as the shape which I think is the wrong way round - not sure if this is the viewer or Run Control." Need to check what is being written out (and displayed) by Run Control (AK) (done)
- "I just realised that one reason the Alarm Handler is such a pain to use is that the software is incapable of scrolling and making a beep noise at the same time!" Test if scrolling and beeping works on
miceiocpctest- VNC? _On
miceiocpctestyou cannot scroll while is beeping (PF). Can't fix.
- "There was a major alarm from DATE at about 1360 spills/ 83k triggers, but by the time we found it we couldn't see any cause or error message. No sign of a trip in the beamloss strip chart. The transfer of the ISIS beamloss numbers from ISIS into EPICS is still very unpredictable - sometimes we get the values updated on 2 successive spills, sometimes the values are stay frozen for 9 or 10 spills." - Can't see a 10-spill plateau in the Archiver, so we could lose an ISIS trip from our own monitoring. needs beam
- Repeating alarms need thresholds adjusting putting them on a strip-chart makes it easier to see if there's a real problem AKurup will update via spreadsheets. Then look out for known alarms during start-up.
- "SSU CC3 High-Side Pressure"
- DS PSU voltage Check with Mike
- "Q4 and Q7 are showing a consistent warning on the MPS water flow" #1880
- "alarm from Q2 deltaT" #1880
- "Q8 reports temperature change warning alarms." #1880
- "Q8 has a warning on the voltage" #1880
- "Supply air pressure reach 6.8 Bar before returning to normal." - increase Limit Hi 7.0, HiHi 7.2
- "Alarms in beam line: quad delta T touches limit" : #1880
- "Heartbeat alarms for miceecserv1 and miceraid 5. The help pop-ups flash up an illegible error message." - Now Fixed
- In Alarm Handlers, we need working pop-ups ...
- Changed during run?
- "After acknowledging the miceraid5 alarm, the PV remains shown as red, but the alarm keeps returning. This seems to be because the Heartbeat re-sets the status to zero before making the next test, which fails again. Probably, this is Wrong as it operates differently to the majority of other alarms."
- Is the Heartbeat test operation correct? AK/Paolo to see if it can be made consistent Needs also to then respond to later Nagios alerts on the same server. Maybe look at time stamp.
- "temperature warnings from the CS1 VME crate."
- Need to clarify temperatures of crates - are they really warmer than expected, else tweak limits AK to track down appropriate expert. ; There is an action on getting the crates cleaned out. Refer action to D.C. or Craig, recommend clean out crates before run.
- Beamstop circuit 1 leak - tested, not failed this time
- Beamstop Restraint possibly has stripped bolt - bolts replaced, updated documentation passed to Steve Boyd to put on wiki , Need to arrange BLOC refreshers, SB to check documentation is posted
- "We have pulled out the target one notch to 35.75 mm BCD, as the losses are now much spikier since the work ISIS did this morning"
- Beamline Co-ordinator has raised with ISIS... HN to remind Bryan
- DS Shield Pressures DS team cleaning gas over summer
- "Decay solenoid supply pressure PI21 - fluctuated back to normal"
- DS needs gas cleaning and TLC DS team cleaning gas over summer
- DS PSU voltage alarm :
Check with Mikewiden limits
- Tweak MPS temperatures difference limits: #1880
- Target laser saga, and control PC / R78 - saga continues. Don't expect laser readout into EPICS this cycle.
- Question from ISIS about the MICE 3ms spill gate - why not 1 ms?
- Possible balance between inefficiency from dead-time vs. activation of ring for no data
- How are the good muons distributed within the MICE spill? Won't be answered before cycle.