Project

General

Profile

MOM Procedures: what to do after a Power Cut

Assess Consequences (draft)

(Need a list of what to check and how, and whom in MICE in toto to inform)

Bear in mind that MICE is now spread across the RAL campus - first thing to do will be to find out which areas have been affected. Probably have to do this directly as network may be down.

Need to get overview of entire situation first - it's easy to get bogged down restarting a particular system, ignoring the consequences of obscure dependencies (e.g. of the DS on building air).

Not in any order:

  • R78: test target and stuff; should it be stopped/parked in case of followup glitches?
  • R9: controls/compressors/pumps?
  • MICE Hall
    • Neutron Monitor per se, trench ODH (and any other safety-related monitoring)
    • Decay Solenoid (what needs to be done if it's tripped off, and how soon?)
    • Air Receiver (feeds DS controls - will last for only 1-2 hrs after input air is lost)
    • HV crates (for obvious damage)
    • air conditioning
  • Rack room (important PCs should be on UPS)
    • crates (for obvious damage)
    • air conditioning (takes about an hour for rack room to start overheating)
  • Front-end servers (in R1):
    • micewww services: MICEmine, eLog, webcams, ConfigDB interface
      • Not urgent ... but need up-to-date local paper copy of relevant MICEmine pages!
    • mousehole

Restarting

What to do when power is back and stable

MLCR Computer Systems

Please read through the all following, AND that which it references, BEFORE you start

Unfortunately, the network switch stack for micenet will not start correctly on power-up. Hence:
  • If the power is restored quickly and the Network Rack UPS is still running, confirm that the micenet switches are operating correctly - see eLog 1606 .
  • If the Network Rack UPS has already shut down, then contact a networking expert first to determine the correct restart procedure before doing anything else.

The order below is deliberate; seek advice before continuing if you run into trouble.

  1. Switch on those monitors in MLCR connected to miceecserv, cagateway and the DAQ KVM.
  2. Switch on UPSs. All should show mains power available and no load. Best to give them 30 mins to build up a reserve of power.
    1. When the Server Rack UPS is switched on the DAQ KVM units will come on immediately and a login prompt appear on the DAQ screens in the MLCR.
    2. When the Network Rack UPS is switched on the network switches will come on immediately. The micenet switches will take a while to self-test and then try to form a stack. Wait five full minutes and then check they have done so - see eLog 1606 . Most likely this will fail; contact Craig, Henry or Chris Brew before proceeding further. For Network Rack follow expert's guidance (see above)
  3. Prior to switching on each computer in turn, select the appropriate KVM channel on the right-hand DAQ monitor and let it finish booting - noting any error messages - before moving to the next.
    1. Server Rack
      1. micestore
      2. miceserv1
    2. Other racks
      1. miceecserv
      2. miceecserv2
      3. miceiocpc1 (no monitor connection)
    3. Other critical servers
      1. miceiocpc2 (no monitor connection)
      2. cagateway
      3. micecss1
  4. Any one of the miceopipc (for debugging CAM)
  5. If power confirmed stable
    1. ConfigDB
    2. targetctl
    3. trackerctl
    4. DAQ
      1. LDCs - miceacq06 -> 10
      2. GDCs - miceraid4 & 5
    5. Onrec monitors, then OnRec PCs in any order (the Online KVM switch should come on automatically when the first machine is powered up)
    6. miceacq05
    7. remaining OPI PCs

All other systems (e.g. BL, DS, PPS) should be left to system experts.
Do NOT attempt to restart miceioc5 (Decay Solenoid controls) except under supervision of DS expert.

For BLOC: restarting beamline control systems:
  1. Any one of the miceopipc (for debugging CAM)
  2. check micecss1 OK
    1. confirm terminal server boxes ts1 to ts4 respond to "ping"
    2. restart VME iocs
  • Target: On "Target Drive" need to actively set voltage to zero, and press reset for all interlocks, including "Enable Frame Lower", to clear apparent glitches