Current configuration¶
Slave | Master | |
heplnv119.pp.rl.ac.uk | heplnv118.pp.rl.ac.uk | |
130.246.244.119 | 130.246.244.118 | |
CentoOS7 | CentOS7 | |
Public interface | http://cdb.mice.rl.ac.uk | |
Ports open | 5432 | 5432 |
cdbviewer : tomcat running on heplnv153
cdb : tomcat running on heplnm072
Old configuration¶
Slave | Standby to be | Master | |
configdb.micenet.rl.ac.uk | heplnv150.pp.rl.ac.uk | heplnm072.pp.rl.ac.uk | |
172.16.246.25 | 130.246.92.150 | 130.246.47.72 | |
SL5 | SL7 | SL6 | |
Public interface | http://cdb.mice.rl.ac.uk http://heplnv152.pp.rl.ac.uk:4443/cdbviewer | ||
Ports open | 5432 | 5432 | 5432, 8080 |
Ancient configuration during the running¶
Master | |
Standby | |
configdb.micenet.rl.ac.uk | heplnm069.pp.rl.ac.uk | heplnm072.pp.rl.ac.uk | |
172.16.246.25 | 130.246.47.69 | 130.246.47.72 | |
SL5 | SL6 | SL6 | |
Public interface | (http://heplnv156.pp.rl.ac.uk:4443/cdbviewer/) | http://cdb.mice.rl.ac.uk http://heplnv152.pp.rl.ac.uk:4443/cdbviewer | |
Ports open on MiceNet | 5432, 8080, 22 | 5432, 8080 |
On heplnm069
and heplnm072
:
bash-4.1$ which psql /usr/pgsql-9.1/bin/psql
CDB Failover procedure¶
On the standby server (heplnm069
or heplnm072
)¶
Check if the standy is in synch
su postgres psql -c 'select pg_last_xlog_receive_location() "receive_location", pg_last_xlog_replay_location() "replay_location", pg_is_in_recovery() "recovery_status";'
receive_location | replay_location | recovery_status ------------------+-----------------+----------------- 0/8F11DF00 | 0/8F11DF00 | t
On the master server (configdb
)¶
If the master is still up and running, check if the standy is being synchronized
su postgres ps aux | grep postgres | grep sender
postgres 3375 0.0 0.0 158824 2968 ? Ss Oct31 0:03 postgres: wal sender process postgres 130.246.47.69(38367) streaming 0/8F11DF00 postgres 5215 0.0 0.0 158824 2988 ? Ss Nov10 0:01 postgres: wal sender process postgres 130.246.47.72(34951) streaming 0/8F11DF00
/var/lib/pgsql/data/pg_xlog/archive
source /etc/cron.weekly/pg-base-backup source /etc/cron.weekly/pg-dumpstop postgres as
root
service pgsql-cdb stop
On the standby server (heplnm069
or heplnm072
)¶
Promote the standby to master, creating a trigger file as in /var/lib/pgsql/data/recovery.conf
su postgres touch /var/lib/pgsql/data/failovercheck it
psql -c 'select pg_is_in_recovery() "recovery_status";'
recovery_status ----------------- f
check
/opt/mice/etc/cdb-server/cdb.props
emacs -nw /opt/mice/etc/cdb-server/cdb.props server.name=MICE Production Server - Master db.url=jdbc:postgresql://localhost:5432/ db.name=cdb db.user=mice db.pwd=**** db.superUser=supermouse db.superPwd=****check if
/var/lib/tomcat5/webapps/cdb.war
or /var/lib/tomcat/webapps/cdb.war
is the correct one (copied from the former master).
Check if /var/lib/pgsql/data/pg_hba.conf
contains everything you need to communicate with the other machine
# "local" is for Unix domain socket connections only local cdb mice,supermouse md5 local cdb postgres md5 local cdb all reject local all all peer # IPv4 local connections: host cdb mice,supermouse 127.0.0.1/32 md5 host cdb mice 130.246.92.152/32 md5 host cdb mice 130.246.92.156/32 md5 host cdb all 0.0.0.0/0 reject host replication postgres 172.16.246.25/22 trust host replication postgres 130.246.47.69/22 trust host replication postgres 130.246.47.72/22 trust host replication postgres 127.0.0.1/32 trust host all all 127.0.0.1/32 ident
As
root
restart postgresservice pgsql-cdb restartand start the dormient tomcat
service tomcat start
On the old primary server (configdb
)¶
Make sure that postgres has been stopped.
Clean /data
cd /var/lib/pgsql/data rm -rf pg_xlog/*
as root
umount /var/lib/pgsql/data/pg_xlog
su postgres cd /var/lib/pgsql/ rm -rf data/*
Take a backup of the new server
/usr/pgsql-9.1/bin/pg_basebackup -h 130.246.47.72(69) -D /var/lib/pgsql/data -U postgres -v -P
Create /var/lib/pgsql/data/recovery.conf (or copy from /var/lib/pgsql/recovery.conf.template
)
standby_mode = 'on' primary_conninfo = 'host=130.246.47.72(69) port=5432 user=postgres' trigger_file = '/var/lib/pgsql/data/failover' restore_command = 'cp /var/lib/pgsql/data/pg_xlog/archive/%f "%p"'
Recreate the mount point for pg_xlog
mv /var/lib/pgsql/data/pg_xlog /tmp/ mkdir /var/lib/pgsql/data/pg_xlog
as root
mount -a
su postgres chmod 700 /var/lib/pgsql/data cp -rp /tmp/pg_xlog/* /var/lib/pgsql/data/pg_xlog/ mkdir /var/lib/pgsql/data/pg_xlog/archive
Restart from root
as standby
service pgsql-cdb start
On any other remaining standby server (heplnm069
or heplnm072
)¶
heplnm069
or heplnm072
)emacs -nw /var/lib/postgresql/9.2/main/recovery.conf recovery_target_timeline = 'latest'
service pgsql-cdb restart
C&M configuration¶
http://configdb.micenet.rl.ac.uk
and http://172.16.246.25
are hardcoded in several places and should be changed to http://heplnm069.pp.rl.ac.uk
or http://heplnm072.pp.rl.ac.uk
:
- Run Control:
iocTops/RunControl/get_tags.py: blm_super = BeamlineSuperMouse("http://configdb.micenet.rl.ac.uk:8080") iocTops/RunControl/set_cdb_beamline_for_tag.py: blm_super = BeamlineSuperMouse("http://configdb.micenet.rl.ac.uk:8080") iocTops/RunControl/iocBoot/iocRunControl/st.cmd:epicsEnvSet("CDB_SERVER","http://configdb.micenet.rl.ac.uk:8080") iocTops/RunControl/RunControlApp/src/RunControl.c: else if (!strcmp(server,"configdb.micenet.rl.ac.uk")) { iocTops/RunControl/get_cdb_beamline_for_tag.py: blm_super = BeamlineSuperMouse("http://configdb.micenet.rl.ac.uk:8080")
- Other EPICS bits:
Config/ProcLauncher/MICE-SM.bash:export CDB_SERVER=http://configdb.micenet.rl.ac.uk iocTops/BeamLine/iocBoot/iocBeamLine/st.cmd:epicsEnvSet("CDB_SERVER","http://configdb.micenet.rl.ac.uk:8080") Software/StateMachineConfig/NOTES: MCDB see http://micewww.pp.rl.ac.uk/projects/configdb/wiki#Python-Client Software/UtilityScripts/Soft_IOC_Launcher/MICE-SM.bash:export CDB_SERVER=http://configdb.micenet.rl.ac.uk Software/UtilityScripts/Soft_IOC_Launcher/RunControl.bash:export CDB_SERVER=http://configdb.micenet.rl.ac.uk:8080
Software/StateMachineConfig/convert/cdb_configuration.py: "PROD": 'http://172.16.246.25:8080',
- Non-EPICS stuff?
Switchback¶
On the new primary (heplnm069
or heplnm072
)¶
Check if is in synch
su postgres psql -c 'select pg_last_xlog_receive_location() "receive_location", pg_last_xlog_replay_location() "replay_location", pg_is_in_recovery() "recovery_status";'
As root
stop postgres
service pgsql-cdb stopstop tomcat
service tomcat stop
On the new standby (configdb
)¶
Check the status of the standby, before promoting, if it is in complete sync
psql -c 'select pg_last_xlog_receive_location() "receive_location", pg_last_xlog_replay_location() "replay_location", pg_is_in_recovery() "recovery_status";'promote back the standby to master
touch /var/lib/pgsql/data/failover
check /var/lib/pgsql/data/pg_hba.conf
this missed directory should already be present
mkdir /var/lib/pgsql/data/pg_xlog/archive
Restart as root
service pgsql-cdb restart
Restart tomcat
service tomcat restart
On the new primary (heplnm069
or heplnm072
)¶
Stop postgres
service pgsql-cdb stop
Clean /data
cd /var/lib/pgsql/data rm -rf pg_xlog/*
as root
umount /var/lib/pgsql/data/pg_xlog su postgres cd /var/lib/pgsql/ rm -rf data/*
Take a backup of the new server
su postgres /usr/pgsql-9.1/bin/pg_basebackup -h 172.16.246.25 -D /var/lib/pgsql/data -U postgres -v -P
Create /var/lib/pgsql/data/recovery.conf (or copy from /var/lib/pgsql/recovery.conf.template
)
standby_mode = 'on' primary_conninfo = 'host=172.16.246.25 port=5432 user=postgres' trigger_file = '/var/lib/pgsql/data/failover' restore_command = 'cp /var/lib/pgsql/data/pg_xlog/archive/%f "%p"'
recreate the mount point for pg_xlog
mv /var/lib/pgsql/data/pg_xlog /tmp/ mkdir /var/lib/pgsql/data/pg_xlog
as root
mount -a
su postgres cp -rp /tmp/pg_xlog/* /var/lib/pgsql/data/pg_xlog/ mkdir /var/lib/pgsql/data/pg_xlog/archive
Restart new primary as new standby as root
service pgsql-cdb start
C&M configuration¶
- Restore the previous configuration using:
http://configdb.micenet.rl.ac.uk:8080
Updated by Franchini, Paolo over 2 years ago ยท 118 revisions