Nb: this document is out of date. We now have master and slave as described, with master in the MLCR and slave on PPD.
1st pair of raid disks – OS
2nd pair of raid disks – postgres including all of its data files except write ahead logs
3rd pair of raid disks – write ahead logs (transaction logs)
N.B. The mount point for the logs is below that for the database in the directory tree.
/dev/mapper/VolGroup01-pgsql on /var/lib/pgsql type ext3 (rw,noatime) /dev/mapper/VolGroup02-pg_xlog on /var/lib/pgsql/data/pg_xlog type ext3 (rw,noatime)
Instructions for Recovery After Failure - Draft¶
Ensure that the postmaster is stopped:
/sbin/service postgresql stop
The Write Ahead Logs are stored on a different raid array from the rest of the database files so we maybe able to recover from these local files. Depending on where we are starting from we need to either reinstall postges or clean out the data directory. NB the Write Ahead Logs are contained in a sub directory of the data directory so if we are trying to recover from the local copies of the Write Ahead Logs then make sure we do not delete them when cleaning the rest of the data directory. As the user
cd ~/data rm -rf `ls | grep -v xlog`
Still in the data directory restore the latest tar of the data directory, this should have been stored in the
tar -xzpf ~/data/pg_xlog/archive/db-base-backup-YYYY-MM-DD.gz --strip-components 4
If you are NOT using the locally stored Write Ahead Logs then you may have to create the
~/data/pg_xlog/archive_status directories. The Write Ahead Logs will then need to be copied from the remote backup to
To ensure our recovery is up to date we need to copy and rename the 'current' Write Ahead Log we saved earlier. This file has to be renamed so that it is next in the sequence of the archived Write Ahead Logs. For example if the highest log name is
00000001000000000000002D then we:
cp ~/data/pg_xlog/archive/current ~/data/pg_xlog/archive/00000001000000000000002E
Now create the file
restore_command = 'cp -p ~/data/pg_xlog/archive/%f %p'
We are now ready to recover the database.
/sbin/service postgresql start
Finally check the latest log file in
~/data/pg_log/ to make sure there are no error messages.
N.B. When restoring from an older backup it only restores up to the point at which the next backup was made. So if we have backup tars for January, February and March and we use the January backup as the base for our recovery then the database will end up in state that it was in immediately before the February backup.
Get a second CDB machine so we will have the additional safety of master and slave. The new machine will sit in the control room and act as master, while the current CDB will become the slave.
The new machine will have the same configuration of raid disks and the current CDB.
Master - get a file system level backup which would then be stored on the 3rd pair of raid disks. Set up archiving of the write ahead logs on the 3rd pair of raid disks. In addition do a periodic (monthly?)
pg_dumpall only storing the latest dump on the 3rd pair of raid disks. This gives us two local restore solution.
Slave – Backup as per the master. In addition Henrys script would continue to copy the backups off of the slave.
The master and slave CDB machines will be in different buildings. The master will be capable of complete recovery via the write ahead logs, not sure yet but the slave will probably be a fraction behind so will be able to do an almost complete recovery via write ahead logs. Both will be capable of doing a recovery from the dump which maybe up to one month old. If things have gone really really wrong we will have the backups that have been archived off site.
The current CDB (future slave) also hosts the redmine database. So if we had to restore the master from the slave that would potentially add the redmine database which would then have to be manually deleted from the master, no great problem but something to be aware of.