Project

General

Profile

Bug #1464

CDB down again

Added by Rajaram, Durga about 8 years ago. Updated almost 8 years ago.

Status:
Open
Priority:
Urgent
Category:
Server
Start date:
09 May 2014
Due date:
% Done:

0%

Estimated time:

Description

CDB server is down again

Service Temporarily Unavailable

The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later.

Apache/2.2.3 (Scientific Linux) Server at cdb.mice.rl.ac.uk Port 80


Related issues

Related to Configuration Database - Bug #1519: Method to get calibrations by run is brokenClosedMartyniak, Janusz11 July 2014

Actions
Has duplicate Configuration Database - Bug #1481: CDB downOpenMartyniak, Janusz03 June 2014

Actions
#1

Updated by Rogers, Chris almost 8 years ago

I added a crontab entry under root crontab to restart the server. At the moment (testing phase) I am doing an hourly restart, I will move this to nightly if it looks like it is running smoothly.

#2

Updated by Rogers, Chris almost 8 years ago

The crontab fired, cdb service appears unaffected. I will move it to nightly at the end of the day if there are no further problems.

#3

Updated by Rogers, Chris almost 8 years ago

I moved the crontab to a nightly restart.

#4

Updated by Rajaram, Durga almost 8 years ago

Hm..it's down again [ ~8 p.m. BST, Thursday, July 21 ]
Does the nightly cron revive it if it's down? We'll see..

#5

Updated by Rajaram, Durga almost 8 years ago

Not sure if the cron brought it back to life or if it was human intervention, but it's back up now [ 4 a.m. BST, July 22 ]

#6

Updated by Rogers, Chris almost 8 years ago

My monitoring indicates it went down at 15:49 and came back up at 03:04 (the cron job runs at 03:03). So it was the cron job. Humm.

#7

Updated by Rajaram, Durga almost 8 years ago

Down again.
Maybe it needs an hourly cron or a background poll to kickstart it if it's down. Or a more permanent solution

#8

Updated by Rogers, Chris almost 8 years ago

It's pretty crap if it can only last twelve hours before failing (it went down at 13:49, I didn't get round to giving it a kick). I modified the cron job to do

wget -q --output-document=/dev/null http://cdb.mice.rl.ac.uk/cdb/
if [ $? -ne 0 ]; then
    echo "restarting tomcat" 
    /etc/init.d/tomcat5 restart
else
    echo "tomcat ok" 
fi

It's a bit of a hack, and we will certainly need something better before serious data taking. What about the MLCR instance of tomcat?

#9

Updated by Rogers, Chris almost 8 years ago

Should say I put it as an entry under /etc/cron.d

*/5 * * * * root /usr/bin/tomcat_restart.bash &> /dev/null
#10

Updated by Rogers, Chris almost 8 years ago

I added verbosity to postgres - it now logs all SQL queries to the DB. Let's see if there is any rhyme or reason to the CDB OutOfMemory errors (should also log CDB, but requires code hacking which I am reluctant to do)...

#11

Updated by Rogers, Chris almost 8 years ago

Note that postgres log made 5 MB on Thursday. So there is a fair bit going on (probably mostly MAUS tests, maybe some C+M work also?). There have been no further downtimes - the restart script should make a file like /tmp/tomcat_restart_DATE_TIME

#12

Updated by Martyniak, Janusz almost 8 years ago

A simple reload test of cdb WS on Tomcat 7 reported a memory leak. Possible candidate sources are DB handles and XML parsing. We are using both. I'm preparing profiling tests for Tomcat/CDB to see what java classes cause the leaks.

Also available in: Atom PDF