jump to navigation

Warning: Don’t shut down that Windows 2003 Server! May 1, 2006

Posted by Matsu in Information Technology, Microsoft, Technology, Windows/Microsoft.
trackback

Recently, I wrote this post about the Liebert 30kVA UPS at work that failed. It had 5-year-old batteries (30 of them) so the manufacturer recommended that we replace the batteries. If that did not fix the problem, then they would send a technician out to diagnose the problem (for only $1,000).

Well, today we installed the new batteries. Then, we shut down all of the servers (approximately 40 of them) to cycle the power and bring the UPS up normally. The UPS has been running on bypass mode all of this time.

The good news? We performed a clean shut down of all servers without any problems.

The bad news? One of the Windows 2003 servers won't boot, now. The server says that the operating system is missing some critical files (there appears to be some damage or corruption of executables or driver files). It's just a mission-critical e-mail server for all employees, so it's not like we can just ignore the problem and leave it off line. So, right now my network engineers are working hard to find and replace the damaged or missing operating system files. We do have a clean back up of the boot volume, so we could restore from tape if we had to, but that's just such a pain.

Can someone please explain to me how Microsoft's best-of-class server technology can fail to boot after a clean power-down? What happened to the file space while it was powered off? I know, a million things can go wrong with hardware and software, but really, isn't the operating system more robust than this? It seems like I've gone back in time to the days of NT 4 and the 'blue screen of death'. Back then, the only fix was to reinstall the operating system. Since the auto-repair function doesn't seem to solve the problem, we may end up doing just that.

Grrrrrrr.

This just proves the point that even the most controlled change in a server room introduces risk. You can never assume that equipment will survive a reboot.

Advertisements

Comments»

1. Richie - May 10, 2006

99.9% Uptime Guarantee … with stats like that why bother with making a system capable of rebooting…


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: