Galin Iliev's blog

Software Architecture & Development

13 disasters for production web sites and their solutions by Omar Al Zabir

A great software developer and architect from Pakistan Omar Al Zabir wrote an article about disasters for production web sites and their solutions. It is very interesting as it describes how to deal with problems (and event worse - disasters) in production.

Here is the list:

  1. Hard drive crashed, burned, got corrupted several times
  2. Controller malfunctions and corrupts all disks in the same controller
  3. RAID malfunction
  4. CPU overheated and burned out
  5. Firewall went down
  6. Remote Desktop stopped working after a patch installation
  7. Remote Desktop max connection exceeded. Cannot login anymore to servers
  8. Database got corrupted while we were moving the production database from one server to another over the network
  9. One developer deleted the production database accidentally while doing routine work
  10. Support crew at hosting service formatted our running production server instead of a corrupted server that we asked to format
  11. Windows got corrupted and was not working until we reinstalled
  12. DNS goes down
  13. Internet backbone goes down in different part of the world

Here is the article 13 disasters for production web sites and their solutions

Note: You can also take a look at his website - it is very impressing with Win2k look&feel