(Ping! Zine Web Hosting Magazine) – Today, most businesses rely heavily on computerized systems for day-to-day running. The more reliant businesses are on computerized systems, the larger the damage should they fail.
Fortunately, there are ways of making these systems more resilient, or at least quicker to restore in case of failure:
Ensuring high availability
Firstly, hardware redundancy is a great way of minimizing the effect a failure will have on your business; keep a spare for each hardware component.
Within a rack of servers, allow for two sources of power to feed the equipment in case one power source should fail. This will ensure mission-critical servers keep running provided the servers have dual power supplies themselves.
Duplicating network switches is another way to build in protection against failure. Most servers will allow for multiple network connections on a single network card that can be used to provide a backup route if one switch goes down; however for full redundancy a second network card would be required.
Servers can also be grouped in a high availability cluster allowing the same application to be run over multiple physical machines. This allows for a single server failure while not taking the application down, it also lets you perform maintenance or upgrades on a single machine without having to take your applications offline.
When servers are clustered data is usually stored on a centralized SAN. Accessible from all nodes, it will be configured with a redundant array of disks and often includes hardware redundancy across controllers, power supplies and network cards.
But what happens when disaster strikes and wipes out the building your IT systems are in?
Planning for the worst
Although most enterprises think about redundancy and will put measures in place to avoid any outages, smaller businesses (and even larger ones) often overlook disasters and the consequences. What happens if your office or server room is no longer there?
Disaster recovery (DR), in this instance, will depend on having a secondary DR site with either a complete or partial replica of your IT systems. Then, in the event of the worst happening you can switch to the backup system and be back online within a few hours with little data loss.
How is this achieved?
- Take nightly backups of your data and store them on a different site to your infrastructure.
- Create a second live environment which is mirrored continuously between your server configurations and data stored on the SAN.
- A third option would be to adopt a cloud set up where, in the event of a disaster, the data center could migrate all the virtual machines over to another site. This would allow you to maintain almost zero downtime with little, or no data loss.
Maintaining high availability and redundancy, as well as planning for disaster recovery is not just about the systems, it is also about your staff and how you will deploy them if there is no office to go to.
Have a DR plan in place; review and rehearse it regularly. This will allow everyone to be more prepared should anything happen. Also, make sure you look at your processes and find out what you need.
For companies of any size, just a few hours’ downtime are enough to lose a sizable amount of revenue and brand credibility, which can take a long time to recover. Planning for the worst is an essential part of ensuring your business’s success. And even if you think you have business continuity plan, when was the last time you read it and did a dry run?
Tips for HA and DR
It is important to bear in mind that both of these processes are required for a fully resilient system. HA is good for small failures of individual components, whereas DR helps in the event of catastrophic failure.
- When planning your system, ensure you have at least an n+1 configuration
- Make sure you have redundant raid in your servers and SANs (ideally RAID-6 or better)
- Ensure your hardware has redundant power supplies and controller cards where possible
- Use dual power feeds, fed from separate power sources (including at least one UPS)
- Cluster your servers and/or replicate data for minimal recovery time.
- Plan a backup strategy involving off site backups. This can be achieved by backing up to tape and storing the tapes securely off site
- Either build a second complete environment in a geographically diverse location or build a cut-down system, with only key mission critical systems. Replicate your data to one of these systems
- Consider using cloud for your DR environment to save capital expenditure (usually paid for monthly)
- Test your DR plan. Nominate key members of staff to carry out the procedure on an isolated network and test that core permissions and access work as designed. Load test where possible. This test should be carried out at least annually or preferably every six months
- Update your DR plan as your system grows and changes to ensure the plan is still viable.
If you have further tips from your own experience, then tweet me on @David_4D
About the author
David Barker is the technical director of 4D Hosting, having founded the company in 1999 when he was 14. In 2007 he bought an industrial unit on the outskirts of London and set up 4D Data Centers as a colocation and connectivity supplier.
In 2013, 4D Hosting re-launched with a focus on providing premium hosting packages and 24/7 support from its own engineers to SME enterprises, business consultants and professionals.