How to avoid and manage IT system failure
Coined over 50 years ago, the term ‘IT Disaster Recovery’ remains more relevant than ever as businesses and their customers increasingly depend on computing.
Advances in technology and software mean IT is much more reliable than it has been in the past, offering high quality, resilient platforms. However, despite the advances in technology, they still rely on the correct resource to maintain the system and secure operating environments which is often something that some small businesses cannot afford to do. This has driven many small businesses towards cloud-based and Software as a Service (SaaS) alternatives, which push responsibility for systems maintenance on to third party providers. In turn, this has changed the nature of IT risks faced by small businesses.
We have partnered with Inoni, business continuity experts, for a series of articles on business risk and continuity planning aimed at small to medium sized businesses, including how to avoid and manage IT system failure.
Is IT system failure a risk to my business?
Many businesses rely on dedicated servers, which means networked computing resources that are accessible to many individuals to collect, process, and provide business-critical data. Servers can exist both physically, sometimes on-premises and increasingly, virtually in the cloud in the form of rented computing power and storage.
Each route will have different types of risks. If you keep IT in-house, you retain control, but on the flipside, it’s up to your IT team to rectify local disruptions, errors, and failures. Alternatively, by outsourcing IT services, you potentially entrust an important business capability to third parties, whose own risks you stand to inherit.
How to reduce the risk of IT system failure
Clearly, no two organisations are the same – each with different business propositions, risk appetites and technology requirements – so there is no one-size-fits-all solution for reducing every small business’s IT systems risk to zero. However, the following may help:
- Keep an inventory of all IT-delivered services your business relies on, internal and external
- Identify which of these are business-critical and how long your business can survive without them
- Analyse how IT can fail for each critical service and estimate the worst-case time to repair
- Prepare business workarounds and messages for use during recovery
- Check you always have enough guaranteed expert IT resource to fix any IT failure acceptably
- Apply best-practice methodologies for all in-house developed or changed software
- Think ahead: plan IT upgrade paths so you optimise resilience and capacity
- Design and build an exact live replica for all business-critical IT resources
- Aim for best-practice protection for all critical IT equipment (power, security, environments)
- Similarly, protect local and wide-area network equipment so IT remains accessible
- Keep IT spaces immaculately clean and free of non-essential materials and clutter
- Obtain Service Level Agreements (SLAs) for all externally-delivered services
- Include IT Disaster Recovery in your business continuity plan.
Read about how to manage cyber and data risks to your business.
How to respond to an IT system failure
It’s clear that a catastrophic IT failure can have an immediate and direct bearing on business continuity, and the two need a joined-up response. It can help to think of an IT system failure as an extended business continuity scenario with specialised tactics and procedures, including:
- If an IT failure is detected by any member of staff, escalate to IT ideally via a 24x7 IT service desk
- Escalate from IT to senior management immediately, providing time to prepare
- Use prepared diagnostics or dashboards to identify the source of failure, if not self-evident
- Assess time to repair, adapt and communicate workaround instructions
- Manage communications with customers and all other interested parties
- Identify and apply resource recovery procedures (runbooks) and configuration data etc
- Fail-over to redundant or replica resources if available, or replace and deploy backups
- Contact and mobilise contracted vendors, hosting providers, agents or suppliers
- Relocate the computer room, re-equip and re-connect if necessary
- Restore IT services, test, and release to users. Manage the re-input of workaround data.
IT failures will always be a risk, but by being prepared, you can ensure your business is as protected as can be.
We are working together with Inoni to bring you insight into resilience, risk and continuity planning to help make your business stronger. If you feel your business would benefit from specialist support to develop your Business Continuity Plan, please send an email to our partners Inoni, who can explain the services they offer.
If you feel you need support with your Business Insurance needs, please get in touch with your local NFU Mutual agency office.