How to Plan your Disaster Recovery RPO and RTO Metrics
Is your organization's acceptable recovery times (RTO and RPO) agreed upon with business owners? This is not a decision to be made by IT. IT's role is to consult with the business and implement and support the required solution to meet the business requirements.
Equilibrium recommends interviewing the business owners during DR planning engagements to finalize a backup and recovery strategy as a team effort.
Don’t be surprised to see high expectations from the business owners or the board, especially when it comes to RPO (acceptable data loss). They probably assume everything is backed up and recoverable with no data loss in the event of a major disaster. I have seen many discussions and debates over RPO where application owners are not on the same page as IT. When it comes to RTO and RPO, a typical meeting starts with explaining the technical terms: RPO and RTO.
Recovery Time Objective (RTO)
Recovery Time Objective (RTO) is the time measured from a disaster event until the service is fully restored and functional to the end user. This is the time when the systems will be down or in other words “the downtime”. This includes adding up all of the following to come up with the total expected recovery time (RTO):
From the time disaster is declared
1. Recall backup media
2. Travel time for on-call engineers
3. Bring up infrastructure
4. Restore data
5. Bring up services
6. Configure application
7. Test and validate
It is great to have the debate and agree upon RPO and RTO practively before a disaster strikes. If a disaster happens, it is too late, and your organization is at risk of losing days of future or past work and operations.
Recovery Point Objective (RPO)
After recovery time is understood, next business owners must be educated on RPO.
Recovery Point Objective (RPO) is the maximum amount of recent data loss the business is willing to accept.
This is typically measured from last backup (used for recovery) until the Disaster event. For example, if there is a backup nightly, the RPO would be 24 hours.
The data that was never backed up cannot be recovered. If the business requires a lower RPO, then the backups need to be more frequent with an appropriate solution. RPO also relates to the worst-case scenario and requires offsite backups for the number to be accurate.
Equilibrium has facilitated countless sessions covering RPO and RTO. We lead these effective meetings and help finalize a common understanding which is both affordable and achievable between the executives and IT.
Don't assume you don’t need the RTO/RPO discussion just because you have virtual or redundant environment. It is great that virtualization, vMotion, clustering and replication will help reduce the RTO/RPO time, but that cannot replace the discussion. The benefits of discussion are many. For example, if the business cannot agree upon RTO and RPO limits then a robust technology can be leveraged to achieve a low RPO/RTO (realtime, seconds, 1 hour, etc.) to cover all bases.
Equilibrium likes to take advantage of these meetings with key decision makers to discuss order-of-magnitude budget estimates. Budget is a major factor and one of the intangibles to the discussion.
Do you need help finalizing your RPO and RTO metrics for your applications? Let's Talk.
Author: Khaja Moizuddin
Diagrams: Richie Proca
Equilibrium IT Solutions, Inc.