DSM Disaster Recovery as a Service

by Brian Banaszynski

Overview

DSM’s Disaster Recovery services provide recovery of IT infrastructure to support business operations in the event of a disaster which causes partial or total loss of the primary production site. DSM’s cloud service is used as the recovery site and replication target, playing a key role in a business continuity plan by enabling Clients to quickly recover critical business applications post-disaster.

 DSM’s Disaster Recovery systems utilize technology from state-of-the-art providers such as Zerto and IBM Mainframe to continuously replicate client data around the clock, ensuring recoverability quickly after an unforeseen disruption.

 DSM custom-tailors its solutions to meet client needs, including almost any recovery time objective (RTO), recovery point objective (RPO), and compliance requirement. Even when the client is not facing a disaster, DSM DRaaS can add value by providing temporary servers that the client can spin up to accelerate a project’s time to release.

DSM’s data center facilities are designed with servers compartmentalized for greater security. With almost three decades of experience in high-risk weather geographies, DSM utilizes an inland data center network, and is one of the few service providers secure enough for the most security-minded state agencies.

Service Delivery

Utilizing Zerto replication and VPN encryption for transferring data, all applicable and included servers will be replicated to DSM’s data center facility and stored powered down (at rest) on DSM’s SAN storage.  The servers at rest can be available for spin up in accordance with the client’s recovery point objectives (RPO) and recovery time objectives (RTO) requirements per the service agreement.  DSM will provide a methodology for the initial seed of the data.  Once the seed has been completed, a DSM engineer will import the seeded data onto DSM’s cloud storage and initiate replication.  In the event of a “Declaration of Disaster” by the client, these replica machines will be brought online and set to active production mode within the agreed upon RTO/RPO.

The Disaster Recovery site will be connected with the primary data center via point-to-point link, as pictured in the figure on the following page.

 TYPICAL DRAAS DEPLOYMENT

 DSM Disaster Recovery services are built upon the philosophy that the chances of a successful recovery are increased by having a documented plan and testing it often. DSM staff will design a comprehensive recovery plan and partner with the client’s team for successful testing.

 

Service delivery methodology is described on the following pages.

SERVICE DELIVERY METHODOLOGY:

 

 Phase 1: Discover

The Discovery Phase focuses on gathering critical information required to develop an effective recovery strategy. DSM will work with the Client to specify business requirements and systems details for this engagement.  

 The steps required to complete Phase 1 are outlined below:

  • Validate DRaaS scope, RTO, and RPO objectives
  • Collect Zerto Configuration information
    • Utilize the DSM Zerto Implementation Guide
  • Review DRaaS requirements with Client
  • Conduct discovery of server(s) and dependencies
    • Firewall Rules
    • External DNS Records
    • Third Party integrations
  • Bandwidth requirements validation


Phase 2: Design

The goal of the Design Phase is to transform the data gathered in the Discovery Phase into a detailed blueprint of the recovery architecture. 

The steps required to complete Phase 2 are outlined below:

  • Identify business application tiers and establish server recovery priorities
  • Zerto VM Protection Groups
    • Define the recovery solution architecture
  • Replication Network
  • Failover Network
  • Boot Order
    • Create Diagram of solution

 

Phase 3: Deploy

During the Deploy Phase, all previous planning is put into action to build the solution and validate recoverability to plan.  

The steps required to complete Phase 3 are outlined below:

  • Client on-Premise Zerto deployment
  • DRaaS infrastructure provisioning
  • Establish Site-to-Site Replication VPN
  • Pair Client site to DSM
  • Create Zerto Virtual Protection Groups (VPG)
  • Develop test plans and timelines
  • Zerto failover testing
  • Troubleshooting
  • Documentation

   

Phase 4: Post Implementation - Client Responsibility

The Post-Implementation Phase addresses lifecycle management of the recovery program. During this phase, DSM works with the Client to ensure that the solution keeps pace with changes in the environment and continues to meet the needs of the organization. 

 The steps required during Phase 4 are outlined below:

  • Monitor and report on changes to the production environment
  • Identify production-level changes impacting recovery
  • Regularly review and update server dependency mapping, server recovery procedures, and changes to the recovery configuration
  • Update restoration procedures, run-books, DR configurations, and other documentation
  • Develop and present recommendations and remediation plans

 Scope of Services

Under this agreement, DSM’s responsibility is to provide infrastructure (compute and storage) to recover the agreed-upon critical servers outlined and enable the Client team with access.   

 DSM also assumes responsibility for:

  • Service delivery regarding the host hardware and networking architecture
  • Providing Microsoft service provider licensing agreement (SPLA) monthly access to Windows server, SQL server licensing for live servers in the event of a disaster. Cost associated for production not at rest.

 DSM’s Disaster Recovery solutions are intended to establish the return of essential operations following a disrupting event, and include the activity scope and deliverables displayed in the following table.

IMPLEMENTATION DELIVERABLES:

 

Activity / Deliverable

Description and Purpose

Project Schedule and Management

A project schedule will be created to establish a written delivery timeline that will be mutually agreed upon and confirmed at the start of the project. Weekly project status reports will be sent to the Client.

Solution As Built Documentation

DSM will provide documentation of solution including architecture diagrams, configuration, support requests and escalation, division of responsibilities, and backup schedules including retention. 

Restoration Procedures

Document the restore procedures used by DSM to recover the Client’s in-scope servers and data to the recovery infrastructure.

Test and Disaster

DSM will execute restoration procedures after a disaster declaration and inform the Client once the recovery platform is accessible. The Client is responsible for all facets of the application layer including, but not limited to, documentation of restore processes and application updates to ensure DR performance.

Change Management

Change management is the responsibility of the Client.  All changes to the production environment for the supported systems that require changes to the DR platform must be reported to DSM.

 

All changes to the replication platform are the responsibility of DSM and will be coordinated with the Client to conform to the change management process.

Ongoing Replication Management

DSM will monitor replication to DR site. DSM will provide daily success/failure status reports. DSM will assist the Client with remediation of any replication failures caused by DSM. Replication failures caused by Client environment are billed to the Client at DSM’s time and materials rate.

   

DSM Responsibilities

  1. DSM will enable a test failover annually included in the monthly fee, providing one week of time for the Client to test and validate applications and services. Additional time for testing or tests will enact the agreed-upon DR pricing schedule.
  2. Bandwidth estimate is based upon a 20% Client data change rate, which will be validated in Phase 1 of the project deployment.
  3. As new critical servers are needed to be protected, the Client will notify DSM for the purpose of setting up the server to be replicated. That work will be performed at an additional charge.
  4. DSM requires the Client have one location for data replication with connectivity partner on-net routed to DSM’s data center, and connectivity at that location to be able to replicate critical servers over this speed. Otherwise, arrangements must be made to increase speed to recommended levels.
  5. DSM is responsible for delivery of the service specified in this work order; however, RTO and RPO can be impacted by many factors. The technologies identified in this proposal are capable of planned targets, but network factors, LAN, WAN, connectivity, and change rates — which are out of DSM’s control — could limit ability to recovery in the stipulated timeframe or to the targeted recovery point.  The Recovery Time Actual (RTA) will be identified during validation testing.
  6. DSM scopes compute and storage capacity based upon Client provided information. If these numbers vary from what was provided, DSM reserves the right to revise proposal to the actual data, if different.

 

Client Responsibilities

To facilitate provision of services, the Client agrees to cooperate with the following responsibilities:

  1. The Client is required to have all servers on agreed-upon operating systems. If servers are to change, DSM must be notified and potential modifications made.
  2. The Client is responsible for all licensing not explicitly included as part of the service to the Client.
  3. The Client is responsible for all local Backups.
  4. Initial and Annual DR testing must be scheduled with DSM 60 days in advance. Once made available, the Client will have one (1) week to complete testing.
  5. The Client is responsible for all application-level testing, updating, and validation.



How Did We Do?