Backups and Disaster Recovery
For the database, nightly full backups are performed and replicated to a separate AWS Availability Zone, which is a distinct location engineered to be insulated from failures. The file system also has a nightly full snapshot and is also replicated to a separate Availability Zone. Seven days of snapshots are maintained.
If a single machine fails, we will recover the backup snapshot to a secondary system in the same facility, which is also the way we move instances for load optimizations, so this is not strictly a recovery operation – it’s part of our routine operations. If the entire Availability Zone became unavailable, the copies are spun up in a different zone using current EC2 templates. Regular recovery audits are performed once a month, while the offsite DR playback is initiated twice a year. This entails setting up an Amazon template instance and recovering the content and database and verifying production capability. Uptime would be dictated by the amount of instance data backed up.
Server Specific
Servers are set up in duplicate pairs with each residing in separate zones. There is a load balancer instance sitting in the first zone with the configuration set for fail-over on top of the 'least connect' setting to account for performance. Should a server fail, traffic will be routed to the server that is up. Since the load balancer is a point of failure, there is a standby load balancer instance in the second zone that would rotate in to continue routing inbound should there be a zone failure. This is managed by another server that would monitor load balancer health and re-assign the elastic IP address in an event driven fail-over process. Downtime should be minimal for the monitor detection of the load balancer being down and the time it takes for AWS to respond to the re-assign elastic IP requests.
File System Specific
Windows 2008 Server Shadow Copy is leveraged to initiate snapshots of the file system twice a day. Up to 2 weeks of file versions are kept. This has been useful to initiate restores based on file or directory issues that are user or server based. Historically this has been done in response to content being pushed up to web servers that may have broken the site or have been before approved dates of content exposure to markets. In these cases, CrownPeak has restored a previous version of the site through shadow copy and locked the directory from further publishing until investigation of the event is done and a solution has been put in place. Restore time is up to 20 minutes of the request being received with the variable set from the amount of data within the directory being restored.
Regional Fail-over for Web Hosting
Base fail-over is in place where current web site configuration and data gets backed up to a west coast S3 container. As an option, this data can be backed up into encrypted format. Golden Images of web servers and load balancers are stored on the east coast and can be ramped up and put in place with customer IT being notified of the new elastic IP address that they would point to. They may choose to have that server run as the new primary over changing the record back.
Extended Regional Fail-over is available as an option. A more general solution that has been put in place for customers has servers in 2 regions with DNS fail over being used to switch to the secondary systems. Should one region go down due to failure a monitor would pick it up and initiate an A record DNS change. With DNS set to a low TTL propagation of the new record would send traffic to the backup server until the primary is brought back up and the monitor sends the request to change the A record to point traffic back to the primary server.
CrownPeak also offers global DNS load balancing or traffic management where traffic is routed to each server based from geographic location. These can be added above the simple fail-over option with the intent to have the second server act in a capacity that is above a warm stand by. For DNS fail-over, CrownPeak can host the DNS for the web domains, or set up and host domains that the customer IT department would alias to in their domain record set up.