Cloud Broadcasting - Resilience

In the last article on Cloud Broadcasting we looked at the concept of “Cloud Washed” and “Cloud Born” and the considerations vendors must look at when delivering true cloud systems. In this article, we look more at resilience and cloud system up time.

To get the best uptime from a cloud based system, software should be based on the HTTP (Hyper Text Transfer Protocol) client server model through a web browser. One of the reasons web-browsers have become so popular is that the application software lives on the server, which is under the control of the service provider facilitating easier and more reliable software upgrades.

Service providers have more control over the back-end part of the software, such as database servers and the ability to spin up new instances and allocate resource to meet peak demand. Advances in language designs such as HTML5 and CSS give better graphics display and control handling.

Load Balancers

Cloud providers such as Amazon Web Services (AWS) take this model one step further and encourage the use of Load Balancers. These are a single point of entry for HTTP/IP traffic and work by splitting the messages between web servers. The load balancer keeps a record of TCP client-server connections so it knows where to send future datagrams.

Load balancers provide another valuable function; they allow servers to be physically separated across locations, thus improving resilience. AWS achieves this through their High Availability (HA) infrastructure. Essentially, two instances are created behind a load balancer and each server is in a different availability zone (AZ), defined by AWS as a datacentre in a different flood plane to other datacentres.

Different availability zones in regions.

Different availability zones in regions.

AWS spreads its services throughout the globe split by geographic area giving resilience and localization for improved network access. Each region is completely independent and consists of multiple AZ’s, and each zone can be thought of as a datacentre. Although they are physically separated, each zone within a region has high speed low latency networks between them.

Smooth Software Upgrades

Locations of datacentres are a closely guarded secret and are not generally known. A region may consist of more than two AZ’s; Virginia in the USA has four and Frankfurt in Germany has two, and AZ’s are identified by names such as us-west-1a and us-west-1b for North California. Load balancers split traffic equally between zones within a region and multiple servers can be enabled in each zone.

Another advantage of load balancers is they provide a smooth process for software upgrades without any downtime. Servers are no longer upgraded in the traditional way, once a software release is available a new server is spun up with the appropriate operating system, the new software is installed on it and the whole system is copied. Amazon refers to this copy as the AMI (Amazon Machine Images), creating a new server with this AMI will exactly clone the original.

Cloud Scaling

If we have a service running one instance in eu-central-1a and another in eu-central-1b, a third server could be spun up in eu-central-1a. Through the software dashboard the first server in eu-central-1a will have all incoming traffic disabled, and when it’s finished processing its current jobs it can be switched off. The same procedure is repeated for eu-central-1b, and when complete both servers will be deleted, thus upgrading without any downtime, a procedure called “rip and replace” in AWS terms.

AMI’s form the basis of scaling within AWS, when a new server is needed, the application software simply spins up a new instance with the current AMI, and then switches it online making it available for use. Once user demand subsides, the application software simply deletes one of the server instances, leaving the vendor to only pay for the uptime use of the server.

Cloud Washed software cannot take advantage of this automation and would instead rely on a developer or engineer to detect the peak demand, and then manually spin up new servers and enable them, remembering to disable them once the peak demand has gone, failing to do so will result in high cloud costs.

Load balancer providing resilience over different availability zones.

Load balancer providing resilience over different availability zones.

Cloud Born software is fully automated and will detect peak demand, spin up new servers and switch them off again, all without any human intervention. Usually, advanced monitoring and alarm systems are integrated into the software to make systems engineers aware of any changes. The costs of allocating additional resources is directly proportional to the demand placed on the system by its clients. Assuming the correct costing model has been adopted the costs will be directly proportional to sales, with minimal overhead and setup costs.

In-built Monitoring

Users can easily transfer AMI’s between zones in a region allowing server instances to be launched quickly. However, if you need to move an AMI to another region, for example from Ohio to Singapore, then the transfer could take a few hours. By doing this, AWS are effectively discouraging users from moving AMI’s between regions.

Load balancers are relatively intelligent and can detect if an attached instance is healthy or not. If the server starts to drop packets, maybe due to overloading or a software bug, the load balancer will detect this and stop sending it datagrams, it will continue to test the server and start sending messages once it recovers.

Load balancers and high availability zones provide a simple, cheap method of improving resilience in cloud infrastructures. Cloud Born systems take advantage of this to meet peak user demands and improve performance without human intervention, thus reducing costs and improving response times. Cloud Washed solutions can still take advantage of these systems but will be slow and expensive due to the manual intervention of expensive humans.

You might also like...

Live Sports Production: Part 1 - New Sports Production Workflows

Welcome to Part 1 of ‘Live Sports Production’ - This new multi-part series uses a round table style format to explore the technology of live sports production with some of the industry’s leading system designers. It is a fascinating insight i…

Building Software Defined Infrastructure: Virtualization Vs Microservices

How virtualization and microservices differ, and workflows where virtualization and microservices would be used or avoided in terms of reliability, flexibility and security.

IP Security For Broadcasters: Part 8 - RADIUS Network Access

Maintaining controlled access is critical for any secure network, especially when working with high-value media in broadcast environments.

Standards: Part 25 - Designing Client-Side Video Players

Here we chart the historical development of client-side video players, describe the building blocks used to create them and the relevant standards.

Microphones: Part 5 - The Variable Directivity Microphone

The variable directivity microphone is very popular for studio work. What goes on inside is very clever and not widely appreciated.