Cloud Best Practices - Part 1
Moving to cloud computing is more than just a technical challenge, it has the potential to embrace the whole needs of the broadcaster’s business. And whether a broadcaster decides to move completely to the cloud or adopt a hybrid approach, the consideration of best practices should be at the forefront of their minds.
Other articles from this series.
Although cybersecurity is important with on-prem datacenters, the introduction of cloud computing has further escalated how we approach security. It’s not that security has never been an issue for broadcasters as it certainly has, it’s just that with traditional broadcast infrastructures media assets were physical devices that had to be obtained by breaking into a building or physically intercepting a delivery. Now, we must be more vigilant as criminals no longer need access to a physical asset stored on video tape. Consequently, we must spend much more time thinking about security and making sure the relevant systems are in place.
At the heart of cloud computing is scalability and flexibility which in turn leads to dynamic computer systems that increase and decrease the resource to meet the needs of the business. But this flexibility also has potential implications for resilience that must be addressed.
With cybersecurity and resilience at the heart of any broadcast cloud infrastructure, it soon becomes clear that the best practices adopted to meet these requirements must be built into the system at the beginning of the design, and not as an afterthought. Consequently, many of these principles are driven by the business requirements.
Business Continuity
The days where we could bury our heads in the sand and pretend we could build systems that don’t fail are well and truly gone. Instead, we must take a more pragmatic approach and assume something will go wrong, and this is demonstrated in today’s agile working practices which assumes things will go wrong. This should be taken one step further so that every component in each workflow should be analyzed and assumed to fail, and if it does, then have some form of remedy in place to fix it.
The same methodology applies to security. No matter how well designed a security system is, there is always going to be a small chance that a cybersecurity breach will occur, and this is especially true for broadcasters who are high profile targets for international cybercriminals. Adopting a zero-tolerance approach will significantly reduce the risk of a security breach, but the very act of allowing users access to a system will always weaken it slightly.
Core to making any broadcaster’s system secure and resilient is the need to provide business continuity. That is, have a plan in place for every workflow that can fail. This also includes how a broadcaster backs up media assets as they must be careful of how the files are synchronized to the backing store. For example, if the on-prem storage is mirrored to the cloud storage and an on-prem file is infected with a virus, then, at some point it will be mirrored to the cloud storage and could possibly infect the cloud backup. A method of alleviating this is to provide incremental backups to the cloud storage so that historic copies of the files can be retrieved. This is potentially an expensive method of working as more storage will be used, but this must be weighed up against the cost of losing the asset.
Employing cloud technologies is an exercise in risk management as there are many solutions to workflow and storage resilience and backup strategies. Therefore, moving to the cloud completely, or as part of a hybrid approach allows broadcasters to take a deep look into their business continuity needs and build the required system. In other words, the technical workflow requirements of cloud deployment work together with defining the business needs and parameters.
Availability Zones
Public cloud service providers use the concept of availability zones, and these can be thought of as hardware backup systems that not only provide resilience, but also have the potential to improve latency. For example, an AWS availability zone will consist of at least two physically separate buildings with mirrored infrastructure (from the point of view of the user) so that if one building fails, then the other will take up the load.
With each availability zone, each facility has independent power, cooling, and networking infrastructure. And each facility within the availability zone connects using high-speed private networks with very low latency. Although facilities connect and share within an availability zone, and a region may have multiple availability zones, no availability zones are shared with different regions.
Load balancing solves the problem of scaling-out resource and making it appear as a single destination IP address to the user. When trying to deliver greater speed to a web-type application, the natural tendency may be to scale-up the webserver, that is, increase its processing and storage capacity. This is both expensive and cumbersome as the server will always reach a natural limit. Also, when the traffic demand is low, the broadcaster will find themselves with a very expensive server sat doing nothing. Scaling-out is the ability to increase the number of moderate sized servers, usually through virtualization, so that they can be reduced when the traffic demand is low.
Availability zones facilitate scaling-out of server and storage resource so more machines can be added and removed as required. Furthermore, the load balancing service can also scale-out so that it doesn’t become a single point of failure.
Figure 1 – Load balancing provides a single IP address for services running on multiple servers using RESTful methodologies. Also, Load balancers can be scaled-out so they do not provide a single point of failure.
Diversifying Infrastructure
A natural adaptation of availability zones is diversifying infrastructure. This works with both on- and off-prem systems as well as considering the use of multiple cloud service providers. A broadcaster could use one cloud provider for their main system with another cloud provider being available, but not used, for their secondary backup.
In the true agile method of working, scripts would be available that could spin up cloud provider-two at the drop of a hat. This is not a trivial task and maintaining two cloud infrastructures, even if only one is being used, is time consuming and hungry on DevOps time, but it does provide incredible resilience.
One of the challenges broadcasters face with this strategy is keeping the number of cloud vendor specific resources to a minimum. For example, a database design may be specific to cloud supplier-one that is not compatible with the equivalent database for cloud supplier-two, so this causes an incompatibility in the code base. Common APIs go a long way to resolving this, but broadcasters cannot assume an SQL database working on one cloud provider will port to another cloud provider.
Consequently, maintaining multiple cloud vendors requires a lot of effort and this must be balanced with the costs involved.
Operating diversity within a broadcaster’s own infrastructure is much more straight forward, even if they use multiple vendor hardware as they have more control over the equipment they procure. Even this has its challenges as some equipment, such as networking routers and switchers, often have vendor lock-in through engineer training and support. Providing diversity in the internal network in terms of multi-vendor equipment is therefore not as straight forward as it may first seem.
Care must also be taken with external network providers, especially when considering diverse routing. Due to the logistical challenges of laying and installing cables across oceans, under roads, and through buildings, only a relatively small number of companies own and administer the physical cable, leading to a business model where many different service providers may be sharing the same physical cable without realizing it. Therefore, the broadcaster cannot assume that the two network vendors they have chosen for their diverse routing are using physically separate networks, understanding the true diversity of network interconnectivity requires the broadcaster to conduct intensive due diligence investigations as part of their procurement so they can truly understand the risk they are taking.
You might also like...
IP Security For Broadcasters: Part 1 - Psychology Of Security
As engineers and technologists, it’s easy to become bogged down in the technical solutions that maintain high levels of computer security, but the first port of call in designing any secure system should be to consider the user and t…
Demands On Production With HDR & WCG
The adoption of HDR requires adjustments in workflow that place different requirements on both people and technology, especially when multiple formats are required simultaneously.
If It Ain’t Broke Still Fix It: Part 2 - Security
The old broadcasting adage: ‘if it ain’t broke don’t fix it’ is no longer relevant and potentially highly dangerous, especially when we consider the security implications of not updating software and operating systems.
Standards: Part 21 - The MPEG, AES & Other Containers
Here we discuss how raw essence data needs to be serialized so it can be stored in media container files. We also describe the various media container file formats and their evolution.
NDI For Broadcast: Part 3 – Bridging The Gap
This third and for now, final part of our mini-series exploring NDI and its place in broadcast infrastructure moves on to a trio of tools released with NDI 5.0 which are all aimed at facilitating remote and collaborative workflows; NDI Audio,…