Cloud Best Practices - Part 1
Moving to cloud computing is more than just a technical challenge, it has the potential to embrace the whole needs of the broadcaster’s business. And whether a broadcaster decides to move completely to the cloud or adopt a hybrid approach, the consideration of best practices should be at the forefront of their minds.
Other articles from this series.
Although cybersecurity is important with on-prem datacenters, the introduction of cloud computing has further escalated how we approach security. It’s not that security has never been an issue for broadcasters as it certainly has, it’s just that with traditional broadcast infrastructures media assets were physical devices that had to be obtained by breaking into a building or physically intercepting a delivery. Now, we must be more vigilant as criminals no longer need access to a physical asset stored on video tape. Consequently, we must spend much more time thinking about security and making sure the relevant systems are in place.
At the heart of cloud computing is scalability and flexibility which in turn leads to dynamic computer systems that increase and decrease the resource to meet the needs of the business. But this flexibility also has potential implications for resilience that must be addressed.
With cybersecurity and resilience at the heart of any broadcast cloud infrastructure, it soon becomes clear that the best practices adopted to meet these requirements must be built into the system at the beginning of the design, and not as an afterthought. Consequently, many of these principles are driven by the business requirements.
Business Continuity
The days where we could bury our heads in the sand and pretend we could build systems that don’t fail are well and truly gone. Instead, we must take a more pragmatic approach and assume something will go wrong, and this is demonstrated in today’s agile working practices which assumes things will go wrong. This should be taken one step further so that every component in each workflow should be analyzed and assumed to fail, and if it does, then have some form of remedy in place to fix it.
The same methodology applies to security. No matter how well designed a security system is, there is always going to be a small chance that a cybersecurity breach will occur, and this is especially true for broadcasters who are high profile targets for international cybercriminals. Adopting a zero-tolerance approach will significantly reduce the risk of a security breach, but the very act of allowing users access to a system will always weaken it slightly.
Core to making any broadcaster’s system secure and resilient is the need to provide business continuity. That is, have a plan in place for every workflow that can fail. This also includes how a broadcaster backs up media assets as they must be careful of how the files are synchronized to the backing store. For example, if the on-prem storage is mirrored to the cloud storage and an on-prem file is infected with a virus, then, at some point it will be mirrored to the cloud storage and could possibly infect the cloud backup. A method of alleviating this is to provide incremental backups to the cloud storage so that historic copies of the files can be retrieved. This is potentially an expensive method of working as more storage will be used, but this must be weighed up against the cost of losing the asset.
Employing cloud technologies is an exercise in risk management as there are many solutions to workflow and storage resilience and backup strategies. Therefore, moving to the cloud completely, or as part of a hybrid approach allows broadcasters to take a deep look into their business continuity needs and build the required system. In other words, the technical workflow requirements of cloud deployment work together with defining the business needs and parameters.
Availability Zones
Public cloud service providers use the concept of availability zones, and these can be thought of as hardware backup systems that not only provide resilience, but also have the potential to improve latency. For example, an AWS availability zone will consist of at least two physically separate buildings with mirrored infrastructure (from the point of view of the user) so that if one building fails, then the other will take up the load.
With each availability zone, each facility has independent power, cooling, and networking infrastructure. And each facility within the availability zone connects using high-speed private networks with very low latency. Although facilities connect and share within an availability zone, and a region may have multiple availability zones, no availability zones are shared with different regions.
Load balancing solves the problem of scaling-out resource and making it appear as a single destination IP address to the user. When trying to deliver greater speed to a web-type application, the natural tendency may be to scale-up the webserver, that is, increase its processing and storage capacity. This is both expensive and cumbersome as the server will always reach a natural limit. Also, when the traffic demand is low, the broadcaster will find themselves with a very expensive server sat doing nothing. Scaling-out is the ability to increase the number of moderate sized servers, usually through virtualization, so that they can be reduced when the traffic demand is low.
Availability zones facilitate scaling-out of server and storage resource so more machines can be added and removed as required. Furthermore, the load balancing service can also scale-out so that it doesn’t become a single point of failure.
Figure 1 – Load balancing provides a single IP address for services running on multiple servers using RESTful methodologies. Also, Load balancers can be scaled-out so they do not provide a single point of failure.
Diversifying Infrastructure
A natural adaptation of availability zones is diversifying infrastructure. This works with both on- and off-prem systems as well as considering the use of multiple cloud service providers. A broadcaster could use one cloud provider for their main system with another cloud provider being available, but not used, for their secondary backup.
In the true agile method of working, scripts would be available that could spin up cloud provider-two at the drop of a hat. This is not a trivial task and maintaining two cloud infrastructures, even if only one is being used, is time consuming and hungry on DevOps time, but it does provide incredible resilience.
One of the challenges broadcasters face with this strategy is keeping the number of cloud vendor specific resources to a minimum. For example, a database design may be specific to cloud supplier-one that is not compatible with the equivalent database for cloud supplier-two, so this causes an incompatibility in the code base. Common APIs go a long way to resolving this, but broadcasters cannot assume an SQL database working on one cloud provider will port to another cloud provider.
Consequently, maintaining multiple cloud vendors requires a lot of effort and this must be balanced with the costs involved.
Operating diversity within a broadcaster’s own infrastructure is much more straight forward, even if they use multiple vendor hardware as they have more control over the equipment they procure. Even this has its challenges as some equipment, such as networking routers and switchers, often have vendor lock-in through engineer training and support. Providing diversity in the internal network in terms of multi-vendor equipment is therefore not as straight forward as it may first seem.
Care must also be taken with external network providers, especially when considering diverse routing. Due to the logistical challenges of laying and installing cables across oceans, under roads, and through buildings, only a relatively small number of companies own and administer the physical cable, leading to a business model where many different service providers may be sharing the same physical cable without realizing it. Therefore, the broadcaster cannot assume that the two network vendors they have chosen for their diverse routing are using physically separate networks, understanding the true diversity of network interconnectivity requires the broadcaster to conduct intensive due diligence investigations as part of their procurement so they can truly understand the risk they are taking.
You might also like...
HDR & WCG For Broadcast: Part 3 - Achieving Simultaneous HDR-SDR Workflows
Welcome to Part 3 of ‘HDR & WCG For Broadcast’ - a major 10 article exploration of the science and practical applications of all aspects of High Dynamic Range and Wide Color Gamut for broadcast production. Part 3 discusses the creative challenges of HDR…
IP Security For Broadcasters: Part 4 - MACsec Explained
IPsec and VPN provide much improved security over untrusted networks such as the internet. However, security may need to improve within a local area network, and to achieve this we have MACsec in our arsenal of security solutions.
Standards: Part 23 - Media Types Vs MIME Types
Media Types describe the container and content format when delivering media over a network. Historically they were described as MIME Types.
Building Software Defined Infrastructure: Part 1 - System Topologies
Welcome to Part 1 of Building Software Defined Infrastructure - a new multi-part content collection from Tony Orme. This series is for broadcast engineering & IT teams seeking to deepen their technical understanding of the microservices based IT technologies that are…
IP Security For Broadcasters: Part 3 - IPsec Explained
One of the great advantages of the internet is that it relies on open standards that promote routing of IP packets between multiple networks. But this provides many challenges when considering security. The good news is that we have solutions…