Ethernet Basics for Studio Video Over IP

Back in the day, the analog waveform monitor and vectorscope were the essential tools of the trade for video engineers. Fast-forward a few decades and signals that were once based on pulses have been replaced by digital SDI signals — and soon, those SDI signals will be replaced by Ethernet packets. With the new SMPTE ST 2110 standard for uncompressed IP video and audio about to come online, engineers need to learn all they can about the standard called Ethernet.

This article offers an overview of Studio Video over IP (SVIP) in the uncompressed domain using Ethernet, with the goal of making these concepts a bit easier for engineers to understand and, more importantly, to command.

Ethernet Unbundled

Ethernet’s basic structure begins with subnetworks (subnets), defined using a four-octet address scheme (for example, the private address of 192.168.1.10), followed by a network mask, or netmask, of 255.255.255.0 to further divide the network into smaller sizes in the address range. The basics of any subnet are a common set of addresses that can “speak” with another address as long as the second address uses the first three octets; e.g. 192.168.1.X. Here, the addresses from 192.168.1.1 to 192.168.1.254 can be used to define a subnet called 192.168.1.0, also known as a class C private network.

The “ping” command on the Computer Line Interface (CLI) operates by sending Internet Control Message Protocol (ICMP) Echo Request packets to the target host and waiting for an ICMP Echo Reply. To continue the 192.168.1.X example, if the computer with address 192.168.1.10 attempts to “ping” an address of 192.168.1.30, it will receive a response from that Ethernet interface. However, if it attempts to “ping” 192.168.2.30, it will not get a response because the target is in a different subnet – UNLESS another object on the Ethernet network acts as the gateway.

Most networks are configured as Local Area Networks (LANs), but they can also have larger scope in the form of Wide Area Networks (WANs). Some companies connect themselves using a WAN to carry data from different subnets within their private network. A station in Denver might have a need to deliver its Ethernet data to a station in New York, for example. The Denver LAN subnet might be 192.168.1.0 but the one in New York might be 192.168.20.0. The WAN would make it possible for these two subnets to communicate.

Ethernet for Multicast Video

In the Ethernet unicast environment, the address acts as the sender and another address within the subnet acts as the listener. However, the requirement for a sender to connect with multiple listeners calls for multicast.

Typically, Ethernet uses multicast addresses ranging from 239.0.0.0 to 239.255.255.255, along with port numbers, to carry video data to multiple receivers. The rules of the subnet still apply — receivers and senders will need to be in the same IP address of the subnet in order to broadcast to those within that subnet. The method that joins the sender with the listener is called the Internet Group Management Protocol (IGMP), enabled in an Ethernet switch called IGMP snooping.

Virtual Local Area Networks (VLANs) are used to keep traffic isolated and confined. Think of VLANs as small switches that can be used independently. For example, if there is a good reason to keep all of the video in one VLAN, this might be a way to group the video from the audio and data. It might also be important to keep certain video confined to studios; in this manner, an entire studio might be kept in a separate VLAN even though the switch might also serve another studio or production area.

Bandwidth Considerations

Ethernet, like analog video, started with rather small bandwidths carried on coaxial cable and twisted-pair wire technology. As the demand for Ethernet gathered momentum in the IT industry, the bandwidth capacity grew from 10 to 100 MB/s, and then up to 1 Gb/s, primarily using twisted-pair cabling. When Ethernet took the next jump from 1 Gb/s to 10 Gb/s, fiber became more practical than copper twisted pairs. Today, Ethernet typically utilizes multimode fiber for runs under 350 feet (OM-3 or OM-4) and single-mode fiber for runs up to 10,000 feet (OS-1 or OS-2).

The ability to switch the signal as close to real time as possible is critical for live production events such as sports and news shows, but the low latency thresholds required by live broadcasting demand the use of uncompressed video. When the industry moves from SDI to IP, the use of uncompressed video over IP networks will be critical for achieving the close-to-real-time latency that we are used to with SDI.

In this new and emerging world of uncompressed SVIP, the payloads are the same as their coaxial counterparts — 1.5 Gb/s for HD-SDI and 3.0 Gb/s for 3G-SDI — and are therefore driving the need for large Ethernet 10 Gb/s ports. At this capacity, the port can accommodate more than one HD signal; in fact, it can handle up to six HD signals (1.5 Gb/s x 6 = 9 Gb/s) or three 3G signals (3.0 Gb/s x 3 = 9 Gb/s). On enterprise-grade 10 Gb/s Ethernet switches, the backbone is rated in terabytes so that the switch has enough bandwidth to carry plenty of SD, HD, and 3G signals.

The Foundation: OSI

The Open Systems Interconnection (OSI) model is the foundation for all Ethernet technologies; therefore, it’s important to explain how OSI is used in the application of SVIP. The first layer is the Physical (PHY) layer, the second layer is Ethernet with its formal addressing scheme, and the third is the IP layer that defines all the rules that apply to the Internet Protocol. 

OSI (Open Systems Interconnection) is reference model for how applications can communicate over a network. The purpose of the reference model is to guide vendors and developers so the digital communication products and software programs they create will interoperate and to facilitate clear comparisons among communications tools.

OSI (Open Systems Interconnection) is reference model for how applications can communicate over a network. The purpose of the reference model is to guide vendors and developers so the digital communication products and software programs they create will interoperate and to facilitate clear comparisons among communications tools.

The packeting structure completes the next layer up in the form of the Universal Datagram Protocol (UDP). This layer includes packets that fit within UDP as defined by the Real Time Protocol (RTP), used both to sequence and to time-stamp the audio, video, or data streams. RTP ensures that the packets comprising a given audio, video, or data stream are held in the correct order and have some time context with regard to each of the sequential packets. Once time- stamped, the streams have some context with respect to one another and can be distinguished by their time differences. This also creates a relationship between the video, audio and data streams so they can be synchronized.

In Summary

While there are many facets to the use of Ethernet, only a few key pieces — IP, UDP, and the RTP time-stamped packets —relate to elementary audio, video, and data streams. Understanding this construction takes a good share of the sting out of learning about the new Video and Audio Over IP topologies.

The larger and more complicated piece is the network architecture that must consider aggregated payloads, SDN, signal management, VLANs, and the host of best practices currently being used in the IT world. 

Scott Barella is chief technology officer for Utah Scientific and a member of the board of directors for the Alliance for IP Media Solutions (AIMS), where he also serves as deputy chairman of the Technical Working Group.

Scott Barella is chief technology officer for Utah Scientific and a member of the board of directors for the Alliance for IP Media Solutions (AIMS), where he also serves as deputy chairman of the Technical Working Group.

You might also like...

Live Sports Production: Part 1 - New Sports Production Workflows

Welcome to Part 1 of ‘Live Sports Production’ - This new multi-part series uses a round table style format to explore the technology of live sports production with some of the industry’s leading system designers. It is a fascinating insight i…

Automating HDR-SDR Conversion

Automation seems like an obvious solution but effective conversion involves understanding what the image content is and therefore what the priorities are for how it should look.

Building Software Defined Infrastructure: Virtualization Vs Microservices

How virtualization and microservices differ, and workflows where virtualization and microservices would be used or avoided in terms of reliability, flexibility and security.

IP Security For Broadcasters: Part 8 - RADIUS Network Access

Maintaining controlled access is critical for any secure network, especially when working with high-value media in broadcast environments.

Standards: Part 25 - Designing Client-Side Video Players

Here we chart the historical development of client-side video players, describe the building blocks used to create them and the relevant standards.