Getting The Whole Picture With Volumetric Video Capture
Shooting at one of Japan’s largest photography studios at Sony’s headquarters in Tokyo.
With talk of an impending Metaverse and augmented reality graphics increasingly becoming part of mainstream television broadcasts and live eSports events, top graphics artists are looking past traditional 3D animation and virtual environments and using fully rendered 360-degree 2K and 4K video to get a more captivating effect.
Indeed, the world of Volumetric Video Capture (VC) is taking off in a big way across the globe, allowing celebrities and sports stars to virtually appear on TV shows when they are not in the room. It’s also helping sports teams, fashion brands, creative agencies and feature film production present characters in ways never seen before.
The first on-air example of VC being used on live broadcast was during a Madonna performance at the Billboard Music Awards 2019 (designed by production company Dimension Studio, in London), although the technology was first demonstrated publicly at the 2016 CES Show. Madonna performed her track dancing on stage and integrated into the choreography with four volumetrically captured holograms of herself.
A Technology Built For The Metaverse
Merging perfectly with apps and environments being created for the Metaverse, which will be populated with digital representations of ourselves, VC is a technique that converts a three-dimensional space, object, or environment into a moving video (or still frame) in real-time using an array of cameras mounted around a subject. Once digitized, this captured object can be transferred to the web, mobile, or virtual worlds and viewed in 3D.
This “free-viewpoint” video technology, as some call it, captures a specific moment in time and allows it to be rotated and viewed from any viewpoint. This is a huge value to editors and post-production professionals, who now have more information (image data) to work with.
“Volumetric capture is a 3D video of a specific moment in time,” said Sara Gamble, Head of Volumetric Solutions, Mark Roberts Motion Control (MRMC), based in the UK. Sara worked for Nikon, which acquired MRMC in 2016, before heading up MRMC’s special projects like this growing VC initiative. Her background is in acquisition and cameras with Nikon. “So what VC brings to sports analysis is the ability to recreate an exact moment in time because it’s captured by video.
Capturing this much image data has also been called an editor’s dream because there are so many angles and images they have to work with. It allows them to choose from more than 100 camera angles of an image, either in post production or later down the delivery chain, and manipulate it in creative ways. And if you wanted to focus on a different point or if you didn’t think you the got shot quite right, you don’t have to reshoot talent. It's all there in the original VC material.
And what makes volumetric video interesting for end users is that the final product does not have a set viewpoint, so they can watch and interact with it from all angles by rotating it. This significantly enhances the viewer experience, heightening their sense of immersion and engagement.
VC Is Not VR
The difference between 360-degree video and volumetric video is the depth provided with volume. In a 360-degree video, users can only view the video from a single, constant depth. With volumetric video, the end-user can play director and control how far in or out they want to explore the scene.
One of the first on-air examples of VC being used on live broadcast was during a Madonna performance at the Billboard Music Awards 2019, in which she danced live with four volumetrically captured holograms of herself from different points in her career.
In the past, production teams have been forced to integrate 2D video into a virtual reality (VR) or augmented reality (AR) experience. Now that they can capture a 3D view of the object, the end-user can have a 1-on-1 experience right in their living room with an athlete, artist, or entertainer.
The demand is increasing as producers find new ways to use the technology. MRMC is working with Dimension Studio, which built the first VC studio in 2017. It soon signed on as the first Microsoft Mixed Reality Partner, a new program that brings together companies that design, build, deploy, and operate mixed reality solutions. According to its website, the organizations in this program have advanced mixed reality offerings and expertise in cloud services, AI, IoT, and SaaS applications—and they are actively growing a mixed reality business.
VC Hits The Road
Citing increasing demand, Dimension Studio has now partnered with Nikon/MRMC and its Polymotion Stage system, which is a mobile VC studio using Dimension’s proprietary software stitching algorithms and a 3-fold artic lorry. Inside there’s an internal studio housing 106 tightly synchronized cameras (53 DLSRs that read and record the color required for the .png texture map and 53 infrared IR cameras that record depth and position in space for creating a VC mesh) arrayed on the walls, ceiling and even the floor (in the mobile stage they can place an additional 4 cameras (2 IR and 2 RGB) on the floor shooting upwards to ensure greater detail when capturing movements that require the head to be facing down. The rig is also equipped with state-of-the-art motion capture and prop tracking, offering a high level of accuracy.
VC also involves audio, so there are four overhead microphones inside the truck for recording sound. Lavalier mics can also be incorporated to capture broadcast quality sound. Directional sound recording is also available.
The rig’s first outing was for the 2021 Open Royal Portrush golf tournament in Northern Ireland for Sky Sports. Dimension Studio provided staff on site to run the equipment and oversaw the cloud processing from inside the MRMC truck.
Having the expandable mobile unit on site allowed the show’s producers to capture 360 renditions of the players at standard 30 fps—they can also capture at up to 60fps and above for specific requirements—as they warmed up before the event. They stopped in, took a swing and then the moving images (output as MP4 files) were uploaded into the Microsoft Azure cloud, where the various camera feeds were then stitched together and delivered to the client in 48 hours.
The MRMC Polymotion Stage mobile rig includes everything needed to capture VC video, which is then sent to the cloud for processing.
The Polymotion Stage truck and Dimension’s London studio have an 8-foot capture diameter volume. Most scenes can be accommodated into these capture volumes where the team can comfortably film 1, 2 or 3 people together or film individually and composite together in scene after. For example, Dimension filmed over 30 actors and reassembled them in a Viking longboat for the 2019 short film “Virtual Viking - The Ambush.
Perfecting The VC Workflow
In 2K or 4K the system captures the RAW data, and there is typically a lot of it—requiring huge amounts of cloud-based storage—at 10 GB/s. Up to an hour of footage can be processed in one day, after which data needs to be transferred onto a local server farm. The final processed 360-degree VC images can then be incorporated into TV coverage (linear and mobile) to add sizzle to the telecast. The VC images can be played back on multiple game engines, as well as in AR, VR, standard broadcast, and live TV.
The volumetric capture studio on board the Polymotion Stage truck includes 106 cameras, full lighting and audio.
Currently, this VC workflow can’t be done live (as it takes 48 hours to render a scene) but there are reportedly people working on live playback. The issue is being able to render all of the camera feeds as a single 3D scene in the cloud, which takes time.
The service is not cheap, as clients are charged a day rate for using the capture studio and then a per minute cloud processing fee. However, without this workflow, the production team might not have been able to get all of the players images on screen in such a new and creative way. It was so successful with viewers that the Polymotion Stage system was used during the 2021 Open as well.
“We’re using 106 cameras now but we could add more if we wanted to change the dynamics,” said Gamble. “This allows us to capture a depth map as well as color. So, it’s a true 3D video representative of their size, movements and everything we need to produce a fully formed VC version of that person. The end result is a true likeness of the person, not just a 2D version.”
VC Experimentation In The Lab
Sony Corp. is also heavily involved with VC technology and is actively working with broadcasters as well as filmmakers to harness its potential power.
A single frame can be manipulated in a variety of ways from a fully rendered volumetric video file for creative effect.
“With the current omnidirectional visualization technology [e.g., virtual reality], you can view 360 degrees around you from a certain viewpoint by wearing a headset and moving your head to look around,” said Yoichi Hirota, a researcher at the Tokyo-based Laboratory 09 R&D Center for Sony Corporation.
“However, it is not possible to move around objects and view them from behind,” he said. “This is the major difference from VR content created with CG. A more flexible point of view will be essential for immersing users in VR environments. In order to provide an immersive experience, it will also be necessary to improve the basic qualities of the video itself such as the resolution and framerate. This in turn will require the handling of larger amounts of data, so I believe we have a lot of work to do on both the video and display sides.”
MRMC’s Gamble agrees completely.
“When you talk about the metaverse, VC will certainly play a large role in how people interact in real time in a digital environment,” she said. “It’s bringing new opportunities to life that we’ve never seen before. For us VC is the future, but it’s just the beginning. Every time we do a shoot we learn more.”
You might also like...
HDR & WCG For Broadcast: Part 3 - Achieving Simultaneous HDR-SDR Workflows
Welcome to Part 3 of ‘HDR & WCG For Broadcast’ - a major 10 article exploration of the science and practical applications of all aspects of High Dynamic Range and Wide Color Gamut for broadcast production. Part 3 discusses the creative challenges of HDR…
The Resolution Revolution
We can now capture video in much higher resolutions than we can transmit, distribute and display. But should we?
Microphones: Part 3 - Human Auditory System
To get the best out of a microphone it is important to understand how it differs from the human ear.
HDR Picture Fundamentals: Camera Technology
Understanding the terminology and technical theory of camera sensors & lenses is a key element of specifying systems to meet the consumer desire for High Dynamic Range.
Demands On Production With HDR & WCG
The adoption of HDR requires adjustments in workflow that place different requirements on both people and technology, especially when multiple formats are required simultaneously.