Author: Christian Timmerer, email@example.com
Affiliation: Alpen-Adria-Universität (AAU) Klagenfurt, Austria & Bitmovin Inc., San Francisco, CA, USA
The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.
The 126th MPEG meeting concluded on March 29, 2019 in Geneva, Switzerland with the following topics:
- Three Degrees of Freedom Plus (3DoF+) – MPEG evaluates responses to the Call for Proposal and starts a new project on Metadata for Immersive Video
- Neural Network Compression for Multimedia Applications – MPEG evaluates responses to the Call for Proposal and kicks off its technical work
- Low Complexity Enhancement Video Coding – MPEG evaluates responses to the Call for Proposal and selects a Test Model for further development
- Point Cloud Compression – MPEG promotes its Geometry-based Point Cloud Compression (G-PCC) technology to the Committee Draft (CD) stage
- MPEG Media Transport (MMT) – MPEG approves 3rd Edition of Final Draft International Standard
- MPEG-G – MPEG-G standards reach Draft International Standard for Application Program Interfaces (APIs) and Metadata technologies
The corresponding press release of the 126th MPEG meeting can be found here: https://mpeg.chiariglione.org/meetings/126
Three Degrees of Freedom Plus (3DoF+)
MPEG evaluates responses to the Call for Proposal and starts a new project on Metadata for Immersive Video
MPEG’s support for 360-degree video — also referred to as omnidirectional video — is achieved using the Omnidirectional Media Format (OMAF) and Supplemental Enhancement Information (SEI) messages for High Efficiency Video Coding (HEVC). It basically enables the utilization of the tiling feature of HEVC to implement 3DoF applications and services, e.g., users consuming 360-degree content using a head mounted display (HMD). However, rendering flat 360-degree video may generate visual discomfort when objects close to the viewer are rendered. The interactive parallax feature of Three Degrees of Freedom Plus (3DoF+) will provide viewers with visual content that more closely mimics natural vision, but within a limited range of viewer motion.
At its 126th meeting, MPEG received five responses to the Call for Proposals (CfP) on 3DoF+ Visual. Subjective evaluations showed that adding the interactive motion parallax to 360-degree video will be possible. Based on the subjective and objective evaluation, a new project was launched, which will be named Metadata for Immersive Video. A first version of a Working Draft (WD) and corresponding Test Model (TM) were designed to combine technical aspects from multiple responses to the call. The current schedule for the project anticipates Final Draft International Standard (FDIS) in July 2020.
Research aspects: Subjective evaluations in the context of 3DoF+ but also immersive media services in general are actively researched within the multimedia research community (e.g., ACM SIGMM/SIGCHI, QoMEX) resulting in a plethora of research papers. One apparent open issue is the gap between scientific/fundamental research and standards developing organizations (SDOs) and industry fora which often address the same problem space but sometimes adopt different methodologies, approaches, tools, etc. However, MPEG (and also other SDOs) often organize public workshops and there will be one during the next meeting, specifically on July 10, 2019 in Gothenburg, Sweden which will be about “Coding Technologies for Immersive Audio/Visual Experiences”. Further details are available here.
Neural Network Compression for Multimedia Applications
MPEG evaluates responses to the Call for Proposal and kicks off its technical work
Artificial neural networks have been adopted for a broad range of tasks in multimedia analysis and processing, such as visual and acoustic classification, extraction of multimedia descriptors or image and video coding. The trained neural networks for these applications contain a large number of parameters (i.e., weights), resulting in a considerable size. Thus, transferring them to a number of clients using them in applications (e.g., mobile phones, smart cameras) requires compressed representation of neural networks.
At its 126th meeting, MPEG analyzed nine technologies submitted by industry leaders as responses to the Call for Proposals (CfP) for Neural Network Compression. These technologies address compressing neural network parameters in order to reduce their size for transmission and the efficiency of using them, while not or only moderately reducing their performance in specific multimedia applications.
After a formal evaluation of submissions, MPEG identified three main technology components in the compression pipeline, which will be further studied in the development of the standard. A key conclusion is that with the proposed technologies, a compression to 10% or less of the original size can be achieved with no or negligible performance loss, where this performance is measured as classification accuracy in image and audio classification, matching rate in visual descriptor matching, and PSNR reduction in image coding. Some of these technologies also result in the reduction of the computational complexity of using the neural network or can benefit from specific capabilities of the target hardware (e.g., support for fixed point operations).
Research aspects: This topic has been addressed already in previous articles here and here. An interesting observation after this meeting is that apparently the compression efficiency is remarkable, specifically as the performance loss is negligible for specific application domains. However, results are based on certain applications and, thus, general conclusions regarding the compression of neural networks as well as how to evaluate its performance are still subject to future work. Nevertheless, MPEG is certainly leading this activity which could become more and more important as more applications and services rely on AI-based techniques.
Low Complexity Enhancement Video Coding
MPEG evaluates responses to the Call for Proposal and selects a Test Model for further development
MPEG started a new work item referred to as Low Complexity Enhancement Video Coding (LCEVC), which will be added as part 2 of the MPEG-5 suite of codecs. The new standard is aimed at bridging the gap between two successive generations of codecs by providing a codec-agile extension to existing video codecs that improves coding efficiency and can be readily deployed via software upgrade and with sustainable power consumption.
The target is to achieve:
- coding efficiency close to High Efficiency Video Coding (HEVC) Main 10 by leveraging Advanced Video Coding (AVC) Main Profile and
- coding efficiency close to upcoming next generation video codecs by leveraging HEVC Main 10.
This coding efficiency should be achieved while maintaining overall encoding and decoding complexity lower than that of the leveraged codecs (i.e., AVC and HEVC, respectively) when used in isolation at full resolution. This target has been met, and one of the responses to the CfP will serve as starting point and test model for the standard. The new standard is expected to become part of the MPEG-5 suite of codecs and its development is expected to be completed in 2020.
Research aspects: In addition to VVC and EVC, LCEVC is now the third video coding project within MPEG basically addressing requirements and needs going beyond HEVC. As usual, research mainly focuses on compression efficiency but a general trend in video coding is probably observable that favors software-based solutions rather than pure hardware coding tools. As such, complexity — both at encoder and decoder — is becoming important as well as power efficiency which are additional factors to be taken into account. Other issues are related to business aspects which are typically discussed elsewhere, e.g., here.
Point Cloud Compression
MPEG promotes its Geometry-based Point Cloud Compression (G-PCC) technology to the Committee Draft (CD) stage
MPEG’s Geometry-based Point Cloud Compression (G-PCC) standard addresses lossless and lossy coding of time-varying 3D point clouds with associated attributes such as color and material properties. This technology is appropriate especially for sparse point clouds.
MPEG’s Video-based Point Cloud Compression (V-PCC) addresses the same problem but for dense point clouds, by projecting the (typically dense) 3D point clouds onto planes, and then processing the resulting sequences of 2D images with video compression techniques.
G-PCC provides a generalized approach, which directly codes the 3D geometry to exploit any redundancy found in the point cloud itself and is complementary to V-PCC and particularly useful for sparse point clouds representing large environments.
Point clouds are typically represented by extremely large amounts of data, which is a significant barrier for mass market applications. However, the relative ease to capture and render spatial information compared to other volumetric video representations makes point clouds increasingly popular to present immersive volumetric data. The current implementation of a lossless, intra-frame G-PCC encoder provides a compression ratio up to 10:1 and acceptable quality lossy coding of ratio up to 35:1.
Research aspects: After V-PCC MPEG has now promoted G-PCC to CD but, in principle, the same research aspects are relevant as discussed here. Thus, coding efficiency is the number one performance metric but also coding complexity and power consumption needs to be considered to enable industry adoption. Systems technologies and adaptive streaming are actively researched within the multimedia research community, specifically ACM MM and ACM MMSys.
MPEG Media Transport (MMT)
MPEG approves 3rd Edition of Final Draft International Standard
MMT 3rd edition will introduce two aspects:
- enhancements for mobile environments and
- support of Contents Delivery Networks (CDNs).
The support for multipath delivery will enable delivery of services over more than one network connection concurrently, which is specifically useful for mobile devices that can support more than one connection at a time.
Additionally, support for intelligent network entities involved in media services (i.e., Media Aware Network Entity (MANE)) will make MMT-based services adapt to changes of the mobile network faster and better. Understanding the support for load balancing is an important feature of CDN-based content delivery, messages for DNS management, media resource update, and media request is being added in this edition.
On going developments within MMT will add support for the usage of MMT over QUIC (Quick UDP Internet Connections) and support of FCAST in the context of MMT.
Research aspects: Multimedia delivery/transport is still an important issue, specifically as multimedia data on the internet is increasing much faster than network bandwidth. In particular, the multimedia research community (i.e., ACM MM and ACM MMSys) is looking into novel approaches and tools utilizing exiting/emerging protocols/techniques like HTTP/2, HTTP/3 (QUIC), WebRTC, and Information-Centric Networking (ICN). One question, however, remains, namely what is the next big thing in multimedia delivery/transport as currently we are certainly in a phase where tools like adaptive HTTP streaming (HAS) reached maturity and the multimedia research community is eager to work on new topics in this domain.