Author: Christian Timmerer
Affiliation: Alpen-Adria-Universität (AAU) Klagenfurt, Austria & Bitmovin Inc., San Francisco, CA, USA
The original blog post can be found at the Bitmovin Techblog and has been updated here to focus on and highlight research aspects.
The MPEG press release comprises the following topics:
- Compact Descriptors for Video Analysis (CDVA) reaches Committee Draft level
- MPEG-G standards reach Committee Draft for metadata and APIs
- MPEG issues Calls for Visual Test Material for Immersive Applications
- Internet of Media Things (IoMT) reaches Committee Draft level
- MPEG finalizes its Media Orchestration (MORE) standard
At the end I will also briefly summarize what else happened with respect to DASH, CMAF, OMAF as well as discuss future aspects of MPEG.
Compact Descriptors for Video Analysis (CDVA) reaches Committee Draft level
The Committee Draft (CD) for CDVA has been approved at the 121st MPEG meeting, which is the first formal step of the ISO/IEC approval process for a new standard. This will become a new part of MPEG-7 to support video search and retrieval applications (ISO/IEC 15938-15).
Managing and organizing the quickly increasing volume of video content is a challenge for many industry sectors, such as media and entertainment or surveillance. One example task is scalable instance search, i.e., finding content containing a specific object instance or location in a very large video database. This requires video descriptors which can be efficiently extracted, stored, and matched. Standardization enables extracting interoperable descriptors on different devices and using software from different providers, so that only the compact descriptors instead of the much larger source videos can be exchanged for matching or querying. The CDVA standard specifies descriptors that fulfil these needs and includes (i) the components of the CDVA descriptor, (ii) its bitstream representation and (iii) the extraction process. The final standard is expected to be finished in early 2019.
CDVA introduces a new descriptor based on features which are output from a Deep Neural Network (DNN). CDVA is robust against viewpoint changes and moderate transformations of the video (e.g., re-encoding, overlays), it supports partial matching and temporal localization of the matching content. The CDVA descriptor has a typical size of 2–4 KBytes per second of video. For typical test cases, it has been demonstrated to reach a correct matching rate of 88% (at 1% false matching rate).
Research aspects: There are probably endless research aspects in the visual descriptor space ranging from validation of the achieved to results so far to further improving informative aspects with the goal to increase correct matching rate (and consequently decreasing the false matching rating). In general, however, the question is whether there’s a need for descriptors in the era of bandwidth-storage-computing over-provisioning and the raising usage of artificial intelligence techniques such as machine learning and deep learning.
MPEG-G standards reach Committee Draft for metadata and APIs
In my previous report I introduced the MPEG-G standard for compression and transport technologies of genomic data. At the 121st MPEG meeting, metadata and APIs reached CD level. The former – metadata – provides relevant information associated to genomic data and the latter – APIs – allow for building interoperable applications capable of manipulating MPEG-G files. Additional standardization plans for MPEG-G include the CDs for reference software (ISO/IEC 23092-4) and conformance (ISO/IEC 23092-4), which are planned to be issued at the next 122nd MPEG meeting with the objective of producing Draft International Standards (DIS) at the end of 2018.
Research aspects: Metadata typically enables certain functionality which can be tested and evaluated against requirements. APIs allow to build applications and services on top of the underlying functions, which could be a driver for research projects to make use of such APIs.
MPEG issues Calls for Visual Test Material for Immersive Applications
I have reported about the Omnidirectional Media Format (OMAF) in my previous report. At the 121st MPEG meeting, MPEG was working on extending OMAF functionalities to allow the modification of viewing positions, e.g., in case of head movements when using a head-mounted display, or for use with other forms of interactive navigation. Unlike OMAF which only provides 3 degrees of freedom (3DoF) for the user to view the content from a perspective looking outwards from the original camera position, the anticipated extension will also support motion parallax within some limited range which is referred to as 3DoF+. In the future with further enhanced technologies, a full 6 degrees of freedom (6DoF) will be achieved with changes of viewing position over a much larger range. To develop technology in these domains, MPEG has issued two Calls for Test Material in the areas of 3DoF+ and 6DoF, asking owners of image and video material to provide such content for use in developing and testing candidate technologies for standardization. Details about these calls can be found at https://mpeg.chiariglione.org/.
Research aspects: The good thing about test material is that it allows for reproducibility, which is an important aspect within the research community. Thus, it is more than appreciated that MPEG issues such a call and let’s hope that this material will become publicly available. Typically this kind of visual test material targets coding but it would be also interesting to have such test content for storage and delivery.
Internet of Media Things (IoMT) reaches Committee Draft level
The goal of IoMT is is to facilitate the large-scale deployment of distributed media systems with interoperable audio/visual data and metadata exchange. This standard specifies APIs providing media things (i.e., cameras/displays and microphones/loudspeakers, possibly capable of significant processing power) with the capability of being discovered, setting-up ad-hoc communication protocols, exposing usage conditions, and providing media and metadata as well as services processing them. IoMT APIs encompass a large variety of devices, not just connected cameras and displays but also sophisticated devices such as smart glasses, image/speech analyzers and gesture recognizers. IoMT enables the expression of the economic value of resources (media and metadata) and of associated processing in terms of digital tokens leveraged by the use of blockchain technologies.
Research aspects: The main focus of IoMT is APIs which provides easy and flexible access to the underlying device’ functionality and, thus, are an important factor to enable research within this interesting domain. For example, using these APIs to enable communicates among these various media things could bring up new forms of interaction with these technologies.
MPEG finalizes its Media Orchestration (MORE) standard
MPEG “Media Orchestration” (MORE) standard reached Final Draft International Standard (FDIS), the final stage of development before being published by ISO/IEC. The scope of the Media Orchestration standard is as follows:
- It supports the automated combination of multiple media sources (i.e., cameras, microphones) into a coherent multimedia experience.
- It supports rendering multimedia experiences on multiple devices simultaneously, again giving a consistent and coherent experience.
- It contains tools for orchestration in time (synchronization) and space.
MPEG expects that the Media Orchestration standard to be especially useful in immersive media settings. This applies notably in social virtual reality (VR) applications, where people share a VR experience and are able to communicate about it. Media Orchestration is expected to allow synchronizing the media experience for all users, and to give them a spatially consistent experience as it is important for a social VR user to be able to understand when other users are looking at them.
Research aspects: This standard enables the social multimedia experience proposed in literature. Interestingly, the W3C is working on something similar referred to as timing object and it would be interesting to see whether these approaches have some commonalities.
What else happened at the MPEG meeting?
DASH is fully in maintenance mode and we are still waiting for the 3rd edition which is supposed to be a consolidation of existing corrigenda and amendments. Currently only minor extensions are proposed and conformance/reference software is being updated. Similar things can be said for CMAF where we have one amendment and one corrigendum under development. Additionally, MPEG is working on CMAF conformance. OMAF has reached FDIS at the last meeting and MPEG is working on reference software and conformance also. It is expected that in the future we will see additional standards and/or technical reports defining/describing how to use CMAF and OMAF in DASH.
Regarding the future video codec, the call for proposals is out since the last meeting as announced in my previous report and responses are due for the next meeting. Thus, it is expected that the 122nd MPEG meeting will be the place to be in terms of MPEG’s future video codec. Speaking about the future, shortly after the 121st MPEG, Leonardo Chiariglione published a blog post entitled “a crisis, the causes and a solution”, which is related to HEVC licensing, Alliance for Open Media (AOM), and possible future options. The blog post certainly caused some reactions within the video community at large and I think this was also intended. Let’s hope it will galvanice the video industry — not to push the button — but to start addressing and resolving the issues. As pointed out in one of my other blog posts about what to care about in 2018, the upcoming MPEG meeting in April 2018 is certainly a place to be. Additionally, it highlights some conferences related to various aspects also discussed in MPEG which I’d like to republish here:
- QoMEX — Int’l Conf. on Quality of Multimedia Experience — will be hosted in Sardinia, Italy from May 29-31, which is THE conference to be for QoE of multimedia applications and services. Submission deadline is January 15/22, 2018.
- MMSys — Multimedia Systems Conf. — and specifically Packet Video, which will be on June 12 in Amsterdam, The Netherlands. Packet Video is THE adaptive streaming scientific event 2018. Submission deadline is March 1, 2018.
- Additionally, you might be interested in ICME (July 23-27, 2018, San Diego, USA), ICIP (October 7-10, 2018, Athens, Greece; specifically in the context of video coding), and PCS (June 24-27, 2018, San Francisco, CA, USA; also in the context of video coding).
- The DASH-IF academic track hosts special events at MMSys (Excellence in DASH Award) and ICME (DASH Grand Challenge).
- MIPR — 1st Int’l Conf. on Multimedia Information Processing and Retrieval — will be in Miami, Florida, USA from April 10-12, 2018. It has a broad range of topics including networking for multimedia systems as well as systems and infrastructures.