Authors / Contributors: Luca Rossetto, Ivan Giangreco, Ralph Gasser and Heiko Schuldt; Affiliation: Department of Mathematics and Computer Science, University of Basel; Editors: Mathias Lux and Marco Bertini
vitrivr is an open source retrieval system capable of processing multimedia documents such as images, videos, music, and 3D-models. It supports a wealth of content based features for multiple modalities and comes with a ready-to-use Docker image and user interface. We focus on the vitrivr stack in its second version, comprised of ADAMPro 2.0.0, Cineast 2.0.0, and the user interface Vitrivr NG 1.0.0. All components of vitrivr are available under an MIT license.
The vitrivr stack  consists of three components: the database system ADAMPro , the retrieval engine Cineast , and the Vitrivr NG user interface. The following describes these components in more detail.
ADAMPro is an Apache Spark based data management system for storing and retrieving large multimedia collections. It follows a modular architecture for storing logical (i.e., contextual information) and content (i.e., extracted feature vectors) metadata in a polystore fashion on various storage engines and provides a large set of index structures for a fast retrieval in the context of a similarity search, including Locality-Sensitive Hashing (LSH), Spectral Hashing (SH), Product Quantization (PQ), and the Vector Approximation-File (VA-File). Since similarity queries are often long-running queries, ADAMPro supports progressive queries that provide the user with streaming result lists by returning (possibly imprecise) results as soon as they become available. ADAMPro is accessible via a grpc API.
Cineast is a modular content-based retrieval engine. It supports processing and retrieval of multimedia documents from several domains such as images, videos, music, and 3D-models. It accomplishes this through its multitude of feature modules, each of which is responsible for a different aspect of similarity between documents. Cineast is written in Java and supports all major operating systems. It uses ffmpeg for media decoding which enables it to process a wide range of multimedia document formats. Cineast is designed to use ADAMPro as a data store but it can also be used without it, for example on installations where only extraction is performed or on deployments with small collection sizes. It can be accessed via a Restful API or a Websocket API.
Vitrivr NG, the user interface of vitrivr, integrates a large variety of media types and allows to apply different query modes on them. It uses the Angular framework and is written in TypeScript 2.1. Communication with Cineast takes place through the aforementioned Restful and Websocket APIs. The UI is designed to be modular and extensible in order to support all media- and query types that are currently supported by the underlying stack and such that it can be easily adapted to specialized use cases. The primary responsibilities of Vitrivr NG are to enable a user to express a query and to present the retrieved results in a consistent and clearly arranged manner.
Getting started with vitrivr
There are several ways to deploy the three components and combine them to the full vitrivr stack. Depending on the requirements at hand, this can be done in either a local or distributed fashion, natively or virtualized. We provide detailed instructions on how the existing deployment modes on vitrivr.org. For simplifying the setup of the full vitrivr stack, we also provide a completely self-contained docker image containing all components of the stack.
After installation, next steps are feature extraction with Cineast, which supports a wealth of features for image, audio, video and 3D content out of the box but can also be extend to support new ones. ADAMPro is then used to create index structures for efficient retrieval and Vitrivr NG, the user interface, supports of queries multiple modalities. A step-by-step guide is available on vitrivr.org.
What can you do with vitrivr?
The vitrivr stack is a powerful and flexible framework for multimedia processing and retrieval across several modalities. It can be used to manage, search and browse large, heterogeneous multimedia collections. Due to its extensibility, it can be easily tailored to particular needs and applications. Its extensibility in terms of features allows software engineers and scientists to use it as a framework and playground to design, implement and test multimedia retrieval and management techniques for different, domain-specific applications. Potential use cases are:
Digital arts and media creation: In the creation of all art, but especially in larger digital art productions, management of assets is an important and non-trivial task. In the production of animated films for example, a vast amount of individual objects are required to make a final scene. These assets encompass all types of media. If not meticulously annotated, dealing with such a large set of media items from different domains becomes an almost insurmountable task. Similar problems occur when interacting with large repositories from external providers such as stock-image vendors which still rely heavily on human annotated keywords. A retrieval system with cross-media retrieval capabilities could simplify the access to specific items of interest from such vast collections.
Cultural heritage: Nowadays, the documentation of our cultural heritage in museums, archives and libraries involves not only text and images but also more and more videos, audio and 3D models (e.g. in archeology). The amount of data owned by the different institutions is vast and managing it becomes becomes more challenging. Especially the task of annotating this data – which is the key to classical retrieval approaches – becomes ever more daunting. In this vein, content-based retrieval facilitated by vitrivr may offer new ways to retrieve and browse in such mixed collection of cultural heritage objects.
Additive manufacturing: Additive manufacturing is a novel technology that drives different domains in the industry. It is based on 3D model templates of real-world objects that are then assembled by a 3D printing device. Databases like Thingiverse offer an extensive repository of such templates which users can access and contribute to. Query-by-Example could offer a completely new approach to finding 3D models that fit one’s requirements and may complement the classical approach based on Boolean retrieval.
All in all, wherever large media collections must be handled and managed, vitrivr can be a potential fit. Hence, vitrivr is not just a ready-to-use software product but also a platform to build upon and to extend in order to satisfy domain-specific requirements and needs.
In this column, we provided an overview of the vitrivr stack and its capabilities. vitrivr is currently to the best of our knowledge the only integrated retrieval solution which simultaneously supports audio, video, images, and 3D-models. It is still under active development, the current focus of which is to more tightly integrate the different media types in order to increase the cross-modal retrieval capabilities. With the vitrivr stack, we hope to offer the community the basis for future research in many areas and domains of multimedia retrieval. Contributions are always highly welcome.
 ROSSETTO, L.; GIANGRECO, I.; TANASE, C.; SCHULDT, H. . vitrivr: A Flexible Retrieval Stack Supporting Multiple Query Modes for Searching in Multimedia Collections. In: ACM International Conference on Multimedia (ACM MM), 2016