Please use this identifier to cite or link to this item:
http://hdl.handle.net/2307/40546
Title: | Semantic processing of multimedia data and applications | Authors: | COLANGELO, FEDERICO | Advisor: | NERI, ALESSANDRO | Keywords: | NEURAL LEARNING MACHINE SEMANTIC |
Issue Date: | 4-Apr-2018 | Publisher: | Università degli studi Roma Tre | Abstract: | Nowadays systems able to process multimedia data automatically are of primary interest. As a matter of fact, volumes of data that cannot be handled by human operators are now generated everyday. Managing this kind of data becomes increasingly difficult when it must be processed according to human-level attributes. In fact, understanding human perception is a complex task and it is difficult to implement algorithms able to extract human-level attributes from multimedia data. Since many of such attributes lack an analytical definition, it is desirable to leverage how they emerge from the raw data in a data driven fashion, mining association between raw values and high-level attributes. However, multimedia data is characterized by high dimensionality. This feature causes numerical problems that hinder the performances of many state-of-the-art techniques. Novel models are thus being developed for the estimation of human-level attributes from multimedia data and, consequently, new attributes can be leveraged to perform complex tasks. In this context, the two most important research questions that we attempt to answer are how to estimate the semantic attributes of multimedia content and how to leverage them for building systems that can assist humans in harnessing the stream of data. In this thesis, the first question is addressed in the domain of audio surveillance. This task deals with the detection and classification of selected audio events in contexts characterized by non-relevant, background events as well as unstructured noise, with critical constraints on false rejection rate for the detection phase. A novel model for the classification and detection of critical audio events is proposed, based on the aforementioned requirements. The second question is addressed in the visual domain, more specifically in the context of unsupervised video orchestration. Here, a method to combine different types of high-level attributes in order to enhance the quality of viewers' experience is shown. More specifically, the proposed method leverages frame-based aesthetic values estimation, as well as automatic estimation of the quality of camera changes through a Markov model, combined through a multi-objective optimization algorithm. In both cases, the proposed methods show satisfying results, contributing to the growing field of semantic data processing. | URI: | http://hdl.handle.net/2307/40546 | Access Rights: | info:eu-repo/semantics/openAccess |
Appears in Collections: | X_Dipartimento di Ingegneria T - Tesi di dottorato |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
thesis_colangelo.pdf | 2.98 MB | Adobe PDF | View/Open |
Page view(s)
133
checked on Nov 21, 2024
Download(s)
62
checked on Nov 21, 2024
Google ScholarTM
Check
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.