Semantic processing of multimedia data and applications

COLANGELO, FEDERICO

Please use this identifier to cite or link to this item: http://hdl.handle.net/2307/40546

Title:	Semantic processing of multimedia data and applications
Authors:	COLANGELO, FEDERICO
Advisor:	NERI, ALESSANDRO
Keywords:	NEURAL LEARNING MACHINE SEMANTIC
Issue Date:	4-Apr-2018
Publisher:	Università degli studi Roma Tre
Abstract:	Nowadays systems able to process multimedia data automatically are of primary interest. As a matter of fact, volumes of data that cannot be handled by human operators are now generated everyday. Managing this kind of data becomes increasingly difficult when it must be processed according to human-level attributes. In fact, understanding human perception is a complex task and it is difficult to implement algorithms able to extract human-level attributes from multimedia data. Since many of such attributes lack an analytical definition, it is desirable to leverage how they emerge from the raw data in a data driven fashion, mining association between raw values and high-level attributes. However, multimedia data is characterized by high dimensionality. This feature causes numerical problems that hinder the performances of many state-of-the-art techniques. Novel models are thus being developed for the estimation of human-level attributes from multimedia data and, consequently, new attributes can be leveraged to perform complex tasks. In this context, the two most important research questions that we attempt to answer are how to estimate the semantic attributes of multimedia content and how to leverage them for building systems that can assist humans in harnessing the stream of data. In this thesis, the first question is addressed in the domain of audio surveillance. This task deals with the detection and classification of selected audio events in contexts characterized by non-relevant, background events as well as unstructured noise, with critical constraints on false rejection rate for the detection phase. A novel model for the classification and detection of critical audio events is proposed, based on the aforementioned requirements. The second question is addressed in the visual domain, more specifically in the context of unsupervised video orchestration. Here, a method to combine different types of high-level attributes in order to enhance the quality of viewers' experience is shown. More specifically, the proposed method leverages frame-based aesthetic values estimation, as well as automatic estimation of the quality of camera changes through a Markov model, combined through a multi-objective optimization algorithm. In both cases, the proposed methods show satisfying results, contributing to the growing field of semantic data processing.
URI:	http://hdl.handle.net/2307/40546
Access Rights:	info:eu-repo/semantics/openAccess
Appears in Collections:	X_Dipartimento di Ingegneria T - Tesi di dottorato

Files in This Item:

File	Description	Size	Format
thesis_colangelo.pdf		2.98 MB	Adobe PDF	View/Open

Show full item record Recommend this item

Page view(s)

236

checked on Feb 24, 2026

Download(s)

78

checked on Feb 24, 2026

Google Scholar^TM

Check

Files in This Item:

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM