Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/2307/40546
Titolo: Semantic processing of multimedia data and applications
Autori: COLANGELO, FEDERICO
Relatore: NERI, ALESSANDRO
Parole chiave: NEURAL
LEARNING
MACHINE
SEMANTIC
Data di pubblicazione: 4-apr-2018
Editore: Università degli studi Roma Tre
Abstract: Nowadays systems able to process multimedia data automatically are of primary interest. As a matter of fact, volumes of data that cannot be handled by human operators are now generated everyday. Managing this kind of data becomes increasingly difficult when it must be processed according to human-level attributes. In fact, understanding human perception is a complex task and it is difficult to implement algorithms able to extract human-level attributes from multimedia data. Since many of such attributes lack an analytical definition, it is desirable to leverage how they emerge from the raw data in a data driven fashion, mining association between raw values and high-level attributes. However, multimedia data is characterized by high dimensionality. This feature causes numerical problems that hinder the performances of many state-of-the-art techniques. Novel models are thus being developed for the estimation of human-level attributes from multimedia data and, consequently, new attributes can be leveraged to perform complex tasks. In this context, the two most important research questions that we attempt to answer are how to estimate the semantic attributes of multimedia content and how to leverage them for building systems that can assist humans in harnessing the stream of data. In this thesis, the first question is addressed in the domain of audio surveillance. This task deals with the detection and classification of selected audio events in contexts characterized by non-relevant, background events as well as unstructured noise, with critical constraints on false rejection rate for the detection phase. A novel model for the classification and detection of critical audio events is proposed, based on the aforementioned requirements. The second question is addressed in the visual domain, more specifically in the context of unsupervised video orchestration. Here, a method to combine different types of high-level attributes in order to enhance the quality of viewers' experience is shown. More specifically, the proposed method leverages frame-based aesthetic values estimation, as well as automatic estimation of the quality of camera changes through a Markov model, combined through a multi-objective optimization algorithm. In both cases, the proposed methods show satisfying results, contributing to the growing field of semantic data processing.
URI: http://hdl.handle.net/2307/40546
Diritti di Accesso: info:eu-repo/semantics/openAccess
È visualizzato nelle collezioni:X_Dipartimento di Ingegneria
T - Tesi di dottorato

File in questo documento:
File Descrizione DimensioniFormato
thesis_colangelo.pdf2.98 MBAdobe PDFVisualizza/apri
Visualizza tutti i metadati del documento Suggerisci questo documento

Page view(s)

64
checked on 17-mag-2024

Download(s)

47
checked on 17-mag-2024

Google ScholarTM

Check


Tutti i documenti archiviati in DSpace sono protetti da copyright. Tutti i diritti riservati.