Big biomedical data modeling for knowledge extraction with machine learning techniques

Cappelli, Eleonora

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/2307/40921

Titolo:	Big biomedical data modeling for knowledge extraction with machine learning techniques
Autori:	Cappelli, Eleonora
Relatore:	Torlone, Riccardo
metadata.dc.contributor.referee:	Elloumi, Mourad Swiercz, Aleksandra
Parole chiave:	BIOINFORMATICS DATA STANDARDIZATION
Data di pubblicazione:	20-apr-2020
Editore:	Università degli studi Roma Tre
Abstract:	Background. Over the last ten years biomedical data daily produced by Next Generation DNA Sequencing (NGS) techniques has doubled every seven months. Nowadays genomics plays a relevant role in the field of Big Data, because of the large amount of biomedical data being produced, analyzed, and stored in many public databases. Currently, the storage of this data is performed by many different organizations and their acquisition methods are highly distributed and involve heterogeneous formats. Methods. In this dissertation the problem of biomedical data heterogeneity is addressed by proposing new standardization methods and pipelines, which permit to easily integrate genomic and clinical data of cancer related to different NGS experiments. Moreover, novel methods for querying them are defined: (i) use cases of the GenoMetric Query Language, a high-level domain-specific query language, are presented to demonstrate the efficiency of the data standardization in terms of information retrieval; (ii) a new data model that minimizes the amount of redundant information is defined, allowing the creation of an Application Programming Interfaces (API) for data retrieval; (iii) methods for discovering and querying large datasets through taxonomy-based methodologies are proposed. Finally, thanks to biomedical data standardization, it is possible to easily apply machine learning techniques for the analysis of genomic data and their interpretation. In particular, knowledge extraction experiments are shown on big biomedical datasets of cancer with promising performance and models. Results. The main results of the dissertation are new software tools and methods: i) OpenGDC, which allows to automatically standardize and extend genomic and clinical data of cancer; OpenGDC software is freely available at http://geco.deib.polimi.it/opengdc/, and additionally, a publicly accessible repository, containing homogenized and enhanced data (resulting in more than 1.5 TB) is released; ii) OpenOmics, which provides a flexible collection of Application Programming Interfaces (APIs), in particular a set of implemented endpoints are available at http://bioinformatics.iasi.cnr.it/openomics/api/routes; An ontological software layer that allows users to interact with experimental data and metadata without knowledge about their representation schema; iii) new software pipelines for gene-oriented data preprocessing are implemented, and a large knowledge base of classification results (datasets, logic formulas, performance, and statistics) obtained by the application of different machine learnings algorithms on a big repository of public available RNA sequencing and DNA methylation of Cancer. iv) CamurWeb, a web service that aims to make the CAMUR machine learning software easily accessible and usable. Conclusions. The aim of the dissertation is to provide tools for the management and analysis of Big Biomedical Data and to allow the definition of a framework for standardization, querying, and knowledge extraction from clinical and genomic data. The obtained experimental results confirm the soundness of the proposed approaches.
URI:	http://hdl.handle.net/2307/40921
Diritti di Accesso:	info:eu-repo/semantics/openAccess
È visualizzato nelle collezioni:	T - Tesi di dottorato

File in questo documento:

File	Descrizione	Dimensioni	Formato
Tesi_Cappelli_Eleonora.pdf		4.22 MB	Adobe PDF	Visualizza/apri

Visualizza tutti i metadati del documento Suggerisci questo documento

Page view(s)

549

Last Week
3

Last month

checked on 24-lug-2026

Download(s)

485

checked on 24-lug-2026

Google Scholar^TM

Check

File in questo documento:

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM