Adeegso tilmaantan si aad u carrabbaabdo ama ugu samayso link qoraalkan http://hdl.handle.net/2307/40921
Cinwaan: Big biomedical data modeling for knowledge extraction with machine learning techniques
Qore: Cappelli, Eleonora
Tifaftire: Torlone, Riccardo
Dibueege: Elloumi, Mourad
Swiercz, Aleksandra
Ereyga furaha: BIOINFORMATICS
DATA STANDARDIZATION
Taariikhda qoraalka: 20-Apr-2020
Tifaftire: Università degli studi Roma Tre
Abstract: Background. Over the last ten years biomedical data daily produced by Next Generation DNA Sequencing (NGS) techniques has doubled every seven months. Nowadays genomics plays a relevant role in the field of Big Data, because of the large amount of biomedical data being produced, analyzed, and stored in many public databases. Currently, the storage of this data is performed by many different organizations and their acquisition methods are highly distributed and involve heterogeneous formats. Methods. In this dissertation the problem of biomedical data heterogeneity is addressed by proposing new standardization methods and pipelines, which permit to easily integrate genomic and clinical data of cancer related to different NGS experiments. Moreover, novel methods for querying them are defined: (i) use cases of the GenoMetric Query Language, a high-level domain-specific query language, are presented to demonstrate the efficiency of the data standardization in terms of information retrieval; (ii) a new data model that minimizes the amount of redundant information is defined, allowing the creation of an Application Programming Interfaces (API) for data retrieval; (iii) methods for discovering and querying large datasets through taxonomy-based methodologies are proposed. Finally, thanks to biomedical data standardization, it is possible to easily apply machine learning techniques for the analysis of genomic data and their interpretation. In particular, knowledge extraction experiments are shown on big biomedical datasets of cancer with promising performance and models. Results. The main results of the dissertation are new software tools and methods: i) OpenGDC, which allows to automatically standardize and extend genomic and clinical data of cancer; OpenGDC software is freely available at http://geco.deib.polimi.it/opengdc/, and additionally, a publicly accessible repository, containing homogenized and enhanced data (resulting in more than 1.5 TB) is released; ii) OpenOmics, which provides a flexible collection of Application Programming Interfaces (APIs), in particular a set of implemented endpoints are available at http://bioinformatics.iasi.cnr.it/openomics/api/routes; An ontological software layer that allows users to interact with experimental data and metadata without knowledge about their representation schema; iii) new software pipelines for gene-oriented data preprocessing are implemented, and a large knowledge base of classification results (datasets, logic formulas, performance, and statistics) obtained by the application of different machine learnings algorithms on a big repository of public available RNA sequencing and DNA methylation of Cancer. iv) CamurWeb, a web service that aims to make the CAMUR machine learning software easily accessible and usable. Conclusions. The aim of the dissertation is to provide tools for the management and analysis of Big Biomedical Data and to allow the definition of a framework for standardization, querying, and knowledge extraction from clinical and genomic data. The obtained experimental results confirm the soundness of the proposed approaches.
URI : http://hdl.handle.net/2307/40921
Xuquuqda Gelitaanka: info:eu-repo/semantics/openAccess
Wuxuu ka dhex muuqdaa ururinnada:T - Tesi di dottorato

Fayl ku dhex jira qoraalkan:
Fayl Sifayn BaacFayl
Tesi_Cappelli_Eleonora.pdf4.22 MBAdobe PDFMuuji/fur
Muuji xogta qoraalka Ku tali qoraalkan

Page view(s)

93
checked on May 12, 2024

Download(s)

38
checked on May 12, 2024

Google ScholarTM

Check


Dhammaan qoraallada lagu kaydiyay DSpace waxay u dhowrsanyihiin xuquuqda qoraha.