Adeegso tilmaantan si aad u carrabbaabdo ama ugu samayso link qoraalkan http://hdl.handle.net/2307/4551
Cinwaan: Pattern recognition and data mining in graphs and strings
Qore: Drovandi, Guido
Tifaftire: Bertolazzi, Paola
Dibueege: Apostolico, Alberto
Blazewicz, Jacek
Taariikhda qoraalka: 26-Mar-2011
Tifaftire: Università degli studi Roma Tre
Abstract: The objective of the dissertation is to study methods to manage and to retrieve relevant information in graphs and strings, contributing to the advancement of research in this promising field. In particular, we focus our attention on two types of data: graphs and strings. The first part of the dissertation deals with the problem of counting subgraphs, also called motifs. It was conjectured that, in some cases, the problem could be solved in polynomial time. However, we show that this is not true giving proofs of NP-completeness. We also propose a method to obtain an approximation of the number of occurrences of topological colored motifs, that are motifs in which each node has a different color. In the second part, we consider the compression problem of graphs and strings, in particular those related with sequences of integers. A new compression scheme for graphs is introduced. To obtain a compact representation of a graph, some particular subgraphs are exploited; however, we do not search explicitly for these subgraphs. The proposed scheme is suitable for the Web Graph, the graph of the URLs over the World Wide Web. To any extent, the proposed method allows to compress other kind of graphs in contrast with many previous works that need the URLs for the compression phase. We also introduce an extension of the graph compression method proposed that exploits the high number of reciprocal edges that characterize social networks. After the compression, the graph is represented by a string of characters that must be stored on the disk. A convenient way is to use prefix codes, that are variable length codes in which a code is not prefix of any other. We give the definition of a new prefix code that is suitable for the compression of a stream of integers following a power law distribution. We show analytically that this encoding follows the entropy of a zeta distribution (one of the most famous power law probability distribution) better than codes proposed in the literature. Besides, we propose a method to build prefix codes for an arbitrary probability distribution of the integers. To deal with this problem, we introduce a general prefix code and we study its properties. We test the method creating prefix codes for different distributions and comparing the compression results with codes presented in the literature. In this case the results are as good as, orbetter than those available in the literature. Moreover, we show that using the general code proposed in the previous chapter it is possible to give a common description to many of the prefix codes proposed in the literature. This result is possible thanks to the introduction of a new method to encode a finite set of integers. The last part deals with the problem of find information in huge sets of strings. A logic mining method is applied to generate logic formulas to distinguish among different classes of strings. The method is applied with success in the analysis of biological data, such as DNA sequences; in particular, it is able to find relevant genes that distinguish among Alzheimer diseased and healthy mice.
URI : http://hdl.handle.net/2307/4551
Xuquuqda Gelitaanka: info:eu-repo/semantics/openAccess
Wuxuu ka dhex muuqdaa ururinnada:X_Dipartimento di Informatica e automazione
T - Tesi di dottorato

Fayl ku dhex jira qoraalkan:
Fayl Sifayn BaacFayl
Pattern Recognition and Data Mining in Graphs and Strings.pdf856.72 kBAdobe PDFMuuji/fur
Muuji xogta qoraalka Ku tali qoraalkan

Page view(s)

156
Last Week
0
Last month
0
checked on Nov 21, 2024

Download(s)

65
checked on Nov 21, 2024

Google ScholarTM

Check


Dhammaan qoraallada lagu kaydiyay DSpace waxay u dhowrsanyihiin xuquuqda qoraha.