Open Research Newcastle
Browse

Mining disjunctive patterns in biomedical data sets

Download all (1.3 MB)
thesis
posted on 2025-05-10, 08:02 authored by Renato Vimieiro
Frequent itemset mining is one of the most studied problems in data mining. Since Agrawal et al. (1993) introduced the problem, several advances both theoretical and practical have been achieved. In spite of that, there are still many unresolved issues to be tackled before frequent pattern mining can be claimed a cornerstone approach in data mining (Han et al., 2007). Here, we investigate issues related to: (1) the (un)suitability of frequent itemset mining algorithms to identify patterns in biomedical data sets; and (2) the limited expressiveness of such patterns, since, in its vast majority, frequent itemsets are exclusively conjunctions. Our ultimate goal in this thesis is to improve methods for frequent pattern mining in such a way that they provide alternative insightful solutions for mining biomedical data sets. Specifically, we provide eficient tools for mining disjunctive patterns in biomedical data sets. We tackle the problem of mining disjunctive patterns through three different fronts: (1) disjunctive minimal generators; (2) disjunctive closed patterns; and (3) quasi-CNF emerging patterns. We then propose three different algorithms, one for each task above: TitanicOR, Disclosed, and QCEP. While the first two aim for more descriptive patterns, the third is a more predictive. These algorithms are proposed as an attempt to cover different sources of data sets coming from biomedical researches. TitanicOR is more suitable to identify patterns in data sets containing physiological, biochemical, or medical record information. Disclosed was designed to exploit the characteristics of microarray gene expression data sets, which usually contains many features, but only few samples. Finally, QCEP is the only algorithm to consider data sets with class label information. We conducted experiments with both synthetic and real world data sets to assess the performance of our algorithms. Our experiments show that our algorithms overcame the state of the art algorithms in each of those categories of patterns.

History

Year awarded

2012.0

Thesis category

  • Doctoral Degree

Degree

Doctor of Philosophy (PhD)

Supervisors

Moscato, Pablo (University of Newcastle); Berretta, Regina (University of Newcastle)

Language

  • en, English

College/Research Centre

Faculty of Engineering and Built Environment

School

School of Electrical Engineering and Computer Science

Rights statement

Copyright 2012 Renato Vimieiro

Usage metrics

    Theses

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC