The technological advancements of recent years led to a pervasion of all life areas with information systems and allows to conveniently and affordably gather large amounts of data. The key to our information society is the transformation of the mere data in these comprehensive databases into information and knowledge. One research area committed to this goal is the one of data mining, where the task is to automatically or semi-automatically extract previously unknown patterns from such data sources. The subject of this thesis is the mining task of clustering, which aims at grouping objects based on their similarity such that similar objects are grouped together, while dissimilar ones are separated.
Since modern storage systems are not subject to practical limitations anymore, data can be captured in its full complexity without restriction to a small selective set of aspects. For such complex data, just identifying a single clustering is often not sufficient. Instead, multiple, alternative, and valid clusterings can be identified for a single dataset, each highlighting different aspects of the data. The paradigm of multi-view clustering, also referred to as alternative clustering, is dedicated to explicitly discover such a diverse set of multiple, alternative clusterings in order to find all hidden patterns in the data.
A second observation for complex data sources, where usually many characteristics are stored for each object, is the inability to find similar objects by considering all of these characteristics. While clustering based on all attributes, in the full-space, is futile, valuable cluster patterns can be found for subsets of attributes, in subspace projections. This problem is tackled by approaches of the subspace clustering paradigm, which aim at uncovering clustering structures hidden in subspace projections, such that for each cluster a set of relevant attributes is determined automatically.
In this thesis, we want to highlight fundamental parallels between the two paradigms of multi-view clustering and subspace clustering, since both account for the possibility of objects belonging to multiple clusters simultaneously. Consequently, we present several approaches exploiting synergy effects by combining both paradigms to find multiple, alternative clusterings in subspace projections of the data.
Alternative Clustering in Subspace Projections
Modern storage systems allow to capture data in its full complexity. As implication for the data mining task of clustering, multiple, alternative, and valid clusterings can be identified for a single dataset. A second observation is that clustering based on all attributes, in the full-space, is futile, but valuable cluster patterns can be found for subsets of attributes.
This thesis contributes novel methods for detecting multiple, alternative clusterings in subspace projections of the data.