enter search term and/or author name
Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering
Hans-Peter Kriegel, Peer Kröger, Arthur Zimek
Article No.: 1
As a prolific research area in data mining, subspace clustering and related problems induced a vast quantity of proposed solutions. However, many publications compare a new proposition—if at all—with one or two competitors, or even...
Semi-analytical method for analyzing models and model selection measures based on moment analysis
Amit Dhurandhar, Alin Dobra
Article No.: 2
In this article we propose a moment-based method for studying models and model selection measures. By focusing on the probabilistic space of classifiers induced by the classification algorithm rather than on that of datasets, we obtain efficient...
Set pattern discovery from binary relations has been extensively studied during the last decade. In particular, many complete and efficient algorithms for frequent closed set mining are now available. Generalizing such a task to n-ary...
DOLPHIN: An efficient algorithm for mining distance-based outliers in very large datasets
Fabrizio Angiulli, Fabio Fassetti
Article No.: 4
In this work a novel distance-based outlier detection algorithm, named DOLPHIN, working on disk-resident datasets and whose I/O cost corresponds to the cost of sequentially reading the input dataset file twice, is presented.
It is both...
Bellwether analysis: Searching for cost-effective query-defined predictors in large databases
Bee-Chung Chen, Raghu Ramakrishnan, Jude W. Shavlik, Pradeep Tamma
Article No.: 5
How to mine massive datasets is a challenging problem with great potential value. Motivated by this challenge, much effort has concentrated on developing scalable versions of machine learning algorithms. However, the cost of mining large datasets...