enter search term and/or author name
Leakage in data mining: Formulation, detection, and avoidance
Shachar Kaufman, Saharon Rosset, Claudia Perlich, Ori Stitelman
Article No.: 15
Deemed “one of the top ten data mining mistakes”, leakage is the introduction of information about the data mining target that should not be legitimately available to mine from. In addition to our own industry experience with real-life...
Summarizing data succinctly with the most informative itemsets
Michael Mampaey, Jilles Vreeken, Nikolaj Tatti
Article No.: 16
Knowledge discovery from data is an inherently iterative process. That is, what we know about the data greatly determines our expectations, and therefore, what results we would find interesting and/or surprising. Given new knowledge about the...
Triangle listing is one of the fundamental algorithmic problems whose solution has numerous applications especially in the analysis of complex networks, such as the computation of clustering coefficients, transitivity, triangular...
Multisource domain adaptation and its application to early detection of fatigue
Rita Chattopadhyay, Qian Sun, Wei Fan, Ian Davidson, Sethuraman Panchanathan, Jieping Ye
Article No.: 18
We consider the characterization of muscle fatigue through a noninvasive sensing mechanism such as Surface ElectroMyoGraphy (SEMG). While changes in the properties of SEMG signals with respect to muscle fatigue have been reported in the...
Substantial improvements in the set-covering projection classifier CHIRP (composite hypercubes on iterated random projections)
Leland Wilkinson, Anushka Anand, Tuan Nhon Dang
Article No.: 19
In Wilkinson et al.  we introduced a new set-covering random projection classifier that achieved average error lower than that of other classifiers in the Weka platform. This classifier was based on an L&infty; norm...