enter search term and/or author name
Probabilistic Reframing for Cost-Sensitive Regression
Article No.: 17
Common-day applications of predictive models usually involve the full use of the available contextual information. When the operating context changes, one may fine-tune the by-default (incontextual) prediction or may even abstain from predicting a...
MDL4BMF: Minimum Description Length for Boolean Matrix Factorization
Pauli Miettinen, Jilles Vreeken
Article No.: 18
Matrix factorizations—where a given data matrix is approximated by a product of two or more factor matrices—are powerful data mining tools. Among other tasks, matrix factorizations are often used to separate global structure from...
Feature selection is widely used in preparing high-dimensional data for effective data mining. The explosive popularity of social media produces massive and high-dimensional data at an unprecedented rate, presenting new challenges to feature...
Efficient Discovery of Association Rules and Frequent Itemsets through Sampling with Tight Performance Guarantees
Matteo Riondato, Eli Upfal
Article No.: 20
The tasks of extracting (top-K) Frequent Itemsets (FIs) and Association Rules (ARs) are fundamental primitives in data mining and database applications. Exact algorithms for these problems exist and are widely used, but their running time...
We examine the problem of identifying social circles, or sets of cohesive and mutually aware nodes surrounding an initial query set, in directed graphs where the complete graph is not known beforehand. This problem differs from local community...
Random Projections for Linear Support Vector Machines
Saurabh Paul, Christos Boutsidis, Malik Magdon-Ismail, Petros Drineas
Article No.: 22
Let X be a data matrix of rank ρ, whose rows represent n points in d-dimensional space. The linear support vector machine constructs a hyperplane separator that maximizes the 1-norm soft margin. We develop a new oblivious...
Consider a social network and suppose that we are only given the number of common friends between each pair of users. Can we reconstruct the underlying network? Similarly, consider a set of documents and the words that appear in them. If we...