enter search term and/or author name
Introduction to the Special Issue of Best Papers in ACM SIGKDD 2014
Wei Wang, Jure Leskovec
Article No.: 33
Product Selection Problem: Improve Market Share by Learning Consumer Behavior
Silei Xu, John C. S. Lui
Article No.: 34
It is often crucial for manufacturers to decide what products to produce so that they can increase their market share in an increasingly fierce market. To decide which products to produce, manufacturers need to analyze the consumers’...
Given a directed graph of millions of nodes, how can we automatically spot anomalous, suspicious nodes judging only from their connectivity patterns? Suspicious graph patterns show up in many applications, from Twitter users who buy fake...
Multi-modal similarity search has attracted considerable attention to meet the need of information retrieval across different types of media. To enable efficient multi-modal similarity search in large-scale databases recently, researchers start to...
Section: Special Issue on SIGKDD 2014
Guest Editorial: Special Issue on Connected Health at Big Data Era (BigChat): A TKDD Special Issue
Hanghang Tong, Fei Wang, Munmun De Choudhury, Zoran Obradovic
Article No.: 37
Kernelized Information-Theoretic Metric Learning for Cancer Diagnosis Using High-Dimensional Molecular Profiling Data
Feiyu Xiong, Moshe Kam, Leonid Hrebien, Beilun Wang, Yanjun Qi
Article No.: 38
With the advancement of genome-wide monitoring technologies, molecular expression data have become widely used for diagnosing cancer through tumor or blood samples. When mining molecular signature data, the process of comparing samples through an...
Multiple types of heterogeneity including label heterogeneity and feature heterogeneity often co-exist in many real-world data mining applications, such as diabetes treatment classification, gene functionality prediction, and brain image analysis....
Finding the densest subgraph in a single graph is a fundamental problem that has been extensively studied. In many emerging applications, there exist dual networks. For example, in genetics, it is important to use protein interactions to...
Biomedical Ontology Quality Assurance Using a Big Data Approach
Licong Cui, Shiqiang Tao, Guo-Qiang Zhang
Article No.: 41
This article presents recent progresses made in using scalable cloud computing environment, Hadoop and MapReduce, to perform ontology quality assurance (OQA), and points to areas of future opportunity. The standard sequential approach used for...
Section: Special Issue on SIGKDD 2014
Less is More: Building Selective Anomaly Ensembles
Shebuti Rayana, Leman Akoglu
Article No.: 42
Ensemble learning for anomaly detection has been barely studied, due to difficulty in acquiring ground truth and the lack of inherent objective functions. In contrast, ensemble approaches for classification and clustering have been studied and...
Co-Clustering Structural Temporal Data with Applications to Semiconductor Manufacturing
Yada Zhu, Jingrui He
Article No.: 43
Recent years have witnessed data explosion in semiconductor manufacturing due to advances in instrumentation and storage techniques. The large amount of data associated with process variables monitored over time form a rich reservoir of...
Inferring Dynamic Diffusion Networks in Online Media
Maryam Tahani, Ali M. A. Hemmatyar, Hamid R. Rabiee, Maryam Ramezani
Article No.: 44
Online media play an important role in information societies by providing a convenient infrastructure for different processes. Information diffusion that is a fundamental process taking place on social and information networks has been...
Association rule mining was first introduced to examine patterns among frequent items. The original motivation for seeking these rules arose from need to examine customer purchasing behaviour in supermarket transaction data. It seeks to identify...
CGC: A Flexible and Robust Approach to Integrating Co-Regularized Multi-Domain Graph for Clustering
Wei Cheng, Zhishan Guo, Xiang Zhang, Wei Wang
Article No.: 46
Multi-view graph clustering aims to enhance clustering performance by integrating heterogeneous information collected in different domains. Each domain provides a different view of the data instances. Leveraging cross-domain information has been...
Spatial-Proximity Optimization for Rapid Task Group Deployment
Chih-Ya Shen, De-Nian Yang, Wang-Chien Lee, Ming-Syan Chen
Article No.: 47
Spatial proximity is one of the most important factors for the quick deployment of the task groups in various time-sensitive missions. This article proposes a new spatial query, Spatio-Social Team Query (SSTQ), that forms a strong task...
Micro-blog has been increasingly used for the public to express their opinions, and for organizations to detect public sentiment about social events or public policies. In this article, we examine and identify the key problems of this field,...
Large graphs are prevalent in many applications and enable a variety of information dissemination processes, e.g., meme, virus, and influence propagation. How can we optimize the underlying graph structure to affect the outcome of such...