ACM Transactions on Knowledge Discovery from Data (TKDD) - Special Issue on SIGKDD 2014, Special Issue on BIGCHAT and Regular Papers, Volume 10 Issue 4, July 2016

Section: Special Issue on SIGKDD 2014

Introduction to the Special Issue of Best Papers in ACM SIGKDD 2014
Wei Wang, Jure Leskovec
Article No.: 33
DOI: 10.1145/2936718

Product Selection Problem: Improve Market Share by Learning Consumer Behavior
Silei Xu, John C. S. Lui
Article No.: 34
DOI: 10.1145/2753764

It is often crucial for manufacturers to decide what products to produce so that they can increase their market share in an increasingly fierce market. To decide which products to produce, manufacturers need to analyze the consumers’...

Catching Synchronized Behaviors in Large Networks: A Graph Mining Approach
Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, Shiqiang Yang
Article No.: 35
DOI: 10.1145/2746403

Given a directed graph of millions of nodes, how can we automatically spot anomalous, suspicious nodes judging only from their connectivity patterns? Suspicious graph patterns show up in many applications, from Twitter users who buy fake...

Heterogeneous Translated Hashing: A Scalable Solution Towards Multi-Modal Similarity Search
Ying Wei, Yangqiu Song, Yi Zhen, Bo Liu, Qiang Yang
Article No.: 36
DOI: 10.1145/2744204

Multi-modal similarity search has attracted considerable attention to meet the need of information retrieval across different types of media. To enable efficient multi-modal similarity search in large-scale databases recently, researchers start to...

Guest Editorial: Special Issue on Connected Health at Big Data Era (BigChat): A TKDD Special Issue
Hanghang Tong, Fei Wang, Munmun De Choudhury, Zoran Obradovic
Article No.: 37
DOI: 10.1145/2912122

Kernelized Information-Theoretic Metric Learning for Cancer Diagnosis Using High-Dimensional Molecular Profiling Data
Feiyu Xiong, Moshe Kam, Leonid Hrebien, Beilun Wang, Yanjun Qi
Article No.: 38
DOI: 10.1145/2789212

With the advancement of genome-wide monitoring technologies, molecular expression data have become widely used for diagnosing cancer through tumor or blood samples. When mining molecular signature data, the process of comparing samples through an...

Jointly Modeling Label and Feature Heterogeneity in Medical Informatics
Pei Yang, Hongxia Yang, Haoda Fu, Dawei Zhou, Jieping Ye, Theodoros Lappas, Jingrui He
Article No.: 39
DOI: 10.1145/2768831

Multiple types of heterogeneity including label heterogeneity and feature heterogeneity often co-exist in many real-world data mining applications, such as diabetes treatment classification, gene functionality prediction, and brain image analysis....

Mining Dual Networks: Models, Algorithms, and Applications
Yubao WU, Xiaofeng Zhu, Li Li, Wei Fan, Ruoming Jin, Xiang Zhang
Article No.: 40
DOI: 10.1145/2785970

Finding the densest subgraph in a single graph is a fundamental problem that has been extensively studied. In many emerging applications, there exist dual networks. For example, in genetics, it is important to use protein interactions to...

Biomedical Ontology Quality Assurance Using a Big Data Approach
Licong Cui, Shiqiang Tao, Guo-Qiang Zhang
Article No.: 41
DOI: 10.1145/2768830

This article presents recent progresses made in using scalable cloud computing environment, Hadoop and MapReduce, to perform ontology quality assurance (OQA), and points to areas of future opportunity. The standard sequential approach used for...

Less is More: Building Selective Anomaly Ensembles
Shebuti Rayana, Leman Akoglu
Article No.: 42
DOI: 10.1145/2890508

Ensemble learning for anomaly detection has been barely studied, due to difficulty in acquiring ground truth and the lack of inherent objective functions. In contrast, ensemble approaches for classification and clustering have been studied and...

Co-Clustering Structural Temporal Data with Applications to Semiconductor Manufacturing
Yada Zhu, Jingrui He
Article No.: 43
DOI: 10.1145/2875427

Recent years have witnessed data explosion in semiconductor manufacturing due to advances in instrumentation and storage techniques. The large amount of data associated with process variables monitored over time form a rich reservoir of...

Inferring Dynamic Diffusion Networks in Online Media
Maryam Tahani, Ali M. A. Hemmatyar, Hamid R. Rabiee, Maryam Ramezani
Article No.: 44
DOI: 10.1145/2882968

Online media play an important role in information societies by providing a convenient infrastructure for different processes. Information diffusion that is a fundamental process taking place on social and information networks has been...

Unsupervised Rare Pattern Mining: A Survey
Yun Sing Koh, Sri Devi Ravana
Article No.: 45
DOI: 10.1145/2898359

Association rule mining was first introduced to examine patterns among frequent items. The original motivation for seeking these rules arose from need to examine customer purchasing behaviour in supermarket transaction data. It seeks to identify...

CGC: A Flexible and Robust Approach to Integrating Co-Regularized Multi-Domain Graph for Clustering
Wei Cheng, Zhishan Guo, Xiang Zhang, Wei Wang
Article No.: 46
DOI: 10.1145/2903147

Multi-view graph clustering aims to enhance clustering performance by integrating heterogeneous information collected in different domains. Each domain provides a different view of the data instances. Leveraging cross-domain information has been...

Spatial-Proximity Optimization for Rapid Task Group Deployment
Chih-Ya Shen, De-Nian Yang, Wang-Chien Lee, Ming-Syan Chen
Article No.: 47
DOI: 10.1145/2818714

Spatial proximity is one of the most important factors for the quick deployment of the task groups in various time-sensitive missions. This article proposes a new spatial query, Spatio-Social Team Query (SSTQ), that forms a strong task...

Featuring, Detecting, and Visualizing Human Sentiment in Chinese Micro-Blog
Zhiwen Yu, Zhitao Wang, Liming Chen, Bin Guo, Wenjie Li
Article No.: 48
DOI: 10.1145/2821513

Micro-blog has been increasingly used for the public to express their opinions, and for organizations to detect public sentiment about social events or public policies. In this article, we examine and identify the key problems of this field,...

Eigen-Optimization on Large Graphs by Edge Manipulation
Chen Chen, Hanghang Tong, B. Aditya Prakash, Tina Eliassi-Rad, Michalis Faloutsos, Christos Faloutsos
Article No.: 49
DOI: 10.1145/2903148

Large graphs are prevalent in many applications and enable a variety of information dissemination processes, e.g., meme, virus, and influence propagation. How can we optimize the underlying graph structure to affect the outcome of such...