Latest Articles

## Kernelized Information-Theoretic Metric Learning for Cancer Diagnosis Using High-Dimensional Molecular Profiling Data

With the advancement of genome-wide monitoring technologies, molecular expression data have become... (more)

## Jointly Modeling Label and Feature Heterogeneity in Medical Informatics

Multiple types of heterogeneity including label heterogeneity and feature heterogeneity often co-exist in many real-world data mining applications,... (more)

## Mining Dual Networks

Finding the densest subgraph in a single graph is a fundamental problem that has been extensively studied. In many emerging applications, there exist dual networks. For example, in genetics, it is important to use protein interactions to interpret genetic interactions. In this application, one network represents physical interactions among nodes,... (more)

## Less is More

Ensemble learning for anomaly detection has been barely studied, due to difficulty in acquiring ground truth and the lack of inherent objective functions. In contrast, ensemble approaches for classification and clustering have been studied and effectively used for long. Our work taps into this gap and builds a new ensemble approach for anomaly... (more)

## Co-Clustering Structural Temporal Data with Applications to Semiconductor Manufacturing

Recent years have witnessed data explosion in semiconductor manufacturing due to advances in instrumentation and storage techniques. The large amount... (more)

## Inferring Dynamic Diffusion Networks in Online Media

Online media play an important role in information societies by providing a convenient infrastructure for different processes. Information diffusion... (more)

## Unsupervised Rare Pattern Mining

Association rule mining was first introduced to examine patterns among frequent items. The original motivation for seeking these rules arose from need to examine customer purchasing behaviour in supermarket transaction data. It seeks to identify combinations of items or itemsets, whose presence in a transaction affects the likelihood of the... (more)

## CGC

Multi-view graph clustering aims to enhance clustering performance by integrating heterogeneous information collected in different domains. Each domain provides a different view of the data instances. Leveraging cross-domain information has been demonstrated an effective way to achieve better clustering results. Despite the previous success,... (more)

### New options for ACM authors to manage rights and permissions for their work

ACM introduces a new publishing license agreement, an updated copyright transfer agreement, and a new author-pays option which allows for perpetual open access through the ACM Digital Library. For more information, visit the ACM Author Rights webpage.

The ACM Transactions on Knowledge Discovery from Data (TKDD) publishes original archival papers in the area of knowledge discovery from data and closely related disciplines.  The majority of the papers that appear in TKDD is expected to address the logical and technical foundation of knowledge discovery and data mining.

##### Forthcoming Articles
Sampling for Nystr ¨om Extension based Spectral Clustering: Incremental Perspective and Novel Analysis

Sampling is the key aspect for Nystr¨om extension based spectral clustering. Traditional sampling schemes select the set of landmark points on a whole and focus on how to lower the matrix approximation error. How- ever, matrix approximation error does not have direct impact on the clustering performance. In this paper, we propose a sampling framework from incremental perspective, i.e., the landmark points are selected one by one, and each next point to be sampled is determined by previously selected landmark points. Incremen- tal sampling builds explicit relationships among landmark points, thus they work together well and provide theoretical guarantee on the clustering performance. We provide two novel analysis methods and propose t- wo schemes for selecting-the-next-one of the framework. The first scheme is based on clusterability analysis, which provides better guarantee on clustering performance than schemes based on matrix approximation error analysis. The second scheme is based on loss analysis, which provides maximized predictive ability of the landmark points on the (implicit) labels of the unsampled points. Experimental results on a wide range of benchmark datasets demonstrate the superiorities of our proposed incremental sampling schemes over existing sampling schemes.

Shop Type Recommendation Leveraging the Data from Social Media and Location-based Services

It is an important yet challenging task for investors to determine the most suitable type of shop (e.g., restaurant, fashion, etc.) for a newly opened store. Traditional ways are predominantly field surveys and empirical estimation, which are not effective as they lack shop-related data. As social media and location-based services (LBS) are becoming more and more pervasive, user-generated data from these platforms is providing rich information not only about individual consumption experiences but also about shop attributes. In this paper, we investigate the recommendation of shop types for a given location, by leveraging heterogeneous data that are mainly historical user preferences and location context from social media and LBS. Our goal is to select the most suitable shop type, seeking to maximize the number of customers served from a candidate set of types. We propose a novel bias learning matrix factorization method with feature fusion for shop popularity prediction. Features are defined and extracted from two perspectives: location, where features are closely related to location characteristics, and commercial, where features are about the relationships between shops in the neighborhood. Experimental results show that the proposed method outperforms state-of-the-art solutions.

Listwise Learning to Rank from Crowds

Learning to rank has received great attention in recent years as it plays a crucial role in many applications such as information retrieval, data mining. The existing concept of learning to rank assumes that each training instance is associated with a reliable label. However, in practice, this assumption does not necessarily hold true as it may be infeasible or remarkably expensive to obtain reliable labels for many learning to rank applications. Therefore, a feasible approach is to collect labels from crowds and then learn a ranking function from crowdsourcing labels. This study explores the listwise learning to rank with crowdsourcing labels obtained from multiple annotators, who may be unreliable. A new probabilistic ranking model is first proposed by combining two existing models. Subsequently, a ranking function is trained by proposing a maximum likelihood learning approach, which estimates ground-truth labels and annotator expertise, and learns the ranking function iteratively. In practical crowdsourcing machine learning, valuable prior information (e.g., professional grades) about the annotators is normaly attainable. Therefore, this study also investigates learning to rank from crowd labels when prior information on the exptertise of involved annotators is avaliable. In particular, three basic types of prior information are investigated, and corresponding learning algorithms are consequently introduced. The proposed algorithms are tested on both synthetic and real-world data. Results reveal that the maximum likelihood approach significantly outperforms the average approach, and its results are comparable to those of the learning model in consideration reliable labels. The results of the investigation further indicate that prior information is helpful in inferring both ranking functions and expertise degrees of annotators.

Heterogeneous Translated Hashing: A Scalable Solution towards Multi-modal Similarity Search

Multi-modal similarity search has attracted considerable attention to meet the need of information retrieval across different types of media. To enable efficient multi-modal similarity search in large-scale databases recently, researchers start to study multi-modal hashing. Most of the existing methods are applied to search across multi-views among which explicit correspondence is provided. Given a multi-modal similarity search task, we observe that abundant multi-view data can be found on the Web which can serve as an auxiliary bridge. In this paper, we propose a Heterogeneous Translated Hashing (HTH) method with such auxiliary bridge incorporated not only to improve current multi-view search but also to enable similarity search across heterogeneous media which have no direct correspondence. HTH provides more flexible and discriminative ability by embedding heterogeneous media into different Hamming spaces, compared to almost all existing methods that map heterogeneous data in a common Hamming space. We formulate a joint optimization model to learn hash functions embedding heterogeneous media into different Hamming spaces, and a translator aligning different Hamming spaces. The extensive experiments on two real-world datasets, one publicly available dataset of Flickr and the other MIRFLICKR-Yahoo Answers dataset, highlight the effectiveness and efficiency of our algorithm.

#### Introduction to the Special Issue of best Papers in ACM SIGKDD 2014

World Knowledge as Indirect Supervision for Document Clustering

One of the key obstacles in making learning protocols realistic in applications is the need to supervise them, a costly process that often requires hiring domain experts. We consider the framework to use the world knowledge as indirect supervision. World knowledge is general-purpose knowledge, which is not designed for any specific domain. Then the key challenges are how to adapt the world knowledge to domains and how to represent it for learning. In this paper, we provide an example of using world knowledge for domain dependent document clustering. We provide three ways to specify the world knowledge to domains by resolving the ambiguity of the entities and their types, and represent the data with world knowledge as a heterogeneous information network. Then we propose a clustering algorithm that can cluster multiple types and incorporate the sub-type information as constraints. In the experiments, we use two existing knowledge bases as our sources of world knowledge. One is Freebase, which is collaboratively collected knowledge about entities and their organizations. The other is YAGO2, a knowledge base automatically extracted from Wikipedia and maps knowledge to the linguistic knowledge base, WordNet. Experimental results on two text benchmark datasets (20newsgroups and RCV1) show that incorporating world knowledge as indirect supervision can significantly outperform the state-of-the-art clustering algorithms as well as clustering algorithms enhanced with world knowledge features.

Scalable Clustering by Iterative Partitioning and Point Attractor Representation

Clustering very large data sets while preserving cluster quality remains a challenging data mining task to date. In this paper, we propose an effective scalable clustering algorithm for large data sets that builds upon the concept of synchronization. Inherited from the powerful concept of synchronization, the proposed algorithm, CIPA (Clustering by Iterative Partitioning and Point Attractor Representations), is capable of handling very large data sets by iteratively partitioning them into thousands of subsets and clustering each subset separately. Using dynamic clustering by synchronization, each subset is then represented by a set of point attractors and outliers. Finally, CIPA identifies the cluster structure of the original data set by clustering the newly generated data set consisting of points attractors and outliers from all subsets. We demonstrate that our new scalable clustering approach has several attractive benefits: (a) CIPA faithfully captures the cluster structure of the original data by performing clustering on each separate data iteratively instead of using any sampling or statistical summarization technique. (b) It allows clustering very large data set efficiently with high cluster quality. (c) CIPA is parallelizable and also suitable for distributed data. Extensive experiments demonstrate the effectiveness and efficiency of our approach.

#### Efficient Discovery of Association Rules and Frequent Itemsets through Sampling with Tight Performance Guarantees.

Fast Sampling for Time-Varying Determinantal Point Processes

Determinantal Point Processes (DPPs) is a stochastic model which assigns each subset of base dataset with a probability proportional to the subset's degree of diversity. It has been shown that DPPs is particularly appropriate in data (e.g. news, videos) subset selection and summarization, where diversity amongst the selected subset is preferred but other conventional models cannot offer. However, DPP's inference algorithms have a polynomial time complexity which makes it difficult to handle large and time-varying datasets, especially when real-time processing requirement is needed. To address such a limitation, we developed a fast sampling algorithm for DPPs which takes advantage of the nature of some time-varying data, where changes occur in data between each time stamp are relatively small, such as news corpora updating and communication network evolving. The algorithm proposed is built upon the simplification of marginal density functions over successive time stamps and the sequential Monte Carlo (SMC) sampling technique. Evaluations on both real-world news dataset and Enron corpus confirm the efficiency of the proposed algorithm.

Leveraging Neighbor Attributes for Classification in Sparsely-Labeled Networks

Many analysis tasks involve linked nodes, such as people connected by friendship links. Research on "link-based classification" (LBC) has studied how to leverage these connections to improve classification accuracy. Most such prior research has assumed the provision of a densely-labeled training network. Instead, this article studies the common and challenging case when LBC must use a single sparsely-labeled network for both learning and inference, a case where existing methods often yield poor accuracy. To address this challenge, we introduce a novel method that enables prediction via "neighbor attributes," which were briefly considered by early LBC work but then abandoned due to perceived problems. We then explain, using both extensive experiments and loss decomposition analysis, how using neighbor attributes often significantly improves accuracy. We further show that using appropriate semi-supervised learning (SSL) is essential to obtaining the best accuracy in this domain, and that the gains of neighbor attributes remain across a range of SSL choices and data conditions. Finally, given the challenges of label sparsity for LBC and the impact of neighbor attributes, we show that multiple previous studies must be re-considered, including studies regarding the best model features, the impact of noisy attributes, and strategies for active learning.

Product Selection Problem: Improve Market Share by Learning Consumer Behavior

It is often crucial for manufacturers to decide what products to produce so that they can increase their market share in an increasingly fierce market. To decide which products to produce, manufacturers need to analyze the consumers' requirements and how consumers make their purchase decisions so that the new products will be competitive in the market. In this paper, we first present a general distance-based product adoption model to capture consumers' purchase behavior. Using this model, various distance metrics can be used to describe different real life purchase behavior. We then provide a learning algorithm to decide which set of distance metrics one should use when we are given some accessible historical purchase data. Based on the product adoption model, we formalize the {\em \mbox{$k$ most} marketable products (or $k$-$\MMP$)} selection problem and formally prove that the problem is {\em NP-hard}. To tackle this problem, we propose an efficient greedy-based approximation algorithm with a provable solution guarantee. Using submodularity analysis, we prove that our approximation algorithm can achieve at least 63\% of the optimal solution. We apply our algorithm on both synthetic datasets and real-world datasets (TripAdvisor.com), and show that our algorithm can easily achieve five or more orders of speedup over the exhaustive search and achieve about 96\% of the optimal solution on average. Our experiments also demonstrate the robustness of our distance metric learning method, and illustrate how one can adopt it to improve the accuracy of product selection.

The Convergence Behavior of Naive Bayes on Large Sparse Datasets

Large and sparse datasets with a lot of missing values are common in the big data era, such as user behaviors over a large number of items. Classification in such datasets is an important topic for machine learning and data mining. Practically, naive Bayes is still a popular classification algorithm for large sparse datasets, as its time and space complexity scales linearly with the size of non-missing values. However, several important questions about the behavior of naive Bayes are yet to be answered. For example, how different mechanisms of missing, data sparsity and the number of attributes systematically affect the learning curves and convergence? In this paper, we address several common data missing mechanisms and propose novel data generation methods based on these mechanisms. We generate large and sparse data systematically, and study the entire AUC (Area Under ROC Curve) learning curve and convergence behavior of naive Bayes. We not only have several important experiment observations, but also provide detailed theoretic studies. Our empirical and theoretic results provide a useful guideline for classifying large sparse datasets with naive Bayes.

Catching Synchronized Behaviors in Large Networks: A Graph Mining Approach

Given a directed graph of millions of nodes, how can we automatically spot anomalous, suspicious nodes, judging only from their connectivity patterns? Suspicious graph patterns show up in many applications, from Twitter users who buy fake followers, manipulating the social network, to botnet members performing distributed denial of service attacks, disturbing the network traffic graph. We propose a fast and effective method, CATCHSYNC, which exploits two of the tell-tale signs left in graphs by fraudsters: (a) synchronized behavior: suspicious nodes have extremely similar behavior pattern, because they are often required to perform some task together (such as follow the same user); and (b) rare behavior: their connectivity patterns are very different from the majority. We introduce novel measures to quantify both concepts (synchronicity and normality) and we propose a parameter-free algorithm that works on the resulting synchronicity-normality plots. Thanks to careful design, CATCHSYNC has the following desirable properties: (a) it is scalable to large datasets, being linear on the graph size; (b) it is parameter free; and (c) it is side-information-oblivious: it can operate using only the topology, without needing labeled data, nor timing information, etc., while still capable of using side information, if available. We applied CATCHSYNC on three large, real datasets 1-billion-edge Twitter social graph, 3-billion-edge and 12-billion-edge Tencent Weibo social graphs, and several synthetic ones; CATCHSYNC consistently outperforms existing competitors, both in detection accuracy by 36% on Twitter and 20% on Tencent Weibo, as well as in speed.

#### Parallel Field Ranking

Latent Time-Series Motifs

Motifs are the most repetitive/frequent patterns of a time-series. The discovery of motifs is crucial for practitioners in order to understand and interpret the phenomena occurring in sequential data. Currently, motifs are searched among series sub-sequences, aiming at selecting the most frequently occurring ones. Search-based methods, which try out series sub-sequence as motif candidates, are currently believed to be the best methods in finding the most frequent patterns. However, this paper proposes an entirely new perspective in finding motifs. We demonstrate that searching is non-optimal since the domain of motifs is restricted, and instead we propose a principled optimization approach able to find optimal motifs. We treat the occurrence frequency as a function and time-series motifs as its parameters, therefore we learn the optimal motifs that maximize the frequency function. In contrast to searching, our method is able to discover the most repetitive patterns (hence optimal), even in cases where they do not explicitly occur as sub-sequences. Experiments on several real-life time-series datasets show that the motifs found by our method are highly more frequent than the ones found through searching, for exactly the same distance threshold.

#### Distributed Algorithms for Computing Very Large Thresholded Covariance Matrices

Modeling of Geographical Dependencies for Real Estate Ranking

It is traditionally a challenge for home buyers to understand, compare and contrast the investment values of estates. While a number of estate appraisal methods have been developed to value real properties, the performances of these methods have been limited by the traditional data sources for estate appraisal. With the development of new ways of collecting estate-related mobile data, there is a potential to leverage geographic dependencies of estates for enhancing estate appraisal. Indeed, the geographic dependencies of the investment value of an estate can be from the characteristics of its own neighborhood (individual), the values of its nearby estates (peer), and the prosperity of the affiliated latent business area (zone). To this end, in this paper, we propose a geographic method, named ClusRanking, for estate appraisal by leveraging the mutual enforcement of ranking and clustering power. ClusRanking is able to exploit geographic individual, peer, and zone dependencies in a probabilistic ranking model. Specifically, we first extract the geographic utility of estates from geography data, estimate the neighborhood popularity of estates by mining taxicab trajectory data, and model the influence of latent business areas. Also, we fuse these three influential factors and predict real estate investment value. Moreover, we simultaneously consider individual, peer and zone dependencies, and derive an estate-specific ranking likelihood as the objective function. Furthermore, we propose an improved method named CR-ClusRanking by incorporating checkin information as a regularization term which reduces the performance volatility of estate ranking system. Finally, we conduct a comprehensive evaluation with the real estate related data of Beijing, and the experimental results demonstrate the effectiveness of our proposed methods.

Greedily improving our own closeness centrality in a network

The closeness centrality is a well-known measure of importance of a vertex within a given complex network. Having high closeness centrality can have positive impact on the vertex itself: hence, in this paper we consider the optimisation problem of determining how much a vertex can increase its centrality by creating a limited amount of new edges incident to it. We will consider both the undirected and the directed graph case. In both cases, we first prove that the optimisation problem does not admit a polynomial-time approximation scheme (unless P=NP), and we then propose a greedy approximation algorithm (with an almost tight approximation ratio), whose performance is then tested on synthetic graphs and real-world networks.

#### Batch Mode Active Sampling based on Marginal Probability Distribution Matching

Convex Sparse PCA for Unsupervised Feature Analysis

Principal component analysis (PCA) has been widely applied to dimensionality reduction and data pre-processing for different applications in engineering, biology and social science. Classical PCA and its variants seek for linear projections of the original variables to obtain a low dimensional feature representation with maximal variance. One limitation is that it is very difficult to interpret the results of PCA. In addition, the classical PCA is vulnerable to certain noisy data. In this paper, we propose a convex sparse principal component analysis (CSPCA) algorithm and apply it to feature analysis. First we show that PCA can be formulated as a low-rank regression optimization problem. Based on the discussion, the $l_{2,1}$-norm minimization is incorporated into the objective function to make the regression coefficients sparse, thereby robust to the outliers. In addition, based on the sparse model used in CSPCA, an optimal weight is assigned to each of the original feature, which in turn provides the output with good interpretability. With the output of our CSPCA, we can effectively analyze the importance of each feature under the PCA criteria. The objective function is convex, and we propose an iterative algorithm to optimize it. We apply the CSPCA algorithm to feature selection and conduct extensive experiments on six different benchmark datasets. Experimental results demonstrate that the proposed algorithm outperforms state-of-the-art unsupervised feature selection algorithms.

Permanence and Community Structure in Complex Networks

The goal of community detection algorithms is to identify densely-connected units within large networks. An implicit assumption is that all the constituent nodes belong equally to their associated community. As a result, to date, efforts have been primarily driven to identify communities as a whole, rather than understanding by how much an individual node belongs to its community. In this paper, we argue that the belongingness of nodes in a community is not uniform. We quantify the degree of belongingness of a vertex within a community by a new vertex-based metric called permanence. The central idea of permanence is based on the observation that the strength of membership of a vertex to a community depends upon two factors: (i) the distribution of the connectivity of the vertex to individual communities, and (ii) how tightly the vertex is connected internally. We present the formulation of permanence based on these two quantities. We demonstrate that compared to other existing metrics, the change in permanence is more commensurate to the level of perturbation in ground-truth communities. We discuss how permanence can help us understand and utilize the structure and evolution of communities in a network. We further show that permanence is an excellent metric for identifying communities. We show that the process of maximizing permanence (abbreviated as MaxPerm) produces meaningful communities that concur with the ground-truth community structure of the networks more accurately than eight other popular community detection algorithms. Finally, we provide mathematical proofs to demonstrate the correctness of finding communities by maximizing permanence. In particular, we show that the communities obtained by this method are (i) less affected by the changes in vertex-ordering, and (ii) more resilient to resolution limit, degeneracy of solutions and asymptotic growth of values.

### Bibliometrics

First Name Last Name Award
John Canny ACM Doctoral Dissertation Award (1987)
Carlos A. Castillo ACM Senior Member (2014)
Chris Clifton ACM Senior Member (2006)
Graham R. Cormode ACM Distinguished Member (2013)
Benjamin Fung ACM Senior Member (2013)
John E Hopcroft ACM Karl V. Karlstrom Outstanding Educator Award (2008)
ACM A. M. Turing Award (1986)
Piotr Indyk ACM Paris Kanellakis Theory and Practice Award (2012)
Jon Kleinberg ACM AAAI Allen Newell Award (2014)
ACM-Infosys Foundation Award in the Computing Sciences (2008)
Chih-Jen Lin ACM Distinguished Member (2011)
ACM Senior Member (2010)
Sethuraman Panchanathan ACM Senior Member (2009)
Jian Pei ACM Senior Member (2007)
Domenico Sacca ACM Senior Member (2007)
Qiang Yang ACM Distinguished Member (2011)
Ben Yanbin Zhao ACM Distinguished Member (2015)

First Name Last Name Paper Counts
Christos Faloutsos 11
Jieping Ye 7
Tao Li 5
Philip Yu 4
Jian Pei 4
Shenghuo Zhu 4
Huan Liu 4
Aristides Gionis 4
Heng Huang 3
Yun Chi 3
Evimaria Terzi 3
Yihong Gong 3
John Lui 3
Feiping Nie 3
Hong Cheng 3
Dingding Wang 3
Fabio Fassetti 3
Christopher Jermaine 3
Lise Getoor 3
John Hopcroft 3
Fabrizio Angiulli 3
Jure Leskovec 3
Hui Xiong 3
Malik Magdon-Ismail 3
Mingsyan Chen 3
Jilles Vreeken 3
Zhihua Zhou 3
Lei Tang 3
Yasushi Sakurai 3
Jilei Tian 2
Ping Luo 2
B Prakash 2
Andrea Esuli 2
Antonella Guzzo 2
Shinjae Yoo 2
Yuru Lin 2
Ian Davidson 2
Guofei Jiang 2
Jiawei Han 2
Ruoming Jin 2
Zhiwen Yu 2
Daniel Kifer 2
Antônio Loureiro 2
Hanghang Tong 2
Xiang Zhang 2
Laks Lakshmanan 2
Jimeng Sun 2
Don Towsley 2
Sucheta Soundarajan 2
Enhong CHEN 2
Qi Liu 2
Belle Tseng 2
Vivekanand Gopalkrishnan 2
Jie Tang 2
Xiaoli Fern 2
Eugene Agichtein 2
Charalampos Tsourakakis 2
Dantong Yu 2
Carlotta Domeniconi 2
Sanjay Ranka 2
Jiliang Tang 2
Srinivasan Parthasarathy 2
Bin Guo 2
Hari Sundaram 2
Joydeep Ghosh 2
Wei Fan 2
Dino Pedreschi 2
Hao Huang 2
Hong Qin 2
Yehuda Koren 2
Pinghui Wang 2
Heikki Mannila 2
Panayiotis Tsaparas 2
Jianhui Chen 2
Yu Zhang 2
Fabrizio Sebastiani 2
Arthur Zimek 2
Michalis Vazirgiannis 2
Jin Huang 2
Geoffrey Webb 2
Indrajit Bhattacharya 2
Xiao Yu 2
Junzhou Zhao 2
Xiaohong Guan 2
Panagis Magdalinos 2
A Patterson 1
Manolis Kellis 1
Carlos Castillo 1
Tianbing Xu 1
Sanmay Das 1
Amit Dhurandhar 1
Beechung Chen 1
Elizabeth Chang 1
Li Wan 1
Weekeong Ng 1
Sethuraman Panchanathan 1
Yu Lei 1
Maria Sapino 1
Shipeng Yu 1
Zhiting Hu 1
Pedro Melo 1
Yuan Jiang 1
Qinbao Song 1
Michele Coscia 1
Yi Wang 1
Lian Duan 1
Bruno Ribeiro 1
Siyuan Liu 1
Aminul Islam 1
Michael Mampaey 1
Matteo Riondato 1
Charles Elkan 1
Jaideep Srivastava 1
João Gama 1
Julian McAuley 1
Carlos Guestrin 1
Naonori Ueda 1
Qi Lou 1
Wei Fan 1
Tomoharu Iwata 1
Xifeng Yan 1
Feiyu Xiong 1
Shiqiang Tao 1
Guoqiang Zhang 1
Fei Wang 1
Kosuke Hashimoto 1
Nobuhisa Ueda 1
Lei Zhang 1
Jie Tang 1
Ricardo Campello 1
Xianchao Zhang 1
Haiqin Yang 1
Aparna Varde 1
Shuhui Wang 1
Jeffrey Chan 1
Michael Houle 1
Luigi Pontieri 1
Bingrong Lin 1
Francesco Bonchi 1
Pedro Vaz De Melo 1
Wei Ding 1
Dimitrios Gunopulos 1
Daxin Jiang 1
Muna Al-Razgan 1
Mohsen Bayati 1
Peilin Zhao 1
Noman Mohammed 1
Chao Liu 1
Dacheng Tao 1
Jaideep Vaidya 1
Collin Stultz 1
Boleslaw Szymanski 1
Maguelonne Teisseire 1
Paolo Boldi 1
Lini Thomas 1
Sachindra Joshi 1
Tharam Dillon 1
Yixin Chen 1
Xuanhong Dang 1
Kasim Candan 1
Yong Ge 1
S Upham 1
Thomas Porta 1
Hongzhi Yin 1
Dora Erdős 1
Joydeep Ghosh 1
Kaiyuan Zhang 1
Carlos Ordonez 1
Fosca Giannotti 1
James Cheng 1
Li Zheng 1
U Kang 1
Peter Christen 1
Raymond Wong 1
Aniket Chakrabarti 1
Saurabh Kataria 1
Irwin King 1
Ling Liu 1
Huilei He 1
Hua Wang 1
Fei Zou 1
Francesco Gullo 1
Virgílio Almeida 1
Christos Faloutsos 1
Laiwan Chan 1
Gianluigi Greco 1
Guimei Liu 1
Nitin Agarwal 1
Kunta Chuang 1
Anthony Tung 1
Sigal Sina 1
S Muthukrishnan 1
Sarit Kraus 1
Chris Ding 1
Lior Rokach 1
Dityan Yeung 1
Amin Saberi 1
Matthew Rattigan 1
Limin Yao 1
Kristina Lerman 1
Cheukkwong Lee 1
Olvi Mangasarian 1
Chris Clifton 1
Mohammed Zaki 1
Jennifer Dy 1
Shaojun Wang 1
Loïc Cerf 1
Henry Tan 1
Min Ding 1
Jennifer Neville 1
Gensheng Zhang 1
Yiming Yang 1
Ayan Acharya 1
Sreangsu Acharyya 1
Arnold Boedihardjo 1
Changtien Lu 1
Zhiqiang Xu 1
Geoffrey Barbier 1
Christophe Giraud-Carrier 1
Kaiming Ting 1
Zhongfei Zhang 1
Matthew Rowe 1
Edward Chang 1
Bruno Abrahão 1
Xiaolin Wang 1
Tingting Gao 1
Kazumi Saito 1
Longjie Li 1
ChengXiang Zhai 1
Dong Xin 1
Christian Böhm 1
Dafna Shahaf 1
Stephen Fienberg 1
Raviv Raich 1
Bilson Campana 1
Vibhor Rastogi 1
Deng Cai 1
Yanjun Qi 1
Theodoros Lappas 1
Wenjie Li 1
Leman Akoglu 1
Chen Chen 1
Munmun De Choudhury 1
T Murali 1
Kiyoko Aoki-Kinoshita 1
Ravi Janardan 1
Sudhir Kumar 1
Siqi Shen 1
Lei Li 1
Xinran He 1
Tiancheng Lou 1
Giacomo Berardi 1
Zhu Wang 1
Xiaotong Zhang 1
Han Liu 1
Kathleen Carley 1
Xiaodan Song 1
Guna Seetharaman 1
Yasuhiro Fujiwara 1
Wei Wang 1
ChienWei Chen 1
Weiyin Loh 1
Shumo Chu 1
Ming Li 1
Jeffrey Erman 1
Daniel Dunlavy 1
Christos Doulkeridis 1
Joao Duarte 1
David Dominguez-Sal 1
Christo Wilson 1
Ben Zhao 1
Steven Skiena 1
Hiroshi Motoda 1
Danai Koutra 1
Chris Volinsky 1
Andreas Krause 1
Hsiangfu Yu 1
Binbin Lin 1
Johannes Gehrke 1
Leonid Hrebien 1
Pei Yang 1
Li Li 1
Denian Yang 1
Zhishan Guo 1
Yunsing Koh 1
Yijuan Lu 1
Feng Liu 1
Yufeng Wang 1
Ernest Garcia 1
Shamkant Navathe 1
Cheng Zeng 1
Atreya Srivathsan 1
Tong Sun 1
Rezwan Ahmed 1
Wei Wei 1
Duygu Ucar 1
Wei Fan 1
Mustafa Bilgic 1
Ben Kao 1
David Cheung 1
Christopher Leckie 1
Seekiong Ng 1
Hong Xie 1
Kui Yu 1
Ron Eyal 1
Avi Rosenfeld 1
Asaf Shabtai 1
Shifeng Weng 1
Kun Liu 1
Dmitry Pavlov 1
Raymond Ng 1
Piotr Indyk 1
Christopher Carothers 1
Anne Laurent 1
Satyanarayana Valluri 1
Ashish Verma 1
Jérémy Besson 1
Raghu Ramakrishnan 1
Rong Ge 1
Byronju Gao 1
Li Tu 1
Saharon Rosset 1
Claudia Perlich 1
Ramana Kompella 1
Vasileios Kandylas 1
Salvatore Ruggieri 1
Jing Zhang 1
Rodrigo Alves 1
Juhua Hu 1
Giulio Rossetti 1
Yanchi Liu 1
Songhua Xu 1
Duo Zhang 1
Tuannhon Dang 1
Chengkai Li 1
Timothy De Vries 1
Yu Jin 1
Eric Xing 1
Albert Bifet 1
Xiaoming Li 1
Josep Brunat 1
Jiang Bian 1
Claudia Plant 1
Jiayu Pan 1
Brandon Westover 1
Eamonn Keogh 1
Yubao Wu 1
1
Hamid Rabiee 1
Fernando Kuipers 1
Dick Epema 1
Min Wang 1
Linpeng Tang 1
Michael Lyu 1
Dityan Yeung 1
Evangelos Papalexakis 1
Nicholas Sidiropoulos 1
George Karypis 1
Jilei Tian 1
Davoud Moulavi 1
Koji Hino 1
Qiang Qu 1
Masaru Kitsuregawa 1
Jenwei Huang 1
James Bailey 1
Xiang Zhang 1
Jianping Zhang 1
Manas Somaiya 1
Graham Cormode 1
Maya Bercovitch 1
Bin Li 1
Marc Maier 1
Mohamed Bouguessa 1
Mingxi Wu 1
Benjamin Fung 1
Ye Chen 1
John Canny 1
Dominique Laurent 1
Yeowwei Choong 1
Luca Becchetti 1
Ying Cui 1
Meghana Deodhar 1
Keli Xiao 1
Bo Long 1
Hans Kriegel 1
Martin Ester 1
Ling Feng 1
Kuan Zhang 1
Vetle Torvik 1
Luigi Moccia 1
Edoardo Serra 1
Claudio Schifanella 1
Nesreen Ahmed 1
Min Wang 1
Ali Pınar 1
Michail Vlachos 1
Ling Chen 1
Yang Liu 1
Chunxiao Xing 1
Dechuan Zhan 1
Saurabh Paul 1
Jose Hern´ndez-Orallo 1
Rainer Gemulla 1
Guangtao Wang 1
Xueying Zhang 1
Yiping Ke 1
William Street 1
Lionel Ni 1
Gunjan Gupta 1
Diana Inkpen 1
Shuiwang Ji 1
Eli Upfal 1
Ruggero Pensa 1
Evrim Acar 1
Yang Zhou 1
Charu Aggarwal 1
Ben London 1
Jirong Wen 1
Joseph Ruiz Md 1
Masahiro Kimura 1
Neil Shah 1
Alexander Ihler 1
Kaiwei Chang 1
Forrest Briggs 1
Gustavo Batista 1
Qiang Zhu 1
Philip Yu 1
Jure Leskovec 1
Jon Kleinberg 1
Hongxia Yang 1
Haoda Fu 1
Dawei Zhou 1
Jingrui He 1
Liming Chen 1
1
Shebuti Rayana 1
Wei Wang 1
Michalis Faloutsos 1
Naren Ramakrishnan 1
Qi Tian 1
Jennifer Neary 1
Minoru Kanehisa 1
Alexandru Iosup 1
Reza Zafarani 1
Francesco Lupia 1
Nima Mirbakhsh 1
Antti Ukkonen 1
John Salerno 1
Nitin Kumar 1
Xindong Wu 1
Flip Korn 1
Ying Wang 1
Ke Wang 1
Benoît Dumoulin 1
Xiuyao Song 1
John Gums 1
Yin Zhang 1
Zhongfei Zhang 1
Yunxin Zhao 1
Jude Shavlik 1
Qian Sun 1
Xiaohui Lu 1
Domenico Saccà 1
Zheng Wang 1
Johannes Schneider 1
Bin Cui 1
Chengqi Zhang 1
Juanzi Li 1
Christos Boutsidis 1
Bingsheng Wang 1
Chris Ding 1
Jing Zhang 1
Scott Burton 1
Hui Ke 1
Qingyan Yang 1
Patrick Haffner 1
Zhili Zhang 1
Tamara Kolda 1
Jie Wang 1
Karthik Subbian 1
Yulan He 1
Galileo Namata 1
John Frenzel MD 1
Hua Duan 1
Yandong Liu 1
Erheng Zhong 1
Wei Fan 1
Qiang Yang 1
Joshua Vogelstein 1
Qiaozhu Mei 1
Suresh Iyengar 1
Jiawei Han 1
Ashwin Machanavajjhala 1
Beilun Wang 1
Chihya Shen 1
Zhitao Wang 1
Jingrui He 1
Ali Hemmatyar 1
Wei Cheng 1
Saurav Sahay 1
Lei Zou 1
Luming Zhang 1
Jian Wang 1
Manos Papagelis 1
Ruud Van De Bovenkamp 1
Wei Peng 1
Clyde Giles 1
Xiaowen Ding 1
Jörg Sander 1
Siyuan Liu 1
Charles Ling 1
Mengling Feng 1
Maria Halkidi 1
David Gleich 1
Steven Hoi 1
David Jensen 1
Glenn Fung 1
Zeeshan Syed 1
Kamalakar Karlapalem 1
Dale Schuurmans 1
Peer Kröger 1
Céline Robardet 1
Jean Boulicaut 1
Zengjian Hu 1
Boaz Ben-Moshe 1
Neil Smalheiser 1
Shachar Kaufman 1
Ori Stitelman 1
Leland Wilkinson 1
Hockhee Ang 1
Steven Hoi 1
Weekeong Ng 1
Xiao Jiang 1
Lyle Ungar 1
Franco Turini 1
Luan Tang 1
Quanquan Gu 1
Xintao Wu 1
Petros Drineas 1
Tengfei Bao 1
Brook Wu 1
Dimitrios Mavroeidis 1
James Cheng 1
Nikolaj Tatti 1
José Balcázar 1
Sanjay Chawla 1
Jianyong Wang 1
Chun Li 1
Feitony Liu 1
Nick Duffield 1
Jinpeng Wang 1
Arnau Prat-Pérez 1
Josep Larriba-Pey 1
Risa Myers 1
Qingtian Zeng 1
Robert Kleinberg 1
Zhi Yang 1
Yafei Dai 1
Victor Lee 1
Brian Gallagher 1
John Hutchins 1
Taneli Mielikäinen 1
Ji Liu 1
Manuel Gomez-Rodriguez 1
Sethuraman Panchanathan 1
Abdullah Mueen 1
Yizhou Sun 1
Xiaofei He 1
Muthuramakrishnan Venkitasubramaniam 1
Moshe Kam 1
Jieping Ye 1
Licong Cui 1
Xiaofeng Zhu 1
Ying Jin 1
Hiroshi Mamitsuka 1
Sitaram Asur 1
Jerry Kiernan 1
Kevin Yip 1
Wei Zheng 1
Zhenxing Wang 1
Dan Simovici 1
Hao Wang 1
Yuval Elovici 1
Ming Lin 1
Changshui Zhang 1
Ravi Konuru 1
Fan Guo 1
Edward Wild 1
Murat Kantarcıoğlu 1
John Guttag 1
Marc Plantevit 1
Shantanu Godbole 1
Alin Dobra 1
Binay Bhattacharya 1
Bin Zhou 1
Anushka Anand 1
Yicheng Tu 1
Siddharth Gopal 1
Alice Leung 1
Renato Assunção 1
Pauli Miettinen 1
Eduardo Hruschka 1
Hongliang Fei 1
Jun Huan 1
Baoxing Huai 1
Hengshu Zhu 1
Pritam Gundecha 1
Lei Chen 1
Jinlin Chen 1
Ana Appel 1
Dino Ienco 1
Rosa Meo 1
Subhabrata Sen 1
Jeffreyxu Yu 1
Zhen Guo 1
Yashu Liu 1
Waynexin Zhao 1
Faming Lu 1
Andrew Mehler 1
Stephen North 1
Seungil Huh 1
Chojui Hsieh 1
Chihjen Lin 1
Zheng Wang 1
Thanawin Rakthanmanon 1
Jesin Zakaria 1
Kedar Bellare 1
Brandon Norick 1
Jiawei Han 1
Ming Ji 1
Wangchien Lee 1
Sri Ravana 1
Sougata Mukherjea 1
Ashwin Ram 1
Liang Hong 1
Venu Satuluri 1
Hunghsuan Chen 1
Rose Yu 1
Yan Liu 1
Yao Zhang 1
Zhanpeng Fang 1
Yang Zhou 1
Xinjiang Lu 1
Dengyong Zhou 1
Jing Peng 1
Ming Zhang 1
Biru Dai 1
Haojun Zhang 1
Limsoon Wong 1
Hungleng Chen 1
Zhenjie Zhang 1
Divesh Srivastava 1
Aisling Kelliher 1
Paul Castro 1
Anon Plangprasopchok 1
Shengrui Wang 1
Patrick Hung 1
Ganesh Ramesh 1

Affiliation Paper Counts
Oracle Corporation 1
Lanzhou University 1
Northeastern University 1
Research Organization of Information and Systems National Institute of Informatics 1
University of Malaya 1
University of Milan 1
Temple University 1
Syracuse University 1
University of Queensland 1
Curtin University of Technology, Perth 1
University of Roma La Sapienza 1
University of the Saarland 1
Institute of Mathematics and Informatics Lithuanian 1
Amazon.com, Inc. 1
Harvard School of Engineering and Applied Sciences 1
Ariel University Center of Samaria 1
Siemens USA 1
Microsoft Research Asia 1
Innopolis University 1
Ryukoku University 1
Cemagref 1
University of Michigan 1
Anhui University 1
University of Ontario Institute of Technology 1
Universite de Cergy-Pontoise 1
Princeton University 1
Queens College, City University of New York 1
University of Arkansas - Fayetteville 1
Yale University 1
University of Auckland 1
University of Missouri-Columbia 1
John F. Kennedy School of Government 1
The University of North Carolina at Charlotte 1
University of South Florida Tampa 1
Valley Laboratory 1
University of Salford 1
Hong Kong Polytechnic University 1
Australian National University 1
University of Texas at Dallas 1
University of Vermont 1
Nanjing University of Science and Technology 1
Washington University in St. Louis 1
HP Labs 1
BBN Technologies 1
Air Force Research Laboratory Information Directorate 1
University of Shizuoka 1
MITRE Corporation 1
Norwegian University of Science and Technology 1
Indian Institute of Science 1
Zhejiang Wanli University 1
Aston University 1
University of Southern California, Information Sciences Institute 1
John Carroll University 1
Brigham and Women's Hospital 1
University of Toronto 1
De Montfort University 1
Wright State University 1
Singapore Management University 1
Air Force Research Laboratory 1
IBM 1
Universite Montpellier 2 Sciences et Techniques 1
Nanjing University of Aeronautics and Astronautics 1
University of Connecticut 1
Industrial Technology Research Institute of Taiwan 1
Hong Kong Red Cross Blood Transfusion Service 1
Nokia USA 1
Universite Claude Bernard Lyon 1 1
Lancaster University 1
Osaka University 1
University of Iowa 1
University of California, Berkeley 1
Shanghai Jiaotong University 1
Wright-Patterson AFB 1
Eli Lilly and Company 1
Swiss Federal Institute of Technology, Zurich 1
Lawrence Livermore National Laboratory 1
Stevens Institute of Technology 1
Jerusalem College of Technology 1
University of California, Los Angeles 1
National Taiwan University of Science and Technology 1
Max Planck Institute for Informatics 2
Hefei University of Technology 2
Zhejiang University 2
Institute of High Performance Computing, Singapore 2
Johns Hopkins University 2
Tel Aviv University 2
University of Minnesota System 2
University of Houston 2
The University of Hong Kong 2
Brigham Young University 2
The University of Western Ontario 2
Brown University 2
Montclair State University 2
Hong Kong Baptist University 2
Renmin University of China 2
University of California, Davis 2
Drexel University 2
University of Texas M. D. Anderson Cancer Center 2
University of Kansas Lawrence 2
University of Quebec in Outaouais 2
Institute for Systems and Computer Engineering of Porto 2
University of Virginia 2
University of Massachusetts Boston 2
University of Tokyo 2
University Michigan Ann Arbor 2
Nokia 2
University of Athens 2
IBM Zurich Research Laboratory 2
Kent State University 2
University of California, San Diego 2
Rutgers University 2
Istituto di Scienza e Tecnologie dell'Informazione A. Faedo 2
Qatar Computing Research institute 2
International Institute of Information Technology Hyderabad 3
Shandong University of Science and Technology 3
Bar-Ilan University 3
Dalian University of Technology 3
Rice University 3
University of Pennsylvania 3
University of California, Irvine 3
University of Sao Paulo 3
The University of British Columbia 3
University of Kentucky 3
INSA Lyon 3
George Mason University 3
Xerox Corporation 3
Binghamton University State University of New York 3
Italian National Research Council 3
University of Sydney 3
Microsoft 3
University of Melbourne 3
University of California, Santa Barbara 3
Wuhan University 3
University of Southern California 3
University of Alberta 3
Rutgers University-Newark Campus 4
Emory University 4
Institute for Infocomm Research, A-Star, Singapore 4
Brookhaven National Laboratory 4
Universitat Politecnica de Catalunya 4
IBM Research 4
University of Antwerp 4
National University of Singapore 4
Athens University of Economics and Business 4
Monash University 4
Boston University 4
Massachusetts Institute of Technology 4
Ben-Gurion University of the Negev 4
University of Pisa 4
Yahoo Research Barcelona 4
Aalto University 4
Case Western Reserve University 5
Pennsylvania State University 5
University of Texas at San Antonio 5
Ohio State University 5
Purdue University 5
Kyoto University 5
University of Turin 5
Oregon State University 5
Sandia National Laboratories 5
Microsoft Research 5
New Jersey Institute of Technology 5
University of Technology Sydney 5
The University of North Carolina at Chapel Hill 5
Delft University of Technology 6
AT&T Laboratories Florham Park 6
University of Massachusetts Amherst 6
Nippon Telegraph & Telephone 6
Ludwig Maximilian University of Munich 6
University of Minnesota Twin Cities 6
Yahoo Inc. 6
Hong Kong University of Science and Technology 7
University of Florida 7
Peking University 7
University of Science and Technology of China 7
Georgia Institute of Technology 7
University of Maryland 7
Virginia Tech 7
University of California, Riverside 7
Federal University of Minas Gerais 7
Nanjing University 7
Northwestern Polytechnical University China 8
Nanyang Technological University 8
Stanford University 8
University of Texas at Austin 8
IBM Thomas J. Watson Research Center 8
Xi'an Jiaotong University 8
Stony Brook University 8
Yahoo Research Labs 8
National Taiwan University 9
Florida International University 9
University of Illinois at Chicago 10
Rensselaer Polytechnic Institute 11
Cornell University 12
Simon Fraser University 12
University of Calabria 12
University of Texas at Arlington 13
University of Illinois at Urbana-Champaign 15
NEC Laboratories America, Inc. 15
Chinese University of Hong Kong 17
Tsinghua University 18
Carnegie Mellon University 29
Arizona State University 43

### ACM Transactions on Knowledge Discovery from Data (TKDD) Archive

#### 2016

Volume 10 Issue 4, June 2016  Issue-in-Progress
Volume 10 Issue 3, February 2016

#### 2015

Volume 10 Issue 2, October 2015
Volume 10 Issue 1, July 2015
Volume 9 Issue 4, June 2015
Volume 9 Issue 3, April 2015 TKDD Special Issue (SIGKDD'13)

#### 2014

Volume 9 Issue 2, November 2014
Volume 9 Issue 1, October 2014
Volume 8 Issue 4, October 2014
Volume 8 Issue 3, June 2014
Volume 8 Issue 2, June 2014
Volume 8 Issue 1, February 2014 Casin special issue

#### 2013

Volume 7 Issue 4, November 2013
Volume 7 Issue 3, September 2013 Special Issue on ACM SIGKDD 2012
Volume 7 Issue 2, July 2013
Volume 7 Issue 1, March 2013

#### 2012

Volume 6 Issue 4, December 2012 Special Issue on the Best of SIGKDD 2011
Volume 6 Issue 3, October 2012
Volume 6 Issue 2, July 2012
Volume 6 Issue 1, March 2012
Volume 5 Issue 4, February 2012

#### 2011

Volume 5 Issue 3, August 2011
Volume 5 Issue 2, February 2011

#### 2010

Volume 5 Issue 1, December 2010
Volume 4 Issue 3, October 2010
Volume 4 Issue 4, October 2010
Volume 4 Issue 2, May 2010
Volume 4 Issue 1, January 2010

#### 2009

Volume 3 Issue 4, November 2009
Volume 3 Issue 3, July 2009
Volume 3 Issue 2, April 2009
Volume 3 Issue 1, March 2009
Volume 2 Issue 4, January 2009

#### 2008

Volume 2 Issue 3, October 2008
Volume 2 Issue 2, July 2008
Volume 2 Issue 1, March 2008
Volume 1 Issue 4, January 2008

#### 2007

Volume 1 Issue 3, December 2007
Volume 1 Issue 2, August 2007
Volume 1 Issue 1, March 2007