Community detection in social networks has become a popular topic of research during the last decade. There exist a variety of algorithms for modularizing the network graph into different communities. However, they mostly assume that partial or complete information of the network graphs are available which is not feasible in many cases. In this paper, we focus on detecting communities by exploiting their diffusion information. We utilize the Conditional Random Field (CRF) to model the behavior of diffusion process. The proposed method (CoDi) does not require any prior knowledge of the network structure or specific properties of communities. Furthermore, in contrast to the structure based community detection methods, this method is able to identify the hidden communities. The experimental results indicate considerable improvements in detecting communities based on accuracy, scalability and real cascade information measures.
Large graphs arise in a number of contexts and understanding their structure and extracting information from them is an important research area. Early algorithms on mining communities have focused on the global structure, and often run in time functional to the size of the entire graph. Nowadays, as we often explore networks with billions of vertices and find communities of size hundreds, it is crucial to shift our attention from macroscopic structure to microscopic structure in large networks. A growing body of work has been adopting local expansion methods in order to identify the community members from a few exemplary seed members. In this paper, we propose a novel approach for finding overlapping communities called LEMON (Local Expansion via Minimum One Norm). The algorithm finds the community by seeking a sparse vector in the span of the local spectra such that the seeds are in its support. We show that LEMON can achieve the highest detection accuracy among state-of-the-art proposals. The running time depends on the size of the community rather than that of the entire graph. The algorithm is easy to implement, and is highly parallelizable. We further provide theoretical analysis on the local spectral properties, bounding the measure of tightness of extracted community in terms of the eigenvalues of graph Laplacian. Moreover, given that networks are not all similar in nature, a comprehensive analysis on how the local expansion approach is suited for uncovering communities in different networks is still lacking. We thoroughly evaluate our approach using both synthetic and real-world datasets across different domains, and analyze the empirical variations when applying our method to inherently different networks in practice. In addition, the heuristics on how the seed set quality and quantity would affect the performance are provided.
In many commercial campaigns, we observe that there exists a trade-off between the number of customers satisfied by the company and the profit gained. Merely satisfying as many customers as possible or maximizing the profit is not desirable. To this end, in this paper, we propose a new problem called k-Satisfiability Assignment for Maximizing the Profit (k-SAMP) where k is a user parameter and a non-negative integer. Given a set P of products and a set O of customers, k-SAMP is to find an assignment between P and O such that at least k customers are satisfied in the assignment and the profit incurred by this assignment is maximized. Although we find that this problem is closely related to two classic computer science problems, namely maximum weight matching and maximum matching, the techniques developed for these classic problems cannot be adapted to our k-SAMP problem. In this work, we design a novel algorithm called Adjust for the k-SAMP problem. Given an assignment A, Adjust iteratively increases the profit of A by adjusting some appropriate matches in A while keeping at least k customers satisfied in A. We prove that Adjust returns a global optimum. Extensive experiments were conducted which verified the efficiency of Adjust.
The seamless integration of community discovery and role assignment has been recently proposed as an unsupervised approach to the exploratory analysis of networks, aimed to unveil the participation of nodes in multiple overlapping communities along with the roles played therein. One limitation of these prototypical research efforts is that the formation of ties is not truly realistic, since it does not account for a fundamental aspect of link establishment in real-world networks, i.e., the explicative reasons that cause interactions among nodes. Such reasons can be abstractedly interpreted as generic requirements of nodes that are met by other nodes and essentially pertain both to the nodes themselves and to their different interaction contexts (i.e., the respective communities and roles). In this manuscript, we present two new model-based machine-learning approaches, wherein community discovery and role assignment are tightly integrated and simultaneously performed through approximate posterior inference in Bayesian mixed-membership models of directed networks. The two proposed models account for the explicative reasons governing link establishment in terms of node-specific and contextual latent interaction factors. The former are inherently characteristic of nodes, while the latter are characterizations of nodes in the context of the individual communities and roles. The generative process of the devised models assigns nodes to communities with respective roles and connects them through directed links, which are probabilistically governed by their node-specific and contextual interaction factors. The two proposed models differ in the impact of the contextual interaction factors on link generation. We develop MCMC algorithms implementing approximate posterior inference and parameter estimation within our models. Finally, we demonstrate their superiority in community compactness and link prediction via an intensive comparative experimentation on real-world and synthetic networks.
Data streams are a new class of data that is becoming pervasively important in a wide range of applications, ranging from sensor networks, environmental monitoring to finance. In this paper, we propose a novel framework for online characterisation and diagnosing the evolution of multidimensional streaming data which incorporates Recursive Wavelet Density Estimators into the context of Velocity Density Estimation. In the proposed framework changes in streaming data are characterised by the use of local and global evolution coefficients. In addition, we propose for the analysis of changes in the correlation structure of the data a recursive implementation of Pearson correlation coefficient using exponential discounting. Two visualisation tools namely, temporal and spatial velocity profiles, are extended in the context of our framework. Three are the main advantages of the proposed method over previous approaches: 1) the memory storage required is minimal and independent of any window size; 2) it has a significantly lower computational complexity; and 3) it makes possible the fast diagnosis of data evolution at all dimensions and at relevant combinations of dimensions with only one pass of the data. With the help of three examples, we show the frameworks relevance in a change detection context and its potential capability for real world applications.