Knowledge Discovery from Data (TKDD)


ACM Transactions on Knowledge Discovery from Data (TKDD), Volume 3 Issue 3, July 2009

Author name disambiguation in MEDLINE
Vetle I. Torvik, Neil R. Smalheiser
Article No.: 11
DOI: 10.1145/1552303.1552304

Background: We recently described “Author-ity,” a model for estimating the probability that two articles in MEDLINE, sharing the same author name, were written by the same individual. Features include shared title words, journal...

Stream data clustering based on grid density and attraction
Li Tu, Yixin Chen
Article No.: 12
DOI: 10.1145/1552303.1552305

Clustering real-time stream data is an important and challenging problem. Existing algorithms such as CluStream are based on the k-means algorithm. These clustering algorithms have difficulties finding clusters of arbitrary shapes and...

Link spam target detection using page farms
Bin Zhou, Jian Pei
Article No.: 13
DOI: 10.1145/1552303.1552306

Currently, most popular Web search engines adopt some link-based ranking methods such as PageRank. Driven by the huge potential benefit of improving rankings of Web pages, many tricks have been attempted to boost page rankings. The most common...

Density-based clustering of data streams at multiple resolutions
Li Wan, Wee Keong Ng, Xuan Hong Dang, Philip S. Yu, Kuan Zhang
Article No.: 14
DOI: 10.1145/1552303.1552307

In data stream clustering, it is desirable to have algorithms that are able to detect clusters of arbitrary shape, clusters that evolve over time, and clusters with noise. Existing stream data clustering algorithms are generally based on an...