Document clustering algorithm using modified K-means

Madhura Vishnu Phatak; Dr Ranjana Anup Agrawal

doi:10.1049/cp.2012.2553

Profiles Research Units Publications

Conferences

Document clustering algorithm using modified K-means

Published in Institution of Engineering and Technology

2012

DOI: 10.1049/cp.2012.2553

Volume: 2012

Issue: CP652

Pages: 294 - 296

Abstract

Document clustering is the task of grouping a set of documents into clusters so that the documents in the same cluster are similar to each other than to those in other clusters. One of the applications of document clustering is in web search engine retrieval system to help the users find relevant information quicker, and allow them to focus their search in the appropriate direction. K-means is a commonly used algorithm for document clustering, but it has some disadvantages. The main limitations of K-means are: 1) The number of clusters K has to be given as input and 2) Based on the initializations it converges to different local minima. 3) It is slow and cannot be used for large number of data points.4) It cannot handle empty clusters. In this paper, we have developed a novel algorithm to eliminate all these basic drawbacks of K-means.

About the journal

Journal	IET Conference Publications
Publisher	Institution of Engineering and Technology

Authors (2)

Madhura Vishnu Phatak
- School of Computer Engineering & Technology
- Engineering and Technology
Dr Ranjana Anup Agrawal
- School of Computer Engineering & Technology
- Engineering and Technology

ABOUT

ACADEMICS

@MIT-WPU

ADMISSIONS/ PLACEMENTS

MISCELLANEOUS