Header menu link for other important links
X
Clustering with tag for web data by using parallel PSO
, Vitthal Sadashiv Gutte, Pooja V. Mundhe, ,
Published in Sofia Academic Publications
2018
Volume: 118.0
   
Issue: 24.0
Pages: 1.0 - 15.0
Abstract
In recent time World Wide Web or web is collection of billions of web pages growing exponentially by the means of public transport, social media, online shopping, blogs etc. This highly generated data is stored, for a large duration of time results into big data. Big data can be in structured, unstructured or semi structured format. This exponential growth of web pages generates huge data which is beyond the capability of relational database. Analysis of such large data cannot be handled easily using traditional data mining tool. Thus Data mining is being researched intensively and combined with soft computing domain, which uses mathematical algorithms to segment data and analyze the probability of future events. In the paper we mention about the evolutionary clustering technique Particle Swarm Optimization (PSO) algorithm on web data, fetched with the help of crawler and preprocessed by removing stop words and stemming. Then an improved numerical statistic method, Term Frequency Inverse Document Frequency is applied on the preprocessed data to derive the importance of a word with respect to a set of documents and overcome traditional TDIDF issue of inter class consideration. A parallel PSO clustering technique is applied on this data to get optimized clusters with higher accuracy and minimize computational time by preserving compactness of intermolecular distance between particles.
About the journal
JournalInternational Journal of Pure and Applied Mathematics
PublisherSofia Academic Publications
Open Access0