Generic placeholder image

Recent Advances in Computer Science and Communications

Editor-in-Chief

ISSN (Print): 2666-2558
ISSN (Online): 2666-2566

Research Article

A Method for Webpage Classification Based on URL Using Clustering

Author(s): Sunita*, Gurvinder Singh and Vijay Rana

Volume 14, Issue 2, 2021

Published on: 12 June, 2019

Page: [442 - 447] Pages: 6

DOI: 10.2174/2213275912666190612143913

Price: $65

Abstract

Background: Pattern mining is the mechanism of extracting useful information from a large dataset of information. A sub-field of web mining is sequential Noisy data extraction from user query, which is considered along with redundancy handling. This redundancy handling mechanism employed in the existing literature is known as ambiguity handling. The clustering mechanism employed in the existing system includes k means, semantic search and incremental growth of the internet.

Aims: The proposed works comprise an analysis of techniques used to extract useful URLs to replace noisy data.

Methods: We consider noisy data extraction from user query considered along with the redundancy handling. This redundancy handling mechanism employed in the existing literature is known as ambiguity handling. The clustering mechanism used in the existing system includes k means and semantic search. These mechanisms are static, causing performance degradation in terms of execution time. It suggests the performance improvement mechanism in this literature.

Results: The methods MPV (Most-Probable-Values) clustering and N-gram techniques for improvement considered in existing literature can further be improved using the research methodology specified through this literature.

Conclusion: In the proposed system, results are based on MPV clustering with N-grams techniques. N-gram analyzes the instances of a word or phrase across all query data. The parameters fetch the results in terms of execution time and the number of URLs retrieves for web page classification.

Keywords: Ambiguity, clustering, noisy data, user query, URL, webpage.

Graphical Abstract

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy