Construction and Reduction Methods of Web Spam Identification Index System

Yuancheng       Li; Rong       Huang; Xiangqian       Nie

Abstract

Background: With the rapid development of the Internet, the number of web spam has increased dramatically in recent years, which has wasted search engine storage and computing power on a massive scale. To identify the web spam effectively, the content features, link features, hidden features and quality features of web page are integrated to establish the corresponding web spam identification index system. However, the index system is highly correlation dimension.

Methods: An improved method of autoencoder named stacked autoencoder neural network (SAE) is used to realize the reduction of the web spam identification index system.

Results: The experiment results show that our method could reduce effectively the index of web spam and significantly improves the recognition rate in the following work.

Conclusion: An autoencoder based web spam indexes reduction method is proposed in this paper. The experimental results show that it greatly reduces the temporal and spatial complexity of the future web spam detection model.

Keywords: Autoencoder, index reduction, stacked autoencoder neural network, web spam, identification index system, detection model.

« Previous Next »

Graphical Abstract

[1] 
D. Fetterly, M. Manasse,  and M. Najork, "Spam, damn spam, and statistics: Using statistical analysis to locate spam Web pages", In Proceeding 7th International Workshop on the Web and Databases Paris, France 2004, pp. 1-6.
[2] 
E.J. González, Artificial Intelligence Resources in Control and Automation Engineering., Bentham Science Publishers, 2012.
[3] 
K.L. Goh,  and A.K. Singh, "Comprehensive literature review on machine learning structures for web spam classification", Procedia Comput. Sci., vol. 70, pp. 434-441, 2015.
[4] 
R.C. Patil,  and Patil D.R., "Web spam detection using SVM classifier", In: International Conference on Intelligent Systems and Control IEEE, 2015pp. 1-4 
[5] 
J. Fdez-Glez, D. Ruano-Ordas, J.R. Méndez, F. Fdez-Riverola, R. Laza,  and R. Pavón, "A dynamic model for integrating simple web spam classification techniques", Expert Syst. Appl., vol. 42, pp. 7969-7978, 2015.
[6] 
N. Spirin,  and J. Han, "Survey on web spam detection: Principles and algorithms", ACM SIGKDD Explor. Newslett., vol. 13, pp. 50-64, 2012.
[7] 
S. Ghiam,  and A. Nemaneypour, "A survey on web spam detection methods: Taxonomy", Int. J. Net. Sec. Appl., vol. 4, pp. 119-134, 2012.
[8] 
J.I. Hua,  and H. Zhang, "Analysis on the content features and their correlation of web pages for spam detection", China Commun., vol. 12, pp. 84-94, 2015.
[9] 
J. Wan, M. Liu, J. Yi,  and X. Zhang, “Detecting spam webpages through topic and semantics analysis,” Computer & Information Technology (GSCIT)., Global Summit on: Sousse, Tunisia, 2015, pp. 1-7.
[10] 
W.U. Lei, B. Gao,  and L.I. Jing, "Web spam detection based on structural and temporal information", Appl. Res. Comput., vol. 25, pp. 1243-1246, 2008.
[11] 
L. Araujo,  and J. Martinez-Romo, "Web spam detection: New classification features based on qualified link analysis and language models", IEEE Trans. Inf. Forensics Security, vol. 5, pp. 581-590, 2010.
[12] 
S. Webb, L. Liu,  and W.B. Rouse, "A parameterized approach to spam-resilient link analysis of the web", IEEE Trans. Parallel Distrib. Syst., vol. 20, pp. 1422-1438, 2009.
[13] 
C. Castillo, D. Donato, L. Becchetti, P. Boldi, M. Santini,  and S. Vigna, "A reference collection for web spam",  SIGIR Forum, Vol. 40, pp. 11-24, 2006.
[14] 
 Yahoo! Web spam collections. URL http://barcelona.research. yahoo.net/webspam/datasets/
                (Accessed: 2007).
[15] 
C.L. Jian, Y. Zhang,  and Y. Li, "Non-divergence of stochastic discrete time algorithms for PCA neural networks", IEEE Trans. Neural Netw. Learn. Syst., vol. 26, pp. 394-399, 2015.
[16] 
T. Bouwmans, "Subspace learning for background modeling: A survey", Recent Pat. Comput. Sci., vol. 2, pp. 223-234, 2009.
[17] 
Z.H. Yu,  and W.L. Chin, "Blind false data injection attack using PCA approximation method in smart grid", IEEE Trans. Smart Grid, vol. 6, pp. 1219-1226, 2015.
[18] 
Z. Chen, Q. Zhu, S.Y. Chai,  and L. Zhang, "Robust human activity recognition using smartphone sensors via CT-PCA and online SVM", IEEE Trans. on Industr. Inform.. 2017, pp. 1-1.
[19] 
J.G. Bala,  and S.L. Fernandes, "Recognizing faces across age progressions and under occlusion", Recent Pat. Comput. Sci., vol. 9, pp. 209-215, 2016.
[20] 
A. Keyhanipour,  and B. Moshiri, "Designing a web spam classifier based on feature fusion in the layered multi-population genetic programming framework", In Proceeding 16th International Conference of Information Fusion (FUSION 2013) Istanbul, Turkey 2013, pp. 53-60.
[21] 
X. Wang, S. Chen,  and M. Yao, "Data dimensionality reduction method of semi-supervised isometric mapping based on regularization", J. Electr. Inf. Technol., vol. 38, pp. 241-245, 2016.
[22] 
G.E. Hinton,  and R.R. Salakhutdinov, "Supporting online material for reducing the dimensionality of data with neural networks", Science, vol. 5786, pp. 504-507, 2006.
[23] 
Z. Chen,  and W. Li, "Multisensor feature fusion for bearing fault diagnosis using sparse autoencoder and deep belief network", IEEE Trans. Instrum. Meas., vol. 66, pp. 1693-1702, 2017.
[24] 
C. Xia, F. Qi,  and G. Shi, "re", IEEE Trans. Neural Netw. Learn. Syst., vol. 27, pp. 1227-1240, 2016.
[25] 
Y. Qi, C. Shen, D. Wang, J. Shi, X. Jiang,  and Z. Zhu, "Stacked sparse autoencoder-based deep network for fault diagnosis of rotating machinery", IEEE Access, vol. 5, pp. 15066-15079, 2017.
[26] 
Y. Fan, L.I. Zuhe, F. Wang,  and M.A. Jiangtao, "Affective abstract image classification based on convolutional sparse autoencoders across different domains", J. Electr. Inf. Technol., vol. 39, pp. 167-175, 2017.
[27] 
Y.H. Lai, F. Chen, S.S. Wang, X. Lu, Y. Tsao,  and C.H. Lee, "A deep denoising autoencoder approach to improving the intelligibility of vocoded speech in cochlear implant simulation", IEEE Trans. Biomed. Eng., vol. 64, pp. 1568-1578, 2017.
[28] 
D. Luo, R. Yang, B. Li,  and J. Huang, "Detection of double compressed AMR audio using stacked autoencoder", IEEE Trans. Inf. Forensics Security, vol. 12, pp. 432-444, 2017.
[29] 
L. Zhang, W. Ma,  and D. Zhang, "Stacked sparse autoencoder in polsar data classification using local spatial information", IEEE Geosci. Remote Sens. Lett., vol. 13, pp. 1359-1363, 2016.
[30] 
F. Lv, M. Han,  and T. Qiu, "Remote sensing image classification based on ensemble extreme learning machine with stacked autoencoder", IEEE Access, vol. 5, pp. 9021-9031, 2017.
[31] 
M. Gong, J. Liu, H. Li, Q. Cai,  and L. Su, "A multi-objective sparse feature learning model for deep neural networks", IEEE Trans. Neural Netw. Learn. Syst., vol. 26, pp. 3263-3277, 2017.
[32] 
Y. Li,  and S. Chu, "Construction and reduction methods of vulnerability index system in power SCADA", Int. J. Secur. Appl., vol. 8, pp. 335-352, 2014.

Rights & Permissions Print Cite

Article Metrics

46

2

DOI https://dx.doi.org/10.2174/2213275912666181127130120	Print ISSN 2213-2759
Publisher Name Bentham Science Publisher	Online ISSN 1874-4796

Recent Patents on Computer Science

Construction and Reduction Methods of Web Spam Identification Index System

Abstract

Graphical Abstract

Related Journals

Related Books