Background: Patents suggest that ground water contaminated with chemicals, bacteria, oils or gases etc. leads to many types of diseases in people. Fresh and clean water plays a significant role in human life. Water samples were collected from different regions of the Kadapa district, Andhra Pradesh.
Method: Water samples with plastic bottles with tight cap and washed with distilled water. Totally, 57 samples were collected and analyzed in the laboratory for physico chemical properties like EC (Electrical Conductivity), pH,TH(Total Hardness),Total dissolved solids(TDS),Ca, Cl and F. In this paper, K-means clustering, K-Mediods clustering and Hierarchical clustering methods are used to group the collected regions of water samples based on the water quality. Later outlier analysis can be done and various interested patterns are identified.
Results: According to the WQI values calculated, all the collected samples are suitable for drinking purpose. According to WQI values calculation, for the collected water sample data, it contains 13 Poor tuples, 13 Good tuples and 31 excellent tuples. According to K-means clustering, 3 clusters are formed with sizes 8, 17, 32.
According to Outlier analysis,the region Pullareddypet (sample No. 7) has highest EC,TH and TDS values among the 57 collected water samples. The region Veerapalli (Sample No. 37) has highest floride value 3.58 among all 57 samples of collection.
Conclusion: Unsupervised learning methods K-Means Clustering, K-Mediods clustering and Hierarchical clustering methods are described for the collected water samples physico-chemical parameter data. The cluster analysis results were compared with WQI values calculated. The three clusters are overlapping with each other with small degree. In the study area, for drinking purpose, only excellent, good, poor category tuples were found. Later outlier analysis is described using Box plot method and K-means clustering method. By using outlier analysis using K-means clustering, various interesting hidden patterns from the data are extracted and it is useful if the data is very large and voluminous.