Preface
Page: ii-ii (1)
Author: Biswadip Basu Mallik, Kirti Verma, Rahul Kar, Ashok Kumar Shaw and Sardar M. N. Islam (Naz)
DOI: 10.2174/9789815124842123010002
The Role of Mathematics in Data Science: Methods, Algorithms, and Computer Programs
Page: 1-23 (23)
Author: Rashmi Singh*, Neha Bhardwaj and Sardar M. N. Islam (Naz)
DOI: 10.2174/9789815124842123010004
PDF Price: $15
Abstract
The field of data science relies heavily on mathematical analysis. A solid
foundation in certain branches of mathematics is essential for every data scientist
already working in the field or planning to enter it in the future. In whatever area we
focus on, data science, machine learning engineering, business intelligence
development, data architecture, or another area of expertise, it is important to examine
the several kinds of mathematical prerequisites and insights and how they're applied in
the field of data science. Machine learning algorithms, data analysis and analyzing
require mathematics. Mathematics is not the only qualification for a data science
education and profession but is often the most significant. Identifying and translating
business difficulties into mathematical ones are a crucial phase in a data scientist's
workflow. In this study, we describe the different areas of mathematics utilized in data
science to understand mathematics and data science together.
Kalman Filter: Data Modelling and Prediction
Page: 24-50 (27)
Author: Arnob Sarkar and Meetu Luthra*
DOI: 10.2174/9789815124842123010005
PDF Price: $15
Abstract
We provide here an analysis of Kalman filter, which has wide applications
in the experimental and observational fields. Kalman filter is a data fusion algorithm or
a mathematical tool which is based on the estimation theory. It basically is a set of
mathematical equations which provide a computational mechanism for evaluating the
state of discrete processes with noisy data. In fact, observations and data analysis is a
very key aspect of all theories. In any set of data, to make it useful, one has to minimize
the error/noise by taking into consideration various aspects like the estimated values
(the theoretical values), the measurement values, experimental errors and the estimated
errors. We have shown here how this can be done using Kalman Filtering technique.
Kalman Filter is a tool which can take the observational data and improvise it to
identify the best possible value of the parameters involved. Kalman filter and its
variants such as the extended Kalman filter have wide applications mainly in the field
of communication e.g., in GPS receivers (global positioning system receivers), radio
equipment used for filtering and removing noise from the output of laptop trackpads,
image processing, face recognition and many more.
The Role of Mathematics and Statistics in the Field of Data Science and its Application
Page: 51-67 (17)
Author: Sathiyapriya Murali* and Priya Panneer
DOI: 10.2174/9789815124842123010006
PDF Price: $15
Abstract
Mathematics is the rock-solid foundation of everything that happens when
science is present, and it is also extremely important in the field of data science since
mathematical ideas assist discover models and facilitate the development of algorithms.
But, the concepts they present and the tools they enable are the only reasons statistics
and arithmetic are so crucial to data science. There is a particular type of mathematical
reasoning that is necessary to grasp data, beyond the fundamentals of calculus, discrete
mathematics, and linear algebra. For the implementation of such algorithms in data
science, a thorough understanding of the various principles of probability and statistics
is essential. Machine learning is one of the many modern data science techniques that
has a strong mathematical base. The evidence presented in this chapter backs up our
earlier claim that math and statistics are the fields that offer the greatest tools and
approaches for extracting structure from data. For newcomers coming from other
professions to data science, math proficiency is crucial.
Bag of Visual Words Model - A Mathematical Approach
Page: 68-79 (12)
Author: Maheswari*
DOI: 10.2174/9789815124842123010007
PDF Price: $15
Abstract
Information extraction from images is now incredibly valuable for many
new inventions. Even though there are several simple methods for extracting
information from the images, feasibility and accuracy are critical. One of the simplest
and most significant processes is feature extraction from the images. Many scientific
approaches are derived by the experts based on the extracted features for a better
conclusion of their work. Mathematical procedures, like Scientific methods, play an
important role in image analysis. The Bag of Visual Words (BoVW) [1, 2, 3] is one of
them, and it is helpful to figure out how similar a group of images is. A set of visual
words characterises the images in the Bag of Visual Words model, which are
subsequently aggregated in a histogram per image [4]. The histogram difference
depicts the similarities among the images. The reweighting methodology known as
Term Frequency – Inverse Document Frequency (TF-IDF) [5] refines this procedure.
The overall weighting [6] for all words in each histogram is calculated before
reweighting. As per the traditional way, the images are transformed into the matrix
called as Cost matrix. It is constructed through two mathematical: Euclidean distances
and Cosine distances. The main purpose of finding these distances is to detect
similarity between the histograms. Further the histograms are normalized and both
distances are calculated. The visual representation is also generated. The two
mathematical methods are compared to see which one is appropriate for checking
resemblance. The strategy identified as the optimum solution based on the findings aids
in fraud detection in digital signature, Image Processing, and classification of images.
A Glance Review on Data Science and its Teaching: Challenges and Solutions
Page: 80-91 (12)
Author: Srinivasa Rao Gundu, Charanarur Panem* and J. Vijaylaxmi
DOI: 10.2174/9789815124842123010008
PDF Price: $15
Abstract
The word “data science” has become more popular in recent years, with a
growing number of people embracing it. Only a small minority of people, on the other
hand, are able to offer a clear explanation of what the term refers to when it is used in
context. With no defined term to communicate and understand one another, it is
difficult for organizations that are devoted to the collaboration, utilization, and
application Data Science to communicate and understand one another.
As a result of technological advancements, it has become increasingly difficult to
define and execute Data Science in a way that is compatible with how it was previously
considered and understood in the past.
Specifically, we could now set out to develop definitions of Data Science that are
representatives of current academic and industrial interpretations and perceptions, map
these perspectives to newer domains of Data Science, and then determine whether or
not this mapping translates into an effective practical curriculum for academics.
Aspects of data science that differentiate it include how it is now used and how it is
projected to be used in the future. Data science is also characterized by its ability to
forecast the future.
Optimization of Various Costs in Inventory Management using Neural Networks
Page: 92-104 (13)
Author: Prerna Sharma* and Bhim Singh
DOI: 10.2174/9789815124842123010009
PDF Price: $15
Abstract
The process of maintaining the right quantity of inventory to meet demand,
minimising logistics costs, and avoiding frequent inventory problems, including reserve
outs, overstocking, and backorders is known as inventory optimisation. One has a finite
capacity and is referred to as an owned warehouse (OW), which is located near the
market, while the other has an endless capacity and is referred to as a rented warehouse
(RW), which is located away from the market. Here, lowering the overall cost is the
goal. Neural networks are employed in these works to either maximise or minimise
cost. The findings produced from the neural networks are compared with a
mathematical model and neural networks. Findings indicate that neural networks
outperformed both conventional mathematical models and neural networks in terms of
optimising the outcomes. The best way to understand supervised machine learning
algorithms like neural networks is in the context of function approximation. The
benefits and drawbacks of the two-warehouse approach compared to the single
warehouse plan will also be covered. We investigate cost optimisation using neural
networks in this chapter, and the outcomes will also be compared using the same.
Cyber Security in Data Science and its Applications
Page: 105-115 (11)
Author: M. Varalakshmi* and I. P. Thulasi
DOI: 10.2174/9789815124842123010010
PDF Price: $15
Abstract
The implementation of data science in cyber security to help preserve
against attacks and improve approach to better conflict cyber warning has many
welfares. Honestly, data science has changed cyber security and the reaction has been
profound and transformed. Cyber security uses data science to keep digital devices,
services, systems, and software Safe from cyberattacks. Here, we talk about cyber
security data science, present day uses for the cyber security field and data guide
quickwitted managerial systems that can safeguard our system from cyber-attacks.
Artificial Neural Networks for Data Processing: A Case Study of Image Classification
Page: 116-127 (12)
Author: Jayaraj Ramasamy*, R. N. Ravikumar and S. Shitharth
DOI: 10.2174/9789815124842123010011
PDF Price: $15
Abstract
An Artificial Neural Network (ANN) is a data processing paradigm inspired
by the way organic nervous systems, such as the brain, process data. The innovative
structure of the information processing system is a crucial component of this paradigm.
It is made up of a huge number of highly linked processing components (neurons) that
work together to solve issues. Neural networks handle data in the same manner that the
human brain does. The network is made up of several densely linked processing units
(neurons) that operate in parallel to solve a given problem. They are unable to be
programmed to execute a specific activity. ANN, like humans, learns by example.
Through a learning process, an ANN is trained for a specific application, such as
pattern recognition or data categorization. In biological systems, learning includes
changes to the synaptic connections that occur between neurons. This is also true for
ANNs. Artificial Neural Networks are used for classification, regression, and grouping.
Stages of image processing are classified as preprocessing, feature extraction, and
classification. It can be utilized later in the process. ANN should be provided with
features and output should be classified. This paper provides an overview of Artificial
Neural Networks (ANN), their operation, and training. It also explains the application
and its benefits. Artificial Neural Network has been used to classify the MNIST
dataset.
Carbon Emission Assessment by Applying Clustering Technique to World’s Emission Datasets
Page: 128-143 (16)
Author: Nitin Jaglal Untwal*
DOI: 10.2174/9789815124842123010012
PDF Price: $15
Abstract
The greenhouse gas emissions mostly include carbon-dioxide as the major
component. The CO2
level is increasing day-by-day which is a great cause of worry for
the future world’s environment. The reason why greenhouse gases’ level increases in
the environment is to be assessed and controlled. The greenhouse gases have heattrapping capacity. A rise in numerous activities, including transportation, power
production, agriculture, business, and residential, which are the main drivers of the
increase in GHG levels in the atmosphere, is to blame for the rise in GHG emissions.
Nitrous oxide, Methane, and Carbon Dioxide are all part of the GHG portfolio.
Deforestation, traffic, and soil degradation all contribute to an increase in CO2
. As a
result of burning biomass and urban trash, methane levels are also rising. The
chlorofloro carbons are also rising due to refrigeration and industrial operations; so
keeping the above concern in mind, the researcher had decided to conduct the study
title. Carbon Emission Assessment by Applying Clustering Technique to World
Emission Datasets using Python Programming. The study considers a period of 169
years (1750-2019). The study is carried out in five steps data fetching in python
programming, feature engineering, standardization, clustering. The study generates 6
clusters. Cluster one contains 220 countries, cluster two includes Russia, France,
Germany, China, Europe (others). America (others), Asia Pacific. Cluster three
includes the United Kingdom. Cluster four includes the United States. Cluster five
includes EU-28. Cluster six includes Malawi.
A Machine Learning Application to Predict Customer Churn: A Case in Indonesian Telecommunication Company
Page: 144-161 (18)
Author: Agus Tri Wibowo, Andi Chaerunisa Utami Putri, Muhammad Reza Tribosnia, Revalda Putawara and M. Mujiya Ulkhaq*
DOI: 10.2174/9789815124842123010013
PDF Price: $15
Abstract
This study aims to develop a churn prediction model which can assist
telecommunication companies in predicting customers who are most likely subject to
churn. The model is developed by employing machine learning techniques on big data
platforms. Customer churn is one of the most critical issues, especially in high
investment telecommunication companies. Accordingly, the companies are looking for
ways to predict potential customers to churn and take necessary actions to reduce the
churn. To accomplish the objective of the study, it first compares eight machine
learning techniques, i.e., ridge classifier, gradient booster, adaptive boosting, bagging
classifier, k-nearest neighbour (kNN), decision tree, logistic regression, and random
forest. By using five evaluation performance metrics (i.e., accuracy, AUC score,
precision score, recall score, and the F score), kNN is selected since it outperforms
other techniques. Second, the selected technique is used to predict the likelihood of
customers churning.
A State-Wise Assessment of Greenhouse Gases Emission in India by Applying K-mean Clustering Technique
Page: 162-176 (15)
Author: Nitin Jaglal Untwal*
DOI: 10.2174/9789815124842123010014
PDF Price: $15
Abstract
India is a vast country with variations in geography as well as in population
density. The pollution in India is increasing day by day. The Greenhouse gas emission
is on the rise due to various activities like agriculture, industry, power generation,
transportation, etc. Carbon dioxide (CO2
), Carbon Monoxide (CO), and Methane (CH4
)
are the major elements in greenhouse gases. The emission of greenhouse gases causes
various threats to the environment and health. The states in India have been under
development since independence. Various activities are on the rise. The states are not
having balanced growth as far as the industrial and agriculture sectors are concerned.
The powerhouse of industrial growth is the state of Maharashtra and Gujarat. The
population density is also scattered in India. The states contribute differently to
greenhouse gases emission and it is difficult for the government to make policy
category-wise for the control of greenhouse gases emissions. The classification of
states into different categories will help in the strategic formulation of policy and
strategy for different states depending on their greenhouse gases emission and per
capita analysis of these emissions. The per capita greenhouse gas emission is calculated
by dividing the total emissions by the total population. After analyzing the above
problem, the researchers have decided to conduct the study titled A state-wise
Assessment of greenhouse gas emission in India by applying the K-mean Clustering
Technique using Python Programming. Research is carried out in Five steps -Feature
extraction and engineering, Data extraction, Standardizing and Scaling, Identification
of Clusters, Cluster formation. The study period is 2020. The data selected for analysis
is yearly data state-wise of different Indian states. Data taken for the study is from the
Kaggle database. Findings - The k- mean algorithm (cluster analysis using Python
Programming) classifies the states of India into three clusters. Cluster one includes 16
states of India viz. Arunachal, Assam, Bihar, Himachal Pradesh, Jammu & Kashmir,
Jharkhand, Madhya Pradesh, Manipur, Meghalaya, Mizoram, Odisha, Rajasthan,
Sikkim, Tripura, Uttar Pradesh, Uttarakhand. Cluster two includes 8 states of the India.
Viz Andhra Pradesh, Goa, Gujarat, Karnataka, Kerala, Maharashtra, Tamilnadu, West
Bengal. Cluster three includes 4 states of India Viz Haryana, Nagaland, Punjab,
Chhattisgarh. The major contributors to greenhouse gase emission are in cluster three.The medium-range emission for greenhouse gases emission are grouped in cluster two
and Minimum Range greenhouse gase emission states are included in cluster one.
Data Mining Techniques: New Avenues for Heart Disease Prediction
Page: 177-185 (9)
Author: Soma Das*
DOI: 10.2174/9789815124842123010015
PDF Price: $15
Abstract
The medical management sector assembles a large volume of unexposed
data on the health status of patients. At times this hidden data could be useful in
diagnosing diseases and making effective decisions. For providing an appropriate way
out and planning a diagnostic system based on this information, now-a-days, the
newest data mining strategies are in use. In this study, a thorough review has been done
on the identification of an effective heart disease prediction system (EHDPS) designed
by neural network for the prediction of the risk level of cardiovascular diseases. The study
focused on the observation of various medical parameters, namely, age, height, weight,
BMI, sex, blood pressure, cholesterol, and obesity. Based on this study, a concept map
has been designed on the prediction ways for individuals with heart disease with the
help of EHDPS. The study assembled considerable information about the multilayer
perceptron neural network with rear proliferation as the algorithm for data analysis.
The current review work may be significant in establishing knowledge of the
association between health factors related to the risk level of heart disease. The study
also suggests means of early intervention and prevention of medical emergencies posed
by the late detection of cardiovascular diseases, especially in the context of post
COVID 19 complications.
Data Science and Healthcare
Page: 186-200 (15)
Author: Armel Djangone*
DOI: 10.2174/9789815124842123010016
PDF Price: $15
Abstract
Data science is often used as an umbrella term to include various techniques
for extracting insights and knowledge from complex structured and unstructured data.
It often relies on a large amount of data (big data) and the application of different
mathematical methods, including computer vision, NLP (or natural language
processing), and data mining techniques. Advances in data science have resulted in a
wider variety of algorithms, specialized for different applications and industries, such
as healthcare, finance, marketing, supply chain, management, and general
administration. Specifically, data science methods have shown promise in addressing
key healthcare challenges and helping healthcare practitioners and leaders make data-driven decision-making. This chapter focuses on healthcare issues and how data
science can help solve these issues. The chapter will survey different approaches to
defining data science and why any organization should use data science. This chapter
will also present different skills required for an effective healthcare data scientist and
discusses healthcare leaders' behaviors that in impacting their organizational processes.
Subject Index
Page: 201-205 (5)
Author: Biswadip Basu Mallik, Kirti Verma, Rahul Kar, Ashok Kumar Shaw and Sardar M. N. Islam (Naz)
DOI: 10.2174/9789815124842123010017
Introduction
Advanced Mathematical Applications in Data Science comprehensively explores the crucial role mathematics plays in the field of data science. Each chapter is contributed by scientists, researchers, and academicians. The 13 chapters cover a range of mathematical concepts utilized in data science, enabling readers to understand the intricate connection between mathematics and data analysis. The book covers diverse topics, including, machine learning models, the Kalman filter, data modeling, artificial neural networks, clustering techniques, and more, showcasing the application of advanced mathematical tools for effective data processing and analysis. With a strong emphasis on real-world applications, the book offers a deeper understanding of the foundational principles behind data analysis and its numerous interdisciplinary applications. This reference is an invaluable resource for graduate students, researchers, academicians, and learners pursuing a research career in mathematical computing or completing advanced data science courses. Key Features: - Comprehensive coverage of advanced mathematical concepts and techniques in data science - Contributions from established scientists, researchers, and academicians - Real-world case studies and practical applications of mathematical methods - Focus on diverse areas, such as image classification, carbon emission assessment, customer churn prediction, and healthcare data analysis - In-depth exploration of data science's connection with mathematics, computer science, and artificial intelligence - Scholarly references for each chapter - Suitable for readers with high school-level mathematical knowledge, making it accessible to a broad audience in academia and industry.