Type 2 diabetes mellitus is a chronic disease, associated with serious complications and co-morbidity and considerable costs. The number of people with diabetes mellitus is expected to increase with 40% in the next decade, due to prolonged life expectancy, the ageing of the population and developments in the health care sector.
Diagnosis of DM may be done either manually by a medical practitioner or by an automatic device. Any of these forms of measurement of DM involve benefits and drawbacks. The main advantage of manual diagnosis is that it does not need any help from the machine for the DM detection procedure, thus allowing the medical professional to be a specialist in the area. Often the symptoms of DM in its initial phase are so low that even an experienced doctor can’t fully identify them. As a result of advances in Machine Learning (ML) and Artificial Intelligence (AI), the disease detection and diagnosis at an initial stage by an automated program is more probable and efficient than the manual DM recognition method.
To develop an efficient predictive supervised Machine Learning Model for Type-2 Diabetes Mellitus with high accuracy using KNN.
Why particularly KNN for achieving our objective?
KNN algorithm is one of the best and the most popular classification algorithms which is used largely in different applications. K-Nearest Neighbor is an example of instance-based learning, in which the training dataset is stored, so that a classification for a new unclassified record may be found simply by comparing it to the most similar records in the training set . The distance function is used in this method to determine which member of the training set is closest to an unknown test instance. Also, because of its simplicity, KNN is easy to modify for more complicated classification problems. For instance, KNN is particularly well-suited for the object which has many class labels .
Developing the model -
Step1 : Data Collection
We will be using the Pima Indians Diabetes Database. The data set can be found on Kaggle:
This dataset consist of several medical predictor (independent) variables and one target (dependent) variable, Outcome. Independent variables include the number of pregnancies the patient has had, their BMI, insulin level, age, and so on.
Step 2 : Data Preparation
In this step we replace any missing values that may be present in the data set with appropriate values and also format the data according to our requirements as shown below:
Step 3 : Model Selection
Model that has been selected is KNN Algorithm. We import the required libraries for model implementation before we import the data for swift implementation of model. The entire list of libraries used can be referred to in the GitHub repository link shared at the end of the article.
Step 4 : Model Implementation
We split the data set randomly into test and train data sets then apply the model to them and calculate accuracy for each value of k from 1 to 10.
The accuracy of training and testing data sets is plotted against the various values of k for which model was applied for better visualization.
Step 5 : Model Evaluation
The graph shows the best accuracy for the model is at when the value of k is 9, so we calculate the training set accuracy and test set accuracy for that value of k.
Disease diagnosis is one of the successful aspects of Machine Learning. Detection of this crucial disease at an early stage and proper medication leads to decrease the diabetician’s rate.
Hence, we developed an efficient predictive supervised Machine Learning Model for Type-2 Diabetes Mellitus and experimentally tested its accuracy using K for K-nearest neighbor model on the PIMA Indian Diabetes dataset . The highest accuracy was 79% indicating the high competence of the model.
The GitHub Repository link for this model :