Create training_data_source.txt
Browse files- training_data_source.txt +68 -0
training_data_source.txt
ADDED
@@ -0,0 +1,68 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
--Training Data Set Information--
|
2 |
+
Sourced from https://www.kaggle.com/datasets/mathchi/diabetes-data-set?resource=download
|
3 |
+
About Dataset
|
4 |
+
Context
|
5 |
+
This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective is to predict based on diagnostic measurements whether a patient has diabetes.
|
6 |
+
|
7 |
+
Content
|
8 |
+
Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.
|
9 |
+
|
10 |
+
Pregnancies: Number of times pregnant
|
11 |
+
Glucose: Plasma glucose concentration a 2 hours in an oral glucose tolerance test
|
12 |
+
BloodPressure: Diastolic blood pressure (mm Hg)
|
13 |
+
SkinThickness: Triceps skin fold thickness (mm)
|
14 |
+
Insulin: 2-Hour serum insulin (mu U/ml)
|
15 |
+
BMI: Body mass index (weight in kg/(height in m)^2)
|
16 |
+
DiabetesPedigreeFunction: Diabetes pedigree function
|
17 |
+
Age: Age (years)
|
18 |
+
Outcome: Class variable (0 or 1)
|
19 |
+
Sources:
|
20 |
+
(a) Original owners: National Institute of Diabetes and Digestive and
|
21 |
+
Kidney Diseases
|
22 |
+
(b) Donor of database: Vincent Sigillito ([email protected])
|
23 |
+
Research Center, RMI Group Leader
|
24 |
+
Applied Physics Laboratory
|
25 |
+
The Johns Hopkins University
|
26 |
+
Johns Hopkins Road
|
27 |
+
Laurel, MD 20707
|
28 |
+
(301) 953-6231
|
29 |
+
(c) Date received: 9 May 1990
|
30 |
+
|
31 |
+
Past Usage:
|
32 |
+
1. Smith,~J.~W., Everhart,~J.~E., Dickson,~W.~C., Knowler,~W.~C., \&
|
33 |
+
Johannes,~R.~S. (1988). Using the ADAP learning algorithm to forecast
|
34 |
+
the onset of diabetes mellitus. In {\it Proceedings of the Symposium
|
35 |
+
on Computer Applications and Medical Care} (pp. 261--265). IEEE
|
36 |
+
Computer Society Press.
|
37 |
+
|
38 |
+
The diagnostic, binary-valued variable investigated is whether the
|
39 |
+
patient shows signs of diabetes according to World Health Organization
|
40 |
+
criteria (i.e., if the 2 hour post-load plasma glucose was at least
|
41 |
+
200 mg/dl at any survey examination or if found during routine medical
|
42 |
+
care). The population lives near Phoenix, Arizona, USA.
|
43 |
+
|
44 |
+
Results: Their ADAP algorithm makes a real-valued prediction between
|
45 |
+
0 and 1. This was transformed into a binary decision using a cutoff of
|
46 |
+
0.448. Using 576 training instances, the sensitivity and specificity
|
47 |
+
of their algorithm was 76% on the remaining 192 instances.
|
48 |
+
Relevant Information:
|
49 |
+
Several constraints were placed on the selection of these instances from
|
50 |
+
a larger database. In particular, all patients here are females at
|
51 |
+
least 21 years old of Pima Indian heritage. ADAP is an adaptive learning
|
52 |
+
routine that generates and executes digital analogs of perceptron-like
|
53 |
+
devices. It is a unique algorithm; see the paper for details.
|
54 |
+
Number of Instances: 768
|
55 |
+
Number of Attributes: 8 plus class
|
56 |
+
For Each Attribute: (all numeric-valued)
|
57 |
+
Number of times pregnant
|
58 |
+
Plasma glucose concentration a 2 hours in an oral glucose tolerance test
|
59 |
+
Diastolic blood pressure (mm Hg)
|
60 |
+
Triceps skin fold thickness (mm)
|
61 |
+
2-Hour serum insulin (mu U/ml)
|
62 |
+
Body mass index (weight in kg/(height in m)^2)
|
63 |
+
Diabetes pedigree function
|
64 |
+
Age (years)
|
65 |
+
Class variable (0 or 1)
|
66 |
+
Missing Attribute Values: Yes
|
67 |
+
Class Distribution: (class value 1 is interpreted as "tested positive for
|
68 |
+
diabetes")
|