jrosenzw commited on
Commit
282083d
1 Parent(s): 0b4ab86

Create training_data_source.txt

Browse files
Files changed (1) hide show
  1. training_data_source.txt +68 -0
training_data_source.txt ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ --Training Data Set Information--
2
+ Sourced from https://www.kaggle.com/datasets/mathchi/diabetes-data-set?resource=download
3
+ About Dataset
4
+ Context
5
+ This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective is to predict based on diagnostic measurements whether a patient has diabetes.
6
+
7
+ Content
8
+ Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.
9
+
10
+ Pregnancies: Number of times pregnant
11
+ Glucose: Plasma glucose concentration a 2 hours in an oral glucose tolerance test
12
+ BloodPressure: Diastolic blood pressure (mm Hg)
13
+ SkinThickness: Triceps skin fold thickness (mm)
14
+ Insulin: 2-Hour serum insulin (mu U/ml)
15
+ BMI: Body mass index (weight in kg/(height in m)^2)
16
+ DiabetesPedigreeFunction: Diabetes pedigree function
17
+ Age: Age (years)
18
+ Outcome: Class variable (0 or 1)
19
+ Sources:
20
+ (a) Original owners: National Institute of Diabetes and Digestive and
21
+ Kidney Diseases
22
+ (b) Donor of database: Vincent Sigillito ([email protected])
23
+ Research Center, RMI Group Leader
24
+ Applied Physics Laboratory
25
+ The Johns Hopkins University
26
+ Johns Hopkins Road
27
+ Laurel, MD 20707
28
+ (301) 953-6231
29
+ (c) Date received: 9 May 1990
30
+
31
+ Past Usage:
32
+ 1. Smith,~J.~W., Everhart,~J.~E., Dickson,~W.~C., Knowler,~W.~C., \&
33
+ Johannes,~R.~S. (1988). Using the ADAP learning algorithm to forecast
34
+ the onset of diabetes mellitus. In {\it Proceedings of the Symposium
35
+ on Computer Applications and Medical Care} (pp. 261--265). IEEE
36
+ Computer Society Press.
37
+
38
+ The diagnostic, binary-valued variable investigated is whether the
39
+ patient shows signs of diabetes according to World Health Organization
40
+ criteria (i.e., if the 2 hour post-load plasma glucose was at least
41
+ 200 mg/dl at any survey examination or if found during routine medical
42
+ care). The population lives near Phoenix, Arizona, USA.
43
+
44
+ Results: Their ADAP algorithm makes a real-valued prediction between
45
+ 0 and 1. This was transformed into a binary decision using a cutoff of
46
+ 0.448. Using 576 training instances, the sensitivity and specificity
47
+ of their algorithm was 76% on the remaining 192 instances.
48
+ Relevant Information:
49
+ Several constraints were placed on the selection of these instances from
50
+ a larger database. In particular, all patients here are females at
51
+ least 21 years old of Pima Indian heritage. ADAP is an adaptive learning
52
+ routine that generates and executes digital analogs of perceptron-like
53
+ devices. It is a unique algorithm; see the paper for details.
54
+ Number of Instances: 768
55
+ Number of Attributes: 8 plus class
56
+ For Each Attribute: (all numeric-valued)
57
+ Number of times pregnant
58
+ Plasma glucose concentration a 2 hours in an oral glucose tolerance test
59
+ Diastolic blood pressure (mm Hg)
60
+ Triceps skin fold thickness (mm)
61
+ 2-Hour serum insulin (mu U/ml)
62
+ Body mass index (weight in kg/(height in m)^2)
63
+ Diabetes pedigree function
64
+ Age (years)
65
+ Class variable (0 or 1)
66
+ Missing Attribute Values: Yes
67
+ Class Distribution: (class value 1 is interpreted as "tested positive for
68
+ diabetes")