saritha5 commited on
Commit
23e666c
·
1 Parent(s): 556549f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +68 -0
README.md CHANGED
@@ -8,6 +8,7 @@ sdk_version: 1.15.2
8
  app_file: app.py
9
  pinned: false
10
  ---
 
11
 
12
  ### What is an Anomaly?
13
 
@@ -23,5 +24,72 @@ Our primary objective was to train a anomaly detection model that would help us
23
 
24
  So, in a superstore, anomalies can be a sudden upsurge in sales or a negetive profit. Any amount of negetive profit is an anomaly.
25
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
8
  app_file: app.py
9
  pinned: false
10
  ---
11
+ # Anomaly Detection eCommerce - Superstore
12
 
13
  ### What is an Anomaly?
14
 
 
24
 
25
  So, in a superstore, anomalies can be a sudden upsurge in sales or a negetive profit. Any amount of negetive profit is an anomaly.
26
 
27
+ ### Isolation Forest
28
+
29
+ Isolation Forests(IF), similar to Random Forests, are build based on decision trees. And since there are no pre-defined labels here, it is an unsupervised model.
30
+
31
+ IsolationForests were built based on the fact that anomalies are the data points that are “few and different”.
32
+
33
+ In an Isolation Forest, randomly sub-sampled data is processed in a tree structure based on randomly selected features. The samples that travel deeper into the tree are less likely to be anomalies as they required more cuts to isolate them. Similarly, the samples which end up in shorter branches indicate anomalies as it was easier for the tree to separate them from other observations.
34
+
35
+ ### The Flow
36
+
37
+ The flow of the code started with essential data visualisation and understanding the dataset. The dataset comprises of 21 features and 9994 rows. Followed by basic preprocessing and data visualisation to get an knowhow of the distribution of data to better under the anomaly.
38
+
39
+ ![MyImage](https://raw.githubusercontent.com/whodoibenow/Anomaly-Detection---eCommerce/main/Screenshot%202022-09-05%20at%203.43.39%20PM.png)
40
+ ![Plot](https://github.com/whodoibenow/Anomaly-Detection---eCommerce/raw/main/Screenshot%202022-09-05%20at%203.43.49%20PM.png)
41
+
42
+ Further, we inspected few random points to know the format of data input. We found out there were few data points with negetive profit, which is an anomaly. And found some extremely large sales, which can also be an anomaly.
43
+
44
+ ![My Image](https://github.com/whodoibenow/Anomaly-Detection---eCommerce/raw/main/Screenshot%202022-09-05%20at%203.45.20%20PM.png)
45
+
46
+ After all the exploratory analysis, we moved forward to transform the data in the form of Min-Max Scaler.
47
+
48
+ Next, we ran outlier detection using Cluster Based Local Outlier Factor (CBLOF), Histogram Based Outlier Detetion (HBOS), K-Nearest Neighbours (KNN), and Isolation Forest (IF). We cannot just settle with one algorithm without testing others.
49
+
50
+ ### Cluster Based Local Outlier Factor (CBLOF)
51
+
52
+ With Cluster Based Local Outlier Factor (CBLOF), we were able to detect 100 Outliers and 9894 Inliers.
53
+
54
+ ![My Image](https://github.com/whodoibenow/Anomaly-Detection---eCommerce/raw/main/Screenshot%202022-09-05%20at%203.44.15%20PM.png)
55
+
56
+ ### Histogram Based Outlier Detetion (HBOS)
57
+
58
+ With Histogram Based Outlier Detetion (HBOS), we were able to detect 90 Outliers and 9904 Inliers.
59
+
60
+ ![My Image](https://github.com/whodoibenow/Anomaly-Detection---eCommerce/raw/main/Screenshot%202022-09-05%20at%203.44.26%20PM.png)
61
+
62
+ ### Isolation Forest (IF)
63
+
64
+ With Isolation Forest (IF), we were able to detect 100 Outliers and 9894 Inliers.
65
+
66
+ ![My Image](https://github.com/whodoibenow/Anomaly-Detection---eCommerce/raw/main/Screenshot%202022-09-05%20at%203.44.35%20PM.png)
67
+
68
+ ### K-Nearest Neighbours (KNN)
69
+
70
+ With K-Nearest Neighbours (KNN), we were able to detect 91 Outliers and 9903 Inliers.
71
+
72
+ ![My Image](https://github.com/whodoibenow/Anomaly-Detection---eCommerce/raw/main/Screenshot%202022-09-05%20at%203.44.50%20PM.png)
73
+
74
+
75
+ ### Predictions
76
+
77
+ Isolation Forest was finally decided to use as our model to predict outliers. Hence, we saved the Isolation Forest model in a pickle format to further use for predictions.
78
+
79
+ We used several arbitrary figures as sales to predict the Anomaly such as follows:
80
+
81
+
82
+ For this, we used 122.184 units of sales to check if its an anomaly. Model predicted : NOT ANOMALY
83
+ ![My Image](https://github.com/whodoibenow/Anomaly-Detection---eCommerce/raw/main/Screenshot%202022-09-05%20at%203.56.20%20PM.png)
84
+
85
+
86
+ For this, we used 2000.184 units of sales to check if its an anomaly. Model predicted : ANOMALY
87
+ ![My Image](https://github.com/whodoibenow/Anomaly-Detection---eCommerce/raw/main/Screenshot%202022-09-05%20at%203.56.32%20PM.png)
88
+
89
+
90
+ For this, we used 1000.184 units of sales to check if its an anomaly. Model predicted : ANOMALY
91
+ ![My Image](https://github.com/whodoibenow/Anomaly-Detection---eCommerce/raw/main/Screenshot%202022-09-05%20at%203.56.49%20PM.png)
92
+
93
+
94
 
95
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference