Spaces:
Runtime error
Runtime error
Update README.md
Browse files
README.md
CHANGED
@@ -8,6 +8,7 @@ sdk_version: 1.15.2
|
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
---
|
|
|
11 |
|
12 |
### What is an Anomaly?
|
13 |
|
@@ -23,5 +24,72 @@ Our primary objective was to train a anomaly detection model that would help us
|
|
23 |
|
24 |
So, in a superstore, anomalies can be a sudden upsurge in sales or a negetive profit. Any amount of negetive profit is an anomaly.
|
25 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
26 |
|
27 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
---
|
11 |
+
# Anomaly Detection eCommerce - Superstore
|
12 |
|
13 |
### What is an Anomaly?
|
14 |
|
|
|
24 |
|
25 |
So, in a superstore, anomalies can be a sudden upsurge in sales or a negetive profit. Any amount of negetive profit is an anomaly.
|
26 |
|
27 |
+
### Isolation Forest
|
28 |
+
|
29 |
+
Isolation Forests(IF), similar to Random Forests, are build based on decision trees. And since there are no pre-defined labels here, it is an unsupervised model.
|
30 |
+
|
31 |
+
IsolationForests were built based on the fact that anomalies are the data points that are “few and different”.
|
32 |
+
|
33 |
+
In an Isolation Forest, randomly sub-sampled data is processed in a tree structure based on randomly selected features. The samples that travel deeper into the tree are less likely to be anomalies as they required more cuts to isolate them. Similarly, the samples which end up in shorter branches indicate anomalies as it was easier for the tree to separate them from other observations.
|
34 |
+
|
35 |
+
### The Flow
|
36 |
+
|
37 |
+
The flow of the code started with essential data visualisation and understanding the dataset. The dataset comprises of 21 features and 9994 rows. Followed by basic preprocessing and data visualisation to get an knowhow of the distribution of data to better under the anomaly.
|
38 |
+
|
39 |
+

|
40 |
+

|
41 |
+
|
42 |
+
Further, we inspected few random points to know the format of data input. We found out there were few data points with negetive profit, which is an anomaly. And found some extremely large sales, which can also be an anomaly.
|
43 |
+
|
44 |
+

|
45 |
+
|
46 |
+
After all the exploratory analysis, we moved forward to transform the data in the form of Min-Max Scaler.
|
47 |
+
|
48 |
+
Next, we ran outlier detection using Cluster Based Local Outlier Factor (CBLOF), Histogram Based Outlier Detetion (HBOS), K-Nearest Neighbours (KNN), and Isolation Forest (IF). We cannot just settle with one algorithm without testing others.
|
49 |
+
|
50 |
+
### Cluster Based Local Outlier Factor (CBLOF)
|
51 |
+
|
52 |
+
With Cluster Based Local Outlier Factor (CBLOF), we were able to detect 100 Outliers and 9894 Inliers.
|
53 |
+
|
54 |
+

|
55 |
+
|
56 |
+
### Histogram Based Outlier Detetion (HBOS)
|
57 |
+
|
58 |
+
With Histogram Based Outlier Detetion (HBOS), we were able to detect 90 Outliers and 9904 Inliers.
|
59 |
+
|
60 |
+

|
61 |
+
|
62 |
+
### Isolation Forest (IF)
|
63 |
+
|
64 |
+
With Isolation Forest (IF), we were able to detect 100 Outliers and 9894 Inliers.
|
65 |
+
|
66 |
+

|
67 |
+
|
68 |
+
### K-Nearest Neighbours (KNN)
|
69 |
+
|
70 |
+
With K-Nearest Neighbours (KNN), we were able to detect 91 Outliers and 9903 Inliers.
|
71 |
+
|
72 |
+

|
73 |
+
|
74 |
+
|
75 |
+
### Predictions
|
76 |
+
|
77 |
+
Isolation Forest was finally decided to use as our model to predict outliers. Hence, we saved the Isolation Forest model in a pickle format to further use for predictions.
|
78 |
+
|
79 |
+
We used several arbitrary figures as sales to predict the Anomaly such as follows:
|
80 |
+
|
81 |
+
|
82 |
+
For this, we used 122.184 units of sales to check if its an anomaly. Model predicted : NOT ANOMALY
|
83 |
+

|
84 |
+
|
85 |
+
|
86 |
+
For this, we used 2000.184 units of sales to check if its an anomaly. Model predicted : ANOMALY
|
87 |
+

|
88 |
+
|
89 |
+
|
90 |
+
For this, we used 1000.184 units of sales to check if its an anomaly. Model predicted : ANOMALY
|
91 |
+

|
92 |
+
|
93 |
+
|
94 |
|
95 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|