improve docs
Browse files
app.py
CHANGED
@@ -152,29 +152,38 @@ with gr.Blocks() as demo:
|
|
152 |
# High-level title and description
|
153 |
gr.Markdown(
|
154 |
"""
|
155 |
-
#
|
156 |
|
157 |
This Space demonstrates how to train a [GradientBoostingClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html#gradientboostingclassifier) from **scikit-learn** on **tabular datasets** hosted on the [Hugging Face Hub](https://huggingface.co/datasets).
|
158 |
|
|
|
|
|
|
|
|
|
|
|
|
|
159 |
**Purpose**:
|
160 |
-
-
|
161 |
- Visualise model performance via confusion matrix heatmap and a feature importance plot.
|
162 |
|
163 |
-
**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
164 |
1. Select one of the suggested datasets from the dropdown _or_ enter any valid dataset from the [Hugging Face Hub](https://huggingface.co/datasets).
|
165 |
2. Click **Load Columns** to retrieve the column names from the dataset's **train** split.
|
166 |
3. Choose exactly _one_ **Label column** (the target) and one or more **Feature columns** (the inputs).
|
167 |
4. Adjust hyperparameters (learning_rate, n_estimators, max_depth, test_size).
|
168 |
5. Click **Train & Evaluate** to train a Gradient Boosting model and see its accuracy, feature importances, and confusion matrix.
|
169 |
|
170 |
-
---
|
171 |
-
**Please Note**:
|
172 |
-
- The dataset must have a **"train"** split with tabular columns (i.e., no nested structures).
|
173 |
-
- Large datasets may take time to download/train.
|
174 |
-
- The confusion matrix helps you see how predictions compare to ground-truth labels. The diagonal cells show correct predictions; off-diagonal cells indicate misclassifications.
|
175 |
-
- The feature importance plot shows which features the model relies on the most for its predictions.
|
176 |
-
|
177 |
You are now a machine learning engineer, congratulations π€
|
|
|
|
|
178 |
"""
|
179 |
)
|
180 |
|
|
|
152 |
# High-level title and description
|
153 |
gr.Markdown(
|
154 |
"""
|
155 |
+
# Introduction to Gradient Boosting
|
156 |
|
157 |
This Space demonstrates how to train a [GradientBoostingClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html#gradientboostingclassifier) from **scikit-learn** on **tabular datasets** hosted on the [Hugging Face Hub](https://huggingface.co/datasets).
|
158 |
|
159 |
+
Gradient Boosting is an ensemble machine learning technique that combines many weak learners (usually small decision trees) in an iterative, stage-wise fashion to create a stronger overall model.
|
160 |
+
In each step, the algorithm fits a new weak learner to the current errors of the combined ensemble, effectively allowing the model to focus on the hardest-to-predict data points.
|
161 |
+
By repeatedly adding these specialized trees, Gradient Boosting can capture complex patterns and deliver high predictive accuracy, especially on tabular data.
|
162 |
+
|
163 |
+
**Put simply, Gradient Boosting makes a big deal out of small anomolies!**
|
164 |
+
|
165 |
**Purpose**:
|
166 |
+
- Easily explore hyperparameters (_learning_rate, n_estimators, max_depth_) and quickly train an ML model on real data.
|
167 |
- Visualise model performance via confusion matrix heatmap and a feature importance plot.
|
168 |
|
169 |
+
**Notes**:
|
170 |
+
- The dataset must have a **"train"** split with tabular columns (i.e., no nested structures).
|
171 |
+
- Large datasets may take time to download/train.
|
172 |
+
- The confusion matrix helps you see how predictions compare to ground-truth labels. The diagonal cells show correct predictions; off-diagonal cells indicate misclassifications.
|
173 |
+
- The feature importance plot shows which features the model relies on the most for its predictions.
|
174 |
+
|
175 |
+
---
|
176 |
+
|
177 |
+
**Usage**:
|
178 |
1. Select one of the suggested datasets from the dropdown _or_ enter any valid dataset from the [Hugging Face Hub](https://huggingface.co/datasets).
|
179 |
2. Click **Load Columns** to retrieve the column names from the dataset's **train** split.
|
180 |
3. Choose exactly _one_ **Label column** (the target) and one or more **Feature columns** (the inputs).
|
181 |
4. Adjust hyperparameters (learning_rate, n_estimators, max_depth, test_size).
|
182 |
5. Click **Train & Evaluate** to train a Gradient Boosting model and see its accuracy, feature importances, and confusion matrix.
|
183 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
184 |
You are now a machine learning engineer, congratulations π€
|
185 |
+
|
186 |
+
---
|
187 |
"""
|
188 |
)
|
189 |
|