ZennyKenny commited on
Commit
e72fe9d
Β·
verified Β·
1 Parent(s): 3992b65

improve docs

Browse files
Files changed (1) hide show
  1. app.py +19 -10
app.py CHANGED
@@ -152,29 +152,38 @@ with gr.Blocks() as demo:
152
  # High-level title and description
153
  gr.Markdown(
154
  """
155
- # Interactive Gradient Boosting Demo
156
 
157
  This Space demonstrates how to train a [GradientBoostingClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html#gradientboostingclassifier) from **scikit-learn** on **tabular datasets** hosted on the [Hugging Face Hub](https://huggingface.co/datasets).
158
 
 
 
 
 
 
 
159
  **Purpose**:
160
- - Easy explore hyperparameters (_learning_rate, n_estimators, max_depth_) and quickly train an ML model on real data.
161
  - Visualise model performance via confusion matrix heatmap and a feature importance plot.
162
 
163
- **How to Use**:
 
 
 
 
 
 
 
 
164
  1. Select one of the suggested datasets from the dropdown _or_ enter any valid dataset from the [Hugging Face Hub](https://huggingface.co/datasets).
165
  2. Click **Load Columns** to retrieve the column names from the dataset's **train** split.
166
  3. Choose exactly _one_ **Label column** (the target) and one or more **Feature columns** (the inputs).
167
  4. Adjust hyperparameters (learning_rate, n_estimators, max_depth, test_size).
168
  5. Click **Train & Evaluate** to train a Gradient Boosting model and see its accuracy, feature importances, and confusion matrix.
169
 
170
- ---
171
- **Please Note**:
172
- - The dataset must have a **"train"** split with tabular columns (i.e., no nested structures).
173
- - Large datasets may take time to download/train.
174
- - The confusion matrix helps you see how predictions compare to ground-truth labels. The diagonal cells show correct predictions; off-diagonal cells indicate misclassifications.
175
- - The feature importance plot shows which features the model relies on the most for its predictions.
176
-
177
  You are now a machine learning engineer, congratulations πŸ€—
 
 
178
  """
179
  )
180
 
 
152
  # High-level title and description
153
  gr.Markdown(
154
  """
155
+ # Introduction to Gradient Boosting
156
 
157
  This Space demonstrates how to train a [GradientBoostingClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html#gradientboostingclassifier) from **scikit-learn** on **tabular datasets** hosted on the [Hugging Face Hub](https://huggingface.co/datasets).
158
 
159
+ Gradient Boosting is an ensemble machine learning technique that combines many weak learners (usually small decision trees) in an iterative, stage-wise fashion to create a stronger overall model.
160
+ In each step, the algorithm fits a new weak learner to the current errors of the combined ensemble, effectively allowing the model to focus on the hardest-to-predict data points.
161
+ By repeatedly adding these specialized trees, Gradient Boosting can capture complex patterns and deliver high predictive accuracy, especially on tabular data.
162
+
163
+ **Put simply, Gradient Boosting makes a big deal out of small anomolies!**
164
+
165
  **Purpose**:
166
+ - Easily explore hyperparameters (_learning_rate, n_estimators, max_depth_) and quickly train an ML model on real data.
167
  - Visualise model performance via confusion matrix heatmap and a feature importance plot.
168
 
169
+ **Notes**:
170
+ - The dataset must have a **"train"** split with tabular columns (i.e., no nested structures).
171
+ - Large datasets may take time to download/train.
172
+ - The confusion matrix helps you see how predictions compare to ground-truth labels. The diagonal cells show correct predictions; off-diagonal cells indicate misclassifications.
173
+ - The feature importance plot shows which features the model relies on the most for its predictions.
174
+
175
+ ---
176
+
177
+ **Usage**:
178
  1. Select one of the suggested datasets from the dropdown _or_ enter any valid dataset from the [Hugging Face Hub](https://huggingface.co/datasets).
179
  2. Click **Load Columns** to retrieve the column names from the dataset's **train** split.
180
  3. Choose exactly _one_ **Label column** (the target) and one or more **Feature columns** (the inputs).
181
  4. Adjust hyperparameters (learning_rate, n_estimators, max_depth, test_size).
182
  5. Click **Train & Evaluate** to train a Gradient Boosting model and see its accuracy, feature importances, and confusion matrix.
183
 
 
 
 
 
 
 
 
184
  You are now a machine learning engineer, congratulations πŸ€—
185
+
186
+ ---
187
  """
188
  )
189