---
library_name: transformers
license: mit
base_model: roberta-base
tags:
- generated_from_trainer
model-index:
- name: roberta-student-fine-tuned
  results: []
language:
- en
metrics:
- exact_match
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# roberta-student-fined-tunned

This model is a fine-tuned version of roberta-base on a dataset provided by Kim Taeuk (김태욱), NLP teacher at Hanyang University.

The model was trained for multi-intent detection using the BlendX dataset, focusing on complex utterances containing multiple intents.

It achieves the following results on the evaluation set:
- Loss: 0.0053
- Exact Match Accuracy: 0.9075


## Model description

The model is based on roberta-base, a robust transformer model pretrained on a large corpus of English text.

Fine-tuning was conducted on a specialized dataset focusing on multi-intent detection in utterances with complex intent structures.


### Model Architecture

- **Base Model:** roberta-base
- **Task:** Multi-Intent Detection
- **Languages:** English


### Strengths

High accuracy on evaluation data.

Capable of detecting multiple intents within a single utterance.


### Limitations

Fine-tuned on a specific dataset; performance may vary on other tasks.

Limited to English text.


## Intended uses & limitations

### Use Cases

Multi-intent detection tasks such as customer service queries, virtual assistants, and dialogue systems.

Academic research and educational projects.


### Limitations

May require additional fine-tuning for domain-specific applications.

Not designed for multilingual tasks.


## Training and evaluation data

The model was trained on the BlendX dataset, a multi-intent detection benchmark focusing on realistic combinations of user intents in task-oriented dialogues.


### Data Details:

The dataset used for training this model is based on the BlendX dataset, focusing on multi-intent detection in task-oriented dialogues.
While the actual BlendX dataset covers instances that can have varying number of intents between 1 to 3,
the dataset for this assignment only includes instances where there are 2 intents for simplicity.


## Dataset License and Source

The dataset used for training this model is licensed under the **[GNU General Public License v2](https://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html)**.  

### Important Notes:
- Any use, distribution, or modification of this dataset must comply with the terms of the GPL v2 license.
- The dataset source and its original license can be found in its [official GitHub repository](https://github.com/HYU-NLP/BlendX/).
- **Dataset File:** [Download Here](https://huggingface.co/datasets/Meruem/BlendX_simplified/resolve/main/BlendX_simplified.json)


### Dataset Format:
- **File Type:** JSON
- **Size:** 28,815 training samples, 1,513 validation samples
- **Data Fields:**
  - `split` (string): Indicates if the sample belongs to the training or validation set.
  - `utterance` (string): The text input containing multiple intents.
  - `intent` (list of strings): The associated intents.


## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine_with_restarts
- warmup_steps: 200
- num_epochs: 20
- save_total_limit: 3
- weight_decay: 0.01
- eval_strategy: epoch
- save_strategy: epoch
- metric_for_best_model: eval_exact_match_accuracy
- load_best_model_at_end: True
- dataloader_pin_memory: True
- fp16: False
- greater_is_better: True

### Training results

| Training Loss | Epoch | Step  | Validation Loss | Exact Match Accuracy |
|:-------------:|:-----:|:-----:|:---------------:|:--------------------:|
| 0.0723        | 1.0   | 2297  | 0.0720          | 0.0                  |
| 0.0576        | 2.0   | 4594  | 0.0516          | 0.0                  |
| 0.0328        | 3.0   | 6891  | 0.0264          | 0.0839               |
| 0.015         | 4.0   | 9188  | 0.0141          | 0.6907               |
| 0.0086        | 5.0   | 11485 | 0.0092          | 0.8771               |
| 0.0046        | 6.0   | 13782 | 0.0069          | 0.8929               |
| 0.0027        | 7.0   | 16079 | 0.0061          | 0.9002               |
| 0.0018        | 8.0   | 18376 | 0.0059          | 0.8936               |
| 0.0012        | 9.0   | 20673 | 0.0056          | 0.8995               |
| 0.0009        | 10.0  | 22970 | 0.0053          | 0.9075               |
| 0.0007        | 11.0  | 25267 | 0.0055          | 0.9055               |
| 0.0005        | 12.0  | 27564 | 0.0061          | 0.8976               |
| 0.0004        | 13.0  | 29861 | 0.0057          | 0.9061               |


### Framework versions

- Transformers 4.47.0
- Pytorch 2.5.1+cu124
- Datasets 3.2.0
- Tokenizers 0.21.0

## Improvement Perspectives

To achieve better results, several improvement strategies could be explored:

- **Model Capacity Expansion:** Test larger models like roberta-large or other bigger models.
- **Batch Size Increase:** Use larger batches for more stable updates.
- **Gradient accumulation steps parameter:** Play with the number of updates steps to accumulate the gradients for, before performing a backward/update pass.
- **Learning Rate Management:**
  - Experiment with strategies like polynomial or others, with dynamic adjustment.
  - Further reduce the learning rate
- **Enhanced Preprocessing:**
  - Test data augmentation techniques such as random masking or synonym replacement.
  - Further reduce the gap between the different categories.
  - Change the weights according to the representativeness of the category.
  - Use another dataset.
- **Longer Training Duration:** Increase the number of epochs and refine stopping criteria for more precise convergence.
- **Model Ensembling:** Use multiple models to improve prediction robustness.
- **Advanced Attention Mechanisms:** Test models using hierarchical attention or enhanced multi-head architectures.
- **Metric:** Choosing the best metric based on our problem.

These strategies require significant computational resources and extended training time but offer substantial potential for performance improvement.