--- library_name: transformers license: mit base_model: roberta-base tags: - generated_from_trainer model-index: - name: roberta-student-fine-tuned results: [] language: - en metrics: - exact_match --- # roberta-student-fined-tunned This model is a fine-tuned version of roberta-base on a dataset provided by Kim Taeuk (김태욱), NLP teacher at Hanyang University. The model was trained for multi-intent detection using the BlendX dataset, focusing on complex utterances containing multiple intents. It achieves the following results on the evaluation set: - Loss: 0.0053 - Exact Match Accuracy: 0.9075 ## Model description The model is based on roberta-base, a robust transformer model pretrained on a large corpus of English text. Fine-tuning was conducted on a specialized dataset focusing on multi-intent detection in utterances with complex intent structures. ### Model Architecture - **Base Model:** roberta-base - **Task:** Multi-Intent Detection - **Languages:** English ### Strengths High accuracy on evaluation data. Capable of detecting multiple intents within a single utterance. ### Limitations Fine-tuned on a specific dataset; performance may vary on other tasks. Limited to English text. ## Intended uses & limitations ### Use Cases Multi-intent detection tasks such as customer service queries, virtual assistants, and dialogue systems. Academic research and educational projects. ### Limitations May require additional fine-tuning for domain-specific applications. Not designed for multilingual tasks. ## Training and evaluation data The model was trained on the BlendX dataset, a multi-intent detection benchmark focusing on realistic combinations of user intents in task-oriented dialogues. ### Data Details: The dataset used for training this model is based on the BlendX dataset, focusing on multi-intent detection in task-oriented dialogues. While the actual BlendX dataset covers instances that can have varying number of intents between 1 to 3, the dataset for this assignment only includes instances where there are 2 intents for simplicity. ## Dataset License and Source The dataset used for training this model is licensed under the **[GNU General Public License v2](https://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html)**. ### Important Notes: - Any use, distribution, or modification of this dataset must comply with the terms of the GPL v2 license. - The dataset source and its original license can be found in its [official GitHub repository](https://github.com/HYU-NLP/BlendX/). - **Dataset File:** [Download Here](https://huggingface.co/datasets/Meruem/BlendX_simplified/resolve/main/BlendX_simplified.json) ### Dataset Format: - **File Type:** JSON - **Size:** 28,815 training samples, 1,513 validation samples - **Data Fields:** - `split` (string): Indicates if the sample belongs to the training or validation set. - `utterance` (string): The text input containing multiple intents. - `intent` (list of strings): The associated intents. ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-05 - train_batch_size: 32 - eval_batch_size: 32 - seed: 42 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: cosine_with_restarts - warmup_steps: 200 - num_epochs: 20 - save_total_limit: 3 - weight_decay: 0.01 - eval_strategy: epoch - save_strategy: epoch - metric_for_best_model: eval_exact_match_accuracy - load_best_model_at_end: True - dataloader_pin_memory: True - fp16: False - greater_is_better: True ### Training results | Training Loss | Epoch | Step | Validation Loss | Exact Match Accuracy | |:-------------:|:-----:|:-----:|:---------------:|:--------------------:| | 0.0723 | 1.0 | 2297 | 0.0720 | 0.0 | | 0.0576 | 2.0 | 4594 | 0.0516 | 0.0 | | 0.0328 | 3.0 | 6891 | 0.0264 | 0.0839 | | 0.015 | 4.0 | 9188 | 0.0141 | 0.6907 | | 0.0086 | 5.0 | 11485 | 0.0092 | 0.8771 | | 0.0046 | 6.0 | 13782 | 0.0069 | 0.8929 | | 0.0027 | 7.0 | 16079 | 0.0061 | 0.9002 | | 0.0018 | 8.0 | 18376 | 0.0059 | 0.8936 | | 0.0012 | 9.0 | 20673 | 0.0056 | 0.8995 | | 0.0009 | 10.0 | 22970 | 0.0053 | 0.9075 | | 0.0007 | 11.0 | 25267 | 0.0055 | 0.9055 | | 0.0005 | 12.0 | 27564 | 0.0061 | 0.8976 | | 0.0004 | 13.0 | 29861 | 0.0057 | 0.9061 | ### Framework versions - Transformers 4.47.0 - Pytorch 2.5.1+cu124 - Datasets 3.2.0 - Tokenizers 0.21.0 ## Improvement Perspectives To achieve better results, several improvement strategies could be explored: - **Model Capacity Expansion:** Test larger models like roberta-large or other bigger models. - **Batch Size Increase:** Use larger batches for more stable updates. - **Gradient accumulation steps parameter:** Play with the number of updates steps to accumulate the gradients for, before performing a backward/update pass. - **Learning Rate Management:** - Experiment with strategies like polynomial or others, with dynamic adjustment. - Further reduce the learning rate - **Enhanced Preprocessing:** - Test data augmentation techniques such as random masking or synonym replacement. - Further reduce the gap between the different categories. - Change the weights according to the representativeness of the category. - Use another dataset. - **Longer Training Duration:** Increase the number of epochs and refine stopping criteria for more precise convergence. - **Model Ensembling:** Use multiple models to improve prediction robustness. - **Advanced Attention Mechanisms:** Test models using hierarchical attention or enhanced multi-head architectures. - **Metric:** Choosing the best metric based on our problem. These strategies require significant computational resources and extended training time but offer substantial potential for performance improvement.