VRLLab
/

TurkishBERTweet-Lora-HS

Turkish

Model card Files Files and versions Community

AliNajafi commited on Dec 29, 2023

Commit

3d3dfa2

•

1 Parent(s): e9121ac

Update README.md

Browse files

Files changed (1) hide show

README.md +4 -27

README.md CHANGED Viewed

@@ -1,5 +1,4 @@
 ---
 language_creators:
 - unknown
 language:
@@ -16,9 +15,9 @@ task_categories:
 - unknown
 task_ids:
 - unknown
 ---
 #### Table of contents
 1. [Introduction](#introduction)
 2. [Main results](#results)
@@ -58,7 +57,6 @@ git clone [email protected]:ViralLab/TurkishBERTweet.git
 cd TurkishBERTweet
 python -m venv venv
 source venv/bin/activate
 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
 pip install peft
 pip install transformers
@@ -68,10 +66,8 @@ pip install transformers
 ## <a name="preprocess"></a> Twitter Preprocessor
 ```python
 from Preprocessor import preprocess
 text = """Lab'ımıza "viral" adını verdik çünkü amacımız disiplinler arası sınırları aşmak ve aralarında yeni bağlantılar kurmak! 🔬 #ViralLab
 https://varollab.com/"""
 preprocessed_text = preprocess(text)
 print(preprocessed_text)
 ```
@@ -87,15 +83,11 @@ lab'ımıza "viral" adını verdik çünkü amacımız disiplinler arası sını
 import torch
 from transformers import AutoTokenizer, AutoModel
 from Preprocessor import preprocess
 tokenizer = AutoTokenizer.from_pretrained("VRLLab/TurkishBERTweet")
 turkishBERTweet = AutoModel.from_pretrained("VRLLab/TurkishBERTweet")
 text = """Lab'ımıza "viral" adını verdik çünkü amacımız disiplinler arası sınırları aşmak ve aralarında yeni bağlantılar kurmak! 💥🔬 #ViralLab #DisiplinlerArası #YenilikçiBağlantılar"""
 preprocessed_text = preprocess(text)
 input_ids = torch.tensor([tokenizer.encode(preprocessed_text)])
 with torch.no_grad():
     features = turkishBERTweet(input_ids)  # Models outputs are now tuples
 ```
@@ -109,16 +101,13 @@ from peft import (
     PeftModel,
     PeftConfig,
 )
 from transformers import (
     AutoModelForSequenceClassification,
     AutoTokenizer)
 from Preprocessor import preprocess
 peft_model = "VRLLab/TurkishBERTweet-Lora-SA"
 peft_config = PeftConfig.from_pretrained(peft_model)
 # loading Tokenizer
 padding_side = "right"
 tokenizer = AutoTokenizer.from_pretrained(
@@ -126,21 +115,17 @@ tokenizer = AutoTokenizer.from_pretrained(
 )
 if getattr(tokenizer, "pad_token_id") is None:
     tokenizer.pad_token_id = tokenizer.eos_token_id
 id2label_sa = {0: "negative", 2: "positive", 1: "neutral"}
 turkishBERTweet_sa = AutoModelForSequenceClassification.from_pretrained(
     peft_config.base_model_name_or_path, return_dict=True, num_labels=len(id2label_sa), id2label=id2label_sa
 )
 turkishBERTweet_sa = PeftModel.from_pretrained(turkishBERTweet_sa, peft_model)
 sample_texts = [
     "Viral lab da insanlar hep birlikte çalışıyorlar. hepbirlikte çalışan insanlar birbirlerine yakın oluyorlar.",
     "americanin diplatlari turkiyeye gelmesin 😤",
     "Mark Zuckerberg ve Elon Musk'un boks müsabakası süper olacak! 🥷",
     "Adam dun ne yediğini unuttu"
     ]
 preprocessed_texts = [preprocess(s) for s in sample_texts]
 with torch.no_grad():
     for s in preprocessed_texts:
@@ -161,16 +146,13 @@ from peft import (
     PeftModel,
     PeftConfig,
 )
 from transformers import (
     AutoModelForSequenceClassification,
     AutoTokenizer)
 from Preprocessor import preprocess
 peft_model = "VRLLab/TurkishBERTweet-Lora-HS"
 peft_config = PeftConfig.from_pretrained(peft_model)
 # loading Tokenizer
 padding_side = "right"
 tokenizer = AutoTokenizer.from_pretrained(
@@ -178,32 +160,26 @@ tokenizer = AutoTokenizer.from_pretrained(
 )
 if getattr(tokenizer, "pad_token_id") is None:
     tokenizer.pad_token_id = tokenizer.eos_token_id
 id2label_hs = {0: "No", 1: "Yes"}
 turkishBERTweet_hs = AutoModelForSequenceClassification.from_pretrained(
     peft_config.base_model_name_or_path, return_dict=True, num_labels=len(id2label_hs), id2label=id2label_hs
 )
 turkishBERTweet_hs = PeftModel.from_pretrained(turkishBERTweet_hs, peft_model)
 sample_texts = [
     "Viral lab da insanlar hep birlikte çalışıyorlar. hepbirlikte çalışan insanlar birbirlerine yakın oluyorlar.",
     "kasmayin artik ya kac kere tanik olduk bu azgin tehlikeli \u201cmultecilerin\u201d yaptiklarina? bir afgan taragindan kafasi tasla ezilip tecavuz edilen kiza da git boyle cihangir solculugu yap yerse?",
     ]
 preprocessed_texts = [preprocess(s) for s in sample_texts]
 with torch.no_grad():
     for s in preprocessed_texts:
         ids = tokenizer.encode_plus(s, return_tensors="pt")
-        label_id = best_model_hs(**ids).logits.argmax(-1).item()
         print(id2label_hs[label_id],":", s)
 ```
 ```output
 No : viral lab da insanlar hep birlikte çalışıyorlar. hepbirlikte çalışan insanlar birbirlerine yakın oluyorlar.
 Yes : kasmayin artik ya kac kere tanik olduk bu azgin tehlikeli “multecilerin” yaptiklarina? bir afgan taragindan kafasi tasla ezilip tecavuz edilen kiza da git boyle cihangir solculugu yap yerse?
 ```
@@ -221,3 +197,4 @@ Yes : kasmayin artik ya kac kere tanik olduk bu azgin tehlikeli ��multecilerin
 ## Acknowledgments
 We thank [Fatih Amasyali](https://avesis.yildiz.edu.tr/amasyali) for providing access to Tweet Sentiment datasets from Kemik group.
 This material is based upon work supported by the Google Cloud Research Credits program with the award GCP19980904. We also thank TUBITAK (121C220 and 222N311) for funding this project.

 ---
 language_creators:
 - unknown
 language:
 - unknown
 task_ids:
 - unknown
+widget:
+- text: "bugün <mask> hissediyorum"
 ---
 #### Table of contents
 1. [Introduction](#introduction)
 2. [Main results](#results)
 cd TurkishBERTweet
 python -m venv venv
 source venv/bin/activate
 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
 pip install peft
 pip install transformers
 ## <a name="preprocess"></a> Twitter Preprocessor
 ```python
 from Preprocessor import preprocess
 text = """Lab'ımıza "viral" adını verdik çünkü amacımız disiplinler arası sınırları aşmak ve aralarında yeni bağlantılar kurmak! 🔬 #ViralLab
 https://varollab.com/"""
 preprocessed_text = preprocess(text)
 print(preprocessed_text)
 ```
 import torch
 from transformers import AutoTokenizer, AutoModel
 from Preprocessor import preprocess
 tokenizer = AutoTokenizer.from_pretrained("VRLLab/TurkishBERTweet")
 turkishBERTweet = AutoModel.from_pretrained("VRLLab/TurkishBERTweet")
 text = """Lab'ımıza "viral" adını verdik çünkü amacımız disiplinler arası sınırları aşmak ve aralarında yeni bağlantılar kurmak! 💥🔬 #ViralLab #DisiplinlerArası #YenilikçiBağlantılar"""
 preprocessed_text = preprocess(text)
 input_ids = torch.tensor([tokenizer.encode(preprocessed_text)])
 with torch.no_grad():
     features = turkishBERTweet(input_ids)  # Models outputs are now tuples
 ```
     PeftModel,
     PeftConfig,
 )
 from transformers import (
     AutoModelForSequenceClassification,
     AutoTokenizer)
 from Preprocessor import preprocess
 peft_model = "VRLLab/TurkishBERTweet-Lora-SA"
 peft_config = PeftConfig.from_pretrained(peft_model)
 # loading Tokenizer
 padding_side = "right"
 tokenizer = AutoTokenizer.from_pretrained(
 )
 if getattr(tokenizer, "pad_token_id") is None:
     tokenizer.pad_token_id = tokenizer.eos_token_id
 id2label_sa = {0: "negative", 2: "positive", 1: "neutral"}
 turkishBERTweet_sa = AutoModelForSequenceClassification.from_pretrained(
     peft_config.base_model_name_or_path, return_dict=True, num_labels=len(id2label_sa), id2label=id2label_sa
 )
 turkishBERTweet_sa = PeftModel.from_pretrained(turkishBERTweet_sa, peft_model)
 sample_texts = [
     "Viral lab da insanlar hep birlikte çalışıyorlar. hepbirlikte çalışan insanlar birbirlerine yakın oluyorlar.",
     "americanin diplatlari turkiyeye gelmesin 😤",
     "Mark Zuckerberg ve Elon Musk'un boks müsabakası süper olacak! 🥷",
     "Adam dun ne yediğini unuttu"
     ]
 preprocessed_texts = [preprocess(s) for s in sample_texts]
 with torch.no_grad():
     for s in preprocessed_texts:
     PeftModel,
     PeftConfig,
 )
 from transformers import (
     AutoModelForSequenceClassification,
     AutoTokenizer)
 from Preprocessor import preprocess
 peft_model = "VRLLab/TurkishBERTweet-Lora-HS"
 peft_config = PeftConfig.from_pretrained(peft_model)
 # loading Tokenizer
 padding_side = "right"
 tokenizer = AutoTokenizer.from_pretrained(
 )
 if getattr(tokenizer, "pad_token_id") is None:
     tokenizer.pad_token_id = tokenizer.eos_token_id
 id2label_hs = {0: "No", 1: "Yes"}
 turkishBERTweet_hs = AutoModelForSequenceClassification.from_pretrained(
     peft_config.base_model_name_or_path, return_dict=True, num_labels=len(id2label_hs), id2label=id2label_hs
 )
 turkishBERTweet_hs = PeftModel.from_pretrained(turkishBERTweet_hs, peft_model)
 sample_texts = [
     "Viral lab da insanlar hep birlikte çalışıyorlar. hepbirlikte çalışan insanlar birbirlerine yakın oluyorlar.",
     "kasmayin artik ya kac kere tanik olduk bu azgin tehlikeli \u201cmultecilerin\u201d yaptiklarina? bir afgan taragindan kafasi tasla ezilip tecavuz edilen kiza da git boyle cihangir solculugu yap yerse?",
     ]
 preprocessed_texts = [preprocess(s) for s in sample_texts]
 with torch.no_grad():
     for s in preprocessed_texts:
         ids = tokenizer.encode_plus(s, return_tensors="pt")
+        label_id = turkishBERTweet_hs(**ids).logits.argmax(-1).item()
         print(id2label_hs[label_id],":", s)
 ```
 ```output
 No : viral lab da insanlar hep birlikte çalışıyorlar. hepbirlikte çalışan insanlar birbirlerine yakın oluyorlar.
 Yes : kasmayin artik ya kac kere tanik olduk bu azgin tehlikeli “multecilerin” yaptiklarina? bir afgan taragindan kafasi tasla ezilip tecavuz edilen kiza da git boyle cihangir solculugu yap yerse?
 ```
 ## Acknowledgments
 We thank [Fatih Amasyali](https://avesis.yildiz.edu.tr/amasyali) for providing access to Tweet Sentiment datasets from Kemik group.
 This material is based upon work supported by the Google Cloud Research Credits program with the award GCP19980904. We also thank TUBITAK (121C220 and 222N311) for funding this project.