ifuseok commited on
Commit
e85b59a
โ€ข
1 Parent(s): 9c44b63

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +64 -0
README.md ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Pretrained ELECTRA Language Model for Korean (bw-electra-base-discriminator)
2
+
3
+
4
+ ### Usage
5
+
6
+ ## Load Model and Tokenizer
7
+
8
+ ```python
9
+ from transformers import ElectraModel,TFElectraModel,ElectraTokenizer
10
+ # tensorflow
11
+ model = TFElectraModel.from_pretrained("ifuseok/bw-electra-base-discriminator")
12
+ # torch
13
+ #model = ElectraModel.from_pretrained("ifuseok/bw-electra-base-discriminator",from_tf=True)
14
+ tokenizer = ElectraTokenizer.from_pretrained("ifuseok/bw-electra-base-discriminator",do_lower)
15
+ ```
16
+
17
+ ## Tokenizer example
18
+ ```python
19
+ from transformers import ElectraTokenizer
20
+ tokenizer = ElectraTokenizer.from_pretrained("ifuseok/bw-electra-base-discriminator")
21
+ tokenizer.tokenize("[CLS] Big Wave ELECTRA ๋ชจ๋ธ์„ ๊ณต๊ฐœํ•ฉ๋‹ˆ๋‹ค. [SEP]")
22
+ ```
23
+
24
+ ## Example using ElectraForPreTraining(Torch)
25
+ ```python
26
+ import torch
27
+ from transformers import ElectraForPreTraining, ElectraTokenizer
28
+
29
+ discriminator = ElectraForPreTraining.from_pretrained("ifuseok/bw-electra-base-discriminator",from_tf=True)
30
+ tokenizer = ElectraTokenizer.from_pretrained("ifuseok/bw-electra-base-discriminator",do_lower_case=False)
31
+
32
+ sentence = "์•„๋ฌด๊ฒƒ๋„ ํ•˜๊ธฐ๊ฐ€ ์‹ซ๋‹ค."
33
+ fake_sentence = "์•„๋ฌด๊ฒƒ๋„ ํ•˜๊ธฐ๊ฐ€ ์ข‹๋‹ค."
34
+
35
+ fake_tokens = tokenizer.tokenize(fake_sentence)
36
+ fake_inputs = tokenizer.encode(fake_sentence, return_tensors="pt")
37
+
38
+ discriminator_outputs = discriminator(fake_inputs)
39
+ predictions = torch.round((torch.sign(discriminator_outputs[0]) + 1) / 2)
40
+
41
+ print(list(zip(fake_tokens, predictions.tolist()[0][1:-1])))
42
+ ```
43
+
44
+ ## Example using ElectraForPreTraining(Tensorflow)
45
+ ```python
46
+ import tensorflow as tf
47
+ from transformers import TFElectraForPreTraining, ElectraTokenizer
48
+
49
+ discriminator = TFElectraForPreTraining.from_pretrained("ifuseok/bw-electra-base-discriminator" )
50
+ tokenizer = ElectraTokenizer.from_pretrained("ifuseok/bw-electra-base-discriminator", use_auth_token=access_token
51
+ ,do_lower_case=False)
52
+
53
+ sentence = "์•„๋ฌด๊ฒƒ๋„ ํ•˜๊ธฐ๊ฐ€ ์‹ซ๋‹ค."
54
+ fake_sentence = "์•„๋ฌด๊ฒƒ๋„ ํ•˜๊ธฐ๊ฐ€ ์ข‹๋‹ค."
55
+
56
+ fake_tokens = tokenizer.tokenize(fake_sentence)
57
+ fake_inputs = tokenizer.encode(fake_sentence, return_tensors="tf")
58
+
59
+ discriminator_outputs = discriminator(fake_inputs)
60
+ predictions = tf.round((tf.sign(discriminator_outputs[0]) + 1)/2).numpy()
61
+
62
+ print(list(zip(fake_tokens, predictions.tolist()[0][1:-1])))
63
+
64
+ ```