AhmedSSoliman commited on
Commit
c8eec10
·
1 Parent(s): c0dd635

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -0
README.md ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ widget:
3
+ - text: "create array containing the maximum value of respective elements of array `[2, 3, 4]` and array `[1, 5, 2]"
4
+ - text: "check if all elements in list `mylist` are identical"
5
+ - text: "enable debug mode on flask application `app`"
6
+ - text: "getting the length of `my_tuple`"
7
+ - text: 'find all files in directory "/mydir" with extension ".txt"'
8
+ ---
9
+
10
+ # MarianCG: A TRANSFORMER MODEL FOR AUTOMATIC CODE GENERATION
11
+ This model is to improve the solving of the code generation problem and implement a transformer model that can work with high accurate results. We implemented MarianCG transformer model which is a code generation model that can be able to generate code from natural language. This work declares the impact of using Marian machine translation model for solving the problem of code generation. In our implementation we prove that a machine translation model can be operated and working as a code generation model. Finally, we set the new contributors and state-of-the-art on CoNaLa reaching 10.2 Exact Match Accuracy and a BLEU score of 34.43 in the code generation problem with CoNaLa dataset.
12
+
13
+ MarianCG model and its implemetation with the code of training and the generated output is available at this repository:
14
+ https://github.com/AhmedSSoliman/MarianCG-NL-to-Code
15
+
16
+
17
+ CoNaLa Dataset for Code Generation is available at
18
+ https://huggingface.co/datasets/AhmedSSoliman/CoNaLa-Large
19
+
20
+ This is the model is avialable on the huggingface hub https://huggingface.co/AhmedSSoliman/MarianCG-CoNaLa-Large
21
+ ```python
22
+ # Model and Tokenizer
23
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
24
+ # model_name = "AhmedSSoliman/MarianCG-NL-to-Code"
25
+ model = AutoModelForSeq2SeqLM.from_pretrained("AhmedSSoliman/MarianCG-CoNaLa-Large")
26
+ tokenizer = AutoTokenizer.from_pretrained("AhmedSSoliman/MarianCG-CoNaLa-Large")
27
+ # Input (Natural Language) and Output (Python Code)
28
+ NL_input = "create array containing the maximum value of respective elements of array `[2, 3, 4]` and array `[1, 5, 2]"
29
+ output = model.generate(**tokenizer(NL_input, padding="max_length", truncation=True, max_length=512, return_tensors="pt"))
30
+ output_code = tokenizer.decode(output[0], skip_special_tokens=True)
31
+ ```
32
+
33
+ This model is available in spaces using gradio at: https://huggingface.co/spaces/AhmedSSoliman/MarianCG-CoNaLa-Large
34
+
35
+
36
+ ---
37
+ Tasks:
38
+ - Translation
39
+ - Code Generation
40
+ - Text2Text Generation
41
+ - Text Generation
42
+ ---