File size: 2,077 Bytes
b12fb97
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ba5967d
b12fb97
 
b3ea9d4
 
b12fb97
 
 
 
 
 
 
 
 
 
 
ba5967d
 
b12fb97
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
---
language: en
tags:
- Explain code
- Code Summarization
- Summarization

license: mit
---


# Gemini

For in-depth understanding of our model and methods, please see our blog [here](https://www.describe-ai.com/gemini)

## Model description

Gemini is a transformer based on Google's T5 model. The model is pre-trained on approximately 800k code/description pairs and then fine-tuned on 10k higher-level explanations that were synthetically generated. Gemini is capable of summarization/explaining short to medium code snippets in:

- Python
- Javascript (mostly vanilla JS, however, it can handle frameworks like React as well)
- Java
- Ruby
- Go

And outputs a description in English. 

## Intended uses & limitations

Gemini without any additional fine-tuning is capable of explaining code in a sentence or two and typically performs best in Python and Javascript. We recommend using Gemini for either simple code explanation, documentation or producing more synthetic data to improve its explanations.

### How to use

You can use this model directly with a pipeline for Text2Text generation, as shown below:

```python
from transformers import pipeline, set_seed

summarizer = pipeline('text2text-generation', model='describeai/gemini-small')
code = "print('hello world!')"

response = summarizer(code, max_length=100, num_beams=3)
print("Summarized code: " + response[0]['generated_text'])

```

Which should yield something along the lines of:

```
Summarized code: The following code is greeting the world.
```

### Model sizes

- Gemini: 770 Million Parameters
- Gemini-Small (this repo): 220 Million Parameters


### Limitations

Typically, Gemini may produce overly simplistic descriptions that don't encompass the entire code snippet. We suspect with more training data, this could be circumvented and will produce better results.


### About Us

A Describe.ai, we are focused on building Artificial Intelligence systems that can understand language as well as humans. While a long path, we plan to contribute our findings to our API to the Open Source community.