language: en
tags:
- Explain code
- Code Summarization
- Summarization
license: mit
Gemini
For in-depth understanding of our model and methods, please see our blog here
Model description
Gemini is a transformer based on Google's T5 model. The model is pre-trained on approximately 800k code/description pairs and then fine-tuned on 10k higher-level explanations that were synthetically generated. Gemini is capable of summarization/explaining short to medium code snippets in:
- Python
- Javascript (mostly vanilla JS, however, it can handle frameworks like React as well)
- Java
- Ruby
- Go
And outputs a description in English.
Intended uses & limitations
Gemini without any additional fine-tuning is capable of explaining code in a sentence or two and typically performs best in Python and Javascript. We recommend using Gemini for either simple code explanation, documentation or producing more synthetic data to improve its explanations.
How to use
You can use this model directly with a pipeline for Text2Text generation, as shown below:
from transformers import pipeline, set_seed
summarizer = pipeline('text2text-generation', model='describeai/gemini')
code = "print('hello world!')"
response = summarizer(code, max_length=100, num_beams=3)
print("Summarized code: " + response[0]['generated_text'])
Which should yield something along the lines of:
Summarized code: The following code is greeting the world.
Model sizes
Gemini: 770 Million Parameters Gemini-Small (this repo): 220 Million Parameters
Limitations
Typically, Gemini may produce overly simplistic descriptions that don't encompass the entire code snippet. We suspect with more training data, this could be circumvented and will produce better results.
About Us
A Describe.ai, we are focused on building Artificial Intelligence systems that can understand language as well as humans. While a long path, we plan to contribute our findings to our API to the Open Source community.