--- language: - hi - gu - pa - as - ta - mr - bn - te - ml - kn --- Indic-Sentence-Completion --- license: other --- # Details The model cannot be commercially used. It's a fine-tuned Bloom-3B in several Indian languages: - Gujarati - Marathi - Bangali - Punjabi - Kannada - Malayalam - Telugu - Tamil - Hindi # Architecture Same as Bloom-3B, the model is decoder only. # Motivation behind the model fine-tuning - The model can be fine-tuned for any downstream task that requires the use of the aforementioned Indian languages - PEFT LoRA is advised. - Can be stacked with an Encoder if needed for any Sequence to Sequence task that requires aforementioned Indian languages # Example of getting inference from the model from transformers import AutoModel, AutoConfig, AutoModelForCausalLM, AutoTokenizer # Path to the directory containing the model files model_directory = "autopilot-ai/Indic-sentence-completion" tokenizer = AutoTokenizer.from_pretrained(model_directory) model = AutoModelForCausalLM.from_pretrained( model_directory, load_in_8bit=True, device_map="auto", ) # Load the model configuration config = AutoConfig.from_pretrained(model_directory) # Load the model model = AutoModel.from_pretrained(model_directory, config=config) batch = tokenizer("હેલો કેમ છો?", return_tensors='pt') with torch.cuda.amp.autocast(): output_tokens = model.generate(**batch, max_new_tokens=10) print('\n\n', tokenizer.decode(output_tokens[0], skip_special_tokens=True)) ## To run the above code snippet (in 8 bits), make sure to install the following pip install accelerate bitsandbytes