license: apache-2.0
Ask2Democracy project
What's baizemocracy-lora-7B-cfqa model?
This model is an open-source chat model fine-tuned with LoRA inspired by Baize project. It was trained with the Baize datasets and the ask2democracy-cfqa-salud-pension dataset, wich contains almost 4k instructions to answers questions based on a context relevant to citizen concerns and public debate in spanish. Two major experiments models was performed during the Hackathon Somos NLP 2023: A conversational style focused model and a contex focused style model. This model is focused in a more conversational way of asking questions. See Pre-proccessing dataset section. There is other model variation more focused on augmented retrieval based on context Baizemocracy-contextfocused.
Testing is a work in progress, we decide to share both model variations with community in order to invovle more people experimenting what it works better and find other possible use cases.
- Developed by:
- 🇨🇴 Jorge Henao
- 🇨🇴 David Torres
Training Parameters
- Base Model: LLaMA-7B
- Training Epoch: 1
- Batch Size: 16
- Maximum Input Length: 512
- Learning Rate: 2e-4
- LoRA Rank: 8
- Updated Modules: All Linears
Training Dataset
Ask2Democracy-cfqa-salud-pension (3,806)
Standford Alpaca (51,942)
Quora Dialogs (54,456):
StackOverflow Dialogs (57,046)
About pre-processing
require 'redcarpet'
markdown = Redcarpet.new("Hello World!")
puts markdown.to_html
def format_instruction_without_context(example): example["topic"] = example['input'] input = "La conversación entre un humano y un asistente de IA." input += "\n[|Human|] "+example['input'] input += "\n[|AI|] "+example["output"] if len(example["topics"])>0: topics = ", ".join(example["topics"]) input += "\n[|Human|] "+"¿En cuáles tópicos clasificarías su respuesta?" input += "\n[|AI|] "+f"Aquí una lista de tópicos: {topics}." example["topic"] += f" ({topics})" example["input"] = input return example`
More details can be found in the Ask2Democracy GitHub