Model Description

A model that can generate Honeycomb Queries.

fine-tuned by Hamel Husain

How to Get Started with the Model

Make sure you install all dependencies

pip install transformers==transformers==4.36.2 datasets==2.15.0 peft==0.6.0 accelerate==0.24.1 bitsandbytes==0.41.3.post2 safetensors==0.4.1 scipy==1.11.4 sentencepiece==0.1.99 protobuf==4.23.4 --upgrade

Next, load the dependencies.

from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
model_id='hamel/hc-mistral-qlora-6'
model = AutoPeftModelForCausalLM.from_pretrained(model_id).cuda()
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token

Next, define a function that can help you with the prompt (alpaca style). It is important that you follow this prompt exactly, because this is how the model was trained.

def prompt(nlq, cols):
    return f"""[INST] <<SYS>>
Honeycomb AI suggests queries based on user input and candidate columns.
<</SYS>>

User Input: {nlq}

Candidate Columns: {cols}
[/INST]
"""

def prompt_tok(nlq, cols):
    _p = prompt(nlq, cols)
    input_ids = tokenizer(_p, return_tensors="pt", truncation=True).input_ids.cuda()
    out_ids = model.generate(input_ids=input_ids, max_new_tokens=5000, 
                          do_sample=False)
    return tokenizer.batch_decode(out_ids.detach().cpu().numpy(), 
                                  skip_special_tokens=True)[0][len(_p):]

Next, make predictions

nlq = "Exception count by exception and caller"
cols = ['error', 'exception.message', 'exception.type', 'exception.stacktrace', 'SampleRate', 'name', 'db.user', 'type', 'duration_ms', 'db.name', 'service.name', 'http.method', 'db.system', 'status_code', 'db.operation', 'library.name', 'process.pid', 'net.transport', 'messaging.system', 'rpc.system', 'http.target', 'db.statement', 'library.version', 'status_message', 'parent_name', 'aws.region', 'process.command', 'rpc.method', 'span.kind', 'serializer.name', 'net.peer.name', 'rpc.service', 'http.scheme', 'process.runtime.name', 'serializer.format', 'serializer.renderer', 'net.peer.port', 'process.runtime.version', 'http.status_code', 'telemetry.sdk.language', 'trace.parent_id', 'process.runtime.description', 'span.num_events', 'messaging.destination', 'net.peer.ip', 'trace.trace_id', 'telemetry.instrumentation_library', 'trace.span_id', 'span.num_links', 'meta.signal_type', 'http.route']

out = prompt_tok(nlq, cols)
print(out)

Training Details

See this wandb run

Training Data

~90k synthetically generated honeycomb queries. This data is located in alpaca_synth_queries.jsonl.

Training Procedure

Used axolotl, see this config.

Downloads last month
16
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for hamel/hc-mistral-qlora-6

Adapter
(1773)
this model