databricks/dolly-v2-7b · Dolly answers on its own.

Hi team,

I followed this example https://notebooks.databricks.com/demos/llm-dolly-chatbot/index.html# to build the same . I used the same template, my pdfs documents are related to commercial domain( like specification details about electronics products).

I did a test to verify, if I ask out of box questions to DollyV2 model whether it returns 'I don't know' or it provides answer on it own.

Example question:

how many players will be playing in a basket ball match?

I expect model to respond 'I don't know' since these details aren't present in my knowledge base(in my pdfs), but unfortunately model returns answer on its own. Could you please tell me how to stop this?

please find the code below

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch
from langchain import PromptTemplate
from langchain.llms import HuggingFacePipeline
from langchain.chains.question_answering import load_qa_chain
 
def build_qa_chain():
  torch.cuda.empty_cache()
  model_name = "databricks/dolly-v2-7b" # can use dolly-v2-3b or dolly-v2-7b for smaller model and faster inferences.
 
  # Increase max_new_tokens for a longer response
  # Other settings might give better results! Play around
  instruct_pipeline = pipeline(model=model_name, torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto", 
                               return_full_text=True, max_new_tokens=256, top_p=0.25, top_k=5)
  # Note: if you use dolly 12B or smaller model but a GPU with less than 24GB RAM, use 8bit. This requires %pip install bitsandbytes
  # instruct_pipeline = pipeline(model=model_name, trust_remote_code=True, device_map="auto", model_kwargs={'load_in_8bit': True})
  # For GPUs without bfloat16 support, like the T4 or V100, use torch_dtype=torch.float16 below
  # model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.float16, trust_remote_code=True)
 
  # Defining our prompt content.
  # langchain will load our similar documents as {context}
  template = """Below is an instruction that describes a task. Write a response that appropriately completes the request.
 
  Instruction: 
  You are a question answering agent and your job is to help providing the best answer. 
  Use only information in the following paragraphs to answer the question at the end. Explain the answer with reference to these paragraphs. If you don't know, say that you do not know.
 
  {context}
 
  Question: {question}
 
  Response:
  """
  prompt = PromptTemplate(input_variables=['context', 'question'], template=template)
 
  hf_pipe = HuggingFacePipeline(pipeline=instruct_pipeline)
  # Set verbose=True to see the full prompt:
  return load_qa_chain(llm=hf_pipe, chain_type="stuff", prompt=prompt, verbose=True)

@srowen can you please help me here?