# Mid-term project: NVIDIA Report chatbot

In the following notebook we'll build RAG pipelines that will allow us to interactively retrieve information from the report "NVIDIA 10-k Filings". We will further use Ragas to evaluate component-wise metrics, as well as end-to-end metrics about the performance of our RAG pipelines.

## Set Environment Variables

Let's set up our OpenAI API key so we can leverage their API later on.

In [87]:
import os
import openai
from openai import AsyncOpenAI  # importing openai for API usage
import chainlit as cl  # importing chainlit for our app
from chainlit.prompt import Prompt, PromptMessage  # importing prompt tools
from chainlit.playground.providers import ChatOpenAI  # importing ChatOpenAI tools

from getpass import getpass

openai.api_key = getpass("Please provide your OpenAI Key: ")
os.environ["OPENAI_API_KEY"] = openai.api_key



## Building our RAG pipeline

### Creating an Index

You'll notice that the largest changes (outside of some import changes) are that our old favourite chains are back to being bundled in an easily usable abstraction.

We can still create custom chains using LCEL - but we can also be more confident that our pre-packaged chains are creating using LCEL under the hood.

#### Loading Data

In [88]:
from langchain_community.document_loaders import PyMuPDFLoader

loader = PyMuPDFLoader(
    "NVIDIA_report.pdf",
)

documents = loader.load()

In [89]:
documents[0].metadata

{'source': 'data/NVIDIA_report.pdf',
 'file_path': 'data/NVIDIA_report.pdf',
 'page': 0,
 'total_pages': 96,
 'format': 'PDF 1.4',
 'title': '0001045810-24-000029',
 'author': 'EDGAR® Online LLC, a subsidiary of OTC Markets Group',
 'subject': 'Form 10-K filed on 2024-02-21 for the period ending 2024-01-28',
 'keywords': '0001045810-24-000029; ; 10-K',
 'creator': 'EDGAR Filing HTML Converter',
 'producer': 'EDGRpdf Service w/ EO.Pdf 22.0.40.0',
 'creationDate': "D:20240221173732-05'00'",
 'modDate': "D:20240221173744-05'00'",
 'trapped': '',
 'encryption': 'Standard V2 R3 128-bit RC4'}

In [90]:
len(documents)

96

#### Transforming Data

Now that we've got our single document - let's split it into smaller pieces so we can more effectively leverage it with our retrieval chain!

We'll start with the classic: `RecursiveCharacterTextSplitter`.

In [91]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,
    chunk_overlap = 100
)

documents = text_splitter.split_documents(documents)

Let's confirm we've split our document.

In [92]:
len(documents)

438

In [93]:
documents[1]

Document(page_content='Title of each class\nTrading Symbol(s)\nName of each exchange on which registered\nCommon Stock, $0.001 par value per share\nNVDA\nThe Nasdaq Global Select Market\nSecurities registered pursuant to Section 12(g) of the Act:\nNone\nIndicate by check mark if the registrant is a well-known seasoned issuer, as defined in Rule 405 of the Securities Act.    Yes ☐ No ☒\nIndicate by check mark if the registrant is not required to file reports pursuant to Section 13 or Section 15(d) of the Act.    Yes ☐ No ☒\nIndicate by check mark whether the registrant (1) has filed all reports required to be filed by Section 13 or 15(d) of the Securities Exchange Act of 1934 during the preceding 12 months (or for such shorter\nperiod that the registrant was required to file such reports), and (2) has been subject to such filing requirements for the past 90 days. Yes ☒ No ☐', metadata={'source': 'data/NVIDIA_report.pdf', 'file_path': 'data/NVIDIA_report.pdf', 'page': 0, 'total_pages': 9

#### Loading OpenAI Embeddings Model

We will use use OpenAI's `text-embedding-3-small` for this task.

In [94]:
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small"
)

#### Creating a FAISS VectorStore

Now that we have documents - we'll need a place to store them alongside their embeddings.

In [95]:
from langchain_community.vectorstores import FAISS

vector_store = FAISS.from_documents(documents, embeddings)

2024-03-13 18:24:10 - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


#### Creating a Retriever

To complete our index, all that's left to do is expose our vectorstore as a retriever:

In [96]:
retriever = vector_store.as_retriever()

#### Testing our Retriever

Now that we've gone through the trouble of creating our retriever - let's see it in action!

In [97]:
retrieved_documents = retriever.invoke("What is this document about?")

2024-03-13 18:24:13 - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


In [98]:
for doc in retrieved_documents:
  print(doc)

page_content='23.1*\nConsent of PricewaterhouseCoopers LLP\n24.1*\nPower of Attorney (included in signature page)\n31.1*\nCertification of Chief Executive Officer as required by Rule 13a-14(a) of the Securities Exchange Act of 1934\n31.2*\nCertification of Chief Financial Officer as required by Rule 13a-14(a) of the Securities Exchange Act of 1934\n32.1#*\nCertification of Chief Executive Officer as required by Rule 13a-14(b) of the Securities Exchange Act of 1934\n32.2#*\nCertification of Chief Financial Officer as required by Rule 13a-14(b) of the Securities Exchange Act of 1934\n97.1+*\nCompensation Recovery Policy, as amended and restated November 30, 2023\n101.INS*\nXBRL Instance Document\n101.SCH*\nXBRL Taxonomy Extension Schema Document\n101.CAL*\nXBRL Taxonomy Extension Calculation Linkbase Document\n101.DEF*\nXBRL Taxonomy Extension Definition Linkbase Document\n101.LAB*\nXBRL Taxonomy Extension Labels Linkbase Document\n101.PRE*\nXBRL Taxonomy Extension Presentation Linkbase 

### Creating a RAG Chain


#### Creating a Prompt Template


In [164]:
from langchain.prompts import ChatPromptTemplate

template = """Answer the question based only on the following context. If you cannot answer the question with the context, please respond with 'I don't know':

Context:
{context}

Question:
{question}
"""

prompt = ChatPromptTemplate.from_template(template)

#### Setting Up our Basic QA Chain

Now we can instantiate our basic RAG chain!

We'll use LCEL directly just to see an example of it - but you could just as easily use an abstraction here to achieve the same goal!

We'll also ensure to pass-through our context - which is critical for RAGAS.

In [100]:
from operator import itemgetter

from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

primary_qa_llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

retrieval_augmented_qa_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": prompt | primary_qa_llm, "context": itemgetter("context")}
)

Above we have a RAG chain that first uses Python's itemgetter to extract the "question" from input, passing it to a retriever but also keeping the original "question" intact. A RunnablePassthrough then temporarily holds the "context" (which is obtained as an output of the "question" chained into the retriever) without altering it. Finally, the "context" and "question" are used as inputs for a prompt for ChatOpenAI, generating a "response".

Let's test it out!

In [160]:
question = "What is the provided document about?"

result = retrieval_augmented_qa_chain.invoke({"question" : question})

print(result["response"].content)

2024-03-13 19:37:45 - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-03-13 19:37:46 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
I don't know.


In [102]:
question = "Who is the E-VP, Operations - and how old are they?"

result = retrieval_augmented_qa_chain.invoke({"question" : question})

print(result["response"].content)
print(result["context"])

2024-03-13 18:24:15 - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-03-13 18:24:16 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
Debora Shoquist is the Executive Vice President of Operations, and she is 69 years old.
[Document(page_content='Minnesota, an M.S.E.E. degree from the California Institute of Technology and an M.B.A. degree from Harvard Business School.\nDebora Shoquist joined NVIDIA in 2007 as Senior Vice President of Operations and in 2009 became Executive Vice President of Operations. Prior to NVIDIA,\nMs. Shoquist served from 2004 to 2007 as Executive Vice President of Operations at JDS Uniphase Corp., a provider of communications test and measurement\nsolutions and optical products for the telecommunications industry. She served from 2002 to 2004 as Senior Vice President and General Manager of the Electro-\nOptics business at Coherent, Inc., a manufacturer of commercial and scientific laser equipment. P

In [103]:
question = "What is the gross carrying amount of Total Amortizable Intangible Assets for Jan 29, 2023?"

result = retrieval_augmented_qa_chain.invoke({"question" : question})

print(result["response"].content)
print(result["context"])

2024-03-13 18:24:16 - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-03-13 18:24:17 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
$3,539 million
[Document(page_content='The following table outlines the estimated future amortization expense related to the net carrying amount of intangible assets as of January 28, 2024:\nFuture Amortization Expense\n \n(In millions)\nFiscal Year:\n \n2025\n$\n555 \n2026\n261 \n2027\n150 \n2028\n37 \n2029\n9 \n2030 and thereafter\n100 \nTotal\n$\n1,112 \n64', metadata={'source': 'data/NVIDIA_report.pdf', 'file_path': 'data/NVIDIA_report.pdf', 'page': 63, 'total_pages': 96, 'format': 'PDF 1.4', 'title': '0001045810-24-000029', 'author': 'EDGAR® Online LLC, a subsidiary of OTC Markets Group', 'subject': 'Form 10-K filed on 2024-02-21 for the period ending 2024-01-28', 'keywords': '0001045810-24-000029; ; 10-K', 'creator': 'EDGAR Filing HTML Converter', 'producer': 'EDGRpdf Service w/ EO.Pdf

## Synthetic Dataset Generation for Evaluation using Ragas

Ragas is a powerful library that lets us evaluate our RAG pipeline by collecting input/output/context triplets and obtaining metrics relating to a number of different aspects of our RAG pipeline.

We'll be evluating on every core metric today, but in order to do that - we'll need to creat a test set. Luckily for us, Ragas can do that directly!

### Synthetic Test Set Generation

We can leverage Ragas' [`Synthetic Test Data generation`](https://docs.ragas.io/en/stable/concepts/testset_generation.html) functionality to generate our own synthetic QC pairs - as well as a synthetic ground truth - quite easily!

> NOTE: This process will use `gpt-3.5-turbo-16k` as the base generator and `gpt-4` as the critic - if you're attempting to create a lot of samples please be aware of cost, as well as rate limits.

In [104]:
loader = PyMuPDFLoader(
    "NVIDIA_report.pdf",
)

eval_documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1500,
    chunk_overlap = 400
)

eval_documents = text_splitter.split_documents(eval_documents)

We split our documents using different parameters when creating our synthetic data because we want to test whether the system can handle unseen data and diverse scenarios effectively, not just the specific conditions it was trained or optimized on. A different strategy might reveal strengths or weaknesses that were not apparent under the training conditions, providing a better understanding of the system's performance and areas for improvement.

In [105]:
len(eval_documents)

340

In [132]:
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context

generator = TestsetGenerator.with_openai()

testset = generator.generate_with_langchain_docs(eval_documents, test_size=10, distributions={simple: 0.2, reasoning: 0.2, multi_context: 0.6})

embedding nodes:   0%|          | 0/680 [00:00<?, ?it/s]

2024-03-13 18:36:36 - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-03-13 18:36:36 - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-03-13 18:36:36 - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-03-13 18:36:36 - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-03-13 18:36:36 - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-03-13 18:36:36 - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-03-13 18:36:36 - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-03-13 18:36:36 - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-03-13 18:36:36 - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-03-13 18:36:36 - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-03-13 18:36:36 - HTTP Request: POST

Generating:   0%|          | 0/10 [00:00<?, ?it/s]

2024-03-13 18:38:28 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-03-13 18:38:28 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-03-13 18:38:28 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-03-13 18:38:28 - retrying evolution: 0 times
2024-03-13 18:38:28 - retrying evolution: 0 times
2024-03-13 18:38:28 - retrying evolution: 0 times
2024-03-13 18:38:28 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-03-13 18:38:28 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-03-13 18:38:28 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-03-13 18:38:28 - retrying evolution: 0 times
2024-03-13 18:38:28 - retrying evolution: 0 times
2024-03-13 18:38:28 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-03-13 18:38:28 - HTTP Request: 

RAGAS provides a synthetic Q&A data genaration module that can cover different levels of complexity. First, 'simple' questions are generated where seeding is used to ensure diversity. Then, the original questions might undergo "evolutions", whereby they become more convolved. The new questions might require reasoning in order to be answered ('reasoning' questions), or might require information contained in multiple chunks ('multi_context' questions). This is a way to simulate the variability in queries that production RAG systems might receive. 

> NOTE: Ragas documentation on this generation process [here](https://docs.ragas.io/en/stable/concepts/testset_generation.html).

Let's look at the output:

In [133]:
testset.test_data[0]

DataRow(question='What is the revenue contribution of the Compute & Networking segment for fiscal year 2024?', contexts=['represented approximately 19% of total revenue for fiscal year 2024, attributable to the Compute & Networking segment.\nOur estimated Compute & Networking demand is expected to remain concentrated.\nThere were no customers with 10% or more of total revenue for fiscal years 2023 and 2022.\nGross Profit and Gross Margin\nGross profit consists of total revenue, net of allowances, less cost of revenue. Cost of revenue consists primarily of the cost of semiconductors, including wafer\nfabrication, assembly, testing and packaging, board and device costs, manufacturing support costs, including labor and overhead associated with such\npurchases, final test yield fallout, inventory and warranty provisions, memory and component costs, tariffs, and shipping costs. Cost of revenue also includes\nacquisition-related costs, development costs for license and service arrangements, 

### Generating Responses with RAG Pipeline

Now that we have some QC pairs, and some ground truths, let's evaluate our RAG pipeline using Ragas. Let's start by extracting our questions and ground truths from our create testset. We can start by converting our test dataset into a Pandas DataFrame.

In [134]:
test_df = testset.to_pandas()

In [135]:
test_df

Unnamed: 0,question,contexts,ground_truth,evolution_type,episode_done
0,What is the revenue contribution of the Comput...,[represented approximately 19% of total revenu...,The revenue contribution of the Compute & Netw...,simple,True
1,What is the purpose of entering into foreign c...,[Table of Contents\nNVIDIA Corporation and Sub...,The purpose of entering into foreign currency ...,simple,True
2,What are the potential impacts on our income t...,"[is reduced, our provision for income taxes, r...",The potential impacts on our income taxes and ...,reasoning,True
3,What is NVIDIA Corporation's income tax policy...,[Table of Contents\nNVIDIA Corporation and Sub...,,reasoning,True
4,What could happen to our business if we don't ...,"[covered by insurance may be large, which coul...",Our business could face legal action or reputa...,multi_context,True
5,What expenses for impacted employees are inclu...,"[– Risks Related to Regulatory, Legal, Our Sto...",Expenses for financial support to impacted emp...,multi_context,True
6,"""How can NVIDIA AI and Omniverse help build Ea...","[the top supercomputer, on the Green500 list.\...",NVIDIA AI and NVIDIA Omniverse platforms can h...,multi_context,True
7,What role does AI play in modern technology wi...,[Table of Contents\nPart I\nItem 1. Business\n...,AI plays a significant role in modern technolo...,multi_context,True
8,What are the potential consequences of quality...,"[Table of Contents\ntransitions, and we may be...",The potential consequences of quality or produ...,multi_context,True
9,What was the percentage change in center reven...,[Center revenue growth of 217% and lower net i...,,multi_context,True


In [136]:
test_questions = test_df["question"].values.tolist()
test_groundtruths = test_df["ground_truth"].values.tolist()

Now we'll generate responses using our RAG pipeline using the questions we've generated - we'll also need to collect our retrieved contexts for each question.

We'll do this in a simple loop to see exactly what's happening!

In [137]:
answers = []
contexts = []

for question in test_questions:
  response = retrieval_augmented_qa_chain.invoke({"question" : question})
  answers.append(response["response"].content)
  contexts.append([context.page_content for context in response["context"]])

2024-03-13 18:39:00 - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-03-13 18:39:01 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-03-13 18:39:01 - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-03-13 18:39:02 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-03-13 18:39:04 - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-03-13 18:39:06 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-03-13 18:39:06 - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-03-13 18:39:07 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-03-13 18:39:07 - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-03-13 18:39:08 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-03-13

Now we can wrap our information in a Hugging Face dataset for use in the Ragas library.

In [138]:
from datasets import Dataset

response_dataset = Dataset.from_dict({
    "question" : test_questions,
    "answer" : answers,
    "contexts" : contexts,
    "ground_truth" : test_groundtruths
})

Let's take a peek and see what that looks like!

In [139]:
response_dataset[0]

{'question': 'What is the revenue contribution of the Compute & Networking segment for fiscal year 2024?',
 'answer': 'The revenue contribution of the Compute & Networking segment for fiscal year 2024 is $32,016 million.',
 'contexts': ['United States\n$\n26,966 \n$\n8,292 \n$\n4,349 \nTaiwan\n13,405 \n6,986 \n8,544 \nChina (including Hong Kong)\n10,306 \n5,785 \n7,111 \nOther countries\n10,245 \n5,911 \n6,910 \nTotal revenue\n$\n60,922 \n$\n26,974 \n$\n26,914 \nRevenue from sales to customers outside of the United States accounted for 56%, 69%, and 84% of total revenue for fiscal years 2024, 2023, and 2022,\nrespectively. The increase in revenue to the United States for fiscal year 2024 was primarily due to higher U.S.-based Compute & Networking segment demand.\nSales to one customer represented 13% of total revenue for fiscal year 2024, which was attributable to the Compute & Networking segment. No customer\nrepresented 10% or more of total revenue for fiscal years 2023 and 2022.\nTh

## Task 2: Evaluating our Pipeline with Ragas

Now that we have our response dataset, we can get into evaluation!
First, we'll import the desired metrics, then we can use them to evaluate our created dataset.
Check out the specific metrics we'll be using in the Ragas documentation:

- [Faithfulness](https://docs.ragas.io/en/stable/concepts/metrics/faithfulness.html)
- [Answer Relevancy](https://docs.ragas.io/en/stable/concepts/metrics/answer_relevance.html)
- [Context Precision](https://docs.ragas.io/en/stable/concepts/metrics/context_precision.html)
- [Context Recall](https://docs.ragas.io/en/stable/concepts/metrics/context_recall.html)
- [Answer Correctness](https://docs.ragas.io/en/stable/concepts/metrics/answer_correctness.html)


In [140]:
from ragas import evaluate
from ragas.metrics import (
    faithfulness,
    answer_relevancy,
    answer_correctness,
    context_recall,
    context_precision,
)

metrics = [
    faithfulness,
    answer_relevancy,
    context_recall,
    context_precision,
    answer_correctness,
]

All that's left to do is call "evaluate" and away we go!

In [141]:
results = evaluate(response_dataset, metrics)

Evaluating:   0%|          | 0/50 [00:00<?, ?it/s]

2024-03-13 18:39:16 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-03-13 18:39:16 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-03-13 18:39:16 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-03-13 18:39:17 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-03-13 18:39:17 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-03-13 18:39:17 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-03-13 18:39:17 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-03-13 18:39:17 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-03-13 18:39:18 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-03-13 18:39:18 - HTTP Request: POST https://api.openai.com/v1/chat/completion

In [142]:
print(results)

{'faithfulness': 1.0000, 'answer_relevancy': 0.7688, 'context_recall': 0.8000, 'context_precision': 0.6472, 'answer_correctness': 0.6008}


In [143]:
results_df = results.to_pandas()
results_df

Unnamed: 0,question,answer,contexts,ground_truth,faithfulness,answer_relevancy,context_recall,context_precision,answer_correctness
0,What is the revenue contribution of the Comput...,The revenue contribution of the Compute & Netw...,"[United States\n$\n26,966 \n$\n8,292 \n$\n4,34...",The revenue contribution of the Compute & Netw...,1.0,1.0,1.0,0.638889,0.742716
1,What is the purpose of entering into foreign c...,To mitigate the impact of foreign currency mov...,[comprehensive income or loss and reclassified...,The purpose of entering into foreign currency ...,1.0,0.947885,1.0,1.0,0.723586
2,What are the potential impacts on our income t...,The potential impacts on income taxes and fina...,"[adverse tax impacts, which may materially imp...",The potential impacts on our income taxes and ...,1.0,0.958867,1.0,1.0,0.61902
3,What is NVIDIA Corporation's income tax policy...,I don't know.,[Table of Contents\nNVIDIA Corporation and Sub...,,,0.0,0.0,0.0,0.198187
4,What could happen to our business if we don't ...,Our business could face legal action or reputa...,"[greater direct costs, including costs associa...",Our business could face legal action or reputa...,1.0,0.952945,1.0,1.0,0.74706
5,What expenses for impacted employees are inclu...,Financial support and charitable activity expe...,"[Macroeconomic Factors\nMacroeconomic factors,...",Expenses for financial support to impacted emp...,1.0,0.90975,1.0,0.583333,0.747494
6,"""How can NVIDIA AI and Omniverse help build Ea...",NVIDIA AI and Omniverse can help build Earth-2...,[television graphics.\nThe NVIDIA RTX platform...,NVIDIA AI and NVIDIA Omniverse platforms can h...,1.0,0.953163,1.0,0.333333,0.746826
7,What role does AI play in modern technology wi...,AI plays a significant role in modern technolo...,[marking the “Big Bang” moment of AI. We intro...,AI plays a significant role in modern technolo...,1.0,1.0,1.0,1.0,0.53813
8,What are the potential consequences of quality...,The potential consequences of quality or produ...,[lead times. Qualification time for new produc...,The potential consequences of quality or produ...,1.0,0.965887,1.0,0.916667,0.746463
9,What was the percentage change in center reven...,I don't know.,"[acquisition-related costs, development costs ...",,,0.0,0.0,0.0,0.1982


## Making Adjustments to our RAG Pipeline

Now that we have established a baseline - we can see how any changes impact our pipeline's performance!

Let's modify our retriever and see how that impacts our Ragas metrics!

In [165]:
from langchain.retrievers import MultiQueryRetriever

advanced_retriever = MultiQueryRetriever.from_llm(retriever=retriever, llm=primary_qa_llm)

We'll also re-create our RAG pipeline using the abstractions that come packaged with LangChain v0.1.0!

First, let's create a chain to "stuff" our documents into our context!

In [173]:
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain import hub

retrieval_qa_prompt = hub.pull("langchain-ai/retrieval-qa-chat")

document_chain = create_stuff_documents_chain(primary_qa_llm, retrieval_qa_prompt)

Next, we'll create the retrieval chain!

In [174]:
from langchain.chains import create_retrieval_chain

retrieval_chain = create_retrieval_chain(advanced_retriever, document_chain)

In [175]:
response = retrieval_chain.invoke({"input": "What is the provided document about?"})

2024-03-13 20:04:07 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-03-13 20:04:07 - Generated queries: ['1. Can you summarize the content of the document?', '2. What information does the document contain?', "3. Could you give me an overview of the document's main topics?"]
2024-03-13 20:04:07 - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-03-13 20:04:07 - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-03-13 20:04:08 - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-03-13 20:04:09 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [176]:
print(response["answer"])

The provided document is an Annual Report on Form 10-K for NVIDIA Corporation. It includes various sections such as Business, Risk Factors, Cybersecurity, Market for Registrant’s Common Equity, Financial Statements, Directors and Executive Officers, Executive Compensation, and other relevant information required by the Securities and Exchange Commission (SEC).


In [177]:
response = retrieval_chain.invoke({"input": "What is the gross carrying amount of Total Amortizable Intangible Assets for Jan 29, 2023?"})

2024-03-13 20:04:12 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-03-13 20:04:12 - Generated queries: ['1. How much is the total carrying value of Amortizable Intangible Assets as of January 29, 2023?', '2. What is the total gross amount of Amortizable Intangible Assets that can be amortized as of January 29, 2023?', '3. Can you provide the total value of Amortizable Intangible Assets that are eligible for amortization on January 29, 2023?']
2024-03-13 20:04:12 - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-03-13 20:04:12 - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-03-13 20:04:13 - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-03-13 20:04:14 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [178]:
print(response["answer"])

The gross carrying amount of Total Amortizable Intangible Assets for Jan 29, 2023, was $3,539 million.


Well, just from those responses this chain *feels* better - but lets see how it performs on our eval!

Let's do the same process we did before to collect our pipeline's contexts and answers.

In [179]:
answers = []
contexts = []

for question in test_questions:
  response = retrieval_chain.invoke({"input" : question})
  answers.append(response["answer"])
  contexts.append([context.page_content for context in response["context"]])

2024-03-13 20:04:16 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-03-13 20:04:16 - Generated queries: ['1. How much revenue does the Compute & Networking segment contribute in fiscal year 2024?', '2. What is the financial impact of the Compute & Networking segment on the revenue for fiscal year 2024?', '3. Can you provide information on the revenue generated by the Compute & Networking segment in fiscal year 2024?']
2024-03-13 20:04:16 - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-03-13 20:04:16 - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-03-13 20:04:17 - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-03-13 20:04:18 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-03-13 20:04:19 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-03-13 20:04:19 - Generated queries: ['1. W

Now we can convert this into a dataset, just like we did before.

In [180]:
response_dataset_advanced_retrieval = Dataset.from_dict({
    "question" : test_questions,
    "answer" : answers,
    "contexts" : contexts,
    "ground_truth" : test_groundtruths
})

Let's evaluate on the same metrics we did for the first pipeline and see how it does:

In [181]:
advanced_retrieval_results = evaluate(response_dataset_advanced_retrieval, metrics)

Evaluating:   0%|          | 0/50 [00:00<?, ?it/s]

2024-03-13 20:05:06 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-03-13 20:05:06 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-03-13 20:05:06 - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-03-13 20:05:06 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-03-13 20:05:06 - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-03-13 20:05:06 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-03-13 20:05:06 - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-03-13 20:05:06 - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2024-03-13 20:05:07 - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-03-13 20:05:07 - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2024-03-13

In [182]:
advanced_retrieval_results_df = advanced_retrieval_results.to_pandas()
advanced_retrieval_results_df

Unnamed: 0,question,answer,contexts,ground_truth,faithfulness,answer_relevancy,context_recall,context_precision,answer_correctness
0,What is the revenue contribution of the Comput...,The revenue contribution of the Compute & Netw...,"[United States\n$\n26,966 \n$\n8,292 \n$\n4,34...",The revenue contribution of the Compute & Netw...,1.0,0.995541,1.0,0.638889,0.741987
1,What is the purpose of entering into foreign c...,The purpose of entering into foreign currency ...,[comprehensive income or loss and reclassified...,The purpose of entering into foreign currency ...,,0.984503,1.0,1.0,0.748117
2,What are the potential impacts on our income t...,Changes in tax laws globally and in foreign ju...,"[adverse tax impacts, which may materially imp...",The potential impacts on our income taxes and ...,1.0,0.930095,1.0,1.0,0.741731
3,What is NVIDIA Corporation's income tax policy...,"Based on the provided context, NVIDIA Corporat...",[Table of Contents\nNVIDIA Corporation and Sub...,,1.0,0.956239,0.0,0.0,0.181602
4,What could happen to our business if we don't ...,If the business does not address climate chang...,"[greater direct costs, including costs associa...",Our business could face legal action or reputa...,1.0,0.933921,1.0,1.0,0.438464
5,What expenses for impacted employees are inclu...,The operating expenses for fiscal year 2024 in...,[employees in the region who primarily support...,Expenses for financial support to impacted emp...,1.0,0.897351,1.0,1.0,0.530663
6,"""How can NVIDIA AI and Omniverse help build Ea...",NVIDIA AI and Omniverse can help build Earth-2...,[television graphics.\nThe NVIDIA RTX platform...,NVIDIA AI and NVIDIA Omniverse platforms can h...,1.0,0.940676,0.5,0.5,0.888234
7,What role does AI play in modern technology wi...,AI plays a significant role in modern technolo...,[underlying technology by using a variety of s...,AI plays a significant role in modern technolo...,0.875,1.0,1.0,1.0,0.44978
8,What are the potential consequences of quality...,Quality or production issues could potentially...,[lead times. Qualification time for new produc...,The potential consequences of quality or produ...,1.0,0.924797,1.0,0.804167,0.903615
9,What was the percentage change in center reven...,The Data Center revenue growth for fiscal year...,"[acquisition-related costs, development costs ...",,,0.0,0.0,0.25,0.179007


## Evaluating our Adjusted Pipeline Against Our Baseline

Now we can compare our results and see what directional changes occured. Let's refresh with our initial metrics.

In [183]:
results

{'faithfulness': 1.0000, 'answer_relevancy': 0.7688, 'context_recall': 0.8000, 'context_precision': 0.6472, 'answer_correctness': 0.6008}

And see how our advanced retrieval modified our chain:

In [184]:
advanced_retrieval_results

{'faithfulness': 0.9844, 'answer_relevancy': 0.8563, 'context_recall': 0.7500, 'context_precision': 0.7193, 'answer_correctness': 0.5803}

In [185]:
import pandas as pd

df_original = pd.DataFrame(list(results.items()), columns=['Metric', 'Baseline'])
df_comparison = pd.DataFrame(list(advanced_retrieval_results.items()), columns=['Metric', 'MultiQueryRetriever with Document Stuffing'])

df_merged = pd.merge(df_original, df_comparison, on='Metric')

df_merged['Delta'] = df_merged['MultiQueryRetriever with Document Stuffing'] - df_merged['Baseline']

df_merged

Unnamed: 0,Metric,Baseline,MultiQueryRetriever with Document Stuffing,Delta
0,faithfulness,1.0,0.984375,-0.015625
1,answer_relevancy,0.76885,0.856312,0.087463
2,context_recall,0.8,0.75,-0.05
3,context_precision,0.647222,0.719306,0.072083
4,answer_correctness,0.600768,0.58032,-0.020448


RAGAS surprisingly rated both RAG systems - with and without advanced retrieval - very similarly. But a very basic question asked in the beginning of this notebook, "What is the provided document about?" was only answered by the RAG system that used advanced retrieval. Likely this can be explained by the following: the question I asked was very informal and not very specific. In other words, it was a perfectly valid question but formulated with some degree of randomness. The advanced retriever rewrote the question in many different ways and immediately the RAG system could answer it very well. And indeed, in real world scenarios, RAG systemas are queried with questions that were not thoroughly thought of, but instead were quickly formulated and typed carelessly. Likely the advanced retriever would make a great difference in such cases, but this would not be picked up by RAGAS for the simple reason that gpt 3.5-turbo will not create poorly formulated questions unless speciffically prompted to do so (and that would require a very good prompt!)

In [186]:
user_template = """{input}
Think through your response step by step.
"""
@cl.on_chat_start  # marks a function that will be executed at the start of a user session
async def start_chat():
    settings = {
        "model": "gpt-3.5-turbo",
        "temperature": 1.0,
        "max_tokens": 500,
        "top_p": 1,
        "frequency_penalty": 0,
        "presence_penalty": 0,
    }

    cl.user_session.set("settings", settings)


@cl.on_message  # marks a function that should be run each time the chatbot receives a message from a user
async def main(message: cl.Message):
    settings = cl.user_session.get("settings")

    client = AsyncOpenAI(
    api_key=os.environ.get("OPENAI_API_KEY"),
)

    print(message.content)

    prompt = Prompt(
        provider=ChatOpenAI.id,
        messages=[
            PromptMessage(
                role="system",
                template=template,
                formatted=template,
            ),
            PromptMessage(
                role="user",
                template=user_template,
                formatted=user_template.format(input=message.content),
            ),
        ],
        inputs={"input": message.content},
        settings=settings,
    )

    print([m.to_openai() for m in prompt.messages])

    msg = cl.Message(content="")

    # Call OpenAI
    async for stream_resp in await client.chat.completions.create(
        messages=[m.to_openai() for m in prompt.messages], stream=True, **settings
    ):
        token = stream_resp.choices[0].delta.content
        if not token:
            token = ""
        await msg.stream_token(token)

    # Update the prompt object with the completion
    prompt.completion = msg.content
    msg.prompt = prompt

    # Send and close the message stream
    await msg.send()
