File size: 3,414 Bytes
4946fc2
 
 
 
 
 
 
 
 
 
687e5f0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
86c99d6
687e5f0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
964a3b7
687e5f0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
---
title: Haystack-7-wonders
emoji: 🚀
colorFrom: indigo
colorTo: red
sdk: docker
app_file: app.py
pinned: false
---

# Welcome! 

This chatbot uses RAG to answer questions about the Seven Wonders of the Ancient World. 

Here are sample questions you can ask it:

1. What is the Great Pyramid of Giza?
2. What is the Hanging Gardens of Babylon?
3. What is the Temple of Artemis at Ephesus?
4. What is the Statue of Zeus at Olympia?
5. What is the Mausoleum at Halicarnassus?
6. Where is Gardens of Babylon?
7. Why did people build Great Pyramid of Giza?
8. What does Rhodes Statue look like?
9. Why did people visit the Temple of Artemis?
10. What is the importance of Colossus of Rhodes?
11. What happened to the Tomb of Mausolus?
12. How did Colossus of Rhodes collapse?

## How is it built?

### Poetry package management

This project uses [Poetry](https://python-poetry.org/) for package management.

It uses [this `pyproject.toml` file](pyproject.toml)

To install dependencies:

```bash
pip install poetry
poetry install
```

### Data source: 

The data is from the [Seven Wonders dataset][1] on Hugging Face. https://huggingface.co/datasets/bilgeyucel/seven-wonders

### Method

The chatbots retrieval mechanism is developed using Retrieval Augmented Generation (RAG) with [Haystack](https://haystack.deepset.ai/tutorials/22_pipeline_with_promptnode) and its user interface is built with [Chainlit](https://docs.chainlit.io/overview). It is using OpenAI GPT-3.5-turbo. 

### Pipeline steps (Haystack) - check the full script here: [app.py](app.py)

1. Initialize in-memory Document store

```python
# Initialize Haystack's QA system
document_store = InMemoryDocumentStore(use_bm25=True)
```
2. Load dataset from HF

```python
dataset = load_dataset("bilgeyucel/seven-wonders", split="train")
```

3. Transform documents and load into document store

```python
document_store.write_documents(dataset)
```
4. Initialize a RAG prompt

```
rag_prompt = PromptTemplate(
    prompt="""Synthesize a brief answer from the following text for the given question.
                             Provide a clear and concise response that summarizes the key points and information presented in the text.
                             Your answer should be in your own words and be no longer than 50 words.
                             \n\n Related text: {join(documents)} \n\n Question: {query} \n\n Answer:""",
    output_parser=AnswerParser(),
)

```

5. Set the nodes using GPT-3.5-turbo

```python
 Set up nodes
retriever = BM25Retriever(document_store=document_store, top_k=2)
pn = PromptNode("gpt-3.5-turbo", 
                api_key=MY_API_KEY, 
                model_kwargs={"stream":False},
                default_prompt_template=rag_prompt)

```

6. Build the pipeline

```python
# Set up pipeline
pipe = Pipeline()
pipe.add_node(component=retriever, name="retriever", inputs=["Query"])
pipe.add_node(component=pn, name="prompt_node", inputs=["retriever"])
```

### Connecting the pipeline to Chainlit

```python

@cl.on_message
async def main(message: str):
    # Use the pipeline to get a response
    output = pipe.run(query=message)

    # Create a Chainlit message with the response
    response = output['answers'][0].answer
    msg = cl.Message(content=response)

    # Send the message to the user
    await msg.send()
```

### Run application

``` bash
poetry run chainlit run app.py --port 7860
```