Update README.md
Browse files
README.md
CHANGED
@@ -52,6 +52,7 @@ queries = [
|
|
52 |
get_detailed_instruct_query(task, 'Cis-acting lncRNAs control the expression of genes that are positioned in the vicinity of their transcription sites.'),
|
53 |
get_detailed_instruct_query(task, 'Forkhead 0 (fox0) transcription factors are involved in apoptosis.')
|
54 |
]
|
|
|
55 |
# No need to add instruction for retrieval documents
|
56 |
documents = [
|
57 |
get_detailed_instruct_passage("Gene regulation by the act of long non-coding RNA transcription Long non-protein-coding RNAs (lncRNAs) are proposed to be the largest transcript class in the mouse and human transcriptomes. Two important questions are whether all lncRNAs are functional and how they could exert a function. Several lncRNAs have been shown to function through their product, but this is not the only possible mode of action. In this review we focus on a role for the process of lncRNA transcription, independent of the lncRNA product, in regulating protein-coding-gene activity in cis. We discuss examples where lncRNA transcription leads to gene silencing or activation, and describe strategies to determine if the lncRNA product or its transcription causes the regulatory effect."),
|
@@ -60,16 +61,18 @@ documents = [
|
|
60 |
input_texts = queries + documents
|
61 |
|
62 |
max_length = 512
|
|
|
63 |
# Tokenize the input texts
|
64 |
batch_dict = tokenizer(input_texts, max_length=max_length, padding=True, truncation=True, return_tensors='pt')
|
65 |
|
66 |
model.eval()
|
67 |
with torch.no_grad():
|
68 |
-
|
69 |
-
|
70 |
```
|
71 |
|
72 |
Then similarity scores between the different sentences are obtained with a dot product between the embeddings:
|
|
|
73 |
```python
|
74 |
scores = (embeddings[:2] @ embeddings[2:].T)
|
75 |
print(scores.tolist())
|
|
|
52 |
get_detailed_instruct_query(task, 'Cis-acting lncRNAs control the expression of genes that are positioned in the vicinity of their transcription sites.'),
|
53 |
get_detailed_instruct_query(task, 'Forkhead 0 (fox0) transcription factors are involved in apoptosis.')
|
54 |
]
|
55 |
+
|
56 |
# No need to add instruction for retrieval documents
|
57 |
documents = [
|
58 |
get_detailed_instruct_passage("Gene regulation by the act of long non-coding RNA transcription Long non-protein-coding RNAs (lncRNAs) are proposed to be the largest transcript class in the mouse and human transcriptomes. Two important questions are whether all lncRNAs are functional and how they could exert a function. Several lncRNAs have been shown to function through their product, but this is not the only possible mode of action. In this review we focus on a role for the process of lncRNA transcription, independent of the lncRNA product, in regulating protein-coding-gene activity in cis. We discuss examples where lncRNA transcription leads to gene silencing or activation, and describe strategies to determine if the lncRNA product or its transcription causes the regulatory effect."),
|
|
|
61 |
input_texts = queries + documents
|
62 |
|
63 |
max_length = 512
|
64 |
+
|
65 |
# Tokenize the input texts
|
66 |
batch_dict = tokenizer(input_texts, max_length=max_length, padding=True, truncation=True, return_tensors='pt')
|
67 |
|
68 |
model.eval()
|
69 |
with torch.no_grad():
|
70 |
+
outputs = model(**batch_dict)
|
71 |
+
embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])
|
72 |
```
|
73 |
|
74 |
Then similarity scores between the different sentences are obtained with a dot product between the embeddings:
|
75 |
+
|
76 |
```python
|
77 |
scores = (embeddings[:2] @ embeddings[2:].T)
|
78 |
print(scores.tolist())
|