CitationIntentLLM
Collection
Collection of finetuned models for Citation Intent Classification
•
4 items
•
Updated
A fine-tuned model for Citation Intent Classification, based on Qwen 2.5 14B Instruct and trained on the SciCite dataset.
GGUF Version: https://huggingface.co/sknow-lab/Qwen2.5-14B-CIC-SciCite-GGUF
Class | Definition |
---|---|
Background information | The citation states, mentions, or points to the background information giving more context about a problem, concept, approach, topic, or importance of the problem in the field. |
Method | Making use of a method, tool, approach or dataset. |
Result comparison | Comparison of the paper’s results/findings with the results/findings of other work. |
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "sknow-lab/Qwen2.5-14B-CIC-SciCite"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
system_prompt = """
# CONTEXT #
You are an expert researcher tasked with classifying the intent of a citation in a scientific publication.
########
# OBJECTIVE #
You will be given a sentence containing a citation. You must classify the intent of the citation by assigning it to one of three predefined classes.
########
# CLASS DEFINITIONS #
The three (3) possible classes are the following: "background information", "method", "results comparison."
1 - background information: The citation states, mentions, or points to the background information giving more context about a problem, concept, approach, topic, or importance of the problem in the field.
2 - method: Making use of a method, tool, approach, or dataset.
3 - results comparison: Comparison of the paper’s results/findings with the results/findings of other work.
########
# RESPONSE RULES #
- Analyze only the citation marked with the @@CITATION tag.
- Assign exactly one class to each citation.
- Respond only with the exact name of one of the following classes: "background information", "method", or "results comparison".
- Do not provide any explanation or elaboration.
"""
test_citing_sentence = "Activated PBMC are the basis of the standard PBMC blast assay for HIV-1 neutralization, whereas the various GHOST and HeLa cell lines have all been used in neutralization assays @@CITATION@@."
user_prompt = f"""
{test_citing_sentence}
### Question: Which is the most likely intent for this citation?
a) background information
b) method
c) results comparison
### Answer:
"""
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
# Response: method
Details about the system prompts and query templates can be found in the paper.
There might be a need for a cleanup function to extract the predicted label from the output. You can find ours on GitHub.
@misc{koloveas2025llmspredictcitationintent,
title={Can LLMs Predict Citation Intent? An Experimental Analysis of In-context Learning and Fine-tuning on Open LLMs},
author={Paris Koloveas and Serafeim Chatzopoulos and Thanasis Vergoulis and Christos Tryfonopoulos},
year={2025},
eprint={2502.14561},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.14561},
}