Qwen2.5-14B-CIC-SciCite

A fine-tuned model for Citation Intent Classification, based on Qwen 2.5 14B Instruct and trained on the SciCite dataset.

GGUF Version: https://huggingface.co/sknow-lab/Qwen2.5-14B-CIC-SciCite-GGUF

SciCite classes

Class	Definition
Background information	The citation states, mentions, or points to the background information giving more context about a problem, concept, approach, topic, or importance of the problem in the field.
Method	Making use of a method, tool, approach or dataset.
Result comparison	Comparison of the paper’s results/findings with the results/findings of other work.

Quickstart

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "sknow-lab/Qwen2.5-14B-CIC-SciCite"

model = AutoModelForCausalLM.from_pretrained(
  model_name,
  torch_dtype="auto",
  device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

system_prompt = """
# CONTEXT #
You are an expert researcher tasked with classifying the intent of a citation in a scientific publication.

########

# OBJECTIVE # 
You will be given a sentence containing a citation. You must classify the intent of the citation by assigning it to one of three predefined classes.

########

# CLASS DEFINITIONS #
The three (3) possible classes are the following: "background information", "method", "results comparison."

1 - background information: The citation states, mentions, or points to the background information giving more context about a problem, concept, approach, topic, or importance of the problem in the field.
2 - method: Making use of a method, tool, approach, or dataset.
3 - results comparison: Comparison of the paper’s results/findings with the results/findings of other work.

########

# RESPONSE RULES #
- Analyze only the citation marked with the @@CITATION tag.
- Assign exactly one class to each citation.
- Respond only with the exact name of one of the following classes: "background information", "method", or "results comparison".
- Do not provide any explanation or elaboration.
"""

test_citing_sentence = "Activated PBMC are the basis of the standard PBMC blast assay for HIV-1 neutralization, whereas the various GHOST and HeLa cell lines have all been used in neutralization assays @@CITATION@@."

user_prompt = f"""
{test_citing_sentence}
### Question: Which is the most likely intent for this citation?
a) background information
b) method
c) results comparison 
### Answer:
"""

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
# Response: method

Details about the system prompts and query templates can be found in the paper.

There might be a need for a cleanup function to extract the predicted label from the output. You can find ours on GitHub.

Citation

@misc{koloveas2025llmspredictcitationintent,
      title={Can LLMs Predict Citation Intent? An Experimental Analysis of In-context Learning and Fine-tuning on Open LLMs}, 
      author={Paris Koloveas and Serafeim Chatzopoulos and Thanasis Vergoulis and Christos Tryfonopoulos},
      year={2025},
      eprint={2502.14561},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.14561}, 
}

sknow-lab
/

Qwen2.5-14B-CIC-SciCite

Qwen2.5-14B-CIC-SciCite

SciCite classes

Quickstart

Citation

Model tree for sknow-lab/Qwen2.5-14B-CIC-SciCite

Dataset used to train sknow-lab/Qwen2.5-14B-CIC-SciCite

Collection including sknow-lab/Qwen2.5-14B-CIC-SciCite

CitationIntentLLM