ELAINE-medllm - Build with Llama3-8B

ELAINE (EngLish-jApanese-chINesE)-medLLM is a trilingual (English, Japanese, Chinese) large language mol adapted for the bio-medical domain based on Llama-3-8B. The training dataset was carefully curated in terms of volume and diversity to adapt to the biomedical domain and endow trilingual capability while preserving the knowledge and abilities of the base model. The training follows 2-stage paths: continued pre-training and supervised fine-tuning (SFT). ELAINE-medLLM exhibits superior trilingual capabilities compared to existing bilingual or multilingual medical LLMs without severely sacrificing the base model's capability.

Model Details

Model type: Please refer to Llama 3 Github for details on the model architecture.
Language(s): English, Japanese, Chinese
Library: DeepSpeed
Tokenizer: Please refer to Llama 3 blog for details on the tokenizer.

Model Performance

Evaluation Benchmarks

The evaluation behchmark dataset and evaluation code can be obtained from this Github site. The details of the bechmark are as follows.

English evaluation benchmarks

Japanese evaluation benchmarks

IgakuQA
- We concatenate the original exam data from 2018 to 2022 into a single JSON file.
JJSIMQA
DenQA
- It contains the exam problems from the Japan National Dentistry Examination and their answers in the past two years (from 2023 through 2024) extracted from the official website of the Ministry of Health, Labor and Welfare in Japan (https://www.mhlw.go.jp/stf/english/index.html).

Chinese evaluation benchmarks

Training Datasets

Continued pre-training

For continued pretraining, we collected English, Japanese, and Chinese text in the bio-medical domain. The domain text collected is classified into six categories: 1) scientific papers, 2) medical guidelines, 3) web text related to biomedical, 4) textbook of biomedical, 5) PubMed abstracts, and 6) PubMed Central (PMC) archives. For the Japanese PubMed abstract, we used the original English PubMed abstract translated in Japanese. We used only open-licensed text except for the Japanese biomedical papers from J-STAGE.

Instruction supervised fine-tuning

We collected various conversational QA datasets in the bio-medical domain from different data sources. For English, we used Medical Meadow in MedAlpca, HealthCareMagic, and iClilic dataset used in ChatDoctor. We adapted the augmented QA dataset from HuatuoGPT-2 for Chinese and English. For Japanese, we used existing alpaca datasets in the general domain translated in Japanese.

Results

English benchmark

model_name	MMLU	MedMCQA	MedQA	MedQA-4op	PubMedQA	Avg
google_gemma-7b	63.65	49.81	43.38	48.82	71.52	55.44
meta-llama_Llama-2-7b-hf	45.02	36.84	30.13	36.59	49.90	39.70
meta-llama_Meta-Llama-3-8B	71.22	56.97	52.60	57.89	69.70	61.68
tokyotech-llm_Llama-3-Swallow-8B-v0.1	65.96	51.27	45.90	52.92	61.01	55.41
Llama3-ELAINE-medLLM-8B	67.80	54.55	50.47	57.73	67.27	59.56

Japanese benchmark

model_name	DenQA	IgakuQA	JJSIMQA	Avg
google_gemma-7b	18.60	29.02	18.90	22.17
meta-llama_Llama-2-7b-hf	10.63	17.64	8.13	12.13
meta-llama_Meta-Llama-3-8B	18.88	35.09	23.52	25.83
tokyotech-llm_Llama-3-Swallow-8B-v0.1	22.24	42.21	27.25	30.57
Llama3-ELAINE-medLLM-8B	22.38	44.06	29.45	31.96

Chinese benchmark

model_name	CMExam	MedQA	MedQA-4op	Avg
google_gemma-7b	36.34	40.54	43.03	39.97
meta-llama_Llama-2-7b-hf	24.33	25.02	29.61	26.32
meta-llama_Meta-Llama-3-8B	40.30	44.96	51.15	45.47
tokyotech-llm_Llama-3-Swallow-8B-v0.1	36.19	40.89	48.00	41.69
Llama3-ELAINE-medLLM-8B	46.03	52.50	58.23	52.25

Risks and Limitations

The models released here are still in the early stages of our research and development and have not been tuned to ensure outputs align with human intent and safety considerations.

Acknowledgements

We thank Meta Research for releasing Llama 3 under a generous open license.

Authors

Ken Yano
Zheheng Luo
Jimin Huang
Qianqian Xie
Masaki Asada
Chenhan Yuan
Kailai Yang
Makoto Miwa
Sophia Ananiadou
Jun'ichi Tsujii

Contact

Ken Yano [[email protected]]

How to cite

If you find our work helpful, please feel free to cite these papers.

@inproceedings{yano-etal-2025-elaine,
    title = "{ELAINE}-med{LLM}: Lightweight {E}nglish {J}apanese {C}hinese Trilingual Large Language Model for Bio-medical Domain",
    author = "Yano, Ken  and
      Luo, Zheheng  and
      Huang, Jimin  and
      Xie, Qianqian  and
      Asada, Masaki  and
      Yuan, Chenhan  and
      Yang, Kailai  and
      Miwa, Makoto  and
      Ananiadou, Sophia  and
      Tsujii, Jun{'}ichi",
    editor = "Rambow, Owen  and
      Wanner, Leo  and
      Apidianaki, Marianna  and
      Al-Khalifa, Hend  and
      Eugenio, Barbara Di  and
      Schockaert, Steven",
    booktitle = "Proceedings of the 31st International Conference on Computational Linguistics",
    month = jan,
    year = "2025",
    address = "Abu Dhabi, UAE",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.coling-main.313/",
    pages = "4670--4688",
}

kenyano
/

Llama3-ELAINE-medLLM-8B