Papers
arxiv:2408.07246

Seeing and Understanding: Bridging Vision with Chemical Knowledge Via ChemVLM

Published on Aug 14
· Submitted by qq8933 on Aug 15
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

In this technical report, we propose ChemVLM, the first open-source multimodal large language model dedicated to the fields of chemistry, designed to address the incompatibility between chemical image understanding and text analysis. Built upon the VIT-MLP-LLM architecture, we leverage ChemLLM-20B as the foundational large model, endowing our model with robust capabilities in understanding and utilizing chemical text knowledge. Additionally, we employ InternVIT-6B as a powerful image encoder. We have curated high-quality data from the chemical domain, including molecules, reaction formulas, and chemistry examination data, and compiled these into a bilingual multimodal question-answering dataset. We test the performance of our model on multiple open-source benchmarks and three custom evaluation sets. Experimental results demonstrate that our model achieves excellent performance, securing state-of-the-art results in five out of six involved tasks. Our model can be found at https://huggingface.co/AI4Chem/ChemVLM-26B.

Community

Paper author Paper submitter

🚀 Introducing ChemVLM, the first open-source multimodal large language model dedicated to chemistry!
🌟Comparable performances with commercial models or specific OCR model but with dialogue capabilities!
✨2B/26B Models Here! https://huggingface.co/AI4Chem/ChemVLM-26B

Paper author Paper submitter
This comment has been hidden

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 2

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2408.07246 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2408.07246 in a Space README.md to link it from this page.

Collections including this paper 2