Triangle104
/

LatexMind-2B-Codec-Q5_K_M-GGUF

Image-Text-to-Text

Inference Endpoints

Model card Files Files and versions Community

Triangle104 commited on 8 days ago

Commit

844eca5

·

verified ·

1 Parent(s): fc4792c

Update README.md

Files changed (1) hide show

README.md +54 -0

README.md CHANGED Viewed

@@ -19,6 +19,60 @@ tags:
 This model was converted to GGUF format from [`prithivMLmods/LatexMind-2B-Codec`](https://huggingface.co/prithivMLmods/LatexMind-2B-Codec) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/prithivMLmods/LatexMind-2B-Codec) for more details on the model.
 ## Use with llama.cpp
 Install llama.cpp through brew (works on Mac and Linux)

 This model was converted to GGUF format from [`prithivMLmods/LatexMind-2B-Codec`](https://huggingface.co/prithivMLmods/LatexMind-2B-Codec) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/prithivMLmods/LatexMind-2B-Codec) for more details on the model.
+---
+The LatexMind-2B-Codec model is a fine-tuned version of Qwen2-VL-2B-Instruct, optimized for Optical Character Recognition (OCR), image-to-text conversion, and mathematical expression extraction with LaTeX formatting.
+ This model integrates a conversational approach with visual and textual
+ understanding to handle multi-modal tasks effectively.
+		Key Enhancements:
+SoTA understanding of images with various resolutions & aspect ratios:
+ LatexMind-2B-Codec achieves state-of-the-art performance on visual
+understanding benchmarks, including MathVista, DocVQA, RealWorldQA,
+MTVQA, etc.
+Advanced LaTeX extraction: The model specializes
+ in extracting structured mathematical expressions from images and
+documents, converting them into LaTeX format for precise rendering and
+further computation.
+Understanding long-duration videos (20min+):
+LatexMind-2B-Codec can process videos over 20 minutes long, enabling
+high-quality video-based question answering, mathematical solution
+explanation, and educational content creation.
+Agent capabilities for automated operations:
+With complex reasoning and decision-making abilities, the model can be
+integrated with mobile devices, robots, and assistive technologies to
+automate tasks based on visual and textual inputs.
+Multilingual Support: To serve global users, in
+addition to English and Chinese, the model supports text recognition
+inside images across multiple languages, including European languages,
+Japanese, Korean, Arabic, Vietnamese, etc.
+This model is particularly effective in retrieving mathematical notations and equations
+ from scanned documents, whiteboard images, and handwritten notes,
+ensuring accurate conversion to LaTeX code for further academic and
+computational applications.
+---
 ## Use with llama.cpp
 Install llama.cpp through brew (works on Mac and Linux)