Spaces:

geekyrakshit
/

medrag

Running

mratanusarkar commited on Oct 17, 2024

Commit

d822059

1 Parent(s): 6526b2f

update: docs with lib sources to help find kwargs

Files changed (4) hide show

docs/document_loader/text_loader/marker_text_loader.md CHANGED Viewed

@@ -1,3 +1,23 @@
 ## Load text from PDF files (using Marker)
 ::: medrag_multi_modal.document_loader.text_loader.marker_text_loader

 ## Load text from PDF files (using Marker)
+??? note "Note"
+    **Underlying Library:** `marker-pdf`
+    Convert PDF to markdown quickly and accurately using a pipeline of deep learning models.
+    You can interact with the underlying library and fine-tune the outputs via `**kwargs`.
+    Use it in our library with:
+    ```python
+    from medrag_multi_modal.document_loader.text_loader import MarkerTextLoader
+    ```
+    For details and available `**kwargs`, please refer to the sources below.
+    **Sources:**
+    - [DataLab](https://www.datalab.to)
+    - [GitHub](https://github.com/VikParuchuri/marker)
+    - [PyPI](https://pypi.org/project/marker-pdf/)
 ::: medrag_multi_modal.document_loader.text_loader.marker_text_loader

docs/document_loader/text_loader/pdfplumber_text_loader.md CHANGED Viewed

@@ -1,3 +1,22 @@
 ## Load text from PDF files (using PDFPlumber)
 ::: medrag_multi_modal.document_loader.text_loader.pdfplumber_text_loader

 ## Load text from PDF files (using PDFPlumber)
+??? note "Note"
+    **Underlying Library:** `pdfplumber`
+    Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
+    You can interact with the underlying library and fine-tune the outputs via `**kwargs`.
+    Use it in our library with:
+    ```python
+    from medrag_multi_modal.document_loader.text_loader import PDFPlumberTextLoader
+    ```
+    For details and available `**kwargs`, please refer to the sources below.
+    **Sources:**
+    - [GitHub](https://github.com/jsvine/pdfplumber)
+    - [PyPI](https://pypi.org/project/pdfplumber/)
 ::: medrag_multi_modal.document_loader.text_loader.pdfplumber_text_loader

docs/document_loader/text_loader/pymupdf4llm_text_loader.md CHANGED Viewed

@@ -1,3 +1,23 @@
 ## Load text from PDF files (using PyMuPDF4LLM)
 ::: medrag_multi_modal.document_loader.text_loader.pymupdf4llm_text_loader

 ## Load text from PDF files (using PyMuPDF4LLM)
+??? note "Note"
+    **Underlying Library:** `pymupdf4llm`
+    PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
+    You can interact with the underlying library and fine-tune the outputs via `**kwargs`.
+    Use it in our library with:
+    ```python
+    from medrag_multi_modal.document_loader.text_loader import PyMuPDF4LLMTextLoader
+    ```
+    For details and available `**kwargs`, please refer to the sources below.
+    **Sources:**
+    - [Docs](https://pymupdf.readthedocs.io/en/latest/pymupdf4llm/)
+    - [GitHub](https://github.com/pymupdf/PyMuPDF)
+    - [PyPI](https://pypi.org/project/pymupdf4llm/)
 ::: medrag_multi_modal.document_loader.text_loader.pymupdf4llm_text_loader

docs/document_loader/text_loader/pypdf2_text_loader.md CHANGED Viewed

@@ -1,3 +1,23 @@
 ## Load text from PDF files (using PyPDF2)
 ::: medrag_multi_modal.document_loader.text_loader.pypdf2_text_loader

 ## Load text from PDF files (using PyPDF2)
+??? note "Note"
+    **Underlying Library:** `pypdf2`
+    A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
+    You can interact with the underlying library and fine-tune the outputs via `**kwargs`.
+    Use it in our library with:
+    ```python
+    from medrag_multi_modal.document_loader.text_loader import PyPDF2TextLoader
+    ```
+    For details and available `**kwargs`, please refer to the sources below.
+    **Sources:**
+    - [Docs](https://pypdf2.readthedocs.io/en/3.x/)
+    - [GitHub](https://github.com/py-pdf/pypdf)
+    - [PyPI](https://pypi.org/project/PyPDF2/)
 ::: medrag_multi_modal.document_loader.text_loader.pypdf2_text_loader