mratanusarkar commited on
Commit
d822059
·
1 Parent(s): 6526b2f

update: docs with lib sources to help find kwargs

Browse files
docs/document_loader/text_loader/marker_text_loader.md CHANGED
@@ -1,3 +1,23 @@
1
  ## Load text from PDF files (using Marker)
2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ::: medrag_multi_modal.document_loader.text_loader.marker_text_loader
 
1
  ## Load text from PDF files (using Marker)
2
 
3
+ ??? note "Note"
4
+ **Underlying Library:** `marker-pdf`
5
+
6
+ Convert PDF to markdown quickly and accurately using a pipeline of deep learning models.
7
+
8
+ You can interact with the underlying library and fine-tune the outputs via `**kwargs`.
9
+
10
+ Use it in our library with:
11
+ ```python
12
+ from medrag_multi_modal.document_loader.text_loader import MarkerTextLoader
13
+ ```
14
+
15
+ For details and available `**kwargs`, please refer to the sources below.
16
+
17
+ **Sources:**
18
+
19
+ - [DataLab](https://www.datalab.to)
20
+ - [GitHub](https://github.com/VikParuchuri/marker)
21
+ - [PyPI](https://pypi.org/project/marker-pdf/)
22
+
23
  ::: medrag_multi_modal.document_loader.text_loader.marker_text_loader
docs/document_loader/text_loader/pdfplumber_text_loader.md CHANGED
@@ -1,3 +1,22 @@
1
  ## Load text from PDF files (using PDFPlumber)
2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ::: medrag_multi_modal.document_loader.text_loader.pdfplumber_text_loader
 
1
  ## Load text from PDF files (using PDFPlumber)
2
 
3
+ ??? note "Note"
4
+ **Underlying Library:** `pdfplumber`
5
+
6
+ Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
7
+
8
+ You can interact with the underlying library and fine-tune the outputs via `**kwargs`.
9
+
10
+ Use it in our library with:
11
+ ```python
12
+ from medrag_multi_modal.document_loader.text_loader import PDFPlumberTextLoader
13
+ ```
14
+
15
+ For details and available `**kwargs`, please refer to the sources below.
16
+
17
+ **Sources:**
18
+
19
+ - [GitHub](https://github.com/jsvine/pdfplumber)
20
+ - [PyPI](https://pypi.org/project/pdfplumber/)
21
+
22
  ::: medrag_multi_modal.document_loader.text_loader.pdfplumber_text_loader
docs/document_loader/text_loader/pymupdf4llm_text_loader.md CHANGED
@@ -1,3 +1,23 @@
1
  ## Load text from PDF files (using PyMuPDF4LLM)
2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ::: medrag_multi_modal.document_loader.text_loader.pymupdf4llm_text_loader
 
1
  ## Load text from PDF files (using PyMuPDF4LLM)
2
 
3
+ ??? note "Note"
4
+ **Underlying Library:** `pymupdf4llm`
5
+
6
+ PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
7
+
8
+ You can interact with the underlying library and fine-tune the outputs via `**kwargs`.
9
+
10
+ Use it in our library with:
11
+ ```python
12
+ from medrag_multi_modal.document_loader.text_loader import PyMuPDF4LLMTextLoader
13
+ ```
14
+
15
+ For details and available `**kwargs`, please refer to the sources below.
16
+
17
+ **Sources:**
18
+
19
+ - [Docs](https://pymupdf.readthedocs.io/en/latest/pymupdf4llm/)
20
+ - [GitHub](https://github.com/pymupdf/PyMuPDF)
21
+ - [PyPI](https://pypi.org/project/pymupdf4llm/)
22
+
23
  ::: medrag_multi_modal.document_loader.text_loader.pymupdf4llm_text_loader
docs/document_loader/text_loader/pypdf2_text_loader.md CHANGED
@@ -1,3 +1,23 @@
1
  ## Load text from PDF files (using PyPDF2)
2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ::: medrag_multi_modal.document_loader.text_loader.pypdf2_text_loader
 
1
  ## Load text from PDF files (using PyPDF2)
2
 
3
+ ??? note "Note"
4
+ **Underlying Library:** `pypdf2`
5
+
6
+ A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
7
+
8
+ You can interact with the underlying library and fine-tune the outputs via `**kwargs`.
9
+
10
+ Use it in our library with:
11
+ ```python
12
+ from medrag_multi_modal.document_loader.text_loader import PyPDF2TextLoader
13
+ ```
14
+
15
+ For details and available `**kwargs`, please refer to the sources below.
16
+
17
+ **Sources:**
18
+
19
+ - [Docs](https://pypdf2.readthedocs.io/en/3.x/)
20
+ - [GitHub](https://github.com/py-pdf/pypdf)
21
+ - [PyPI](https://pypi.org/project/PyPDF2/)
22
+
23
  ::: medrag_multi_modal.document_loader.text_loader.pypdf2_text_loader