Spaces:
Running
Running
Commit
·
d822059
1
Parent(s):
6526b2f
update: docs with lib sources to help find kwargs
Browse files
docs/document_loader/text_loader/marker_text_loader.md
CHANGED
@@ -1,3 +1,23 @@
|
|
1 |
## Load text from PDF files (using Marker)
|
2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
::: medrag_multi_modal.document_loader.text_loader.marker_text_loader
|
|
|
1 |
## Load text from PDF files (using Marker)
|
2 |
|
3 |
+
??? note "Note"
|
4 |
+
**Underlying Library:** `marker-pdf`
|
5 |
+
|
6 |
+
Convert PDF to markdown quickly and accurately using a pipeline of deep learning models.
|
7 |
+
|
8 |
+
You can interact with the underlying library and fine-tune the outputs via `**kwargs`.
|
9 |
+
|
10 |
+
Use it in our library with:
|
11 |
+
```python
|
12 |
+
from medrag_multi_modal.document_loader.text_loader import MarkerTextLoader
|
13 |
+
```
|
14 |
+
|
15 |
+
For details and available `**kwargs`, please refer to the sources below.
|
16 |
+
|
17 |
+
**Sources:**
|
18 |
+
|
19 |
+
- [DataLab](https://www.datalab.to)
|
20 |
+
- [GitHub](https://github.com/VikParuchuri/marker)
|
21 |
+
- [PyPI](https://pypi.org/project/marker-pdf/)
|
22 |
+
|
23 |
::: medrag_multi_modal.document_loader.text_loader.marker_text_loader
|
docs/document_loader/text_loader/pdfplumber_text_loader.md
CHANGED
@@ -1,3 +1,22 @@
|
|
1 |
## Load text from PDF files (using PDFPlumber)
|
2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
::: medrag_multi_modal.document_loader.text_loader.pdfplumber_text_loader
|
|
|
1 |
## Load text from PDF files (using PDFPlumber)
|
2 |
|
3 |
+
??? note "Note"
|
4 |
+
**Underlying Library:** `pdfplumber`
|
5 |
+
|
6 |
+
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
|
7 |
+
|
8 |
+
You can interact with the underlying library and fine-tune the outputs via `**kwargs`.
|
9 |
+
|
10 |
+
Use it in our library with:
|
11 |
+
```python
|
12 |
+
from medrag_multi_modal.document_loader.text_loader import PDFPlumberTextLoader
|
13 |
+
```
|
14 |
+
|
15 |
+
For details and available `**kwargs`, please refer to the sources below.
|
16 |
+
|
17 |
+
**Sources:**
|
18 |
+
|
19 |
+
- [GitHub](https://github.com/jsvine/pdfplumber)
|
20 |
+
- [PyPI](https://pypi.org/project/pdfplumber/)
|
21 |
+
|
22 |
::: medrag_multi_modal.document_loader.text_loader.pdfplumber_text_loader
|
docs/document_loader/text_loader/pymupdf4llm_text_loader.md
CHANGED
@@ -1,3 +1,23 @@
|
|
1 |
## Load text from PDF files (using PyMuPDF4LLM)
|
2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
::: medrag_multi_modal.document_loader.text_loader.pymupdf4llm_text_loader
|
|
|
1 |
## Load text from PDF files (using PyMuPDF4LLM)
|
2 |
|
3 |
+
??? note "Note"
|
4 |
+
**Underlying Library:** `pymupdf4llm`
|
5 |
+
|
6 |
+
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
|
7 |
+
|
8 |
+
You can interact with the underlying library and fine-tune the outputs via `**kwargs`.
|
9 |
+
|
10 |
+
Use it in our library with:
|
11 |
+
```python
|
12 |
+
from medrag_multi_modal.document_loader.text_loader import PyMuPDF4LLMTextLoader
|
13 |
+
```
|
14 |
+
|
15 |
+
For details and available `**kwargs`, please refer to the sources below.
|
16 |
+
|
17 |
+
**Sources:**
|
18 |
+
|
19 |
+
- [Docs](https://pymupdf.readthedocs.io/en/latest/pymupdf4llm/)
|
20 |
+
- [GitHub](https://github.com/pymupdf/PyMuPDF)
|
21 |
+
- [PyPI](https://pypi.org/project/pymupdf4llm/)
|
22 |
+
|
23 |
::: medrag_multi_modal.document_loader.text_loader.pymupdf4llm_text_loader
|
docs/document_loader/text_loader/pypdf2_text_loader.md
CHANGED
@@ -1,3 +1,23 @@
|
|
1 |
## Load text from PDF files (using PyPDF2)
|
2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
::: medrag_multi_modal.document_loader.text_loader.pypdf2_text_loader
|
|
|
1 |
## Load text from PDF files (using PyPDF2)
|
2 |
|
3 |
+
??? note "Note"
|
4 |
+
**Underlying Library:** `pypdf2`
|
5 |
+
|
6 |
+
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
|
7 |
+
|
8 |
+
You can interact with the underlying library and fine-tune the outputs via `**kwargs`.
|
9 |
+
|
10 |
+
Use it in our library with:
|
11 |
+
```python
|
12 |
+
from medrag_multi_modal.document_loader.text_loader import PyPDF2TextLoader
|
13 |
+
```
|
14 |
+
|
15 |
+
For details and available `**kwargs`, please refer to the sources below.
|
16 |
+
|
17 |
+
**Sources:**
|
18 |
+
|
19 |
+
- [Docs](https://pypdf2.readthedocs.io/en/3.x/)
|
20 |
+
- [GitHub](https://github.com/py-pdf/pypdf)
|
21 |
+
- [PyPI](https://pypi.org/project/PyPDF2/)
|
22 |
+
|
23 |
::: medrag_multi_modal.document_loader.text_loader.pypdf2_text_loader
|