Spaces:
Sleeping
Sleeping
Commit
·
e6f2eb8
1
Parent(s):
04ea7bb
update: docs with lib sources and notes
Browse files- docs/document_loader/image_loader/fitzpil_img_loader.md +19 -0
- docs/document_loader/image_loader/marker_img_loader.md +18 -0
- docs/document_loader/image_loader/pdf2image_img_loader.md +23 -0
- docs/document_loader/image_loader/pdfplumber_img_loader.md +19 -0
- docs/document_loader/image_loader/pymupdf_img_loader.md +20 -0
docs/document_loader/image_loader/fitzpil_img_loader.md
CHANGED
@@ -1,3 +1,22 @@
|
|
1 |
# Load images from PDF files (using Fitz & PIL)
|
2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
::: medrag_multi_modal.document_loader.image_loader.fitzpil_img_loader
|
|
|
1 |
# Load images from PDF files (using Fitz & PIL)
|
2 |
|
3 |
+
??? note "Note"
|
4 |
+
**Underlying Library:** `fitz` & `pillow`
|
5 |
+
|
6 |
+
Extract images from PDF files using `fitz` and `pillow`.
|
7 |
+
|
8 |
+
Use it in our library with:
|
9 |
+
```python
|
10 |
+
from medrag_multi_modal.document_loader.image_loader import FitzPILImageLoader
|
11 |
+
```
|
12 |
+
|
13 |
+
For more details, please refer to the sources below.
|
14 |
+
|
15 |
+
**Sources:**
|
16 |
+
|
17 |
+
- [Docs](https://pymupdf.readthedocs.io/en/latest/intro.html)
|
18 |
+
- [GitHub](https://github.com/kastman/fitz)
|
19 |
+
- [PyPI](https://pypi.org/project/fitz/)
|
20 |
+
- [PyPI](https://pypi.org/project/pillow/)
|
21 |
+
|
22 |
::: medrag_multi_modal.document_loader.image_loader.fitzpil_img_loader
|
docs/document_loader/image_loader/marker_img_loader.md
CHANGED
@@ -1,3 +1,21 @@
|
|
1 |
# Load images from PDF files (using Marker)
|
2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
::: medrag_multi_modal.document_loader.image_loader.marker_img_loader
|
|
|
1 |
# Load images from PDF files (using Marker)
|
2 |
|
3 |
+
??? note "Note"
|
4 |
+
**Underlying Library:** `marker-pdf`
|
5 |
+
|
6 |
+
Extract images from PDF files using `marker-pdf`.
|
7 |
+
|
8 |
+
Use it in our library with:
|
9 |
+
```python
|
10 |
+
from medrag_multi_modal.document_loader.image_loader import MarkerImageLoader
|
11 |
+
```
|
12 |
+
|
13 |
+
For details, please refer to the sources below.
|
14 |
+
|
15 |
+
**Sources:**
|
16 |
+
|
17 |
+
- [DataLab](https://www.datalab.to)
|
18 |
+
- [GitHub](https://github.com/VikParuchuri/marker)
|
19 |
+
- [PyPI](https://pypi.org/project/marker-pdf/)
|
20 |
+
|
21 |
::: medrag_multi_modal.document_loader.image_loader.marker_img_loader
|
docs/document_loader/image_loader/pdf2image_img_loader.md
CHANGED
@@ -1,3 +1,26 @@
|
|
1 |
# Load images from PDF files (using PDF2Image)
|
2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
::: medrag_multi_modal.document_loader.image_loader.pdf2image_img_loader
|
|
|
1 |
# Load images from PDF files (using PDF2Image)
|
2 |
|
3 |
+
!!! danger "Warning"
|
4 |
+
Unlike other image extraction methods in `document_loader.image_loader`, this loader does not extract embedded images from the PDF.
|
5 |
+
Instead, it creates a snapshot image version of each selected page from the PDF.
|
6 |
+
|
7 |
+
??? note "Note"
|
8 |
+
**Underlying Library:** `pdf2image`
|
9 |
+
|
10 |
+
Extract images from PDF files using `pdf2image`.
|
11 |
+
|
12 |
+
|
13 |
+
Use it in our library with:
|
14 |
+
```python
|
15 |
+
from medrag_multi_modal.document_loader.image_loader import PDF2ImageLoader
|
16 |
+
```
|
17 |
+
|
18 |
+
For details and available `**kwargs`, please refer to the sources below.
|
19 |
+
|
20 |
+
**Sources:**
|
21 |
+
|
22 |
+
- [DataLab](https://www.datalab.to)
|
23 |
+
- [GitHub](https://github.com/VikParuchuri/marker)
|
24 |
+
- [PyPI](https://pypi.org/project/marker-pdf/)
|
25 |
+
|
26 |
::: medrag_multi_modal.document_loader.image_loader.pdf2image_img_loader
|
docs/document_loader/image_loader/pdfplumber_img_loader.md
CHANGED
@@ -1,3 +1,22 @@
|
|
1 |
# Load images from PDF files (using PDFPlumber)
|
2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
::: medrag_multi_modal.document_loader.image_loader.pdfplumber_img_loader
|
|
|
1 |
# Load images from PDF files (using PDFPlumber)
|
2 |
|
3 |
+
??? note "Note"
|
4 |
+
**Underlying Library:** `pdfplumber`
|
5 |
+
|
6 |
+
Extract images from PDF files using `pdfplumber`.
|
7 |
+
|
8 |
+
You can interact with the underlying library and fine-tune the outputs via `**kwargs`.
|
9 |
+
|
10 |
+
Use it in our library with:
|
11 |
+
```python
|
12 |
+
from medrag_multi_modal.document_loader.image_loader import PDFPlumberImageLoader
|
13 |
+
```
|
14 |
+
|
15 |
+
For details, please refer to the sources below.
|
16 |
+
|
17 |
+
**Sources:**
|
18 |
+
|
19 |
+
- [GitHub](https://github.com/jsvine/pdfplumber)
|
20 |
+
- [PyPI](https://pypi.org/project/pdfplumber/)
|
21 |
+
|
22 |
::: medrag_multi_modal.document_loader.image_loader.pdfplumber_img_loader
|
docs/document_loader/image_loader/pymupdf_img_loader.md
CHANGED
@@ -1,3 +1,23 @@
|
|
1 |
# Load images from PDF files (using PyMuPDF)
|
2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
::: medrag_multi_modal.document_loader.image_loader.pymupdf_img_loader
|
|
|
1 |
# Load images from PDF files (using PyMuPDF)
|
2 |
|
3 |
+
??? note "Note"
|
4 |
+
**Underlying Library:** `pymupdf`
|
5 |
+
|
6 |
+
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
|
7 |
+
|
8 |
+
You can interact with the underlying library and fine-tune the outputs via `**kwargs`.
|
9 |
+
|
10 |
+
Use it in our library with:
|
11 |
+
```python
|
12 |
+
from medrag_multi_modal.document_loader.image_loader import PyMuPDFImageLoader
|
13 |
+
```
|
14 |
+
|
15 |
+
For details, please refer to the sources below.
|
16 |
+
|
17 |
+
**Sources:**
|
18 |
+
|
19 |
+
- [Docs](https://pymupdf.readthedocs.io/en/latest/)
|
20 |
+
- [GitHub](https://github.com/pymupdf/PyMuPDF)
|
21 |
+
- [PyPI](https://pypi.org/project/PyMuPDF/)
|
22 |
+
|
23 |
::: medrag_multi_modal.document_loader.image_loader.pymupdf_img_loader
|