mratanusarkar commited on
Commit
e6f2eb8
·
1 Parent(s): 04ea7bb

update: docs with lib sources and notes

Browse files
docs/document_loader/image_loader/fitzpil_img_loader.md CHANGED
@@ -1,3 +1,22 @@
1
  # Load images from PDF files (using Fitz & PIL)
2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ::: medrag_multi_modal.document_loader.image_loader.fitzpil_img_loader
 
1
  # Load images from PDF files (using Fitz & PIL)
2
 
3
+ ??? note "Note"
4
+ **Underlying Library:** `fitz` & `pillow`
5
+
6
+ Extract images from PDF files using `fitz` and `pillow`.
7
+
8
+ Use it in our library with:
9
+ ```python
10
+ from medrag_multi_modal.document_loader.image_loader import FitzPILImageLoader
11
+ ```
12
+
13
+ For more details, please refer to the sources below.
14
+
15
+ **Sources:**
16
+
17
+ - [Docs](https://pymupdf.readthedocs.io/en/latest/intro.html)
18
+ - [GitHub](https://github.com/kastman/fitz)
19
+ - [PyPI](https://pypi.org/project/fitz/)
20
+ - [PyPI](https://pypi.org/project/pillow/)
21
+
22
  ::: medrag_multi_modal.document_loader.image_loader.fitzpil_img_loader
docs/document_loader/image_loader/marker_img_loader.md CHANGED
@@ -1,3 +1,21 @@
1
  # Load images from PDF files (using Marker)
2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ::: medrag_multi_modal.document_loader.image_loader.marker_img_loader
 
1
  # Load images from PDF files (using Marker)
2
 
3
+ ??? note "Note"
4
+ **Underlying Library:** `marker-pdf`
5
+
6
+ Extract images from PDF files using `marker-pdf`.
7
+
8
+ Use it in our library with:
9
+ ```python
10
+ from medrag_multi_modal.document_loader.image_loader import MarkerImageLoader
11
+ ```
12
+
13
+ For details, please refer to the sources below.
14
+
15
+ **Sources:**
16
+
17
+ - [DataLab](https://www.datalab.to)
18
+ - [GitHub](https://github.com/VikParuchuri/marker)
19
+ - [PyPI](https://pypi.org/project/marker-pdf/)
20
+
21
  ::: medrag_multi_modal.document_loader.image_loader.marker_img_loader
docs/document_loader/image_loader/pdf2image_img_loader.md CHANGED
@@ -1,3 +1,26 @@
1
  # Load images from PDF files (using PDF2Image)
2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ::: medrag_multi_modal.document_loader.image_loader.pdf2image_img_loader
 
1
  # Load images from PDF files (using PDF2Image)
2
 
3
+ !!! danger "Warning"
4
+ Unlike other image extraction methods in `document_loader.image_loader`, this loader does not extract embedded images from the PDF.
5
+ Instead, it creates a snapshot image version of each selected page from the PDF.
6
+
7
+ ??? note "Note"
8
+ **Underlying Library:** `pdf2image`
9
+
10
+ Extract images from PDF files using `pdf2image`.
11
+
12
+
13
+ Use it in our library with:
14
+ ```python
15
+ from medrag_multi_modal.document_loader.image_loader import PDF2ImageLoader
16
+ ```
17
+
18
+ For details and available `**kwargs`, please refer to the sources below.
19
+
20
+ **Sources:**
21
+
22
+ - [DataLab](https://www.datalab.to)
23
+ - [GitHub](https://github.com/VikParuchuri/marker)
24
+ - [PyPI](https://pypi.org/project/marker-pdf/)
25
+
26
  ::: medrag_multi_modal.document_loader.image_loader.pdf2image_img_loader
docs/document_loader/image_loader/pdfplumber_img_loader.md CHANGED
@@ -1,3 +1,22 @@
1
  # Load images from PDF files (using PDFPlumber)
2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ::: medrag_multi_modal.document_loader.image_loader.pdfplumber_img_loader
 
1
  # Load images from PDF files (using PDFPlumber)
2
 
3
+ ??? note "Note"
4
+ **Underlying Library:** `pdfplumber`
5
+
6
+ Extract images from PDF files using `pdfplumber`.
7
+
8
+ You can interact with the underlying library and fine-tune the outputs via `**kwargs`.
9
+
10
+ Use it in our library with:
11
+ ```python
12
+ from medrag_multi_modal.document_loader.image_loader import PDFPlumberImageLoader
13
+ ```
14
+
15
+ For details, please refer to the sources below.
16
+
17
+ **Sources:**
18
+
19
+ - [GitHub](https://github.com/jsvine/pdfplumber)
20
+ - [PyPI](https://pypi.org/project/pdfplumber/)
21
+
22
  ::: medrag_multi_modal.document_loader.image_loader.pdfplumber_img_loader
docs/document_loader/image_loader/pymupdf_img_loader.md CHANGED
@@ -1,3 +1,23 @@
1
  # Load images from PDF files (using PyMuPDF)
2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ::: medrag_multi_modal.document_loader.image_loader.pymupdf_img_loader
 
1
  # Load images from PDF files (using PyMuPDF)
2
 
3
+ ??? note "Note"
4
+ **Underlying Library:** `pymupdf`
5
+
6
+ PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
7
+
8
+ You can interact with the underlying library and fine-tune the outputs via `**kwargs`.
9
+
10
+ Use it in our library with:
11
+ ```python
12
+ from medrag_multi_modal.document_loader.image_loader import PyMuPDFImageLoader
13
+ ```
14
+
15
+ For details, please refer to the sources below.
16
+
17
+ **Sources:**
18
+
19
+ - [Docs](https://pymupdf.readthedocs.io/en/latest/)
20
+ - [GitHub](https://github.com/pymupdf/PyMuPDF)
21
+ - [PyPI](https://pypi.org/project/PyMuPDF/)
22
+
23
  ::: medrag_multi_modal.document_loader.image_loader.pymupdf_img_loader