I gave few comments on Github as gnusupport. I am asking, does it extract text as in the context of how human reads the text, or in the digital context?
Here is in particular book which I would like to convert to text for learning purposes:
https://www.startyourowngoldmine.com/files/books/sampling/Sampling-Series-No-1-3.pdf
I know it is PDF already, but I do not know how to extract text to make new clean PDF.
Getting this type of nonsense is not good with the pdftotext
:
4
Arizona State Bureau of Mines
31dMS HUM MDVS Nl lDd 5/ 3b>Β± SIHL
Kb
s
1
I<0
1
I know that pdfitdown
is for making PDF, not extracting for PDF, but maybe you know the way how to extract conceptually the text from PDF?