Hello, I’m starting a new course and the materials are all in PDF viewable only, for comody sake i use it a lot for online services to convert image to text, even ChatGpt 4 does it, does somebody knows some king of self hosted ocr converter? To convert screenshots into text?
Tnx
tesseract-ocr
? You can download it via apt or something similar.paperless-ngx has built in ocr but I don’t think it would fit your needs
I will check it up
Windows 11 has this built in if you take a screenshot
Didn’t know that,i use flameshot for screenshots,i will take a look thnx
You could spin up paperless-ngx. Or use pdf24 creator. Beware paperless consume will delete the file.
I used paperless-ngx before and it works pretty good.
I will check it up, i have Stirlingpdf and I see it also has ocr support
Nextcloud AIO (all-in-one) comes with full text search installed, which brings tesseract to nextcloud. so you can let tesseract-ocr run over all documents and then they will be searchable with Elasticsearch.
I’m not sure I understand you correctly. Do you want to apply OCR to PDFs or to Screenshots?
For PDFs there’s the excellent ocrmypdf which paperless-ngx uses under the hood.