Automatic text recoginition (OCR) for images inside PDF documents, Powerpoint presentations and ZIP archives

Text stored in image formats like JPG, PNG, TIFF or GIF (i.e. scans, photos or screenshots) can not be found by standard fulltext search. So the search engine Open Semantic Search enriches meta data of images like filename, format and size with results from automatic text recognition (OCR).

Since many information is not searchable by fulltext search because its in graphical formats embedded in PDF documents or Powerpoint presentations (i.e. screenshots instead of text format), the enhancer OCR of Open Semantic Search extracts images from PDF files for automatic textrecognition (OCR), too.

With the release 14.08.10 of Open Semantic Search now extracts and enriches images like scans and screenshots not only from PDF documents but from Powerpoint presentations and ZIP archives, too.