Nabeel Sulieman

OCR with Tesseract


A couple of weeks ago I had the need to extract text from images. I did a little research and found Tesseract to be a very easy to use tool for this purpose.

The installation is simple. apt-get install tesseract-ocr or brew install tesseract on Linux or Mac respectively. As an example, here is an image:

Hello World!

The result:

$ tesseract test-text.jpg - -l eng
Helle world /

My exclamation point was wrongly interpreted as a slash, but otherwise it worked quite well!