Tesseract-ios is an Objective-C wrapper for Tesseract OCR.
This project couldn't exist without the Ângelo Suzuki's blog post. A lot of code came from his article.
- iOS SDK 6.0, iOS 5.0+ (there is no support for armv6)
- Tesseract and Leptonica libraries from the tesseract-ios-lib repo.
- Download tesseract-ios-lib and put it somewhere in your project.
- Put the
Classes
content (from this repo) somewhere in your project. - Go to your project settings, and ensure that
C++ Standard Library => Compiler Default
.
Here is the default workflow to extract text from an image:
- Instantiate Tesseract with data path and language
- Set the delegate (only needed for recognizing with progress update)
- Set variables (character set, …)
- Set the image to analyze
- Start recognition
- Get recognized text
- Clear
#import "Tesseract.h"
Tesseract* tesseract = [[Tesseract alloc] initWithDataPath:@"tessdata" language:@"eng"];
[tesseract setVariableValue:@"0123456789" forKey:@"tessedit_char_whitelist"];
[tesseract setImage:[UIImage imageNamed:@"image_sample.jpg"]];
[tesseract recognize];
NSLog(@"%@", [tesseract recognizedText]);
[tesseract clear];
#import "Tesseract.h"
- (void) viewDidLoad {
Tesseract* tesseract = [[Tesseract alloc] initWithDataPath:@"tessdata" language:@"eng"];
[tesseract setTessDelegate:self];
[tesseract setVariableValue:@"0123456789" forKey:@"tessedit_char_whitelist"];
[tesseract setImage:[UIImage imageNamed:@"image_sample.jpg"]];
[tesseract recognizeWithProgressUpdate];
NSLog(@"%@", [tesseract recognizedText]);
[tesseract clear];
}
/*
* this is delegate is getting called everytime the progress changes
*/
- (void)progressUpdate:(NSUInteger)progress {
NSLog(@"progress: %d%%",progress);
}
- (id)initWithDataPath:(NSString *)dataPath language:(NSString *)language
Initialize a new Tesseract
instance.
dataPath
: a relative path from the application bundle to the.traineddata
files. You can find these files from the tesseract downloads section.language
: language used for recognition. Ex:eng
. Tesseract will search for aeng.traineddata
file in thedataPath
directory.
Returns nil
if instanciation failed.
- (void)setVariableValue:(NSString *)value forKey:(NSString *)key
Set Tesseract variable key
to value
. See http://www.sk-spell.sk.cx/tesseract-ocr-en-variables for a complete (but not up-to-date) list.
For instance, use tessedit_char_whitelist
to restrict characters to a specific set.
- (void)setImage:(UIImage *)image
Set the image to recognize.
- (BOOL)setLanguage:(NSString *)language
Override the language defined with -initWithDataPath:language:
.
- (BOOL)recognize
Start text recognition. You might want to launch this process in background with NSObject
's -performSelectorInBackground:withObject:
.
- (BOOL)recognizeWithProgressUpdate
Start text recognition with progress update. Implement - (void)progressUpdate:(NSUInteger)progress
delegate in order to show a progress bar or to log the progress.
- (NSString *)recognizedText
Get the text extracted from the image.
- (void) clear
Clears Tesseract object after text has been recognized from image. Preventing memory leaks.
- (void)progressUpdate:(NSUInteger)progress
Function that will be called on progress update during recognizing. progress range is [0...100]