Python Tesseract.clear Examples

Programming Language: Python

Namespace/Package Name: tesserwrap

Class/Type: Tesseract

Method/Function: clear

Examples at hotexamples.com: 4

Python Tesseract.clear - 4 examples found. These are the top rated real world Python examples of tesserwrap.Tesseract.clear extracted from open source projects. You can rate examples to help us improve the quality of examples.

Frequently Used Methods

Show Hide

Tesseract(9)

ocr_image(7)

clear(4)

get_text(3)

set_page_seg_mode(3)

set_variable(3)

get_mean_confidence(2)

get_utf8_text(2)

set_image(2)

get_words(1)

Example #1

Show file

File: tesseract.py Project: CodeForAfrica/aleph

def extract_image_data(data, languages=None):
    """Extract text from a binary string of data."""
    tessdata_prefix = get_config('TESSDATA_PREFIX')
    if tessdata_prefix is None:
        raise IngestorException("TESSDATA_PREFIX is not set, OCR won't work.")
    languages = get_languages_iso3(languages)
    text = Cache.get_ocr(data, languages)
    if text is not None:
        return text
    try:
        img = Image.open(StringIO(data))
    except DecompressionBombWarning as dce:
        log.debug("Image too large: %", dce)
        return None
    except IOError as ioe:
        log.info("Unknown image format: %r", ioe)
        return None
    # TODO: play with contrast and sharpening the images.
    extractor = Tesseract(tessdata_prefix, lang=languages)
    extractor.set_page_seg_mode(PageSegMode.PSM_AUTO_OSD)
    text = extractor.ocr_image(img)
    extractor.clear()
    log.debug('OCR done: %s, %s characters extracted',
              languages, len(text))
    Cache.set_ocr(data, languages, text)
    return text

Example #2

Show file

File: books.py Project: haf/making-the-computer-see-ndc-2014

def ocr_text(img):
    '''Perform OCR on the image.'''
    tr = Tesseract(lang='eng')
    tr.clear()
    pil_image = pil.Image.fromarray(img)
    tr.set_image(pil_image)
    utf8_text = tr.get_text()
    return utf8_text

Example #3

Show file

File: ocr.py Project: amnet04/ALECMAPREADER1

def ocr(img,idioma):
    ocr_img = Image.fromarray(img)
    ocr = Tesseract(lang=idioma)
    ocr.set_image(ocr_img)
    pattern = re.compile('[a-zA-Z0-9]')
    text = ocr.get_utf8_text()
    text = text.splitlines()
    text = [x for x in text if x != '']
    text = [x for x in text if pattern.search(x)]
    ocr.clear()
    return (text)

Example #4

Show file

File: scratchpad.py Project: haf/making-the-computer-see-ndc-2014

def ocr_text(img):
    tr = Tesseract(lang='eng')
    tr.clear()
    pil_image = pil.Image.fromarray(img)
    # Turn off OCR word dictionaries
    tr.set_variable('load_system_dawg', "F")
    tr.set_variable('load_freq_dawg', "F")
    tr.set_variable('-psm', "7") # treat image as single line
    tr.set_variable('tessedit_char_whitelist', "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789")
    tr.set_image(pil_image)
    utf8_text = tr.get_text()
    return unicode(utf8_text)