Overview:
GUI frontend to convert Scan Tailor tiff output to a OCR'ed, searchable djvu file.
Screenshots:
Supported OS:
Only tested in Windows XP.
by Nod5 - Free Software GPL3 - AutoHotkey
Known issues:
Old software, not tested in Windows 10 or with latest version of Tesseract.
The OCR step can in some cases miss a character which makes all subsequent OCR words one character off. That bug needs fixing for this tool to be fit for use again.
How to use:
Drag drop a file on a command.
The first command takes a .tiff as input,
operates on all .tiff in dropfile folder and
outputs an OCR'ed, searchable .djvu file.
- for use on .tiff from Scan Tailor
- operates on *all* .tiff in same folder as dropped file
- uses -lossy setting to minimize djvu file size
Dependencies: (try latest windows binary version):
1. DjvuLibre ,  djvu.sourceforge.net
2. Tesseract 3 ,  https://github.com/tesseract-ocr/tesseract
check ReadMe/FAQ on site; two downloads needed:
tesseract-3.00.win32.zip
eng.traineddata.gz (unpack and put in subfolder tesseract-ocr	essdata )
Command line use:
 TiffDjvuOcr.exe "C:.tif"     = all .tif in folder C: to .djvu with OCR
 TiffDjvuOcr.exe noocr "C:.tif"  = all .tif in folder C: to .djvu
 TiffDjvuOcr.exe "C:.djvu"     = do OCR on a.djvu
 TiffDjvuOcr.exe gettif "C:.djvu"    = extract multipage .tif from a.djvu
 TiffDjvuOcr.exe img "C:.jpg"     = single image file to .djvu
TiffDjvuOcr.exe join "C:.djvu"     = join all .djvu in C: into one
 TiffDjvuOcr.exe noloss "C:.tiff"     = all .tif in folder C: to .djvu with no-loss setting (bigger file; use if smaller djvu get characters errors)
md5 hashes:
50bc4f32bd7e1b91311bf725a65dc416 TiffDjvuOcr.ahk
36d2633fdecbe4502fdbb49d0babed06 TiffDjvuOcr.exe
Changelog:
v110305 New commands: to .djvu no-loss , join .djvu , img to .djvu; Autohotkey_L compatible.
v101013 ImageMagick no longer needed; now using Tesseract 3; fixed error at ocr on pages with no text
v100605 Perl no longer needed for processing tesseract output (thanks ewemoa!)
v100404 first release
- Version
- Downloads 197
- File Size
- File Count 1
- Create Date February 21, 2018
- Last update 2018-02-21 17:00:21
- Last Updated February 23, 2018

 
					
