Overview:
GUI frontend to convert Scan Tailor tiff output to a OCR'ed, searchable djvu file.
Screenshots:
Supported OS:
Only tested in Windows XP.
by Nod5 - Free Software GPL3 - AutoHotkey
Known issues:
Old software, not tested in Windows 10 or with latest version of Tesseract.
The OCR step can in some cases miss a character which makes all subsequent OCR words one character off. That bug needs fixing for this tool to be fit for use again.
How to use:
Drag drop a file on a command.
The first command takes a .tiff as input,
operates on all .tiff in dropfile folder and
outputs an OCR'ed, searchable .djvu file.
- for use on .tiff from Scan Tailor
- operates on *all* .tiff in same folder as dropped file
- uses -lossy setting to minimize djvu file size
Dependencies: (try latest windows binary version):
1. DjvuLibre , djvu.sourceforge.net
2. Tesseract 3 , https://github.com/tesseract-ocr/tesseract
check ReadMe/FAQ on site; two downloads needed:
tesseract-3.00.win32.zip
eng.traineddata.gz (unpack and put in subfolder tesseract-ocr essdata )
Command line use:
TiffDjvuOcr.exe "C:.tif" = all .tif in folder C: to .djvu with OCR
TiffDjvuOcr.exe noocr "C:.tif" = all .tif in folder C: to .djvu
TiffDjvuOcr.exe "C:.djvu" = do OCR on a.djvu
TiffDjvuOcr.exe gettif "C:.djvu" = extract multipage .tif from a.djvu
TiffDjvuOcr.exe img "C:.jpg" = single image file to .djvu
TiffDjvuOcr.exe join "C:.djvu" = join all .djvu in C: into one
TiffDjvuOcr.exe noloss "C:.tiff" = all .tif in folder C: to .djvu with no-loss setting (bigger file; use if smaller djvu get characters errors)
md5 hashes:
50bc4f32bd7e1b91311bf725a65dc416 TiffDjvuOcr.ahk
36d2633fdecbe4502fdbb49d0babed06 TiffDjvuOcr.exe
Changelog:
v110305 New commands: to .djvu no-loss , join .djvu , img to .djvu; Autohotkey_L compatible.
v101013 ImageMagick no longer needed; now using Tesseract 3; fixed error at ocr on pages with no text
v100605 Perl no longer needed for processing tesseract output (thanks ewemoa!)
v100404 first release
- Version
- Downloads 181
- File Size
- File Count 1
- Create Date February 21, 2018
- Last update 2018-02-21 17:00:21
- Last Updated February 23, 2018