PLEX86  x86- Virtual Machine (VM) Program
 Plex86  |  CVS  |  Mailing List  |  Download  |  Linux  |  Newsgroups

OCR to UTF8


Your Ad Here

Your Ad Here

On Mon, 21 Feb 2005 18:43:07 -0700, ray staggered into the Black Sun and said: snip

kooka doesn't do OCR; it calls a number of standalone programs to do that. Yeah, yeah, I know, but precision is important.

Hrm. Well, check this out:

Omnipage 10.0 commercial engine--

Business inventory is likely to expand moderately to take care of a rising volume of business. In no major lines are inventories out of line with sales. Residential construction has been among the weaker elements of...

Typereader 6.0 commercial engine--

Business inventory is likely to expand moderately to take care of a rising volume of business. In no major lines are inventories out of line with sales. Residential construction has been among the weaker elements of...

printer trouble
I'm on SuSE 9.0, my printer is an HP Photosmart 7450. The printer was working fine yesterday. I've...

GNU ocrad 0.10 Free engine--

Business inveotory is likcly to expand modeately to take care o a rising voluloe o business. In no major lines ae inventories out oF lioe with sales. Residential con5truction has been amon6 the weaker elemeots oF...

ghost of a USB hub
I have a 4 year old Sony VAIO tower with two built in USB ports. I recently hooked up a...

...this is on a very clean image with no skew and well-formed characters. The commercial engines used were several years old; ocrad was the latest stable version available (0.10) according to portage. I'll try the other engines kooka uses--maybe tomorrow. Enabling spellcheck with aspell didn't help much--the thing wanted me to go through each mangled word and manually choose an appropriate new word for each one!

OCR is a difficult problem. I have some professional interest in it, and though I've never worked on an OCR engine, I have some familiarity with the problems and pitfalls involved.

Like another poster said, they can match addresses to a canonical database of valid addresses. That narrows the possible matches down considerably, making the problem much easier. No such constraint is present in the general OCR problem.

-- Matt GThere is no Darkness in Eternity-But only Light too dim for us to see Brainbench MVP for Linux Admin mail: TRAP + SPAN don't belong



Your Ad Here

List | Previous | Next

printer trouble

Linux groups from Newsgroups

The #1 Usenet Provider on the Internet

Modem Problem. 3030