I am experimenting on OCR (Optical Character Recognition) .
which is Read Data from an image , I searched a lot over the web
Found two solutions
1.) Using Google's Tessnet2
2.) Using MODI (Microsoft_Office_Document_Imaging Library)
MODI So far i culdn' try as this library comes with MSOfiice 2007 or XP Which I can not get hold of so far.
I tried Google's Tessnet2 and it gave me 98% correct result but only read Alphabets couldn't read digits though.
Below are the Steps which I have used to use this .
1.) Download Tessnet2 binary from the below link
http://www.pixel-technology.com/freeware/tessnet2/
2.) Add reference of Tessnet2 _32.dll (for 32 bit OS) Tessnet2 _64.dll(for 64 bit os)
in Visual Studio Project Solution
3)Download language data definition file(tesseract-2.00.eng.tar.gz) (I did it for English ) from the below link
http://code.google.com/p/tesseract-ocr/downloads/list
4) UnZip the Above folder and Keep all files in Directory 'tessdata'
Place this directory in your App/bin/debug folder
ex. my case I put it here "D:\TanviDoc\OCRApp\OCRApp\bin\Debug\tessdata"
5.) Below is the sample code to do OCR
which is Read Data from an image , I searched a lot over the web
Found two solutions
1.) Using Google's Tessnet2
2.) Using MODI (Microsoft_Office_Document_Imaging Library)
MODI So far i culdn' try as this library comes with MSOfiice 2007 or XP Which I can not get hold of so far.
I tried Google's Tessnet2 and it gave me 98% correct result but only read Alphabets couldn't read digits though.
Below are the Steps which I have used to use this .
1.) Download Tessnet2 binary from the below link
http://www.pixel-technology.com/freeware/tessnet2/
2.) Add reference of Tessnet2 _32.dll (for 32 bit OS) Tessnet2 _64.dll(for 64 bit os)
in Visual Studio Project Solution
3)Download language data definition file(tesseract-2.00.eng.tar.gz) (I did it for English ) from the below link
http://code.google.com/p/tesseract-ocr/downloads/list
4) UnZip the Above folder and Keep all files in Directory 'tessdata'
Place this directory in your App/bin/debug folder
ex. my case I put it here "D:\TanviDoc\OCRApp\OCRApp\bin\Debug\tessdata"
5.) Below is the sample code to do OCR
using System;
using System.Collections.Generic;
using System.Text;
using System.Drawing;
using System.Threading;
namespace TesseractConsole
{
class Program
{
static void Main(string[] args)
{
Bitmap bmp = new Bitmap(@"C:\Documents and Settings\lak\Desktop/quotes_7a.jpg");
tessnet2.Tesseract ocr = new tessnet2.Tesseract();
// ocr.SetVariable("tessedit_cha/r_whitelist", "0123456789");
ocr.Init(null, "eng", false);
// List<tessnet2.Word> r1 = ocr.DoOCR(bmp, new Rectangle(792, 247, 130, 54));
List<tessnet2.Word> r1 = ocr.DoOCR(bmp, Rectangle.Empty);
int lc = tessnet2.Tesseract.LineCount(r1);
for (int i = 0; i < lc; i++)
{
List<tessnet2.Word> lineWords = tessnet2.Tesseract.GetLineWords(r1, i);
Console.WriteLine("Line {0} = {1}", i, tessnet2.Tesseract.GetLineText(r1, i));
}
foreach (tessnet2.Word word in r1)
Console.WriteLine("{0}:{1}", word.Confidence, word.Text);
}
}
}
6.) Execute this ,you 'll find Image converted into text.
Above technique did not convert Digits from Image to text , which still I have to figure out.
Microsoft_Office_Document_Imaging Library)And Still have to experiment with MODI( and have to figure out which one is the better one
Hi Tanvi, I'm also having problems while extracting digits from images, and guess if you have solved the issue and you could help me.
ReplyDeleteHi , No I tried a lot but could n' do that then
ReplyDeleteI used Microsoft Office Imaging Library and that result are very good
If you wanna use it can check the below post
http://www.dotnetissues.com/search/label/OCR%20using%20MODI
Thank you very much, I've read the other post about MODI just after asking you.
ReplyDeleteMODI is far better than Tessnet. it is easy to use also.
ReplyDeleteIt reads both digits and characters
DEAN
without installing Microsoft office,, how i can use it...
ReplyDeleteany clues