You are on page 1of 5

SilkTest provides a 4Test include file, OCR.

inc, that contains two 4Test functions to perform


optical character recognition (OCR). One function converts a bitmap file to text, while the
other allows you to pass in a window identifier and extract the text in the window (or a region
of the window). To use the 4Test OCR functions, include OCR.inc in your test script (or
include file) or in the 'Use Files' field in the Runtime Options dialog. To include the function
documentation in the library browser, add OCR.txt to the 'Help files for library browser field' in
the General Options dialog.

The 4Test functions call functions in a Segue DLL that extends the third-party Textract DLL
from Structu Rise. The Textract DLL uses a font pattern database file to recognize text for
certain sizes and styles of certain fonts that are specified in an initialization file. Although a
default version of the font pattern database is installed with SilkTest, it is strongly
recommended that you configure the font pattern database to include the fonts used by your
application, as explained in the INSTRUCTIONS FOR GENERATING THE FONT PATTERN
DATABASE section toward the bottom of this document.

THE SILKTEST OCR MODULE

The following files comprise the OCR module provided with SilkTest. All of the files
must reside in the SilkTest directory:

• Exgui.exe: Utility to generate the font pattern database. Can also be used as
a standalone utility for text recognition.
• OCR.inc: The 4Test include file that provides high-level 4Test functions
based on the functions in SgOcrLib.dll.
• SgOcrLib.dll: Segue extension of Textract.dll that provides higher level
functions.
• SgOcrLib.inc: Declares the functions in SgOcrLib.dll so that they can be
called in 4Test.
• SgOcrPattern.pat: Font pattern database that controls text conversion.
• Textract.dll: Textract OCR DLL provided by Structu Rise.
• Textract.ini: Initialization file for the Textract OCR DLL. Two sections contain
settings that may require modification:
o In the [Options] section, the "Database Path" setting must point to the
OCR pattern file, '<SilkTest directory>\SgOcrPattern.pat'.
o The [Recognition] section contains settings that control which fonts
are used to generate the pattern file.

THE 4TEST OCR FUNCTIONS

OCR.inc includes the following 4Test functions for OCR:

function: iRet = OcrGetTextFromBmp (sOcrText, sBitmapFile)


returns: iRet: Result length. If conversion fails, then an E_OCR
exception will be raised. INTEGER.
parameter: sOcrText: The result of converting the bitmap to text.
NULL if conversion failed. OUT. STRING.
parameter: sBitmapFile: The bitmap (.bmp) file to convert. STRING.
notes: Convert a bitmap file into text. The conversion uses the
preconfigured pattern file, which is specified

in the textract.ini file (Database Path setting).

function: iRet = OcrGetTextFromWnd (sText, wWindow[, rCapture])


returns: iRet: Result length. If conversion fails, then an E_OCR
exception will be raised. INTEGER.
parameter: sText: The result of converting a bitmap of the window to
text. NULL if conversion failed. OUT. STRING.

Page 1 of 5
parameter: wWindow: The window that will be the source of the bitmap
to be converted to text. WINDOW.
parameter: rCapture: The capture region. OPTIONAL. RECT.
notes: Convert a bitmap of a window (or area within a window)
into text. The conversion uses the preconfigured
pattern file, which is specified in the textract.ini file
(Database Path setting).

The sample test script (ocrtest.t) includes a testcase (shown below) that extracts the text from
a Microsoft Word document. Microsoft Office controls are recognized as custom windows
(CustomWin) by SilkTest, so you cannot use the 4Test GetText() method to get the text.
However, you can use the OcrGetTextFromWnd() function to capture a bitmap of the
document window and convert it to text. Notice that if necessary, the testcase will scroll
through the document and capture multiple bitmaps.

[+] testcase GetOcrAPIDocText (STRING sDocument)


[ ] MSWord.SetActive ()
[ ]
[ ] // Open the specified document.
[ ] MSWord.OpenDoc (sDocument)
[ ]
[ ] // Capture each page in succession.
[ ]
[ ] // Start at the top and page down until the bottom is reached
[ ] LIST OF STRING lsResults = {}
[ ] STRING sResult = NULL
[ ] INTEGER iMaxPos, iCurPos, iLastPos = -1
[ ] INTEGER iResLen, iTotalLen = 0
[ ]
[+] withoptions
[ ] BindAgentOption (OPT_REQUIRE_ACTIVE, FALSE)
[ ] BindAgentOption (OPT_VERIFY_ACTIVE, FALSE)
[ ]
[ ] TheDoc.ScrollBarV.ScrollToMin ()
[ ] iCurPos = TheDoc.ScrollBarV.GetPosition ()
[ ] iMaxPos = TheDoc.ScrollBarV.GetRange ().iMax
[ ]
[+] while TRUE
[ ] // If we are capturing the first page, then eliminate the
flashing cursor
[ ] // by highlighting the current character. Otherwise, page
down to capture
[ ] // the next page.
[ ] MSWord.SetActive ()
[+] if sResult == NULL
[ ] // First page
[ ] MSWord.TypeKeys ("<Shift-Right>")
[+] else
[ ] // Page down
[+] withoptions
[ ] BindAgentOption (OPT_REQUIRE_ACTIVE, FALSE)
[ ] BindAgentOption (OPT_VERIFY_ACTIVE, FALSE)
[ ] TheDoc.ScrollBarV.ScrollByPage (1)
[ ] iLastPos = iCurPos
[ ] iCurPos = TheDoc.ScrollBarV.GetPosition ()
[ ]
[ ] // If scrolling did not change the scrollbar
position, then
[ ] // we have reached the bottom. Also, if we scrolled
up instead

Page 2 of 5
[ ] // of down by paging down, then we have reached the
bottom.
[+] if iCurPos <= iLastPos
[ ] break
[ ]
[ ] // Convert the bitmap for the current view.
[ ] iResLen = OcrGetTextFromWnd (sResult, TheDoc.CurrentView)
[ ]
[+] if sResult != NULL
[ ] ListAppend (lsResults, sResult)
[ ] iTotalLen = iTotalLen + Len (sResult)
[ ]
[ ]
[ ] ResPrintList ("Document text ({iTotalLen} chars)", lsResults)
[ ]

If neither of the 4Test functions in OCR.inc provides the functionality that you need, you can
call the DLL functions in the Segue DLL, SgOcrLib.dll, directly using the function declarations
in SgOcrLib.inc. The DLL functions are documented at the top of SgOcrLib.inc, and the
functions in OCR.inc can serve as an example of how to use the DLL functions.

INSTRUCTIONS FOR GENERATING THE FONT PATTERN DATABASE

Use the Exgui.exe utility to generate the pattern file (SgOcrPattern.pat). The text conversion
is performed using Windows fonts. Before conversion, the fonts that may be used must be
processed into the pattern file. Before generating the pattern file, the used fonts, their sizes
and styles must first be configured in the Textract.ini file. Open Textract.ini and adjust the
following settings in the [Recognition] section:

• Include1: List of fonts that should be converted


• Exclude: List of fonts that should not be converted
• Italic: 1 - convert italic characters; 0 - exclude them
• Bold: 1 - convert bold characters; 0 - exclude them
• Underlined: 1 - convert underlined characters; 0 - exclude them
• Sizes: Range of font sizes (<min>-<max>) that should be converted

After Textract.ini has been adjusted, open the Exgui.exe application and click the "Build font
pattern database" button. When the "Textract - Build Font Base" dialog appears, click OK and
wait for the pattern file to be generated. The file will save the name and path specified in the
"Database Path" setting in the [Options] section of the textract.ini file.

MORE INFORMATION ABOUT SGOCRLIB.DLL

Copy the following files into the directory where the executable that is calling the dll (e.g.,
Partner.exe) resides:
SgOcrLib.dll
SgOcrPattern.pat
Textract.dll
Textract.ini
For convenient pattern file generation, it is recommended that you also copy the exgui.exe file
to that directory.

Open the Textract.ini file and modify the Database Path setting (in the [Options] section) to
point to the correct location of SgOcrPattern.pat.
Have the application (e.g., SilkTest) load SgOcrLib.dll, which contains the functions explained
in the following section.

Interface

Page 3 of 5
OcrInit
int OcrInit(void** ppOcr)

Call once for initialization of the OCR library. This function must be called in advance of any
text conversion.

Return values:
0 Successful
-1 Invalid parameter; ppOcr = 0
-2 Could not load the textract dll
-3 Some functions of the textract dll could not be found
-4 Internal initialization of the textract dll failed; possible
textract.ini file misconfiguration

OcrTerminate
int OcrTerminate(void* pOcr)

Call at the end to clean up the library.

Return values:
0 Successful
-1 Invalid parameter; pOcr = 0

OcrConvert
int OcrConvert(void* pOcr, TString sFile)

Call to convert the bmp file into text. The conversion result can be retrieved with the
OcrGetResult function. The conversion uses the preconfigured pattern file that is specified in
the textract.ini file (Database Path).

Return values:
0 Successful
-1 Invalid parameter; pOcr = 0
-2 Library not initialized; Call OcrInit in advance!
-3 Text conversion failed.

OcrGetResult
char* OcrGetResult(void* pOcr)

Call to retrieve the result of the prior text conversion.

Return values:
Converted text Successful
0 No result

OcrGetResultLen
unsigned int OcrGetResultLen(void* pOcr)

Call to retrieve the result length of the prior text conversion.

Return values:
Result length

Page 4 of 5
Pattern file generation

Use the exgui.exe application to generate the appropriate pattern file. The text conversion is
done using Windows fonts. The used fonts must be processed into the pattern file before any
conversion is performed. The used fonts, their sizes and their styles must first be configured
in the textract.ini file. Open textract.ini and adjust following sections:

Include1: list fonts that should be converted


Exclude: list fonts that should not be converted
Italic: specify 1 or 0 whether or not italic fonts should be scanned
Bold: specify 1 or 0 whether or not bold fonts should be scanned
Underlined: specify 1 or 0 whether or not underlined fonts should be scanned
Sizes: specify the range of font sizes to be converted

Then open the exgui.exe application and press the “Build font pattern database...” button.
Press OK and wait for the pattern file to be generated. The generated file will have the name
and the location specified in the Database Path section of the textract.ini file.

Page 5 of 5

You might also like