You are on page 1of 18

Scan Tailor User Guide

Scan Tailor is an interactive tool for post-processing of scanned pages. It gives the ability to
cut or crop pages, compensate for skew angle, and add / delete content fields and margins,
among others. You begin with raw scans, and end up with tiff's that are ready for printing or
assembly in PDF or DjVu file.
Scanning, OCR, and the building of single-file multi-page documents are not included in the
project objectives.
The program is developed for Windows, GNU / Linux, and other Unix-like systems such as
Mac OS X.

Installation and first start


For Windows
Versions for Windows come in the form of an installer. Install the program and run through
the self-explanatory menu.
For GNU / Linux and Mac OS X
If your distribution does not have Scan Tailor in its repositories (and the only distribution I
know of is Alt Linux), then you have to build it from source. After assembly, you can run
Scan Tailor via Alt + F2, using the command "scantailor" (without the quotes), or if you don't
have a Gnome environment you can type the same command on the command line. On Mac
OS X, it currently must be run from the terminal after building and installation.
Once launched, the main program window looks like this:

In the central panel we see the items "New project ...", " Open Project ... " and a list of recent
projects, if any.

Create Project
So, let's create our first project. First, we need the raw material - Scan Tailor requires the
presence of at least one image file in the project. This source material can be scanned or
photographed pages of a book or a magazine. Follow these tips for scanning to get the best
results.
Click on "New Project ..." - Opens dialog "Project Files".

"Input Directory" - the folder where the original scans are located.
"Output Directory" - the folder where processed scans are stored.
"Files Not In Project" - files in a folder specified in the Input Directory, but not yet added to
the project or files that were added and have been removed from the project.
"Files In Project"- files added to the project, although not necessarily from the directory entry.
The project can take files from different directories to contribute to the project. You may
select files from one directory, then change it, select files from there, and so on.
The buttons ">>" and "<<" between "Files Not In Project" and "Files In Project" move files
from one list to another. This will only move highlighted files. It is possible to highlight all
the files via "Select All", or individually - via Ctrl + Click, or a range - through Shift + Click
as per standard selection procedures.
Once you've selected the files to use in the project (supported files types are *.tif, *.tiff, *.png,
*.jpg, *.jpeg), click OK. Usually at this stage the process of creating the project ends, but if
3

project files with an unspecified or clearly incorrect DPI have been selected then a "Fix DPI"
dialogue window will open.

To explain a little about DPI; DPI stands for "dots per inch" ( "dots per inch) which defines
the correspondence between the physical dimensions (inches, centimeters), and pixel size.
Pixels are the points which make up digital images. For example in the screenshot above, we
see two scans with the size of 2816 x 2112 pixels. If we knew the DPI of the scan, we could
calculate their physical size. If for example, assume that their DPI was 300x300 (300
horizontal by 300 vertical), then the physical dimensions are:
(2816 / 300) x (in 2112 / 300) = 9,4 x 7 inches = 23,8 x 17,8 cm
In this case DPI is determined by the scan settings, and usually scanning software produces
files of the correct DPI. However during further manipulation of the image this can easily get
lost so it must be restored to normal for operation with Scan Tailor. The most common DPI's
are 300 and 600. Horizontal and vertical components of the DPI are usually the same.
For digital camera images you may need to measure the physical dimensions of the area of
which the picture was taken then divide the number of pixels along that dimension of the
image by the length in inches to obtain the DPI (dots or pixels per inch). Keep this in mind
before you get rid of a book, though you can often look up a book's physical dimensions
online. DPI numbers below 150 will generally not lead to satisfactory results.
Consider again the dialogue window "Fix DPI". The tab "Need Fixing" lists only those files in
which the DPI is not specified or is clearly wrong. In the tab "All Pages" (Not to be confused
with the same drop-down list) lists all the files at all. They also can change the DPI. If we
know that all the files in the project are 300 x 300 DPI, then one can go to set the DPI for all
4

files. To do this, highlight the row that says "All pages" with the triangle/arrow next to it (not
the tab that says All pages), select the correct DPI, and click apply. You can also specify the
DPI for groups of files with the same pixel size, and also for individual files. Files that have
been fixed disappear from the tab "Need Fixing". When the tab is completely empty, click
OK. Upon clicking OK, the process of creating the project will be completed.
The DPI settings which you specify are not written to the files - they are only stored in the
Scan Tailor project file.
At the stages of processing "Correction orientation" and "Cutting pages" you have the ability
to add / remove files from the project with the help of a pop-up menu, selected via a right
click of the mouse in the preview window (the small thumbnails):

When you try to add a file with invalid DPI Scan Tailor will ask you to fix it:

Keep in mind that there is currently no way to change the DPI for a file in the already created
project, or any way to add a group of files to an existing project. This feature is expected in
future versions.

The concept of processing scans in Scan Tailor


5

First, you need to understand the general concept of the program. Processing of the scanned
pages in Scan Tailor is like a factory conveyor. There are stages, each of which deals with
some specific manipulation of the image to produce the final product.
Let's look at the main program window:

Top left we have a list of stages (1), and a preview window (4) it can thus be considered a
pipeline. It is important to understand that as per a regular assembly line if the product
reached a certain stage, then all the previous stages must have been completed in strict
sequence. At any time you can return to one of the previous stages to see, and if necessary to
correct, the operation that was done to the image at this point.
The round button "Play" to the right of the name of each stage runs the batch processing, like
starting the above described conveyor. Pressing the "Play" button causes pages, one by one, to
pass through all stages of processing, including the current stage. "Pass through all stages of
processing" - does not necessarily mean "processed". For example if the scan had already
been cut, then when it gets to the stage of cutting again it will not be cut, unless you had
changed something important such as its orientation. You may stop batch processing at any
stage by using the large round "stop" button during the processing of the main work area (3).
None of the stages, except the last stage, named "Output" actually record the resulting image
to the disk. All stages except the last are purely analytical.
The rest of the main window:
(2) Options for the current processing stage. (3) Main working area that displays the current
image, as well as tools for manipulating it, (5) Tool tips - here you can see prompts for how to
6

use the tool or item under the mouse pointer. (6) a button to "Keep the active page in the field
of view." In the depressed condition this causes the program to maintain the active page in the
preview window (4).

Program menu
File menu
New Project ...
Open Project ...
Save Project
Save Project As ...
Close Project
Quit

Ctrl+N
Ctrl+O
Ctrl+S
Ctrl+W
Ctrl+Q

These are mostly self-explanatory, except for the item "Close Project". This command closes
the project, but not the whole program. The main window returns to the way it was
immediately after starting the program. The shortcut keys are listed to the right of the item in
the file menu. For example pressing and holding the control key and then s will save the
project.
Tools Menu:
Debug mode is intended only for developers. After processing a single page, in the central
area of the tabs will appear the intermediate results of processing.

Processing stages
Fix orientation
At this stage it is possible to turn scans by multiples of 90 degrees. i.e., to correct sideways or
upside-down scans.
This is a manual stage because the program does not know how to determine the correct
orientation of scans - the user must do this. This also means that using batch processing at this
stage is useless. Obviously it behoves the user to make sure all initial scans are of the same
orientation, if possible; mixing orientation will make this stage less automatic and more timeconsuming.
Panel settings for this stage appears as follows:

Rotate

Buttons with yellow arrows rotate scans 90 degrees in one direction or the other. A green
pointer shows which direction the scan is oriented at the given moment. The button "reset"
returns the scan to the initial position - the green pointer will indicate upward.
Apply to

This makes it possible to turn not only this page (which is done automatically), but also
others. This dialogue is opened by clicking on the "Apply to ..." button and it appears as
follows:

The first two options in the list do not require explanation. This page and the following ones
will apply the same operation that has been applied to the current page to each page after the
current one. Every other page will apply rotation to either each even or to each odd page in
accordance with whether the current page is even or odd. The last two options are activated
only if two or more pages are selected from the preview pane. To use Every other selected
page the selected pages must be continuous. To select continuous pages it's often more
convenient to use Shift + click.

Split Pages
This stage determines whether you want to divide the page(s).
Control parameters at this stage looks like this:

Type of division from left to right:


1. One complete scan, without any parts of the next page. These scans are usually
obtained from specialized book scanners or photographs.
2. One scan, which hits part of the next page.
3. Twin scan.
The type of division is determined automatically, but can be set manually. Use the
"Change ...", to manually set. The type of division can be applied to all pages at once or
individual pages. The same button can be used to automatically select the type of division.
The dividing line can also be determined/moved automatically or specified manually, but it
can not be applied to other pages. It is useful to quickly check out the preview pane of each
page to ensure the page splits have been correctly applied - sometimes images in pages can
affect the split operation.
To increase the chances of correct automatic determination of the type of incision and
separation line, try to follow these rules when scanning:

If you make a one-scan, do it in portrait rather than landscape orientation.


When scanning, choose the most raw mode, ie one in which the machine will not try
to improve or compress anything.

Examples of good scans:

10

Example of bad scan (pretreated scanner software):

Deskew
At this stage one may determine the angle which the page needs to be turned for the text to be
properly horizontal. Since compensation is a simple rotation such distortions as keystone or
curling can not be corrected at this stage. The rotation angle is determined automatically, but
you can also set it manually.
Here is the working area in this mode:

11

Images can be rotated by dragging the round handles at the edges.


Here's what a panel of options looks like:

You can also explicitly specify the rotation angle in degrees. Positive angles will rotate the
image clockwise, negative counter-clockwise. For fine adjustment of the angle it may be
convenient to click the mouse on the text portion of the input field corner, then move the
mouse wheel to fix it.

Select Content

12

This stage determines the rectangular region with "useful" or useable content (shaded in
color). Why do we need to define this area? Firstly in order to determine the page size to the
output. The content will be added to the total margin area, and the outer limit of these margins
affects the size of the output file. Secondly so that the final images don't show the line of fold
or other debris from the edges. Strictly speaking whether the debris falls in the margin in the
output stage depends on the mode. In most modes the margin is filled in white.
If areas are identified incorrectly, you can tweak individual pages manually by setting the
mouse pointer over the edge, clicking & dragging as needed. Occasionally Scan Tailor may
find non-existent content or conversely not select content where it should. In this case, you
can manually create / delete a region by right-clicking on the image, and select the appropriate
menu item.

Page Layout

13

At this stage you may adjust the margins added to the content box. There are two types of
margins - hard and soft.
Hard margin - is that between the solid lines. They are set by the user. You can either move
over any solid line, be it an inner or outer edge, or set the margins through numerical values.

14

Soft margin - is that between the solid and the dotted line. These margins are automatically
added to bring the page size to the same size of other pages. If you see a dotted line - this
means that somewhere in the project there is a page with that width (usable area of Hard +
margin), and (possibly others) with that height.
This is one big page causing the soft margins in all the other pages, if only for not leveling
them off.

Optional alignment is precisely defined, add a soft margin, and if you add, then with any of
the parts.

Output
At this stage the output files are created from the images and written to the disk. The resultant
images also appear in the central window of the program.

15

Unlike the other stages, the "Output" stage becomes available only after all page pass the
stages of "Select Content" and "Page Layout". This is because the size of pages in the output
depend on each other. Say if it found a big page, then all the other fields are increasing (more
is described in the documentation on the Page Layout stage). Therefore it is important to
know the final size of pages, and it can only be done through the stages of "Select Content"
and "Page Layout". Why stage is the "Select Content" stage necessary? Because all options
under "Page Layout" are set manually, or default values are taken.
Configure the following:

Output Resolution (DPI) - you can manually specify a resolution for the output files:

16

Please note that although asymmetric DPI is currently supported (horizontal and vertical DPI
is not equal), this may be withdrawn at some stage.
The default is 600 DPI. In some cases it is 300.
Mode - Selects the output mode of the pages:

Black and White mode hardly requires explanation but clearly there are no greytones
available so this would not be suitable for any images and some drawings. There is an option
to "despeckle", and to increase or decrease the line thickness (i.e. of the text). In general it is
best to not despeckle if the image is reasonably clean as despeckling can result in the loss of
some portions of text. This may be compensated for to a degree by increasing the line
thickness but it's probably important to experiment on a few pages before applying to the
entire project.
The option Color / Grayscale has additional settings that are available:

17

Margins can be filled with white or left as is. If the margins are filled in white, then the option
to equalise lighting also becomes available. This option normalizes the background color,
bringing it to white, and normalizes contrast, increasing it in the shaded areas.
Mixed mode is used for projects in which there are scans from half-tone images (grayscale or
color). Pictures will be automatically detected and displayed as it is, just as in the "Colour /
Gray" with the included equalised illumination. The rest of the page is displayed in black and
white.
Automatic image works well enough, but if the picture merges smoothly into the background
the result may be unsatisfactory. In this case, you must create and configure the picture zone
images. It is important to note that the creation of zones of images is possible only in mixed
mode.
Display all files at once
To do this, run batch processing through the menu. The files will be stored in the directory
you chose when you created the project. Unfortunately, unlike other stages, already derived
pages will be displayed again, even if nothing in them has changed. This feature will be
removed in future versions. In the meantime, if necessary, tweak a few pages after the running
of a batch, it is better not to run the entire batch processing again, rather adjust the few pages
by hand. The process of withdrawal of the current page begins immediately at the transition to
the stage of "withdrawal" or when moving to another page at this point.

The format of output files


Output is in TIFF format. Black and White modes uses G4Fax compression; other modes use
LZW compression. G4Fax, and LZW compress without loss of quality.

18

You might also like