You are on page 1of 393

About this Programmers Guide

This documentation describes programming essentials you need to build Java and other applications that use
Aspose.Words. This section provides information about key programming concepts, as well as code samples and detailed
explanations.

At the top level, the Programmers Guide is split into chapters that cover the following main feature areas of
Aspose.Words:
Converting Documents
Programming with the Document Object Model
Mail Merge and Reporting
Rendering and Printing
Platforms and Interoperability
Most chapters begin with one or more introductory topics that explain what is possible in general and why things are done
the way they are. The procedural (also known as How-To) topics follow. The How-To topics usually take one real
programming task and show how it can be done with Aspose.Words step by step.
This Programmers Guide is a living thing and we are improving and adding content all the time. Feel free to comment on the
quality of the documentation and suggest where more information is needed. Your comments are appreciated and help to
improve this documentation in the same way your requests help to shape Aspose.Words features.
Overview of Aspose.Words in Java
The Aspose.Words API in J ava

While trying to keep the API as straight forward and clear as possible, we decided to recognize and honor the
common development practices of the platform. Therefore, Aspose.Words for Java follows coding guidelines
widely accepted by Java developers.
Packages

All classes and methods used in Aspose.Words for Java are contained in one package.
Package in Aspose.Words for Java
com.aspose.words
Classes

Where possible, class, method and property names match those found in Microsoft Word Automation.
Class Name in Aspose.Words for Java
Document
Paragraph
Enumerations

Enumerations are ported to Java as classes with public integer constants
Constant in Aspose.Words for Java
BorderType.LEFT
TextFormFieldType.DATE_TEXT
ProtectionType.ALLOW_ONLY_COMMENT
S

The main reason why we did not use Java enums is to stay compatible with J2SE 1.4.2 as Java enums appeared only in J2SE
5.0.

All constants are integer values in Aspose.Words for Java. Whereas in the .NET version a parameter, return value or a
property was of an enumerated type, it has been ported as an integer to Java. In such cases, the documentation for the
parameter will specify what class contains the constants applicable for this parameter.
Methods

Method names follow the accepted practices for the Java platform.
Method Name in Aspose.Words for
Java
Document.Save
CompositeNode.GetChildNodes

Several methods had to be renamed as they got into conflict with some Java runtime methods. For example,
the clone method which should be named Document.Clone was renamed to Document.deepClone in Java.
Properties

All properties found in classes within Aspose.Words for Java are implemeneted as getter and setter methods.
The original name of the method had "get" and "set" prefixes added to it.
Getter and Setter in Aspose.Words for Java
Font.getBold , Font.setBold
PageSetup.getLeftMargin , PageSetup.setLeftMargin
I ndexed Properties

Indexed properties appear as get() and set() properties in most cases.
Getter and Setter in Aspose.Words for
Java
Style.get(int)
Style.get(String)
Events

Events in Aspose.Words for Java are implemented as callbacks (listeners). For examples to subscribe to the
event of a field merging you create your own class implementing the IFieldMergingCallback interface.
Implementations of I nternal I nterfaces

In Java, all members that are implementations of interfaces are public methods of the class. This makes some
methods visible (that were not intended to be visible) in the public API of Aspose.Words for Java. We will
include a corresponding remark in all such methods or will try to remove them from the documentation
completely.

For example, the public Border class implements internal interface IComplexAttr, and the Merge method is
visible in the public API . You should not use such methods. In this case you cannot use this method at all
because the IComplexAttr interface is not public and its declaration is not available to you.

About Document Conversions in
Aspose.Words


The ability to easily and reliably convert documents from one format to another is one of the four main feature areas of
Aspose.Words (the other three being: document object model, rendering and mail merge).

Almost any task that you want to perform with Aspose.Words involves loading or saving a document in some format.
The LoadFormat enumeration specifies all load or import formats supported by Aspose.Words. The SaveFormat
enumeration specifies all save or export formats supported by Aspose.Words. Aspose.Words can convert a document from
any load format into any save format making the total number of possible conversions very large.
Converting from one document format to another in Aspose.Words is very easy and can be accomplished using just two lines
of code:
1. Load your document into a Document object using one of its constructors. By default, Aspose.Words will even
auto-detect the file format for you.
2. Invoke one of the Document.Save methods on the Document object and specify the desired output format.
Document Load Overview


The Document class represents a document loaded into memory. Document has several overloaded
constructors allowing you to create a blank document or to load it from a file or stream.
Creating a New Document

Call the Document constructor without parameters to create a new blank document:
Example
Shows how to create a blank document. Note the blank document contains one section and one paragraph.
Java
Document doc = new Document();


The document paper size is PaperSize.Letter by default.
If you want to generate a document programmatically, the most reasonable step after creation is to use
DocumentBuilder to add document contents.
Example
Shows how to create build a document using a document builder.
Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

builder.writeln("Hello World!");

doc.save(getMyDir() + "DocumentBuilderAndSave Out.docx");

Opening from a File

Pass a file name as a string to the Document constructor that accepts a string to open an existing document
from a file:
Example
Opens a document from a file.
Java
// Open a document. The file is opened read only and only for the duration of the
constructor.
Document doc = new Document(getMyDir() + "Document.doc");

Opening from a Stream

Simply pass a stream object that contains a document to the Document constructor accepting an InputStream:
Example
Opens a document from a stream.
Java
// Open the stream. Read only access is enough for Aspose.Words to load a document.
InputStream stream = new FileInputStream(getMyDir() + "Document.doc");

// Load the entire document into memory.
Document doc = new Document(stream);

// You can close the stream now, it is no longer needed because the document is in
memory.
stream.close();

// ... do something with the document

Opening Encrypted Documents


You can open Word documents encrypted with a password. To do that, use the special constructor overload,
which accepts a LoadOptions object. This object contains the LoadOptions.Password property which specifies
the password string.
Example
Loads a Microsoft Word document encrypted with a password.
Java
Document doc = new Document(getMyDir() + "Document.LoadEncrypted.doc", new
LoadOptions("qwerty"));

Document Save Overview

Use the Document.Save method for saving a document. There are overloads that allow saving a document to a
file or stream. The document can be saved in any save format supported by Aspose.Words. For the list of all
supported save formats see the SaveFormat enumeration.
Saving to a File

Simply use the Document.Save method with a file name as a string. Aspose.Words will infer the save format
from the file extension that you specify.
Example
Saves a document to a file.
Java
doc.save(getMyDir() + "Document.OpenFromFile Out.doc");

Saving to a Stream

You pass a stream object to the Document.Save method. When you save to a stream, you must specify the
save format explicitly using the SaveFormat enumeration.
Example
Shows how to save a document to a stream.
Java
Document doc = new Document(getMyDir() + "Document.doc");

ByteArrayOutputStream dstStream = new ByteArrayOutputStream();
doc.save(dstStream, SaveFormat.DOCX);

// In you want to read the result into a Document object again, in Java you need to
get the
// data bytes and wrap into an input stream.
ByteArrayInputStream srcStream = new ByteArrayInputStream(dstStream.toByteArray());

Specifying Save Options

There are Document.Save method overloads that accept a SaveOptions object. This should be an object of a
class derived from the SaveOptions class. Each save format has a corresponding class that holds save options
for that save format, for example there is PdfSaveOptions for the SaveFormat.Pdf save format.

More info about using save options is coming soon.
Example
Shows how to set save options before saving a document to HTML.
Java
Document doc = new Document(getMyDir() + "Rendering.doc");

// This is the directory we want the exported images to be saved to.
File imagesDir = new File(getMyDir(), "Images");

// The folder specified needs to exist and should be empty.
if(imagesDir.exists())
imagesDir.delete();

imagesDir.mkdir();

// Set an option to export form fields as plain text, not as HTML input elements.
HtmlSaveOptions options = new HtmlSaveOptions(SaveFormat.HTML);
options.setExportTextInputFormFieldAsText(true);
options.setImagesFolder(imagesDir.getPath());

doc.save(getMyDir() + "Document.SaveWithOptions Out.html", options);

Working with Digital Signatures

A digital signature is used to authenticate a document to establish that the sender of the document is who they
say they are and the content of the document has not been tampered with.

Aspose.Words supports documents with digital signatures and provides access to them allowing you to detect
and validate digital signatures on a document . At the present time digital signatures are supported on DOC,
OOXML and ODT documents.
Digital Signatures are not Preserved on Open and Save
An important point to note is that a document loaded and then saved using Aspose.Words will lose any
digital signatures signed on the document. This is by design as a digital signature ensures that the content
has not been modified and furthermore authenticates the identify of who signed the document. These
principles would be invalidated if the original signatures were carried over to the resulting document.

Due to this, if you process documents uploaded to a server this could potentially mean you may corrupt a
document uploaded to your server in this way without knowing. Therefore it is best to check for digital
signatures on a document and take the appropriate action if any are found, for example an alert can be
sent to the client informing them that the document they are passing contains digital signatures which will
be lost if it is processed.
Example
Shows how to check a document for digital signatures before loading it into a Document object.
Java
// The path to the document which is to be processed.
String filePath = getMyDir() + "Document.Signed.docx";

FileFormatInfo info = FileFormatUtil.detectFileFormat(filePath);
if (info.hasDigitalSignature())
{
System.out.println(java.text.MessageFormat.format(
"Document {0} has digital signatures, they will be lost if you open/save
this document with Aspose.Words.",
new File(filePath).getName()));
}

The code above uses the FileFormatUtil.DetectFileFormat method to detect if a document contains
digital signatures without loading the document first. This provides an efficient and safe way to check a
document for signatures before processing them. When executed, the method returns a FileFormatInfo
object which provides the property FileFormatInfo.HasDigitalSignature.
This property returns true if the document contains one or more digital signatures. Its important to note
that this method does not validate the signatures, it only determines if signatures are present. Validating
and signing documents will be supported in Aspose.Words for Java sometime in the future.
Digital Signatures on Macros (VBA Projects)
Digital signatures on macros cannot be accessed or signed. This is because Aspose.Words does not
directly deal with macros in a document. However digital signatures on macros are preserved when
exporting the document back to any word format. These signatures can be preserved on VBA code
because the binary content of the macros are not changed even if the document itself is modified.
How to Convert a Document to PDF

To convert a document to PDF simply invoke the Document.Save method and specify a file name with the
.pdf extension.
Example
Converts a whole document from DOC to PDF using default options.
Java
Document doc = new Document(getMyDir() + "Document.doc");
doc.save(getMyDir() + "Document.Doc2PdfSave Out.pdf");


getMyDir() is a function that returns the path to the directory which holds the input document for the
example. In other words, the constructor is taking an absolute file path to the document on disk.


How to Convert a Document to MHTML and
Email

Aspose.Words can save any document in MHTML (Web Archive) format. This makes it very easy to use
Aspose.Words and Aspose.Email together to generate email messages with rich content. For example, you can
load a predefined DOC, OOXML or RTF document into Aspose.Words, fill it with data, save as MHTML and then
convert to any mail format supported by Aspose.Email.
Download the complete Save MHTML and Email sample's source code.
Example
Shows how to save any document from Aspose.Words as MHTML and create a Outlook MSG file from it
using Aspose.Email.
Java
// Load the document into Aspose.Words.
String srcFileName = dataDir + "DinnerInvitationDemo.doc";
Document doc = new Document(srcFileName);

// Save to an output stream in MHTML format.
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
doc.save(outputStream, SaveFormat.MHTML);

// Load the MHTML stream back into an input stream for use with Aspose.Email.
ByteArrayInputStream inputStream = new
ByteArrayInputStream(outputStream.toByteArray());

// Create an Aspose.Email MIME email message from the stream.
MailMessage message = MailMessage.load(inputStream, MessageFormat.getMht());
message.setFrom(new MailAddress("your_from@email.com"));
message.getTo().add("your_to@email.com");
message.setSubject("Aspose.Words + Aspose.Email MHTML Test Message");

// Save the message in Outlook MSG format.
message.save(dataDir + "Message Out.msg",
MailMessageSaveType.getOutlookMessageFormat());

How to Convert an Image to PDF

You can download the complete source code of the ImageToPdf sample here.
This article shows how to create a PDF document from an image using Aspose.Words. While converting
images to PDF is not a main feature of Aspose.Words, this example shows how easy it is to do with
Aspose.Words.

To make this code work you need to add references to Aspose.Words, javax.imageio and
java.awt.image to your project.
The code below allows converting single frame images, such as JPEG, PNG or BMP, as well as multi-
frame GIF images to PDF .
Example
Converts an image into a PDF document.
Java
package ImageToPdf;

import javax.imageio.ImageIO;
import javax.imageio.ImageReader;
import javax.imageio.stream.ImageInputStream;
import java.awt.image.BufferedImage;
import java.io.File;
import java.net.URI;

import com.aspose.words.Document;
import com.aspose.words.DocumentBuilder;
import com.aspose.words.BreakType;
import com.aspose.words.PageSetup;
import com.aspose.words.ConvertUtil;
import com.aspose.words.RelativeHorizontalPosition;
import com.aspose.words.RelativeVerticalPosition;
import com.aspose.words.WrapType;


class Program
{
public static void main(String[] args) throws Exception
{
// Sample infrastructure.
URI exeDir = Program.class.getResource("").toURI();
String dataDir = new File(exeDir.resolve("../../Data")) + File.separator;

convertImageToPdf(dataDir + "Test.jpg", dataDir + "TestJpg Out.pdf");
convertImageToPdf(dataDir + "Test.png", dataDir + "TestPng Out.pdf");
convertImageToPdf(dataDir + "Test.bmp", dataDir + "TestBmp Out.pdf");
convertImageToPdf(dataDir + "Test.gif", dataDir + "TestGif Out.pdf");
}

/**
* Converts an image to PDF using Aspose.Words for Java.
*
* @param inputFileName File name of input image file.
* @param outputFileName Output PDF file name.
*/
public static void convertImageToPdf(String inputFileName, String outputFileName)
throws Exception
{
// Create Aspose.Words.Document and DocumentBuilder.
// The builder makes it simple to add content to the document.
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

// Load images from the disk using the approriate reader.
// The file formats that can be loaded depends on the image readers available
on the machine.
ImageInputStream iis = ImageIO.createImageInputStream(new
File(inputFileName));
ImageReader reader = ImageIO.getImageReaders(iis).next();
reader.setInput(iis, false);

try
{
// Get the number of frames in the image.
int framesCount = reader.getNumImages(true);

// Loop through all frames.
for (int frameIdx = 0; frameIdx < framesCount; frameIdx++)
{
// Insert a section break before each new page, in case of a multi-
frame image.
if (frameIdx != 0)
builder.insertBreak(BreakType.SECTION_BREAK_NEW_PAGE);

// Select active frame.
BufferedImage image = reader.read(frameIdx);

// We want the size of the page to be the same as the size of the
image.
// Convert pixels to points to size the page to the actual image size.
PageSetup ps = builder.getPageSetup();

ps.setPageWidth(ConvertUtil.pixelToPoint(image.getWidth()));
ps.setPageHeight(ConvertUtil.pixelToPoint(image.getHeight()));

// Insert the image into the document and position it at the top left
corner of the page.
builder.insertImage(
image,
RelativeHorizontalPosition.PAGE,
0,
RelativeVerticalPosition.PAGE,
0,
ps.getPageWidth(),
ps.getPageHeight(),
WrapType.NONE);
}
}

finally {
if (iis != null) {
iis.close();
reader.dispose();
}
}
How-to Convert a Document to EPUB

An EPUB document (short for electronic publication) is HTML-based format commonly used for electronic book
distribution. This format is fully supported in Aspose.Words for exporting electronic books compatible with
majority of devices used for reading. This article shows how to convert simple MS Word document to EPUB
with a few lines of code. It also demonstrates what a sample document looks like after being converted to
EPUB using Aspose.Words.
Converting a Document to EPUB
Example
Converts a document to EPUB using default save options.
Java
// Open an existing document from disk.
Document doc = new Document(getMyDir() + "Document.EpubConversion.doc");

// Save the document in EPUB format.
doc.save(getMyDir() + "Document.EpubConversion Out.epub");

Specifying Save Options
You can specify a number of options by passing an instance of HtmlSaveOptions to the
Document.Save(String, SaveOptions) method. The code snippet below shows a few of them in action.
Example
Converts a document to EPUB with save options specified.
Java
// Open an existing document from disk.
Document doc = new Document(getMyDir() + "Document.EpubConversion.doc");

// Create a new instance of HtmlSaveOptions. This object allows us to set options that
control
// how the output document is saved.
HtmlSaveOptions saveOptions =
new HtmlSaveOptions();

// Specify the desired encoding.
saveOptions.setEncoding(Charset.forName("UTF-8"));

// Specify at what elements to split the internal HTML at. This creates a new HTML
within the EPUB
// which allows you to limit the size of each HTML part. This is useful for readers
which cannot read
// HTML files greater than a certain size e.g 300kb.
saveOptions.setDocumentSplitCriteria(DocumentSplitCriteria.HEADING_PARAGRAPH);

// Specify that we want to export document properties.
saveOptions.setExportDocumentProperties(true);

// Specify that we want to save in EPUB format.
saveOptions.setSaveFormat(SaveFormat.EPUB);

// Export the document as an EPUB file.
doc.save(getMyDir() + "Document.EpubConversion Out.epub", saveOptions);

A Sample Conversion
In the next few paragraphs well review the results of a sample document converted to EPUB format. The
screenshots below shows the key features.
Since EPUB is a publishing format for electronic books, its apparent that the most import features will
involve text. At a glance, we can see the text and all key features in the EPUB output look identical to the
source document.
The picture below shows the key text formatting features after conversion to EPUB.

A wide range of paragraph formatting settings used in following example perform equal to the
source document.

The following picture shows how great tables are rendered despite of their complexity.

Even complex lists from the source document are exported well to EPUB.

Images are essential for most publications and can be aligned differently on the screen.

This picture shows an table of contents generated from source document exported as inline text
with working hyperlinks. The same headings which make up the TOC in the source document are
exported to the navigation pane in the EPUB for easy navigation.

EPUB File Validation
The EPUB documents produced by Aspose.Words pass validation which means that EPUB standards are
adhered to and there are no errors with the EPUB.
Even though passing validation doesnt guarantee that every device or EPUB viewer will display your
document in exactly the same way, it does however give the highest chance that your document will be
viewed as close as possible as originally intended.
The picture below shows report on the document we just converted on one of the validation services.

Meta-data in EPUB Files
Meta-data is additional information such as Author Name, Title, Comments, etc. added to a file thats not
visible in the content of the file itself.
Word document formats have special properties dedicated to such metadata and this can exported to
EPUB files as well. Metadata fields are often required by distributors and e-book stores as keywords for
their search engines and providing information about books for customers.
The picture below shows the metadata after the conversion.

// Save the document to PDF.
doc.save(outputFileName);
}
}


How to Detect the File Format

Sometimes it is necessary to detect the format of a document file before opening because the file extension
does not guarantee that the file content is appropriate.

For example, a document maybe saved with the wrong extension or no extension at all . Therefore, if you are
not sure what the actual content of the file is and want to avoid throwing an exception, you can use the
FileFormatUtil.DetectFileFormat method. This is a static method that accepts either a file name or stream
object that contains the file data. The method returns a FileFormatInfo object that contains the detected
information about the file type.
Example
Shows how to use the FileFormatUtil class to detect the document format and other features of the
document.
Java
FileFormatInfo info = FileFormatUtil.detectFileFormat(getMyDir() + "Document.doc");
System.out.println("The document format is: " +
FileFormatUtil.loadFormatToExtension(info.getLoadFormat()));
System.out.println("Document is encrypted: " + info.isEncrypted());
System.out.println("Document has a digital signature: " + info.hasDigitalSignature());

How to Check Format Compatibility
When you are dealing with multiple documents in various file formats, you may need to separate out
those files that can be processed by Aspose.Words from those that cannot. You may also want to know
why some of the documents cannot be processed.

If you attempt to load a file into a Document object and Aspose.Words cannot recognize the file format
or the format is not supported, Aspose.Words will throw an exception. You can catch those exceptions
and analyze them, but Aspose.Words also provides a specialized method that allows to quickly determine
the file format without loading a document with possible exceptions.
This article describes how you can check the format compatibility of all files in the selected folder and
sort them by file format into appropriate subfolders.
Solution
To do this, we will work through the following steps in the code:
1. Get the collection of all files in the selected folder.
2. Loop through the collection.
3. For each file:
1. Check the file format.
2. Display the check results.
3. Move the file to the appropriate folder.
The following files are used in this sample. The file name is on the left and its description is on the right.
To test supported file formats:
Input Document Type
Test File (docx).docx Office Open XML WordprocessingML document without macros.
Test File (docm).docm Office Open XML WordprocessingML document with macros.
Test File (doc).doc Microsoft Word 97 - 2003 document.
Test File (rtf).rtf Rich Text Format document.
Test File (dot).dot Microsoft Word 97 - 2003 template
Test File (dotx).dotx Office Open XML WordprocessingML template.
Test File (HTML).html HTML document.
Test File (MHTML).mhtml MHTML (Web archive) document.
Test File (WordML).xml Microsoft Word 2003 WordprocessingML document.
Test File (odt).odt OpenDocument Text format (OpenOffice Writer).
Test File (XML).xml FlatOPC OOXML Document.
To test encrypted documents:
Input Document Type
Test File (enc).doc Encrypted Microsoft Word 97 - 2003 document.
Test File (enc).docx Encrypted Office Open XML WordprocessingML document.
Unsupported file formats:
Input Document Type
Test File (pre97).doc Microsoft Word 95 document.
Test File (JPG).jpg JPEG image file.
The Code
As were dealing with the content in a folder, the first thing we need to do is to get the collection of all
files in this folder using the List method of the File class:
Example
Get the list of all files in the dataDir folder.
Java
File[] fileList = new java.io.File(dataDir).listFiles();

When all the files are collected, the rest of the work is done by a single method within the Aspose.Words
component FileFormatUtil.DetectFileFormat. The FileFormatUtil.DetectFileFormat method checks
the file format, but note that it only checks the file format, it does not validate the file format. This means
that there is no guarantee that the file will be opened even if FileFormatUtil.DetectFileFormat returns
that it is one of the supported formats. This is because the FileFormatUtil.DetectFileFormat method
reads only partial data of the file format, enough to check the file format, but not enough for complete
validation.

The following code loops through the collected list of files, checks the file format of each file, displays
them in the console and moves each file into the appropriate folder:
Example
Check each file in the folder and move it to the appropriate subfolder.
Java
// Loop through all found files.
for (File file : fileList)
{
if (file.isDirectory())
continue;

// Extract and display the file name without the path.
String nameOnly = file.getName();
System.out.print(nameOnly);

// Check the file format and move the file to the appropriate folder.
String fileName = file.getPath();
FileFormatInfo info = FileFormatUtil.detectFileFormat(fileName);

// Display the document type.
switch (info.getLoadFormat())
{
case LoadFormat.DOC:
System.out.println("\tMicrosoft Word 97-2003 document.");
break;
case LoadFormat.DOT:
System.out.println("\tMicrosoft Word 97-2003 template.");
break;
case LoadFormat.DOCX:
System.out.println("\tOffice Open XML WordprocessingML Macro-Free
Document.");
break;
case LoadFormat.DOCM:
System.out.println("\tOffice Open XML WordprocessingML Macro-Enabled
Document.");
break;
case LoadFormat.DOTX:
System.out.println("\tOffice Open XML WordprocessingML Macro-Free
Template.");
break;
case LoadFormat.DOTM:
System.out.println("\tOffice Open XML WordprocessingML Macro-Enabled
Template.");
break;
case LoadFormat.FLAT_OPC:
System.out.println("\tFlat OPC document.");
break;
case LoadFormat.RTF:
System.out.println("\tRTF format.");
break;
case LoadFormat.WORD_ML:
System.out.println("\tMicrosoft Word 2003 WordprocessingML format.");
break;
case LoadFormat.HTML:
System.out.println("\tHTML format.");
break;
case LoadFormat.MHTML:
System.out.println("\tMHTML (Web archive) format.");
break;
case LoadFormat.ODT:
System.out.println("\tOpenDocument Text.");
break;
case LoadFormat.OTT:
System.out.println("\tOpenDocument Text Template.");
break;
case LoadFormat.DOC_PRE_WORD_97:
System.out.println("\tMS Word 6 or Word 95 format.");
break;
case LoadFormat.UNKNOWN:
default:
System.out.println("\tUnknown format.");
break;
}

// Now copy the document into the appropriate folder.
if (info.isEncrypted())
{
System.out.println("\tAn encrypted document.");
fileCopy(fileName, new File(encryptedDir, nameOnly).getPath());
}
else
{
switch (info.getLoadFormat())
{
case LoadFormat.DOC_PRE_WORD_97:
fileCopy(fileName, new File(pre97Dir + nameOnly).getPath());
break;
case LoadFormat.UNKNOWN:
fileCopy(fileName, new File(unknownDir + nameOnly).getPath());
break;
default:
fileCopy(fileName, new File(supportedDir + nameOnly).getPath());
break;
}
}
}


The files are moved into appropriate subfolders using the additional method fileCopy which will copy the file
from the soure location to the new location.
End Result
The sample moves all the files to subfolders and displays the following log:

How to Convert a Document to a Byte Array

This article shows how to serialize a Document object to obtain a byte array representing the Document*and
then how to unserialize the byte array to obtain a Document object again. This technique is often required
when storing a document in a database or for preparing a Document for transmission across the web.

Please note that an Aspose.Words.Document object cannot be serialized using built in Java serialization
techniques but instead can be serialized by using the method detailed below.
The Code
The simplest method used to serialize a Document object is to first save it to a ByteArrayOutputStream
using the Document.Save method overload of the Document class accepting a stream and SaveFormat.
The toByteArray method is then called on the stream which returns an array of bytes representing the
document in byte form.

The save format chosen is important as to ensure the highest fidelity is retained upon saving and reloading
into the Document object. For this reasons an OOXML format is suggested.
The steps above are then reversed to load the bytes back into a Document object.
Example
Shows how to convert a document object to an array of bytes and back into a document object again.
Java
// Load the document.
Document doc = new Document(getMyDir() + "Document.doc");

// Create a new memory stream.
ByteArrayOutputStream outStream = new ByteArrayOutputStream();
// Save the document to stream.
doc.save(outStream, SaveFormat.DOCX);

// Convert the document to byte form.
byte[] docBytes = outStream.toByteArray();

// The bytes are now ready to be stored/transmitted.

// Now reverse the steps to load the bytes back into a document object.
ByteArrayInputStream inStream = new ByteArrayInputStream(docBytes);

// Load the stream into a new document object.
Document loadDoc = new Document(inStream);

How to Load and Save a Document to
Database

You can download the complete source code of the DocumentInDB sample.
One of the tasks you may need to perform when working with documents is storing and retrieving
Document objects to and from a database. For example, this would be necessary if you were
implementing any type of content management system. The storage of all previous versions of documents
would be required to be stored in a database system. The ability to store documents in the database is also
extremely useful when your application provides a web-based service.

This sample shows how to store a document into a database and then load it back into a Document object
for working with. For the sake of simplicity, the name of the file is the key used to store and fetch
documents from the database. The database contains two columns. The first column FileName is stored
as a String and is used to identify the documents. The second column FileContent is stored as a BLOB
object which stores the document object is byte form.
Solution
We will do the following to store, read and delete a document in the database:
Steps to Store a Document into the Database:
1. Save the source document into a ByteArrayOutputStream . This allows us to get the content of the
document as an array of bytes.
2. Store the array of bytes into a database field.
Steps to Read a Document from the Database:
1. Select the record that contains the document data as an array of bytes.
2. Load the array of bytes from the data record into a ByteArrayInputStream .
3. Create a Document object that will load the document from the input stream.
The following Word document is used in this sample:


The Code
There are three methods implemented in this sample and are described in detail below:
The StoreToDatabase method that stores the Document object into the database.
The ReadFromDatabase method that reads the stored Document object from the database.
The DeleteFromDatabase method that deletes the record containing the specified Document from the
database.
Example
Helper methods used to connect to and execute queries on a database.
Java
/**
* Utility function that creates a connection to the Database.
*/
public static void createConnection(String dataBasePath) throws Exception
{
// Load a DB driver that is used by the demos
Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");

// The path to the database on the disk.
File dataBase = new File(dataBasePath);

// Compose connection string.
String connectionString = "jdbc:odbc:DRIVER={Microsoft Access Driver (*.mdb)};" +
"DBQ=" + dataBase + ";UID=Admin";
// Create a connection to the database.
mConnection = DriverManager.getConnection(connectionString);
}

/**
* Executes a query on the database.
*/
protected static ResultSet executeQuery(String query) throws Exception
{
return createStatement().executeQuery(query);
}

/**
* Creates a new database statement.
*/
public static Statement createStatement() throws Exception
{
return mConnection.createStatement(ResultSet.TYPE_SCROLL_INSENSITIVE,
ResultSet.CONCUR_READ_ONLY);
}

Firstly, a connection to the database is created by calling the createConnection method. In this sample we
are using a Microsoft Access .mdb database to store an Aspose.Words document. A document is loaded
into memory and then converted into a byte array using the general method described in the article here.
Storing a Document into the Database
Example
Stores the document to the specified database.
Java
public static void storeToDatabase(Document doc) throws Exception
{
// Save the document to a OutputStream object.
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
doc.save(outputStream, SaveFormat.DOC);

// Get the filename from the document.
String fileName = new File(doc.getOriginalFileName()).getName();

// Create the SQL command.
String commandString = "INSERT INTO Documents (FileName, FileContent) VALUES(?,
?)";

// Prepare the statement to store the data into the database.
PreparedStatement statement = mConnection.prepareStatement(commandString);

// Add the parameter value for FileName.
statement.setString(1, fileName);

// Add the parameter value for FileContent.
statement.setBinaryStream(2, new ByteArrayInputStream(outputStream.toByteArray()),
outputStream.size());

// Execute and commit the changes.
statement.execute();
mConnection.commit();
}

The next step is the most important: specify the commandString, which is an SQL expression that will do
all the work. To store the Document into the database the INSERT INTO command is used and the
table specified along with the values of two record fields FileName and FileContent. To avoid
additional parameters, the file name is acquired from the Document object itself. The FileContent field
value is assigned the bytes from the memory stream, which contains the binary representation of the
stored document.

The remaining line of code executes the command which stores the Aspose.Words document in the
database.
Retrieving a Document from the Database
Example
Retrieves and returns the document from the specified database using the filename as a key to fetch the
document.
Java
public static Document readFromDatabase(String fileName) throws Exception
{
// Create the SQL command.
String commandString = "SELECT * FROM Documents WHERE FileName='" + fileName +
"'";

// Retrieve the results from the database.
ResultSet resultSet = executeQuery(commandString);

// Check there was a matching record found from the database and throw an
exception if no record was found.
if(!resultSet.isBeforeFirst())
throw new IllegalArgumentException(MessageFormat.format("Could not find any
record matching the document \"{0}\" in the database.", fileName));

// Move to the first record.
resultSet.next();

// The document is stored in byte form in the FileContent column.
// Retrieve these bytes of the first matching record to a new buffer.
byte[] buffer = resultSet.getBytes("FileContent");

// Wrap the bytes from the buffer into a new ByteArrayInputStream object.
ByteArrayInputStream newStream = new ByteArrayInputStream(buffer);

// Read the document from the input stream.
Document doc = new Document(newStream);

// Return the retrieved document.
return doc;

}

Firstly the SQL command SELECT * FROM is used to fetch the appropriate record based off the
filename.

The data is then populated from the database into a ResultSet object using the database adapter initialized
at the start of the application. The populated ResultSet object is checked to ensure the requested data has
actually been extracted. For the final steps, the process used in storing the document is reversed and the
bytes are deserialized to be loaded back into a Document object. The Document object is then returned
from the method and saved with the appropriate name to disk.
Deleting a Document from the Database
The final method is DeleteFromDatabase . This is quite straightforward as there are no manipulations
with the Document object.
Example
Delete the document from the database, using filename to fetch the record.
Java
public static void deleteFromDatabase(String fileName) throws Exception
{
// Create the SQL command.
String commandString = "DELETE * FROM Documents WHERE FileName='" + fileName +
"'";

// Execute the command.
createStatement().executeUpdate(commandString);
}

A connection to the database is created and the SQL command DELETE * FROM is specified using the
filename to seek the appropriate record containing the Document. The document is deleted from the
database.

To illustrate all three methods we have called them consecutively:
Example
Stores the document to a database, then reads the same document back again, and finally deletes the
record containing the document from the database.
Java
// Store the document to the database.
storeToDatabase(doc);
// Read the document from the database and store the file to disk.
Document dbDoc = readFromDatabase(FILE_NAME);

// Save the retrieved document to disk.
String newFileName = new File(FILE_NAME).getName() + " from DB" +
FILE_NAME.substring(FILE_NAME.lastIndexOf("."));
dbDoc.save(dataDir + newFileName);

// Delete the document from the database.
deleteFromDatabase(FILE_NAME);

End Result
When the code above is executed the following document is stored and then retrieved from the database
and will appear in the Data folder:

How to Load Plain Text (TXT) Files

You can download the complete source code of the LoadTxt sample.

Aspose.Words allows you to import plain text data the same way as other document formats, by using the
Document constructor.
Example
Loads a plain text file into an Aspose.Words.Document object.
Java
package LoadTxt;

import java.io.*;
import java.io.File;
import java.net.URI;

import com.aspose.words.Document;


class Program
{
public static void main(String[] args) throws Exception
{
// Sample infrastructure.
URI exeDir = Program.class.getResource("").toURI();
String dataDir = new File(exeDir.resolve("../../Data")) + File.separator;

// The encoding of the text file is automatically detected.
Document doc = new Document(dataDir + "LoadTxt.txt");

// Save as any Aspose.Words supported format, such as DOCX.
doc.save(dataDir + "LoadTxt Out.docx");
}
}

Text Import Features
Plain text format is a basic format that does not require advanced text processor to be viewed or edited.
However some plain text files attempt to demonstrate of more complex formats such as lists and
indentation. For example, a list might be represented as a series of lines, each starting with the same
character.
Aspose.Words attempts to detect and load such features into a new document as their equivalent
Microsoft word feature instead of just as plain text.
The table below shows the key features of the text import engine:
Feature Details
Text encoding
The following encoding are supported:
Latin1.
BigEndianUnicode.
UTF-16.
UTF-7.
UTF-8.
Import of ordered
lists
Arabic number with dot or right parenthesis e.g 1. or 2). Multilevel list are supported
only supported when using dot.
Uppercase or lowercase Latin letter with dot or right parenthesis e.g a. or b).
Import of
unordered lists
Unordered lists are imported from consecutive lines which start with any of the following
characters: *,--, o, .
Paragraph
indentation
Left indent and first line indent are detected and imported for paragraphs using appropriate
number space characters at the beginning of the paragraph.
Paragraph
detection
Rules for detecting a new paragraph start:
If next line left indent isnt equal with the current paragraphs left indent.
An empty line starts a new paragraph.
Any list detected starts a new paragraph.

Sample Conversion
Sample input (a plain text file)

Output Document
The result of the text file loaded into Aspose.Words as saved as a DOCX document is below.
Notice that the preceding space is interpreted as indentation, and the lists are loaded as a proper list
feature.


Object Model Overview

This section describes the Aspose.Words Document Object Model's (DOM) main classes and relationships. By
using the classes of the Aspose.Words DOM, you can obtain detailed programmatic access to document
elements and formatting.
The Aspose.Words DOM is an in-memory representation of a Microsoft Word document. The
Aspose.Words DOM allows you to programmatically read, manipulate and modify a Word document's
content and formatting.
A sample document showing how it appears in Microsoft Word.

The tree of objects is created when the above document is read into the Aspose.Words DOM.

Document, Section, Paragraph, Table, Shape, Run and all other ellipses on this diagram are
Aspose.Words objects that represent Word document elements. The objects are organized into a tree. The
illustration also shows that the objects in the document tree have various properties.
The document tree in Aspose.Words follows the Composite Design Pattern:
All node classes ultimately derive from the Node class, the basic class in the Aspose.Words DOM.
Nodes that can contain other nodes, for example Section and Paragraph, derive from the
CompositeNode class, which in turn derives from Node.
Node Classes

When Aspose.Words reads a Word document into memory, objects of different types are created to represent
various document elements. Every run of text, paragraph, table, section is a node, and even the document
itself is a node. Aspose.Words defines a class for every type of document node.

The following illustration is a UML class diagram that shows inheritance between node classes of the
Aspose.Words Document Object Model (DOM). The names of abstract classes are in italics. Note that the
Aspose.Words DOM also contains non-node classes such as Style , PageSetup, Font , etc that do not participate
in the inheritance and they are not shown on this diagram.



The following table lists Aspose.Words node classes and their short descriptions.
Aspose.Words Class Category Description
Document Document
A document object that, as the root of the document tree, provides access
to the entire Word document.
Section Document A section object that corresponds to one section in a Word document.
Body Document A container for the main text of a section (main text story).
HeaderFooter Document A container for text of a particular header or footer inside a section.
GlossaryDocument Document
Represents the root entry for a glossary document within a Word
document.
BuildingBlock Document
Represents a glossary document entry such as a Building Block, AutoText or
an AutoCorrect entry.
Paragraph Text A paragraph of text, contains inline nodes.
Run Text A run of text with consistent formatting.
BookmarkStart Text A beginning of a bookmark marker.
BookmarkEnd Text An end of a bookmark marker.
FieldStart Text A special character that designates the start of a Word field.
FieldSeparator Text A special character that separates the field code from the field result.
FieldEnd Text A special character that designates the end of a Word field.
FormField Text A form field.
SpecialChar Text
A special character that is not one of the more specific special character
types.
Table Tables A table in a Word document.
Row Tables A row of a table.
Cell Tables A cell of a table row.
Shape Shapes An image, shape, textbox or an OLE object in a Word document.
GroupShape Shapes A group of shapes.
DrawingML Shapes Represents a DrawingML shape, picture, chart or diagram in the document.
Footnote Annotations A footnote or endnote in a Word document, contains text of the footnote.
Comment Annotations A comment in a Word document, contains text of the comment.
CommentRangeStart Annotations
Denotes the start of a region of text which has a comment associated with
it.
CommentRangeEnd Annotations Denotes the end of a region of text which has a comment associated with it.
SmartTag Markup
Represents a smart tag around one or more inline structures within a
paragraph.
CustomXmlMarkup Markup
Represents Custom XML markup around certain structures in the
document.
StructuredDocumentTag Markup Represents a structured document tag (content control) within a document.
OfficeMath Math Represents an Office math object such as a function, equation or matrix.


The following table lists Aspose.Words base node classes that help to form the class hierarchy.
Class Description
Node Abstract base class for all nodes of a Word document. Provides basic functionality of a child node.
CompositeNode
Base class for nodes that can contain other nodes. Provides operations to access, insert, remove
and select child nodes.
Story
Text of a Word document is stored in several stories (independent flows of text). This is a base
class for section-level stories: Body and HeaderFooter .
InlineStory Base class for inline-level nodes that can contain a story: Comment and, Footnote .
Inline Base class for inline-level nodes that consist of a single run of text with font formatting.
DocumentBase Abstract base class for a main document and glossary document of a Word document
Distinguish Nodes by NodeType
Although the class of the node is sufficient enough to distinguish different nodes from each other,
Aspose.Words provides the NodeType enumeration to simplify some API tasks such as selecting nodes
of a specific type.

The type of each node can be obtained using the Node.NodeType property. This property returns a
NodeType enumeration value. For example, a paragraph node (represented by the Paragraph class)
returns NodeType.Paragraph , a table node (represented by the Table class) returns NodeType.Table ,
and so on.
Example
The following example shows how to use the NodeType enumeration.
Java
Document doc = new Document();

// Returns NodeType.Document
int type = doc.getNodeType();

Logical Levels in a Document
This documentation sometimes refers to a group of node classes as belonging to a "level" in a document,
for example "block-level" or "inline-level" (also known as "inline") nodes. The distinction of levels in a
document is purely logical and is not explicitly expressed by inheritance or other means in the
Aspose.Words DOM.

The level of the node is used to describe where in the document tree the node would typically occur. The
following table lists the logical node levels, descriptions and the classes that belong to each level.
Node
Level
Classes Description
Document
level
Section
The top level Document node contains only
Section objects.A Section is a container for
stories (independent flows of text) for the
main text and optionally headers and
footers.
Block level
Paragraph , Table , StructuredDocumentTag ,
CustomXmlMarkup
Tables and paragraphs are block-level
elements and contain other
elements.Custom markup nodes can contain
nested block-level nodes.
Inline level
Run , FormField , SpecialChar , FieldChar , FieldStart ,
FieldSeparator , FieldEnd , Shape , GroupShape ,
Comment , Footnote , CommentRangeStart ,
CommentRangeEnd , DrawingML , SmartTag ,
StructuredDocumentTag , CustomXmlMarkup ,
Inline occur inside a Paragraph and
represent the actual content of the
document.Footnote, Comment and Shape
can contain block-level elements.Custom
markup nodes can contain nested inline-
BookmarkStart and BookmarkEnd . level elements

Document Tree Navigation
Tree Overview

Aspose.Words represents a document as a tree of nodes. An integral feature of the tree is the ability to
navigate between the nodes. This section shows how to explore and navigate the document tree in
Aspose.Words.
When the sample fax document presented earlier is opened in DocumentExplorer (an example project which
is available on Github under "ViewersAndVisualizers"), it shows the tree of nodes exactly as it is represented in
Aspose.Words:


The nodes in the tree are said to have relationships between them. A node that contains another node is a
parent and the contained node is a child . Children of the same parent are sibling nodes. The Document
node is always the root node.
The nodes that can contain other nodes derive from the CompositeNode class and all nodes ultimately
derive from the Node class. The two base classes provide common methods and properties to navigate
and modify the tree structure.
The following UML class diagram shows the classes and methods we are going to explore in the
remainder of this topic:


The UML object diagram below shows several nodes of the fax sample document and how they are
connected to each other via the parent, child and sibling properties:


Parent Node
Each node has a parent that is specified by the Node.ParentNode property. A node does not have a
parent node (Node.ParentNode is null) when a node has just been created and not yet added to the
tree, or if it has been removed from the tree. You can remove a node from its parent by calling
Node.Remove.

The parent node of the root Document node is always null.
Example
Shows how to access the parent node.
Java
// Create a new empty document. It has one section.
Document doc = new Document();

// The section is the first child node of the document.
Node section = doc.getFirstChild();

// The section's parent node is the document.
System.out.println("Section parent is the document: " + (doc ==
section.getParentNode()));

Owner Document
It is important to mention that a node always belongs to a particular document, even if it was just
created or has been removed from the tree. The document to which the node belongs is returned by
the Node.Document property.
A node always belongs to a document, because some vital document-wide structures such as styles
and lists are stored in the Document node. For example, it is not possible to have a Paragraph
without a Document because each paragraph has a style assigned to it and the style is defined
globally for the document.
This rule is enforced when creating any new nodes. For instance, a new Paragraph to be added
directly to the DOM requires a document object passed to the constructor. This is the document to
which the paragraph belongs to.
When creating a new paragraph using DocumentBuilder the builder always has a Document class
linked to it through the DocumentBuilder.Document property.
Example
Shows that when you create any node, it requires a document that will own the node.
Java
// Open a file from disk.
Document doc = new Document();

// Creating a new node of any type requires a document passed into the constructor.
Paragraph para = new Paragraph(doc);

// The new paragraph node does not yet have a parent.
System.out.println("Paragraph has no parent node: " + (para.getParentNode() == null));

// But the paragraph node knows its document.
System.out.println("Both nodes' documents are the same: " + (para.getDocument() ==
doc));

// The fact that a node always belongs to a document allows us to access and modify
// properties that reference the document-wide data such as styles or lists.
para.getParagraphFormat().setStyleName("Heading 1");

// Now add the paragraph to the main text of the first section.
doc.getFirstSection().getBody().appendChild(para);

// The paragraph node is now a child of the Body node.
System.out.println("Paragraph has a parent node: " + (para.getParentNode() != null));

Child Nodes
The most efficient way to access child nodes of a CompositeNode is via the
CompositeNode.FirstChild and CompositeNode.LastChild properties that return the first and last
child nodes respectively. If there are no child nodes, a null is returned.
CompositeNode also provides the CompositeNode.ChildNodes collection that allows indexed or
enumerated access to the children. The CompositeNode.ChildNodes property is a live collection of
nodes. It means that whenever the document is changed (nodes removed or inserted), the
CompositeNode.ChildNodes collection is automatically updated. Node collections are discussed in
detail in further topics.
If a node has no children, then CompositeNode.ChildNodes returns an empty collection. You can
check if a CompositeNode contains any child nodes using the CompositeNode.HasChildNodes
property.
Example
Shows how to enumerate immediate children of a CompositeNode using the enumerator provided by the ChildNodes
collection.
Java
NodeCollection children = paragraph.getChildNodes();
for (Node child : (Iterable<Node>) children)
{
// Paragraph may contain children of various types such as runs, shapes and so on.
if (child.getNodeType() == NodeType.RUN)
{
// Say we found the node that we want, do something useful.
Run run = (Run)child;
System.out.println(run.getText());
}
}

Example
Shows how to enumerate immediate children of a CompositeNode using indexed access.
Java
NodeCollection children = paragraph.getChildNodes();
for (int i = 0; i < children.getCount(); i++)
{
Node child = children.get(i);

// Paragraph may contain children of various types such as runs, shapes and so on.
if (child.getNodeType() == NodeType.RUN)
{
// Say we found the node that we want, do something useful.
Run run = (Run)child;
System.out.println(run.getText());
}
}

Sibling Nodes
You can obtain the node immediately preceding or following a certain node using
Node.PreviousSibling and Node.NextSibling, respectively. If a node is the last child of its parent,
then the Node.NextSibling property is null. Conversely, if the node is a first child of its parent, the
Node.PreviousSibling property is null.

Note that because the child nodes are internally stored in a single linked list in Aspose.Words,
Node.NextSibling is more efficient than Node.PreviousSibling .
Example
Shows how to efficiently visit all direct and indirect children of a composite node.
Java
public void recurseAllNodes() throws Exception
{
// Open a document.
Document doc = new Document(getMyDir() + "Node.RecurseAllNodes.doc");

// Invoke the recursive function that will walk the tree.
traverseAllNodes(doc);
}

/**
* A simple function that will walk through all children of a specified node
recursively
* and print the type of each node to the screen.
*/
public void traverseAllNodes(CompositeNode parentNode) throws Exception
{
// This is the most efficient way to loop through immediate children of a node.
for (Node childNode = parentNode.getFirstChild(); childNode != null; childNode =
childNode.getNextSibling())
{
// Do some useful work.
System.out.println(Node.nodeTypeToString(childNode.getNodeType()));

// Recurse into the node if it is a composite node.
if (childNode.isComposite())
traverseAllNodes((CompositeNode)childNode);
}
}

Typed Access to Children and Parent
So far, we have discussed the properties that return one of the base types Node or CompositeNode.
You will have noticed that you might have to cast the values to the concrete class of the node, such
as Run or Paragraph.
Many casting or explicit conversions between types using the as operator is often considered a bad
smell in an object oriented code. However, casting is not always bad; sometimes a bit of casting is
necessary. We found you cannot completely get away without casting when working with an object
model that is a Composite, like the Aspose.Words DOM.
To reduce the need for casting, most of the Aspose.Words classes provide properties and collections
that allow strictly typed access. There are three basic patterns for typed access:* A parent node
exposes typed FirstXXX and LastXXX properties. For example, Document has
Document.FirstSection and Document.LastSection properties. Similarly, Table has Table.FirstRow
and Table.LastRow properties and so on.
A parent node exposes a typed collection of child nodes, for example Document.Sections ,
Body.Paragraphs and so on.
A child node provides typed access to its parent, for example Run.ParentParagraph ,
Paragraph.ParentSection etc.
Typed properties are merely useful shortcuts that sometimes allow easier access than the generic
properties inherited from Node.ParentNode and CompositeNode.FirstChild .
Example
Demonstrates how to use typed properties to access nodes of the document tree.
Java
// Quick typed access to the first child Section node of the Document.
Section section = doc.getFirstSection();

// Quick typed access to the Body child node of the Section.
Body body = section.getBody();

// Quick typed access to all Table child nodes contained in the Body.
TableCollection tables = body.getTables();

for (Table table : tables)
{
// Quick typed access to the first row of the table.
if (table.getFirstRow() != null)
table.getFirstRow().remove();

// Quick typed access to the last row of the table.
if (table.getLastRow() != null)
table.getLastRow().remove();
}

Design Patterns in Aspose.Words

For a better understanding of the Aspose.Words object model, the design patterns used in the public interfaces
are described here. The links to online descriptions of the patterns are provided where possible, but of course,
for the best coverage see the GoF book if this is one of their patterns.
Document Object Model is a Composite
General Composite related ideas:
Node is the base class for all nodes.
CompositeNode is the base class for composite nodes.
In our implementation, the base Node class does not have the child management nodes in its interface.
The child management methods appear only in CompositeNode .
We found that removing the child management methods from the base class made interfaces much
cleaner and did not bring in a lot of extra type casting
Here is a description of the Composite pattern in Wikipedia
Aspose.Words specific:
Many methods and properties of Node and CompositeNode were designed to be similar to
XmlDocument , XmlNode and XmlElement intentionally to help shorten the learning curve.
The Aspose.Words.Document class is the root node for a complete Word document.
A node always belongs to a Document even if it is "detached" from the tree and does not have a parent
node. This is needed because the node might have some formatting properties that are valid only in the
context of a specific Document .
When moving or copying nodes between different documents you need to use Document.ImportNode
before you can insert a node from a different document.
CompositeNode.ChildNodes , CompositeNode.GetChildNodes return NodeCollection , which is a
wrapper that represents a selection of nodes as a live collection.
Document.Sections , Section.HeadersFooters , Story.Paragraphs and so on are typed-wrapper collections
that derive from NodeCollection and provide typed access to a selection of nodes of a specific type.
DocumentBuilder is a Builder for a Composite
Generally, it is easy to work the document tree directly, inserting and removing nodes where you want
them.
Example
Creates and adds a paragraph node.
Java
Document doc = new Document();

Paragraph para = new Paragraph(doc);

Section section = doc.getLastSection();
section.getBody().appendChild(para);

However, there are cases where creating a document element directly is not so straightforward and it is
better to have some utility that will do the creation for you. For example to create a Word field several
nodes need to be inserted, and you should make sure they are all in an appropriate state: FieldStart, Run
for the field code, FieldSeparator, one or more Run nodes for the field result and FieldEnd. Inserting a
form field is even more complex; it needs a complete Word field as well as FormField, BookmarkStart
and BookmarkEnd nodes.

DocumentBuilder is the tool that makes the process of building a document simpler. There are two
groups of methods: to move the cursor to a node where you want to do the building, and to insert
something at the cursor.
Although DocumentBuilder does not exactly fulfill the intent of the Builder pattern (the builder pattern
is used to enable the creation of a variety of complex objects from one source object), we still call it
Builder because that is what it does.
Range is a Facade for a Composite
A text document with a complex structure and formatting such as a Microsoft Word document is hard to
represent in an easy and user-friendly object model.

We choose to represent it as a tree of nodes because it gives the users of Aspose.Words what they want -
detailed access to the document content in a reasonably familiar environment ( XmlDocument -like API)
and makes it possible for us to actually do it (unlike an API similar to Microsoft Word Automation that
we wanted initially).
Therefore, you have the tool to examine and modify Word files, but it turns out some operations on "flat
text" are quite hard to do with a "tree model". Such seemingly easy things as find and replace, delete a
paragraph or a section break can require significant efforts to traverse the tree, split and join tree nodes
and so on.
The Range class (although still in its infancy) is designed to hide the "tree look" of the model behind a
"flat text" interface. For example, Range provides find and replace functionality that can search and
replace across different Run , Paragraph , Table etc nodes and it hides a lot behind the scenes as it has
to cut, move and join nodes of the tree as it goes. We think Range is clearly a Faade pattern .
More Facades for Various Document Elements
Bookmark is a Facade that allows you to work with two nodes BookmarkStart and BookmarkEnd as a
single entity.
DocumentVisitor is a Visitor
The Visitor pattern is famous for its ability to allow the addition of new operations to an existing object
model without modifying this model

Just derive from DocumentVisitor , override the VisitXXX methods such as
DocumentVisitor.VisitParagraphStart and DocumentVisitor.VisitRun that receive the calls for the
desired nodes. Call Node.Accept on the node from which you want to start enumeration and it will all
work. You can even return a value from your VisitXXX methods to indicate how the enumeration should
continue.
We also extensively use DocumentVisitor ourselves:
All export converters DOC, HTML and PDF inside Aspose.Words are implemented as document visitors.
Internal field and bookmark finders, and revision accepting engine are all implemented as document
visitors.
Document Overview
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Feb 01, 2012 (view change)
Go to start of metadata

The Document is a central class in Aspose.Words and represents a document and provides various document
properties and methods such as saving or protecting the document.

Whatever you want to perform with Aspose.Words: create a new document from scratch, open a template for
mail merge, get different parts from the document - use the Document class as your starting point. The
Document object contains all content and formatting, styles, built-in and custom properties, and the
MailMerge object that is used for mail merge.
Document allows you to retrieve text, bookmarks and form fields for the whole document or for
separate sections.
Document contains a collection of the Section objects so that you can obtain a particular section or do
some manipulations like copying/moving sections.
Document can be saved at any time to a file or stream. A document can also be sent to a client
browser.
</div> </div> <fieldset class="hidden parameters"> <input type="hidden" title="i18n.done.name" value="Done">
<input type="hidden" title="i18n.manage.watchers.dialog.title" value="Manage Watchers"> <input type="hidden"
title="i18n.manage.watchers.unable.to.remove.error" value="Failed to remove watcher. Refresh page to see latest
status."> <input type="hidden" title="i18n.manage.watchers.status.adding.watcher" value="Adding
watcher&amp;hellip;"> </fieldset> <script type="text/x-template" title="manage-watchers-dialog"> <div
class="dialog-content"> <div class="column page-watchers"> <h3>Watching this page</h3> <p
class="description">These people are notified when the page is changed. You can add or remove people from this
list.</p> <form action="/docs/json/addwatch.action" method="POST"> <input type="hidden" name="atl_token"
value="88f116dba683705c6294c964bbb7e494c6392291"> <input type="hidden" name="pageId"
value="15860084"/> <input type="hidden" id="add-watcher-username" name="username" value=""/> <label
for="add-watcher-user">User</label> <input id="add-watcher-user" name="userFullName" type="search"
class="autocomplete-user" value="" placeholder="Full name or username" autocomplete="off" data-max="10"
data-target="#add-watcher-username" data-dropdown-target="#add-watcher-dropdown" data-template="{title}"
data-none-message="No matching users found."> <input id="add-watcher-submit" type="submit" name="add"
value="Add"> <div id="add-watcher-dropdown" class="aui-dd-parent autocomplete"></div> <div class="status
hidden"></div> </form> <ul class="user-list"> <li class="loading">Loading&hellip;</li> <li class="no-users
hidden">No page watchers</li> </ul> </div> <div class="column space-watchers"> <h3>Watching this space</h3>
<p class="description">These people are notified when any content in the space is changed. You cannot modify
this list.</p> <ul class="user-list"> <li class="loading">Loading&hellip;</li> <li class="no-users hidden">No space
watchers</li> </ul> </div> </div> </script> <script type="text/x-template" title="manage-watchers-user"> <li
class="watch-user"> <img class="profile-picture confluence-userlink" src="{iconUrl}" data-
username="{username}"> <a class="confluence-userlink" href="{url}" data-username="{username}">{fullName}
<span class="username">({username})</span></a> <span class="remove-watch" title="Remove" data-
username="{username}">Remove</span> </li> </script> <script type="text/x-template" title="manage-watchers-
help-link"> <div class="dialog-help-link"> <a href="http://docs.atlassian.com/confluence/docs-
34/Managing+Watchers" target="_blank">Help</a> </div> </script> <br class="clear"> </div> </div> </div> <div
id="footer"> <ul id="poweredby"> <li class="noprint"> Aspose 2002-2014. All Rights Reserved.</li> </ul> </div>
<!-- include system javascript resources --> <!-- end system javascript resources --> </div> </body> </html>
Working with Document Properties
Skip to end of metadata

Attachments:1
Added by hammad, last edited by Adam Skelton on Feb 01, 2012 (view change)
Go to start of metadata

Document properties allow some useful information to be stored along with the document. There are system
(built-in) and user defined (custom) properties. Built-in properties contain such things as document title,
author's name, document statistics, and so on. Custom properties are just name-value pairs where user defines
both the name and value.

You can use document properties in your document automation project to store some useful info along with
the document such as when the document was received/processed/time stamped and so on.
Accessing Document Properties in Microsoft Word
You can access document properties in Microsoft Word by using the File | Properties menu.

Accessing Document Properties in Aspose.Words
To access document properties in Aspose.Words do the following:
To obtain built-in document properties, use Document.BuiltInDocumentProperties .
To obtain custom document properties, use Document.CustomDocumentProperties .
Document.BuiltInDocumentProperties returns a Properties.BuiltInDocumentProperties object and
Document.CustomDocumentProperties returns a Properties.CustomDocumentProperties object.
Both objects are collections of the Properties.DocumentProperty objects. These objects can be obtained
through the indexer property either by name or by index. Properties.BuiltInDocumentProperties
additionally provides access to the document properties via a set of typed properties that return values of
the appropriate type. Properties.CustomDocumentProperties allows adding or removing document
properties from the document.
Example
Enumerates through all built-in and custom properties in a document.
Java
String fileName = getMyDir() + "Properties.doc";
Document doc = new Document(fileName);

System.out.println(MessageFormat.format("1. Document name: {0}", fileName));

System.out.println("2. Built-in Properties");
for (DocumentProperty prop : doc.getBuiltInDocumentProperties())
System.out.println(MessageFormat.format("{0} : {1}", prop.getName(),
prop.getValue()));

System.out.println("3. Custom Properties");
for (DocumentProperty prop : doc.getCustomDocumentProperties())
System.out.println(MessageFormat.format("{0} : {1}", prop.getName(),
prop.getValue()));

The Properties.DocumentProperty class allows you to get the name, value, and type of the document
property:
To get the name of a property, use Properties.DocumentProperty.Name .
To get the value of a property, use Properties.DocumentProperty.Value .
Properties.DocumentProperty.Value returns an Object , but there is a set of methods allowing you to get
the value of the property converted to a particular type.
To get the type of a property, use DocumentProperty.Type . This returns one of the PropertyType
enumeration values. After you get to know what type the property is, you can use one of the
DocumentProperty.ToXXX methods such as DocumentProperty.ToString and DocumentProperty.ToInt
to obtain the value of the appropriate type instead of getting DocumentProperty.Value .
Updating Built-I n Document Properties
While Microsoft Word automatically updates some document properties when needed, Aspose.Words
never automatically changes any properties. For example, Microsoft Word updates the time the document
was last printed, last saved, updates statistical properties (word, paragraph, character etc counts).

Aspose.Words does not update any properties automatically, but provides a method for updating some
statistical built-in document properties. Call the Document.UpdateWordCount method to recalculate
and update the BuiltInDocumentProperties.Characters ,
BuiltInDocumentProperties.CharactersWithSpaces , BuiltInDocumentProperties.Words and
BuiltInDocumentProperties.Paragraphs properties in the BuiltInDocumentProperties collection. This
will ensure they are synchronized with changes made after the document was opened or created.

Note that Aspose.Words never updates the BuiltInDocumentProperties.Lines and
BuiltInDocumentProperties.Pages properties.
Adding or Removing Document Properties
You cannot add or remove built-in document properties in Aspose.Words, you can only change their
values.

To add custom document properties in Aspose.Words, use CustomDocumentProperties.Add passing
the name of the new property and the value of the appropriate type. The method returns the newly created
DocumentProperty object.
Example
Checks if a custom property with a given name exists in a document and adds few more custom document
properties.
Java
Document doc = new Document(getMyDir() + "Properties.doc");

CustomDocumentProperties props = doc.getCustomDocumentProperties();

if (props.get("Authorized") == null)
{
props.add("Authorized", true);
props.add("Authorized By", "John Smith");
props.add("Authorized Date", new Date());
props.add("Authorized Revision",
doc.getBuiltInDocumentProperties().getRevisionNumber());
props.add("Authorized Amount", 123.45);
}

To remove custom properties, use DocumentPropertyCollection.Remove passing it the name of the
property to remove.
Example
Removes a custom document property.
Java
Document doc = new Document(getMyDir() + "Properties.doc");

doc.getCustomDocumentProperties().remove("Authorized Date");

Cloning a Document
Skip to end of metadata

Added by hammad, last edited by Caroline von Schmalensee on Oct 29, 2013 (view change)
Go to start of metadata

If you need to generate hundreds or thousands of documents from a single document, just load the document
into memory once, clone it, and populate the cloned document with your data. This speeds up the generation
of documents because there is no need to load and parse the document from file every time. Cloning is done
with the Document.Clone method that performs a deep copy of the Document and returns it.
Example
Shows how to deep clone a document.
Java
Document doc = new Document(getMyDir() + "Document.doc");
Document clone = doc.deepClone();

Protecting Documents
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Feb 01, 2012 (view change)
Go to start of metadata

When a document is protected, the user can make only limited changes, such as adding annotations, making
revisions, or completing a form.

Even if a document is protected with a password, Aspose.Words does not require the password to open,
modify or unprotect this document.
When you use Aspose.Words to protect a document, you have an option of keeping the existing
password or specifying a new password.
If you need to make sure the document is really protected from changes, consider digitally signing the
document. Aspose.Words supports detecting digital signatures for DOC, OOXML and ODT
documents. Aspose.Words also preserves a digital signature applied to the VBA project (macros)
contained in a document. For further details see the Working with Digital Signatures article.

Documents protected in Microsoft Word can be easily unprotected even by users without a password. When a
document is protected, it can be opened in Microsoft Word, saved as RTF or WordprocessingML document and
then the protection password can be removed using Notepad or any plain text editor. Then, the user can open
the document again in Microsoft Word and save as an unprotected DOC.
Protecting a Document

Use the Document.Protect(ProtectionType) method to protect a document from changes. This method
accepts a ProtectionType parameter and optionally a password by passing one as the second parameter
Document.Protect(ProtectionType, String).
Example
Shows how to protect a document.
Java
Document doc = new Document();
doc.protect(ProtectionType.ALLOW_ONLY_FORM_FIELDS, "password");

Unprotecting a Document

Calling Document.Unprotect unprotects the document even if it has a protection password.
Example
Shows how to unprotect any document. Note that the password is not required.
Java
doc.unprotect();

Getting the Protection Type

You can retrieve the type of document protection by getting the value of the Document.ProtectionType
property.
Example
Shows how to get protection type currently set in the document.
Java
Document doc = new Document(getMyDir() + "Document.doc");
int protectionType = doc.getProtectionType();

Accessing Styles
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Feb 01, 2012 (view change)
Go to start of metadata

You can get a collection of styles defined in the document using the Document.Styles property. This collection
holds both the built-in and user-defined styles in a document. A particular style could be obtained by its
name/alias, style identifier, or index.

Styles and formatting are discussed in more detail later in this documentation.
Example
Shows how to get access to the collection of styles defined in the document.
Java
Document doc = new Document();
StyleCollection styles = doc.getStyles();

for (Style style : styles)
System.out.println(style.getName());

Getting Document Variables
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Feb 01, 2012 (view change)
Go to start of metadata

You can get a collection of document variables using the Document.Variables property. Variable names and
values are strings.
Example
Shows how to enumerate over document variables.
Java
Document doc = new Document(getMyDir() + "Document.doc");

for (java.util.Map.Entry entry : doc.getVariables())
{
String name = entry.getKey().toString();
String value = entry.getValue().toString();

// Do something useful.
System.out.println(MessageFormat.format("Name: {0}, Value: {1}", name, value));
}

Manage Tracking Changes
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Feb 01, 2012 (view change)
Go to start of metadata

This article outlines how Aspose.Words supports the Track Changes feature of Microsoft Word.

The Track Changes feature (also called Reviewing) in Microsoft Word allows you to track changes to content
and formatting made by users. When you turn this feature on, all inserted, deleted and modified elements of
the document will be visually highlighted with information about who, when and what was changed. The
objects that carry the information about what was changed are called "tracking changes" or "revisions".
The Comments feature in Microsoft Word is also related to tracking changes. It allows a user to add
comment to any fragment of text in the document. Note that comments are completely independent
from tracking changes.
Aspose.Words Preserves Comments and Revisions
When you use Aspose.Words to open a Microsoft Word document and then save it, all comments and
revisions in the document are preserved.
Accept Revisions
The Document.AcceptAllRevisions method allows you to "accept" all revisions in the document. Calling
this method is similar to selecting "Accept All Changes" in Microsoft Word. Aspose.Words will actually
delete fragments that were "delete revisions", retain fragments that were "insert revisions" and apply
formatting changes. Note that comments are not affected during this operation.

In Aspose.Words, you can accept tracking changes of the document by calling
Document.AcceptAllRevisions .
Example
Shows how to accept all tracking changes in the document.
Java
Document doc = new Document(getMyDir() + "Document.doc");
doc.acceptAllRevisions();


You can also check if a document has any tracking changes using the Document.HasRevisions property.
Programmatically Access Revisions
There could be insert, delete and formatting change revisions in a Word document. Aspose.Words allows
you to programmatically detect certain types of revisions.

The Inline.IsInsertRevision and Inline.IsDeleteRevision properties available for the Run and
Paragraph objects allow you to detect whether this object was inserted or deleted in Microsoft Word
while change tracking was enabled.
The Document.HasRevisions property returns true if the document has at least one revision.
The Document.TrackRevisions property can be set to true to indicate whether the revision tracking in
Microsoft Word will be enabled.

Note that this setting does not affect the changes made to the document using Aspose.Words. Changes made
to the document using Aspose.Words are never tracked as revisions.
Programmatically Access Comments
Comments are represented in the document tree as objects of the Comment class. You can add, delete or
modify comments programmatically like any other node in Aspose.Words Document Object Model.
Comment is a composite node and can contain paragraphs and tables that constitute the text of the
comment. The Comment class also provides access to the name and initials of the author of the comment.
Setting View Options
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Feb 01, 2012 (view change)
Go to start of metadata

You can control a document's view when it is opened in Microsoft Word. For example, you may want to switch
to the print layout or change the zoom value. Use the Settings.ViewOptions property of the Document object
to set the view options.
Example
The following code shows how to make sure the document is displayed at 50% zoom when opened in
Microsoft Word.
Java
Document doc = new Document(getMyDir() + "Document.doc");
doc.getViewOptions().setViewType(ViewType.PAGE_LAYOUT);
doc.getViewOptions().setZoomPercent(50);
doc.save(getMyDir() + "Document.SetZoom Out.doc");

Sections Overview
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Feb 01, 2012 (view change)
Go to start of metadata

This topic discusses how to work programmatically with document sections using Aspose.Words. Working with
sections is very useful when it comes to document generation. You can combine documents, build up an
output document from several sections copied from multiple template documents or remove unneeded
sections depending on some application logic, effectively filtering a common template document to a specific
scenario.

A Word document can contain one or more sections. At the end of the section, there is a section break that
separates one section from the next in a document. Each section has its own set of properties that specify page
size, orientation, margins, the number of text columns, headers and footers and so on.
Sections in Microsoft Word
Skip to end of metadata

Attachments:2
Added by hammad, last edited by Adam Skelton on Feb 01, 2012 (view change)
Go to start of metadata

In Microsoft Word, you can easily split the document into sections by adding a section break in the place where
you want to start a new section. To join a section in the document with the next one, you need to delete a
section break between them.
Inserting a Section Break in Microsoft Word
A Section break is a mark you insert to show the end of a section. A section break stores the section
formatting elements, such as the margins, page orientation, headers and footers, and sequence of page
numbers.
Just insert section breaks to divide the document into sections, and then format each section the way you
want. For example, format a section as a single column for the introduction of a report, and then format
the following section as two columns for the reports body text.
To insert a section break, do the following:
1. Click where you want to insert a section break.
2. On the Insert menu, click Break .


3. Under Section break types , click the option that describes where you want the new section to begin.


The following types of section breaks can be inserted:
Next page inserts a section break and starts the new section on the next page.
Continuous inserts a section break and starts the new section on the same page.
Odd page or Even page inserts a section break and starts the new section on the next odd-numbered or
even-numbered page.
Deleting a Section Break in Microsoft Word
When you delete a section break, you also delete the section formatting for the text above it. That text
becomes part of the following section, and it assumes the formatting of that section.
1. Select the section break you want to delete.

If you are in print layout view or outline view and do not see the section break, display hidden text by
clicking Show/Hide on the Standard toolbar.
2. Press DELETE.
Sections in Aspose.Words
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Feb 01, 2012 (view change)
Go to start of metadata

Sections of the document are represented by the Section and SectionCollection classes. Section objects are
immediate children of the Document node and can be accessed via the Document.Sections property.
Obtaining a Section
Each section is represented by a Section object that can be obtained from the Document.Sections
collection by the index.
Example
Shows how to access a section at the specified index.
Java
Document doc = new Document(getMyDir() + "Document.doc");
Section section = doc.getSections().get(0);

Adding a Section
The Document object provides the section collection that can be accessed by using Document.Sections.
This returns a SectionCollection object containing the documents sections. You can then use the
SectionCollection.Add method on this object to add a section to the end of the document.
Example
Shows how to add a section to the end of the document.
Java
Document doc = new Document(getMyDir() + "Document.doc");
Section sectionToAdd = new Section(doc);
doc.getSections().add(sectionToAdd);


The section that is to be added or inserted must not belong to an existing document. It must be either cloned
from another section or removed from a document before it can be added.
Deleting a Section
In the same way as discussed above, the documents sections are retrieved by using Document.Sections.
You can then use SectionCollection.Remove(Node) to remove a specified section or
SectionCollection.RemoveAt to remove a section at the specified index.
Example
Shows how to remove a section at the specified index.
Java
Document doc = new Document(getMyDir() + "Document.doc");
doc.getSections().removeAt(0);


In addition, you can use SectionCollection.Clear to remove all the sections from the document.
Example
Shows how to remove all sections from a document.
Java
Document doc = new Document(getMyDir() + "Document.doc");
doc.getSections().clear();

Adding Section Content
If you want to copy and insert just the main text of a section excluding the section separator and section
properties, use Section.PrependContent or Section.AppendContent passing a Section object for the
content being copied. No new section is created, headers and footers are not copied. The former method
inserts a copy of the content at the beginning of the section, while the latter inserts a copy of the content at
the end of the section.
Example
Shows how to append content of an existing section. The number of sections in the document remains the
same.
Java
Document doc = new Document(getMyDir() + "Section.AppendContent.doc");

// This is the section that we will append and prepend to.
Section section = doc.getSections().get(2);

// This copies content of the 1st section and inserts it at the beginning of the
specified section.
Section sectionToPrepend = doc.getSections().get(0);
section.prependContent(sectionToPrepend);

// This copies content of the 2nd section and inserts it at the end of the specified
section.
Section sectionToAppend = doc.getSections().get(1);
section.appendContent(sectionToAppend);

Deleting Section Content
To delete the main text of a section, use Section.ClearContent.
Example
Shows how to delete main content of a section.
Java
Document doc = new Document(getMyDir() + "Document.doc");
Section section = doc.getSections().get(0);
section.clearContent();


To delete the headers and footers in a section, call Section.ClearHeadersFooters.
Example
Clears content of all headers and footers in a section.
Java
Document doc = new Document(getMyDir() + "Document.doc");
Section section = doc.getSections().get(0);
section.clearHeadersFooters();

Cloning a Section
Use the Section.Clone method to create a duplicate of a particular section.
Example
Shows how to create a duplicate of a particular section.
Java
Document doc = new Document(getMyDir() + "Document.doc");
Section cloneSection = doc.getSections().get(0).deepClone();

Copying Sections between Documents
Fully or partially copying one document into another is a very popular task. Here is a "pattern" to
implement this.

Before any node from another document can be inserted, it must be imported using
Document.ImportNode . The Document.ImportNode method makes a copy of the original node and
updates all internal document-specific attributes such as lists and styles to make them valid in the
destination document.
Example
Shows how to copy sections between documents.
Java
Document srcDoc = new Document(getMyDir() + "Document.doc");
Document dstDoc = new Document();

Section sourceSection = srcDoc.getSections().get(0);
Section newSection = (Section)dstDoc.importNode(sourceSection, true);
dstDoc.getSections().add(newSection);


Sometimes it is necessary to avoid section breaks in the destination document. In this case, you can use
Section.AppendContent instead of SectionCollection.Add.
Tables Overview
Skip to end of metadata

Attachments:2
Added by hammad, last edited by Adam Skelton on Feb 01, 2012 (view change)
Go to start of metadata

Tables are a common element found in word documents. They allow for large amounts of information to be
organized and displayed cleanly in a grid like structure with rows and columns. They are also frequently used as
a page layout tool, and a better alternative for displaying tabbed data (with tab stops) as they allow much
better control over the design and layout of the content.

You can lay out content which is to be kept in a fixed position by using a borderless table. While you would
normally have plain text in a table, you can also put other content in cells, such as images or even other tables.
This is a common example of a table found in a Microsoft Word document:


A table is comprised of elements such as Cell , Row and Column . These are concepts which are
common to all tables in general whether they come from a Microsoft Word document or an HTML
document.


Tables in Aspose.Words are fully supported. You are able to freely edit, change, add and remove
tables. Rendering of tables with high fidelity is also supported.
Tables in Microsoft Word
Skip to end of metadata

Attachments:6
Added by hammad, last edited by Adam Skelton on Feb 01, 2012 (view change)
Go to start of metadata

All versions of Microsoft Word provide special commands for inserting and working with tables. The exact
location of these differs between older and newer versions of Microsoft Word but they are all present. These
are some of the more common tasks required when working with tables in Microsoft Word.
I nserting a Table in Microsoft Word
To insert a table in Microsoft Word 2003 and earlier:
1. Click the Table menu from the top toolbar.
2. Click Insert and then Table.
3. Fill in the appropriate values and press Ok to insert the table.
To insert a table in Microsoft Word 2007 and later:
1. Click the Insert tab.
2. Choose the Tables drop down menu.
3. Select Insert Table.


4. Fill in the appropriate values and press Ok to insert the table.


Removing a Table or Table Elements in Microsoft Word
To remove a table or individual table elements in Microsoft Word 2003 and earlier:
1. Click inside the table in the position that you want.
2. Click the Table menu from the top toolbar.
3. Click Delete.
4. Choose the menu item of element you want to delete. For instance choosing Table will remove the entire
table from the document.
To remove a table or individual table elements in Microsoft Word 2007 and later:
1. Click inside the table at the desired position.
2. The Layout tab should appear. Click this tab.
3. Click the Delete drop down menu.
4. Choose the menu item of the element you want to delete. For instance choosing Delete Table will remove
the entire table from the document.


Merging Cells in a Table in Microsoft Word
1. Select the cells to be merged by dragging the cursor over the cells.
2. Right click on the selection.
3. Select Merge Cells from the popup menu.


Using the AutoFit feature in Microsoft Word
To use the AutoFit feature to automatically size a table in Microsoft Word:
1. Right click anywhere inside the desired table.
2. Select AutoFit from popup menu.


3. Select the desired autofit option


1. AutoFit to Contents fits the table around content.
2. AutoFit to Window resizes the table so it fills the available page width between the left and right
margins.
3. Fixed Column Width sets each column width to an absolute value. This means even if the content
within the cells were to change the width of each column in the table will stay the same.
Tables in Aspose.Words
Skip to end of metadata

Attachments:1
Added by hammad, last edited by Adam Skelton on Feb 01, 2012 (view change)
Go to start of metadata

A table from any document loaded into Aspose.Words is imported as a Table node. A table can be found as a
child of the main body of text, an inline story such as a comment or footnote, or within a cell as a nested table.
Furthermore, tables can be nested inside other tables up to any depth.

A Table node does not contain any real content - instead it is a container for other such nodes which make up
the content:
A Table contains many Row nodes. A Table exposes all the normal members of a node which allows
you to freely move, modify and remove the table in the document.
A Row represents a single row of a table and contains many Cell nodes. Additionally a Row provides
members which define how a row is displayed, for example the height and alignment.
A Cell is what contains the true content seen in a table and is made up of Paragraph and other block level
nodes. Additionally cells can contain further nested tables.
This relationship is best represented by inspecting the structure of a Table node in a document through
the use of DocumentExplorer .


You can see in the diagram above that the document contains a table which consists of one row which in
turn consists of two cells. Each of the two cells contains a paragraph which is the container of the
formatted text ion a cell. In Aspose.Words all table related classes and properties are contained in the
Aspose.Words.Tables namespace.
You should also notice table is succeeded with an empty paragraph. It is a requirement for a Microsoft
Word document to have at least one paragraph after a table. This is used to separate consecutive tables
and without it such consecutive tables would be joined together into one. This behavior is identical in
both Microsoft Word and Aspose.Words.
Creating Tables Overview
Skip to end of metadata

Added by Adam Skelton, last edited by Adam Skelton on Feb 09, 2012
Go to start of metadata

Aspose.Words provides several different methods to create new tables in a document. This article
presents the full details of how to insert formatted tables using each technique as well as a
comparison of each technique at the end of the article.

A newly created table is given similar defaults as used in Microsoft Word:
Table
Property
Default in Aspose.Words
Border Style Single
Border Width 1/2 pt
Border Color Black
Left and Right
Padding
5.4 pts
AutoFit Mode AutoFit to Window
Allow AutoFit True

A table can be inline where it is tightly positioned or can be floating where it can be positioned anywhere on
the page. By default, Aspose.Words always creates inline tables.
Inserting a Table using DocumentBuilder
Skip to end of metadata

Attachments:12
Added by Adam Skelton, last edited by Adam Skelton on Feb 09, 2012
Go to start of metadata

In Aspose.Words a table is normally inserted using DocumentBuilder. The following
methods are used to build a table. Other methods will also be used to insert content
into the table cells.
DocumentBuilder.StartTable
DocumentBuilder.InsertCell
DocumentBuilder.EndRow
DocumentBuilder.EndTable
DocumentBuilder.Writeln
Operation Description Table State
DocumentBuilder.StartTable
Starts building a new table at the
current cursor position.
The table is created empty and has
no rows or cells yet.

DocumentBuilder.InsertCell
Inserts a new row and cell into the
table.

DocumentBuilder.Writeln
Writes some text into the current
cell.

DocumentBuilder.InsertCell
Appends a new cell at the end of the
current row.

DocumentBuilder.Writeln
Writes some text into the current cell
(now the second cell).

DocumentBuilder.EndRow
Instructs the builder to end the
current row and to begin a new row
with the next call to
DocumentBuilder.InsertCell .

DocumentBuilder.InsertCell
Creates a new row and inserts a new
cell.

DocumentBuilder.Writeln
Inserts some text into the first cell of
the second row.

DocumentBuilder.EndTable
Called to finish off building the table.
The builder cursor will now point
outside the table ready to insert
content after the table.

Example
Shows how to create a simple table using DocumentBuilder with default formatting.
Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

// We call this method to start building the table.
builder.startTable();
builder.insertCell();
builder.write("Row 1, Cell 1 Content.");

// Build the second cell
builder.insertCell();
builder.write("Row 1, Cell 2 Content.");
// Call the following method to end the row and start a new row.
builder.endRow();

// Build the first cell of the second row.
builder.insertCell();
builder.write("Row 2, Cell 1 Content");

// Build the second cell.
builder.insertCell();
builder.write("Row 2, Cell 2 Content.");
builder.endRow();

// Signal that we have finished building the table.
builder.endTable();

// Save the document to disk.
doc.save(getMyDir() + "DocumentBuilder.CreateSimpleTable Out.doc");

The result of the above code is a table inserted in to the document which contains four cells and some
text.

Example
Shows how to create a formatted table using DocumentBuilder
Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

Table table = builder.startTable();

// Make the header row.
builder.insertCell();

// Set the left indent for the table. Table wide formatting must be applied after
// at least one row is present in the table.
table.setLeftIndent(20.0);

// Set height and define the height rule for the header row.
builder.getRowFormat().setHeight(40.0);
builder.getRowFormat().setHeightRule(HeightRule.AT_LEAST);

// Some special features for the header row.
builder.getCellFormat().getShading().setBackgroundPatternColor(new Color(198, 217,
241));
builder.getParagraphFormat().setAlignment(ParagraphAlignment.CENTER);
builder.getFont().setSize(16);
builder.getFont().setName("Arial");
builder.getFont().setBold (true);

builder.getCellFormat().setWidth(100.0);
builder.write("Header Row,\n Cell 1");

// We don't need to specify the width of this cell because it's inherited from the
previous cell.
builder.insertCell();
builder.write("Header Row,\n Cell 2");

builder.insertCell();
builder.getCellFormat().setWidth(200.0);
builder.write("Header Row,\n Cell 3");
builder.endRow();

// Set features for the other rows and cells.
builder.getCellFormat().getShading().setBackgroundPatternColor(Color.WHITE);
builder.getCellFormat().setWidth(100.0);
builder.getCellFormat().setVerticalAlignment(CellVerticalAlignment.CENTER);

// Reset height and define a different height rule for table body
builder.getRowFormat().setHeight(30.0);
builder.getRowFormat().setHeightRule(HeightRule.AUTO);
builder.insertCell();
// Reset font formatting.
builder.getFont().setSize(12);
builder.getFont().setBold(false);

// Build the other cells.
builder.write("Row 1, Cell 1 Content");
builder.insertCell();
builder.write("Row 1, Cell 2 Content");

builder.insertCell();
builder.getCellFormat().setWidth(200.0);
builder.write("Row 1, Cell 3 Content");
builder.endRow();

builder.insertCell();
builder.getCellFormat().setWidth(100.0);
builder.write("Row 2, Cell 1 Content");

builder.insertCell();
builder.write("Row 2, Cell 2 Content");

builder.insertCell();
builder.getCellFormat().setWidth(200.0);
builder.write("Row 2, Cell 3 Content.");
builder.endRow();
builder.endTable();

doc.save(getMyDir() + "DocumentBuilder.CreateFormattedTable Out.doc");

The result is a table which is formatted with some cell shading and different cell alignment.

Example
Shows how to insert a nested table using DocumentBuilder.
Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

// Build the outer table.
Cell cell = builder.insertCell();
builder.writeln("Outer Table Cell 1");

builder.insertCell();
builder.writeln("Outer Table Cell 2");

// This call is important in order to create a nested table within the first table
// Without this call the cells inserted below will be appended to the outer table.
builder.endTable();

// Move to the first cell of the outer table.
builder.moveTo(cell.getFirstParagraph());

// Build the inner table.
builder.insertCell();
builder.writeln("Inner Table Cell 1");
builder.insertCell();
builder.writeln("Inner Table Cell 2");

builder.endTable();

doc.save(getMyDir() + "DocumentBuilder.InsertNestedTable Out.doc");

This will produce a table within another table. This is often referred to as a nested table.

Inserting a Table Directly into the Document
Object Model
Skip to end of metadata

Added by Adam Skelton, last edited by Adam Skelton on Feb 09, 2012
Go to start of metadata

Additionally you can insert tables directly into the DOM at a particular node position. The same table defaults
are used as when using a DocumentBuilder to create a table.
To build a new table from scratch without the use of DocumentBuilder, first create a new Table node
using the appropriate constructor, then add it to the document tree.

Note that you must take into account that the table will initially be completely empty (i.e contains no child
rows yet). In order to build the table you will first need to add the appropriate child nodes.
Example
Shows how to insert a table using the constructors of nodes.
Java
Document doc = new Document();

// We start by creating the table object. Note how we must pass the document object
// to the constructor of each node. This is because every node we create must belong
// to some document.
Table table = new Table(doc);
// Add the table to the document.
doc.getFirstSection().getBody().appendChild(table);

// Here we could call EnsureMinimum to create the rows and cells for us. This method
is used
// to ensure that the specified node is valid, in this case a valid table should have
at least one
// row and one cell, therefore this method creates them for us.

// Instead we will handle creating the row and table ourselves. This would be the best
way to do this
// if we were creating a table inside an algorthim for example.
Row row = new Row(doc);
row.getRowFormat().setAllowBreakAcrossPages(true);
table.appendChild(row);

// We can now apply any auto fit settings.
table.autoFit(AutoFitBehavior.FIXED_COLUMN_WIDTHS);

// Create a cell and add it to the row
Cell cell = new Cell(doc);
cell.getCellFormat().getShading().setBackgroundPatternColor(Color.BLUE);
cell.getCellFormat().setWidth(80);

// Add a paragraph to the cell as well as a new run with some text.
cell.appendChild(new Paragraph(doc));
cell.getFirstParagraph().appendChild(new Run(doc, "Row 1, Cell 1 Text"));

// Add the cell to the row.
row.appendChild(cell);

// We would then repeat the process for the other cells and rows in the table.
// We can also speed things up by cloning existing cells and rows.
row.appendChild(cell.deepClone(false));
row.getLastCell().appendChild(new Paragraph(doc));
row.getLastCell().getFirstParagraph().appendChild(new Run(doc, "Row 1, Cell 2 Text"));

doc.save(getMyDir() + "Table.InsertTableUsingNodes Out.doc");

Inserting a Clone of an Existing Table
Skip to end of metadata

Added by Adam Skelton, last edited by Adam Skelton on Feb 09, 2012
Go to start of metadata

Often there are times when you have an existing table in a document and would like to add a copy of this table
then apply some modifications. The easiest way to duplicate a table while retaining all formatting is to clone
the table node using the Table.Clone method.
Example
Shows how to make a clone of a table in the document and insert it after the original table.
Java
Document doc = new Document(getMyDir() + "Table.SimpleTable.doc");

// Retrieve the first table in the document.
Table table = (Table)doc.getChild(NodeType.TABLE, 0, true);

// Create a clone of the table.
Table tableClone = (Table)table.deepClone(true);

// Insert the cloned table into the document after the original
table.getParentNode().insertAfter(tableClone, table);

// Insert an empty paragraph between the two tables or else they will be combined into
one
// upon save. This has to do with document validation.
table.getParentNode().insertAfter(new Paragraph(doc), table);

doc.save(getMyDir() + "Table.CloneTableAndInsert Out.doc");


If the new table is to include different content you will need to first clear the existing content from the table
first.
Example
Shows how to remove all content from the cells of a cloned table.
Java
for (Cell cell : (Iterable<Cell>) tableClone.getChildNodes(NodeType.CELL, true))
cell.removeAllChildren();


The same technique can be used to add copies of an existing row to a table.
Example
Shows how to make a clone of the last row of a table and append it to the table.
Java
Document doc = new Document(getMyDir() + "Table.SimpleTable.doc");

// Retrieve the first table in the document.
Table table = (Table)doc.getChild(NodeType.TABLE, 0, true);

// Clone the last row in the table.
Row clonedRow = (Row)table.getLastRow().deepClone(true);

// Remove all content from the cloned row's cells. This makes the row ready for
// new content to be inserted into.
for(Cell cell : clonedRow.getCells())
cell.removeAllChildren();

// Add the row to the end of the table.
table.appendChild(clonedRow);

doc.save(getMyDir() + "Table.AddCloneRowToTable Out.doc");


If you are looking at creating tables in document which dynamically grow with each record from your data
source, then the above method is not advised.
Instead the desired output is achieved more easily by using Mail Merge with Regions. You can learn more
about this technique under Mail Merge with Regions Explained .
Inserting a Table from HTML
Skip to end of metadata

Added by Adam Skelton, last edited by Adam Skelton on Feb 09, 2012
Go to start of metadata

Aspose.Words supports inserting content into a document from an HTML source by using the
DocumentBuilder.InsertHtml method. The input can be a full HTML page or just a partial snippet. Using this
method we can insert tables into our document by using table elements e.g <table>, <tr>, <td>.
Example
Shows how to insert a table in a document from a string containing HTML tags.
Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

// Insert the table from HTML. Note that AutoFitSettings does not apply to tables
// inserted from HTML.
builder.insertHtml("<table>" +
"<tr>" +
"<td>Row 1, Cell 1</td>" +
"<td>Row 1, Cell 2</td>" +
"</tr>" +
"<tr>" +
"<td>Row 2, Cell 2</td>" +
"<td>Row 2, Cell 2</td>" +
"</tr>" +
"</table>");

doc.save(getMyDir() + "DocumentBuilder.InsertTableFromHtml Out.doc");

Comparison of Insertion Techniques
Skip to end of metadata

Added by Adam Skelton, last edited by Adam Skelton on Feb 09, 2012 (view change)
Go to start of metadata

As described in previous articles, Aspose.Words provides several methods for
inserting new tables into a document. Each have their advantages and
disadvantages, so often the choice of which to use depends on your situation.
The table below can give you an idea of each technique.
Method Advantages Disadvantages
DocumentBuilder
(DocumentBuilder.StartTable)
Standard method of inserting tables
and other document content.
Sometimes hard to create many varieties
of tables at the same time with the same
instance of the builder.
Table (Table)
Fits in better with surronding code
that creates and inserts nodes
directly into the DOM without the
use of DocumentBuilder.
Table is created blank. Before most
operations are performed
Table.EnsureMinimum must be called to
create any missing child nodes.
Cloning (Table.Clone)
Can create a copy of an existing
table while retaining all formatting
on rows and cells.
The appropriate child nodes must be
removed before the table is ready for use.
From an HTML source.
(DocumentBuilder.InsertHtml)
Can create a new table from HTML
source e.g the <table>, <tr>, <td>
tags
Not all possible formatting on a Microsoft
Word table can be applied in HTML.
Formatting Overview
Skip to end of metadata

Added by Adam Skelton, last edited by Adam Skelton on Feb 09, 2012
Go to start of metadata

Each element of a table can be applied with different formatting. For instance, table formatting will be applied
over the entire table while row formatting will only affect particular rows etc.

Aspose.Words provides a rich API to retrieve and apply formatting to a table. You can use the Table ,
RowFormat and CellFormat nodes to set formatting.
Applying Formatting on the Table Level
Skip to end of metadata

Attachments:2
Added by Adam Skelton, last edited by Adam Skelton on Feb 09, 2012
Go to start of metadata

To apply formatting to a table you can use the properties available on the corresponding Table node. A visual
view of table formatting features in Microsoft Word and their corresponding properties in Aspose.Words are
given below.



Example
Shows how to apply a outline border to a table.
Java
Document doc = new Document(getMyDir() + "Table.EmptyTable.doc");
Table table = (Table)doc.getChild(NodeType.TABLE, 0, true);

// Align the table to the center of the page.
table.setAlignment(TableAlignment.CENTER);

// Clear any existing borders from the table.
table.clearBorders();

// Set a green border around the table but not inside.
table.setBorder(BorderType.LEFT, LineStyle.SINGLE, 1.5, Color.GREEN, true);
table.setBorder(BorderType.RIGHT, LineStyle.SINGLE, 1.5, Color.GREEN, true);
table.setBorder(BorderType.TOP, LineStyle.SINGLE, 1.5, Color.GREEN, true);
table.setBorder(BorderType.BOTTOM, LineStyle.SINGLE, 1.5, Color.GREEN, true);

// Fill the cells with a light green solid color.
table.setShading(TextureIndex.TEXTURE_SOLID, Color.GREEN, Color.GREEN);

doc.save(getMyDir() + "Table.SetOutlineBorders Out.doc");

Example
Shows how to build a table with all borders enabled (grid).
Java
Document doc = new Document(getMyDir() + "Table.EmptyTable.doc");
Table table = (Table)doc.getChild(NodeType.TABLE, 0, true);

// Clear any existing borders from the table.
table.clearBorders();

// Set a green border around and inside the table.
table.setBorders(LineStyle.SINGLE, 1.5, Color.GREEN);

doc.save(getMyDir() + "Table.SetAllBorders Out.doc");


Note that before you apply table properties there must be at least one row present in the table. This means
when building a table using DocumentBuilder, such formatting must be done after the first call to
DocumentBuilder.InsertCell or after adding the first row to a table or when inserting nodes directly into the
DOM.
Applying Formatting on the Row Level
Skip to end of metadata

Attachments:1
Added by Adam Skelton, last edited by Adam Skelton on Feb 09, 2012
Go to start of metadata

Formatting on the row level can be controlled using the RowFormat property of the Row.

Example
Shows how to modify formatting of a table row.
Java
Document doc = new Document(getMyDir() + "Table.Document.doc");
Table table = (Table)doc.getChild(NodeType.TABLE, 0, true);

// Retrieve the first row in the table.
Row firstRow = table.getFirstRow();

// Modify some row level properties.
firstRow.getRowFormat().getBorders().setLineStyle(LineStyle.NONE);
firstRow.getRowFormat().setHeightRule(HeightRule.AUTO);
firstRow.getRowFormat().setAllowBreakAcrossPages(true);

Applying Formatting on the Cell Level
Skip to end of metadata

Attachments:2
Added by Adam Skelton, last edited by Adam Skelton on Feb 09, 2012
Go to start of metadata

Formatting on the cell level is controlled using the CellFormat property of the Cell.



Example
Shows how to modify formatting of a table cell.
Java
Document doc = new Document(getMyDir() + "Table.Document.doc");
Table table = (Table)doc.getChild(NodeType.TABLE, 0, true);

// Retrieve the first cell in the table.
Cell firstCell = table.getFirstRow().getFirstCell();

// Modify some row level properties.
firstCell.getCellFormat().setWidth(30); // in points
firstCell.getCellFormat().setOrientation(TextOrientation.DOWNWARD);
firstCell.getCellFormat().getShading().setForegroundPatternColor(Color.GREEN);

Applying Borders and Shading
Skip to end of metadata

Attachments:2
Added by Adam Skelton, last edited by Adam Skelton on Feb 09, 2012
Go to start of metadata

Borders and shading can be applied either table wide using Table.SetBorder, Table.SetBorders and
Table.SetShading or to particular cells only by using CellFormat.Borders and CellFormat.Shading. Additionally
borders can be set on a row by using RowFormat.Borders, however shading cannot be applied in this way.



Example
Shows how to format table and cell with different borders and shadings
Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

Table table = builder.startTable();
builder.insertCell();

// Set the borders for the entire table.
table.setBorders(LineStyle.SINGLE, 2.0, Color.BLACK);
// Set the cell shading for this cell.
builder.getCellFormat().getShading().setBackgroundPatternColor(Color.RED);
builder.writeln("Cell #1");

builder.insertCell();
// Specify a different cell shading for the second cell.
builder.getCellFormat().getShading().setBackgroundPatternColor(Color.GREEN);
builder.writeln("Cell #2");

// End this row.
builder.endRow();

// Clear the cell formatting from previous operations.
builder.getCellFormat().clearFormatting();

// Create the second row.
builder.insertCell();

// Create larger borders for the first cell of this row. This will be different
// compared to the borders set for the table.
builder.getCellFormat().getBorders().getLeft().setLineWidth(4.0);
builder.getCellFormat().getBorders().getRight().setLineWidth(4.0);
builder.getCellFormat().getBorders().getTop().setLineWidth(4.0);
builder.getCellFormat().getBorders().getBottom().setLineWidth(4.0);
builder.writeln("Cell #3");

builder.insertCell();
// Clear the cell formatting from the previous cell.
builder.getCellFormat().clearFormatting();
builder.writeln("Cell #4");

doc.save(getMyDir() + "Table.SetBordersAndShading Out.doc");

Applying Borders and Shading
Skip to end of metadata

Attachments:2
Added by Adam Skelton, last edited by Adam Skelton on Feb 09, 2012
Go to start of metadata

Borders and shading can be applied either table wide using Table.SetBorder, Table.SetBorders and
Table.SetShading or to particular cells only by using CellFormat.Borders and CellFormat.Shading. Additionally
borders can be set on a row by using RowFormat.Borders, however shading cannot be applied in this way.



Example
Shows how to format table and cell with different borders and shadings
Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

Table table = builder.startTable();
builder.insertCell();

// Set the borders for the entire table.
table.setBorders(LineStyle.SINGLE, 2.0, Color.BLACK);
// Set the cell shading for this cell.
builder.getCellFormat().getShading().setBackgroundPatternColor(Color.RED);
builder.writeln("Cell #1");

builder.insertCell();
// Specify a different cell shading for the second cell.
builder.getCellFormat().getShading().setBackgroundPatternColor(Color.GREEN);
builder.writeln("Cell #2");

// End this row.
builder.endRow();

// Clear the cell formatting from previous operations.
builder.getCellFormat().clearFormatting();

// Create the second row.
builder.insertCell();

// Create larger borders for the first cell of this row. This will be different
// compared to the borders set for the table.
builder.getCellFormat().getBorders().getLeft().setLineWidth(4.0);
builder.getCellFormat().getBorders().getRight().setLineWidth(4.0);
builder.getCellFormat().getBorders().getTop().setLineWidth(4.0);
builder.getCellFormat().getBorders().getBottom().setLineWidth(4.0);
builder.writeln("Cell #3");

builder.insertCell();
// Clear the cell formatting from the previous cell.
builder.getCellFormat().clearFormatting();
builder.writeln("Cell #4");

doc.save(getMyDir() + "Table.SetBordersAndShading Out.doc");

Specifying Row Heights
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Feb 01, 2012 (view change)
Go to start of metadata

The height of a table row is controlled using height and height rule properties. These can be set
differently for each row in the table which allows for wide control over the height of each row.

In Aspose.Words these are represented by the RowFormat.Height and RowFormat.HeightRule
properties of the given Row .
HeightRule
Value
Description
Auto
This is the default height rule given to a new row. Technically this means that no height rule is
defined. The row is sized to fit the largest content within the cells of the row.
At Least
With this setting the height of the row will grow to accommodate the content of the row, but will
never be smaller than the specified size in RowFormat.Height .
Exactly
The size of the row is set exactly to the value found in RowFormat.Height and does not grow to fit
content.


The simplest way to set row height is using DocumentBuilder. Using the appropriate RowFormat properties
you can set a default height setting or apply a different height for each row in the table.
Example
Shows how to create a table that contains a single cell and apply row formatting.
Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

Table table = builder.startTable();
builder.insertCell();

// Set the row formatting
RowFormat rowFormat = builder.getRowFormat();
rowFormat.setHeight(100);
rowFormat.setHeightRule(HeightRule.EXACTLY);
// These formatting properties are set on the table and are applied to all rows in the
table.
table.setLeftPadding(30);
table.setRightPadding(30);
table.setTopPadding(30);
table.setBottomPadding(30);

builder.writeln("I'm a wonderful formatted row.");

builder.endRow();
builder.endTable();

Working with Table Styles
Skip to end of metadata

Attachments:1
Added by hammad, last edited by Adam Skelton on Feb 01, 2012 (view change)
Go to start of metadata

A table style defines a set of formatting that can be easily applied to a table. Formatting such as borders,
shading, alignment and font can be set in a table style and applied to many tables for a consistent appearance.
Aspose.Words supports applying a table style to a table and also reading properties of any table style.
Table styles are preserved during loading and saving in the following ways:
* Table styles in DOCX and WordML formats are preserved when loading and saving to these formats.

* Table styles are preserved when loading and saving in the DOC format (but not to any other format).

* When exporting to other formats, rendering or printing, table styles are expanded to direct formatting on
the table so all formatting is preserved.

Currently you cannot create new table styles. You can only apply an in-built table styles or custom table
styles which already exist in the document to a table.


Applying a Table Style
In Aspose.Words you can apply a table style by using any of the Table.Style, Table.StyleIdentifier and
Table.StyleName properties.

You can also choose which features of the table style to apply, for example first column, last column,
banded rows. These are listed under the TableStyleOptions enumeration and are applied by using
Table.StyleOptions property. The TableStyleOptions enumeration allows bitwise combination of these
features.
Example
Shows how to build a new table with a table style applied.
Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

Table table = builder.startTable();
// We must insert at least one row first before setting any table formatting.
builder.insertCell();
// Set the table style used based of the unique style identifier.
// Note that not all table styles are available when saving as .doc format.
table.setStyleIdentifier(StyleIdentifier.MEDIUM_SHADING_1_ACCENT_1);
// Apply which features should be formatted by the style.
table.setStyleOptions(TableStyleOptions.FIRST_COLUMN | TableStyleOptions.ROW_BANDS |
TableStyleOptions.FIRST_ROW);
table.autoFit(AutoFitBehavior.AUTO_FIT_TO_CONTENTS);

// Continue with building the table as normal.
builder.writeln("Item");
builder.getCellFormat().setRightPadding(40);
builder.insertCell();
builder.writeln("Quantity (kg)");
builder.endRow();

builder.insertCell();
builder.writeln("Apples");
builder.insertCell();
builder.writeln("20");
builder.endRow();

builder.insertCell();
builder.writeln("Bananas");
builder.insertCell();
builder.writeln("40");
builder.endRow();

builder.insertCell();
builder.writeln("Carrots");
builder.insertCell();
builder.writeln("50");
builder.endRow();

doc.save(getMyDir() + "DocumentBuilder.SetTableStyle Out.docx");


Aspose.Words also provides a method to take formatting found on a table style and expands it onto the rows
and cells of the table as direct formatting. Test combine formatting with table style and cell style. This method
will not override any other formatting that is already applied to the table through row or cell format.
Example
Shows how to expand the formatting from styles onto the rows and cells of the table as direct formatting.
Java
Document doc = new Document(getMyDir() + "Table.TableStyle.docx");

// Get the first cell of the first table in the document.
Table table = (Table)doc.getChild(NodeType.TABLE, 0, true);
Cell firstCell = table.getFirstRow().getFirstCell();

// First print the color of the cell shading. This should be empty as the current
shading
// is stored in the table style.
Color cellShadingBefore =
firstCell.getCellFormat().getShading().getBackgroundPatternColor();
System.out.println("Cell shading before style expansion: " + cellShadingBefore);

// Expand table style formatting to direct formatting.
doc.expandTableStylesToDirectFormatting();

// Now print the cell shading after expanding table styles. A blue background pattern
color
// should have been applied from the table style.
Color cellShadingAfter =
firstCell.getCellFormat().getShading().getBackgroundPatternColor();
System.out.println("Cell shading after style expansion: " + cellShadingAfter);

Extracting Plain Text from a Table
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Feb 01, 2012 (view change)
Go to start of metadata

A Table like any other node in Aspose.Words has access to a Range object. Using this object, you can call
methods over the entire table range to extract the table as plain text. The Range.Text property is used for this
purpose.
Example
Shows how to print the text range of a table.
Java
Document doc = new Document(getMyDir() + "Table.SimpleTable.doc");

// Get the first table in the document.
Table table = (Table)doc.getChild(NodeType.TABLE, 0, true);

// The range text will include control characters such as "\a" for a cell.
// You can call ToTxt() on the desired node to find the plain text.

// Print the plain text range of the table to the screen.
System.out.println("Contents of the table: ");
System.out.println(table.getRange().getText());


The same technique is used to extract the content from individual cells of a table only.
Example
Shows how to print the text range of row and table elements.
Java
// Print the contents of the first row to the screen.
System.out.println("\nContents of the row: ");
System.out.println(table.getFirstRow().getRange().getText());

// Print the contents of the last cell in the table to the screen.
System.out.println("\nContents of the cell: ");
System.out.println(table.getLastRow().getLastCell().getRange().getText());

Replacing Text in a Table
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Feb 01, 2012 (view change)
Go to start of metadata

Using a tables range object you can replace text within the table. However, there are currently restrictions
which prevent any replacement with special characters being made so care must be taken to ensure that the
replacement string does not carry over more than one paragraph or cell. If such a replacement is made which
spans across multiple nodes, such as paragraphs or cells, then an exception is thrown.

Normally the replacement of text should be done at the cell level (per cell) or at the paragraph level.
Example
Shows how to replace all instances of string of text in a table and cell.
Java
Document doc = new Document(getMyDir() + "Table.SimpleTable.doc");

// Get the first table in the document.
Table table = (Table)doc.getChild(NodeType.TABLE, 0, true);

// Replace any instances of our string in the entire table.
table.getRange().replace("Carrots", "Eggs", true, true);
// Replace any instances of our string in the last cell of the table only.
table.getLastRow().getLastCell().getRange().replace("50", "20", true, true);

doc.save(getMyDir() + "Table.ReplaceCellText Out.docx");

Finding the Index of Table Elements
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Feb 01, 2012 (view change)
Go to start of metadata

Finding the index of any node involves gathering all child nodes of the elements type from the parent node
then using the NodeCollection.IndexOf method to find the index of the desired node in the collection.
Finding the I ndex of Table in a Document
Example
Retrieves the index of a table in the document.
Java
NodeCollection allTables = doc.getChildNodes(NodeType.TABLE, true);
int tableIndex = allTables.indexOf(table);

Finding the I ndex of a Row in a Table
Example
Retrieves the index of a row in a table.
Java
int rowIndex = table.indexOf(row);

Finding the I ndex of a Cell in a Row
Example
Retrieves the index of a cell in a row.
Java
int cellIndex = row.indexOf(cell);

Working with Columns
Skip to end of metadata

Attachments:1
Added by hammad, last edited by Adam Skelton on Oct 14, 2012 (view change)
Go to start of metadata

In both Word documents and in the Aspose.Words Document Object Model, there is no concept of a column.
By design, table rows in Microsoft Word are completely independent and base properties and operations are
only contained on rows and cells of the table. This gives tables the possibility of some interesting attributes:
Each row in a table can have a completely different number of cells.
Vertically, the cells of each row can have different widths.
It is possible to join tables with differing row formats and cell counts.


Any operations that are performed on columns in Microsoft Word are in actual fact short-cut
methods which perform the operation by modifying the cells of the rows collectively in such a way
that it appears they are being applied to columns. This structure of rows and cells is the same way that
tables are represented in Aspose.Words.
In the Aspose.Words Document Object Model a Table node is made up of Row and then Cell nodes.
There is also no native support for columns.
You can still achieve such operations on columns by iterating through the same cell index of the rows
of a table. The code below makes such operations easier by proving a faade class which collects the
cells which make up a column of a table.
Example
Demonstrates a facade object for working with a column of a table.
Java
/**
* Represents a facade object for a column of a table in a Microsoft Word document.
*/
class Column
{
private Column(Table table, int columnIndex)
{
if (table == null)
throw new IllegalArgumentException("table");

mTable = table;
mColumnIndex = columnIndex;
}

/**
* Returns a new column facade from the table and supplied zero-based index.
*/
public static Column fromIndex(Table table, int columnIndex)
{
return new Column(table, columnIndex);
}

/**
* Returns the cells which make up the column.
*/
public Cell[] getCells()
{
ArrayList columnCells = getColumnCells();
return (Cell[])columnCells.toArray(new Cell[columnCells.size()]);
}

/**
* Returns the index of the given cell in the column.
*/
public int indexOf(Cell cell)
{
return getColumnCells().indexOf(cell);
}

/**
* Inserts a brand new column before this column into the table.
*/
public Column insertColumnBefore()
{
Cell[] columnCells = getCells();

if (columnCells.length == 0)
throw new IllegalArgumentException("Column must not be empty");

// Create a clone of this column.
for(Cell cell : columnCells)
cell.getParentRow().insertBefore(cell.deepClone(false), cell);

// This is the new column.
Column column = new Column(columnCells[0].getParentRow().getParentTable(),
mColumnIndex);

// We want to make sure that the cells are all valid to work with (have at
least one paragraph).
for (Cell cell : column.getCells())
cell.ensureMinimum();

// Increase the index which this column represents since there is now one
extra column infront.
mColumnIndex++;

return column;
}

/**
* Removes the column from the table.
*/
public void remove()
{
for (Cell cell : getCells())
cell.remove();
}

/**
* Returns the text of the column.
*/
public String toTxt() throws Exception
{
StringBuilder builder = new StringBuilder();

for (Cell cell : getCells())
builder.append(cell.toString(SaveFormat.TEXT));

return builder.toString();
}

/**
* Provides an up-to-date collection of cells which make up the column represented
by this facade.
*/
private ArrayList getColumnCells()
{
ArrayList columnCells = new ArrayList();

for (Row row : mTable.getRows())
{
Cell cell = row.getCells().get(mColumnIndex);
if (cell != null)
columnCells.add(cell);
}

return columnCells;
}

private int mColumnIndex;
private Table mTable;
}

Example
Shows how to insert a blank column into a table.
Java
// Get the second column in the table.
Column column = Column.fromIndex(table, 1);

// Create a new column to the left of this column.
// This is the same as using the "Insert Column Before" command in Microsoft Word.
Column newColumn = column.insertColumnBefore();

// Add some text to each of the column cells.
for (Cell cell : newColumn.getCells())
cell.getFirstParagraph().appendChild(new Run(doc, "Column Text " +
newColumn.indexOf(cell)));

Example
Shows how to get the plain text of a table column.
Java
// Get the first column in the table.
Column column = Column.fromIndex(table, 0);

// Print the plain text of the column to the screen.
System.out.println(column.toTxt());

Example
Shows how to remove a column from a table in a document.
Java
Document doc = new Document(getMyDir() + "Table.Document.doc");
Table table = (Table)doc.getChild(NodeType.TABLE, 1, true);

// Get the third column from the table and remove it.
Column column = Column.fromIndex(table, 2);
column.remove();

doc.save(getMyDir() + "Table.RemoveColumn Out.doc");

Working with Merged Cells
Skip to end of metadata

Attachments:1
Added by hammad, last edited by Caroline von Schmalensee on Feb 15, 2013 (view change)
Go to start of metadata

Several cells in a table can be merged together into a single cell. This is useful when crows require a title or large
blocks of text which span across the width of the table. This can only be achieved by merging cells in the table
into a single cell. Aspose.Words supports merged cells when working with all input formats including when
importing HTML content.
In Aspose.Words, merged cells are represented by CellFormat.HorizontalMerge and
CellFormat.VerticalMerge. The CellFormat.HorizontalMerge property describes if the cell is part
of a horizontal merge of cells. Likewise the CellFormat.VerticalMerge property describes if the cell
is a part of a vertical merge of cells.
The values of these properties are what define the merge behavior of cells.
The first cell in a sequence of merged cells will have CellMerge.First.
Any subsequent merged cells has CellMerge.Previous.
A cell which is not merged has CellMerge.None.



Sometimes when you load an existing document cells in a table will appear merged. However
these can be in fact one long cell. Microsoft Word at times is known to export merged cells in this
way. This can cause confusion when attempting to work with individual cells. There appears to be
no particular pattern as to when this happens.

Checking if a Cell is Merged
To check if a cell is part of a sequence of merged cells, we simply check the
CellFormat.HorizontalMerge and CellFormat.VerticalMerge properties.
Example: Getting the Merge Type
Prints the horizontal and vertical merge type of a cell.
Java
public void checkCellsMerged() throws Exception
{
Document doc = new Document(getMyDir() + "Table.MergedCells.doc");

// Retrieve the first table in the document.
Table table = (Table)doc.getChild(NodeType.TABLE, 0, true);

for (Row row : table.getRows())
{
for (Cell cell : row.getCells())
{
System.out.println(printCellMergeType(cell));
}
}

}

public String printCellMergeType(Cell cell)
{
boolean isHorizontallyMerged = cell.getCellFormat().getHorizontalMerge() !=
CellMerge.NONE;
boolean isVerticallyMerged = cell.getCellFormat().getVerticalMerge() !=
CellMerge.NONE;
String cellLocation = MessageFormat.format("R{0}, C{1}",
cell.getParentRow().getParentTable().indexOf(cell.getParentRow()) + 1,
cell.getParentRow().indexOf(cell) + 1);

if (isHorizontallyMerged && isVerticallyMerged)
return MessageFormat.format("The cell at {0} is both horizontally and
vertically merged", cellLocation);
else if (isHorizontallyMerged)
return MessageFormat.format("The cell at {0} is horizontally merged.",
cellLocation);
else if (isVerticallyMerged)
return MessageFormat.format("The cell at {0} is vertically merged",
cellLocation);
else
return MessageFormat.format("The cell at {0} is not merged", cellLocation);
}

Merging Cells in a Table
The same technique is used to set the merge behavior on the cells in a table. When building a table with
merge cells with DocumentBuilder you need to set the appropriate merge type for each cell. Also you
must remember to clear the merge setting or otherwise all cells in the table will become merged. This can
be done by setting the value of the appropriate merge property to CellMerge.None.
Example: Merging Cells Horizontally
Creates a table with two rows with cells in the first row horizontally merged.
Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

builder.insertCell();
builder.getCellFormat().setHorizontalMerge(CellMerge.FIRST);
builder.write("Text in merged cells.");

builder.insertCell();
// This cell is merged to the previous and should be empty.
builder.getCellFormat().setHorizontalMerge(CellMerge.PREVIOUS);
builder.endRow();

builder.insertCell();
builder.getCellFormat().setHorizontalMerge(CellMerge.NONE);
builder.write("Text in one cell.");

builder.insertCell();
builder.write("Text in another cell.");
builder.endRow();
builder.endTable();

Example: Merging Cells Vertically
Creates a table with two columns with cells merged vertically in the first column.
Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

builder.insertCell();
builder.getCellFormat().setVerticalMerge(CellMerge.FIRST);
builder.write("Text in merged cells.");

builder.insertCell();
builder.getCellFormat().setVerticalMerge(CellMerge.NONE);
builder.write("Text in one cell");
builder.endRow();

builder.insertCell();
// This cell is vertically merged to the cell above and should be empty.
builder.getCellFormat().setVerticalMerge(CellMerge.PREVIOUS);

builder.insertCell();
builder.getCellFormat().setVerticalMerge(CellMerge.NONE);
builder.write("Text in another cell");
builder.endRow();
builder.endTable();


There are different ways to start a table. In the code snippets above builder.insertCell(); is used.
Another method is to use builder.startTable(). Either approach starts a new table.
Read more: Inserting a Table using DocumentBuilder.
In other situations where a builder is not used, such as in an existing table, merging cells in this way may
not be as simple.
Instead we can wrap the base operations which are involved in apply merge properties to cells into a
method which makes the task much easier. This method is similar to the automation Merge method which
is called to merge a range of cells in a table.
The code below will merge the range of cells in table starting from the given cell, to the end cell. This
range can span over many rows or columns.
Example: Merging all Cells in a Range
A method which merges all cells of a table in the specified range of cells.
Java
/**
* Merges the range of cells found between the two specified cells both horizontally
and vertically. Can span over multiple rows.
*/
public static void mergeCells(Cell startCell, Cell endCell)
{
Table parentTable = startCell.getParentRow().getParentTable();

// Find the row and cell indices for the start and end cell.
Point startCellPos = new Point(startCell.getParentRow().indexOf(startCell),
parentTable.indexOf(startCell.getParentRow()));
Point endCellPos = new Point(endCell.getParentRow().indexOf(endCell),
parentTable.indexOf(endCell.getParentRow()));
// Create the range of cells to be merged based off these indices. Inverse each
index if the end cell if before the start cell.
Rectangle mergeRange = new Rectangle(Math.min(startCellPos.x, endCellPos.x),
Math.min(startCellPos.y, endCellPos.y),
Math.abs(endCellPos.x - startCellPos.x) + 1, Math.abs(endCellPos.y -
startCellPos.y) + 1);

for (Row row : parentTable.getRows())
{
for(Cell cell : row.getCells())
{
Point currentPos = new Point(row.indexOf(cell), parentTable.indexOf(row));

// Check if the current cell is inside our merge range then merge it.
if (mergeRange.contains(currentPos))
{
if (currentPos.x == mergeRange.x)
cell.getCellFormat().setHorizontalMerge(CellMerge.FIRST);
else
cell.getCellFormat().setHorizontalMerge(CellMerge.PREVIOUS);

if (currentPos.y == mergeRange.y)
cell.getCellFormat().setVerticalMerge(CellMerge.FIRST);
else
cell.getCellFormat().setVerticalMerge(CellMerge.PREVIOUS);
}
}
}
}

Example: Merging Cells between Two Cells
Merges the range of cells between the two specified cells.
Java
// We want to merge the range of cells found inbetween these two cells.
Cell cellStartRange = table.getRows().get(2).getCells().get(2);
Cell cellEndRange = table.getRows().get(3).getCells().get(3);

// Merge all the cells between the two specified cells into one.
mergeCells(cellStartRange, cellEndRange);

Specifying Rows to Repeat on Subsequent
Pages as Header Rows
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Feb 01, 2012 (view change)
Go to start of metadata

A table can specify certain starting rows of a table to be used as header rows. This means if the table spans
over many pages, these rows will be repeated at the top of the table for each page.

In Microsoft Word this option is found under Table Properties as Repeat row as header on subsequent pages
. Using this option you can choose to repeat only a single row or many rows in a table.
In the case of a single header row it must be the first row in the table. In addition when multiple
header rows are used then the header row each of these rows must be consecutive and these rows must
be on one page.
In Aspose.Words you can apply this setting by using the RowFormat.HeadingFormat property.

Note that heading rows do not work in nested tables. That is, if you have a table within another table then this
setting will have no effect. This is a limitation of Microsoft Word which does not allow this and not of
Aspose.Words.
Example
Shows how to build a table which include heading rows that repeat on subsequent pages.
Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

Table table = builder.startTable();
builder.getRowFormat().setHeadingFormat(true);
builder.getParagraphFormat().setAlignment(ParagraphAlignment.CENTER);
builder.getCellFormat().setWidth(100);
builder.insertCell();
builder.writeln("Heading row 1");
builder.endRow();
builder.insertCell();
builder.writeln("Heading row 2");
builder.endRow();

builder.getCellFormat().setWidth(50);
builder.getParagraphFormat().clearFormatting();

// Insert some content so the table is long enough to continue onto the next page.
for (int i = 0; i < 50; i++)
{
builder.insertCell();
builder.getRowFormat().setHeadingFormat(false);
builder.write("Column 1 Text");
builder.insertCell();
builder.write("Column 2 Text");
builder.endRow();
}

doc.save(getMyDir() + "Table.HeadingRow Out.doc");

Keeping Tables and Rows from Breaking
across Pages
Skip to end of metadata

Attachments:3
Added by hammad, last edited by tahir manzoor on Jun 26, 2014 (view change)
Go to start of metadata

There are times where the contents of a table should not be split across a page. For instance when there is a
title above a table, the title and the table should always be kept together on the same page to preserve proper
appearance.
There are two separate techniques that are useful to achieve this functionality:
Allow Row to Break across Pages which is applied to the rows of a table.
Keep with Next which is applied to paragraphs in table cells.
We will use the table below in our example. By default it has the properties above disabled. Also notice
how the content in the middle row is split across the page.


Keeping a Row from Breaking across Pages
This involves restricting content inside the cells of a row from being split across a page. In
Microsoft Word this can found under Table Properties as the option Allow Row to break across
Pages.
In Aspose.Words this is found under the RowFormat object of a Row as the property
RowFormat.AllowBreakAcrossPages .
Example
Shows how to disable rows breaking across pages for every row in a table.
Java
// Disable breaking across pages for all rows in the table.
for(Row row : table)
row.getRowFormat().setAllowBreakAcrossPages(false);

The result is the contents of each row are no longer split across the page. The table will only split across
the page at the start of a row instead of in the middle of a row.

Keeping a Table from Breaking across Pages
To stop a table from splitting across the page we need to state that we wish the content contained
within the table to stay together. In Microsoft Word this involves selecting the table and enabling
Keep with Next under Paragraph Format.
In Aspose.Words the technique is the same. Each paragraph inside the cells of the table should have
ParagraphFormat.KeepWithNext set to true. The exception is the last paragraph in the table which
should be set to false.
Example
Shows how to set a table to stay together on the same page.
Java
// To keep a table from breaking across a page we need to enable KeepWithNext
// for every paragraph in the table except for the last paragraphs in the last
// row of the table.
for (Cell cell : (Iterable<Cell>) table.getChildNodes(NodeType.CELL, true))
{
//call this method if table's cell is created on the fly
//newly created cell does not have paragraph inside
cell.ensureMinimum();
for (Paragraph para : cell.getParagraphs())
if (!(cell.getParentRow().isLastRow() && para.isEndOfCell()))
para.getParagraphFormat().setKeepWithNext(true);
}

The table is no longer split across the page and the entire table is moved to the next page instead.

Joining and Splitting Tables
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Feb 01, 2012 (view change)
Go to start of metadata

A table which is represented in the Aspose.Words Document Object Model is made up of independent rows
and cells which makes joining or splitting tables easy.

In order to manipulate a table to split or join with another table we simply need to move the rows from one
table to another.
Combining Two Tables into One
The rows from the second table simply need to be shifted to the end of the first table and the container of
the second table deleted.
Example
Shows how to combine the rows from two tables into one.
Java
// Load the document.
Document doc = new Document(getMyDir() + "Table.Document.doc");

// Get the first and second table in the document.
// The rows from the second table will be appended to the end of the first table.
Table firstTable = (Table)doc.getChild(NodeType.TABLE, 0, true);
Table secondTable = (Table)doc.getChild(NodeType.TABLE, 1, true);

// Append all rows from the current table to the next.
// Due to the design of tables even tables with different cell count and widths can be
joined into one table.
while (secondTable.hasChildNodes())
firstTable.getRows().add(secondTable.getFirstRow());

// Remove the empty table container.
secondTable.remove();

doc.save(getMyDir() + "Table.CombineTables Out.doc");

Split a Table into Two Separate Tables
We first need to pick a row at where to split the table. Once we know this we can create two tables from
the original table by following these simple steps:
1. Create a clone of the table without cloning children to hold the moved rows and insert it after the original
table.
2. Starting from the specified row move all subsequent rows to this second table.
Example
Shows how to split a table into two tables a specific row.
Java
// Load the document.
Document doc = new Document(getMyDir() + "Table.SimpleTable.doc");

// Get the first table in the document.
Table firstTable = (Table)doc.getChild(NodeType.TABLE, 0, true);

// We will split the table at the third row (inclusive).
Row row = firstTable.getRows().get(2);

// Create a new container for the split table.
Table table = (Table)firstTable.deepClone(false);

// Insert the container after the original.
firstTable.getParentNode().insertAfter(table, firstTable);

// Add a buffer paragraph to ensure the tables stay apart.
firstTable.getParentNode().insertAfter(new Paragraph(doc), firstTable);

Row currentRow;

do
{
currentRow = firstTable.getLastRow();
table.prependChild(currentRow);
}
while (currentRow != row);

doc.save(getMyDir() + "Table.SplitTable Out.doc");

About Field Update
Skip to end of metadata

Added by hammad, last edited by Caroline von Schmalensee on Jun 05, 2013 (view change)
Go to start of metadata

Fields in a document are like placeholders where some useful data can be inserted. For example, a field can be
a page reference, formula or a mail merge field. A field in a Microsoft Word document consists of a field code
and field result. The field code is an instruction about how the field result needs to be updated or calculated.
An application that processes a document and encounters a field might have the functionality to interpret the
instructions contained in the field code and update the field result with a new value. This is called field update.
Usually, a field, when inserted in Microsoft Word, already contains an up to date value. For example,
if the field is a formula or a page number, it will contain a correct calculated value for the given
version of the document. But if you have an application that generates or modifies a document with
fields, for example combines two documents or populates with data, then for the document to be
useful, all fields should ideally be updated.
Fields in Microsoft Word documents are complex. There are over 50 field types (each needs its own
result calculation procedure), formulas and expressions, bookmarks and references, functions and
various switches. Fields can also be nested.
Aspose.Words is a class library designed for server-side processing of Microsoft Word documents and
supports fields in the following ways:
All fields in a document are preserved during open/save and conversions.
It is possible to update results of some of the most popular fields.
Fields Supported during Update
Calculation of the following fields is supported in the current version of Aspose.Words:
= (formula field)
ADDRESSBLOCK
AUTHOR
COMPARE
CREATEDATE
DATE
DOCPROPERTY
DOCVARIABLE
GREETINGLINE
IF
INCLUDETEXT
MERGEFIELD
MERGEREC
MERGESEQ
NEXT
NEXTIF
NUMPAGES
PAGE
PAGEREF
REF
SECTION
SECTIONPAGES
SEQ
SET
STYLEREF
TIME
TITLE
TOA
TOC (including TOT and TOF)
TC
Sophisticated Parsing
Aspose.Words follows the way Microsoft Word processes fields and as a result it correctly handles:
Nested fields:
IF { =OR({ COMPARE { =2.5 +PRODUCT(3,5 ,8.4) } > 4}, { =2/2 }) } = 1 "Credit not acceptable" "Credit
acceptable"
Field argument can be a result of a nested field.
Fields can be nested within a field code as well as in the field result.
Spaces/no spaces, quotes/no quotes, escape characters in fields etc
.: MERGEFIELD \f"Text after""Field \n\ame with \" and \\\ and \\\*"\bTextBefor\e
Fields that span across multiple paragraphs.
Formula Fields
Aspose.Words provides a very serious implementation of the formula engine and supports the following:
Arithmetic and logical operators:
=(54+4*(6-77)-(5))+(-6-5)/4/5
Functions:
=ABS(-01.4)+2.645/(5.6^3.5)+776457 \\\# "#,##0"
References to bookmarks
: =IF(C>4, 5,ABS(A)*.76) +3.85
Number formatting switches:
=00000000 \\\# "$#,##0.00;($#,##0.00)"
The following functions in expressions are supported: ABS, AND, AVERAGE, COUNT, DEFINED,
FALSE, IF, INT, MAX, MIN, MOD, NOT, OR, PRODUCT, ROUND, SIGN, SUM, TRUE.
IF and COMPARE Fields
Just some of the IF expressions that Aspose.Words can easily calculate should give you an idea of how
powerful this feature is:
IF 3 > 5.7^4+MAX(4,3) True False
IF "abcd" > "abc" True False
IF "?ab*" = "1abdsg" True False
IF 4 = "2*2" True False
COMPARE 3+5/34 < 4.6/3/2
DATE and TIME Fields
Aspose.Words supports all date and time formatting switches available in Microsoft Word, some
examples are:
DATE @ "d-MMM-yy"
DATE @ "d/MM/yyyy h:mm am/pm
Mail Merge Fields
Aspose.Words imposes no limit on the complexity of mail merge fields in your documents and supports
nested IF and formula fields and can even calculate the merge fields name using a formula.
Some examples of mail merge fields that Aspose.Words supports:
Mail merge field switches
: MERGEFIELD FirstName \\\* FirstCap \b "Mr. "
Nested merge fields in a formula:
IF { MERGEFIELD Value1 } >= { MERGEFIELD Value2 } True False
Calculate the name of the merge field at runtime:
MERGEFIELD { IF { MERGEFIELD Value1 } >= { MERGEFIELD Value2 } FirstName"LastName" }
Conditional move to next record in the data source:
NEXTIF { MERGEFIELD Value1 } <= { =IF(-2.45 >= 6*{ MERGEFIELD Value2 }, 2, -.45) }
Format Switches
A field in a document can have formatting switches that specify how the resulting value should be
formatted. Aspose.Words supports the following format switches:
@ - date and time formatting
\\\# - number formatting
\\\* Caps
\\\* FirstCap
\\\* Lower
\\\* Upper
\\\* CHARFORMAT format result according to the first character of the field code.
\\\* MERGEFORMAT format result according to how the old result is formatted.
How to Update Fields
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Mar 19, 2014 (view change)
Go to start of metadata

When a document is loaded, Aspose.Words mimics the behavior of Microsoft Word with the option to
automatically update fields is switched off. The behavior can be summarized as follows:
When you open/save a document the fields remain intact.
You can explicitly update all fields in a document (e.g. rebuild TOC) when you need to.
When you print/render to PDF or XPS the fields related to page-numbering in headers/footers are
updated.
When you execute mail merge all fields are updated automatically.
Update Fields Programmatically
To explicitly update fields in the whole document, simply call Document.UpdateFields.

To update fields contained in part of a document, obtain a Range object and call the Range.UpdateFields
method. In Aspose.Words, you can obtain a Range for any node in the document tree, such as Section ,
HeaderFooter , Paragraph etc using the Node.Range property.
You can update the result of a single field by calling Field.Update .
Automatic Update of Page-Related Fields during Rendering
When you execute conversion of a document to a fixed-page format e.g. to PDF or XPS, then
Aspose.Words will automatically update page layout-related fields PAGE, PAGEREF found in
headers/footers of the document. This behavior mimics the behavior of Microsoft Word when printing a
document.

If you want to update all other fields in the document, then you need to call Document.UpdateFields
before rendering the document.
Example
Shows how to update all fields before rendering a document.
Java
Document doc = new Document(getMyDir() + "Rendering.doc");

// This updates all fields in the document.
doc.updateFields();

doc.save(getMyDir() + "Rendering.UpdateFields Out.pdf");

Automatic Field Update during Mail Merge
When you execute a mail merge, all fields in the document will be automatically updated. This is because
mail merge is a case of a field update. The program encounters a mail merge field and needs to update its
result, which involves grabbing the value from the data source and inserting it into the field. The logic is
of course more complicated, for example, when the end of the document/mail merge region is reached but
there is still further data to be merged, then the region needs to be duplicated and the new set of fields
updated.
Differences in Field Update in Aspose.Words
10.0 and Above
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Feb 01, 2012 (view change)
Go to start of metadata

Starting from Aspose.Words 10.0 the way that some fields are updated is slightly different, due to internal
reworking of the field evaluation engine. These changes to the field engine allow fields to be more accurately
updated and brings field updating in Aspose.Words closer to how field update in Microsoft Word behaves.

With the implementation of this new engine the general behavior of field update remains the same with the
exceptions which mostly affect how the Document.UpdateFields and Document.UpdatePageLayout methods
behave. These important changes are detailed below:
Calling UpdateFields Now Updates All Field Types
In previous versions calling Document.UpdateFields or Range.UpdateFields would update only regular
fields such as IF or DOCPROPERTY and not page-layout related fields such as PAGE or NUMPAGES.
Newer versions will now update both the regular and page-layout related fields.

When Document.UpdateFields or Range.UpdateFields is called all fields are updated over the entire
document/range. This may involve building the document layout if a page-layout related field like the
PAGE field is encountered during the update.
Example
Shows how to update all fields in a document.
Java
Document doc = new Document(getMyDir() + "Document.doc");
doc.updateFields();

A similar process occurs when a single field is updated using the Field.Update method. If this field is a
regular field then only this field is updated as normal. However if this field is related to the page layout
then the document layout is rebuilt and it is updated along with all other page related fields found in
headers or footers.

These changes in field evaluation may potentially cause results that differ from previous versions on
certain documents when executing the same field update code.
Calling UpdatePageLayout Now Only Updates Page-Layout Related Fields in Headers and Footers
In previous versions a call to Document.UpdatePageLayout was required in order to update fields in the
document like PAGE and PAGEREF. In the current version this functionality is handled by
Document.UpdateFields which updates all types of fields as discussed above.

Document.UpdatePageLayout is still used to build or rebuild the document layout when a document is
to be rendered. When this method is called or a document is rendered (i.e. saved to PDF, XPS, printed
etc.) the document layout is built. In previous versions this process would update all page-layout related
fields, however in the current version these fields are automatically updated only in the headers and
footers of the document.
These changes to how fields are updated upon document layout are required and match how Microsoft
Word updates fields. This now allows a document to be rendered without any fields in the main body
being updated which is how fields are evaluated in Microsoft Word.
If the old functionality of updating page-related fields in the entire document when rendering is desired
then an explicit call to Document.UpdateFields is required before saving the document.
Example
Shows how to update all fields before rendering a document.
Java
Document doc = new Document(getMyDir() + "Rendering.doc");

// This updates all fields in the document.
doc.updateFields();

doc.save(getMyDir() + "Rendering.UpdateFields Out.pdf");

All Types of Fields Encountered during Mail Merge are Updated
Previously only non page-related fields encountered during mail merge would be updated. Now all fields
inside a mail merge region (or in the whole body if not using mail merge regions) are updated, including
page-related fields.

As with Document.UpdateFields this may invoke the document layout to be built if page-related fields
are encountered. This behavior mimics how Microsoft Words handles field update during mail merge and
ensures that after mail merge has been executed all fields are up to date with correct values.
Fields Overview
Skip to end of metadata

Attachments:2
Added by Adam Skelton, last edited by Adam Skelton on Mar 19, 2014
Go to start of metadata

Fields in a document are like placeholders where useful data can be inserted. For example, a field can be a
page reference, formula or a mail merge field. A field in a Microsoft Word document consists of a field code
and a field result. The field code is an instruction about how the field result needs to be updated or calculated.
An application that processes a document and encounters a field might have the functionality to interpret the
instructions contained in the field code and update the field result with a new value. This is called field update.
Usually a field, when inserted in Microsoft Word, already contains an up to date value. For example, if
the field is a formula or a page number, it will contain a correct calculated value for the given version of
the document. But if you have an application that generates or modifies a document with fields (for
example combines two documents or populates with data) then for the document to be useful, all fields
should ideally be updated.
A field consists of:

The field start and separator nodes are used to encompass the content which makes up the field code
(normally as plain text)
The field separator and field end encompass the field result. This can be made up of various types of
content ranging from runs of text to paragraphs to tables.
Some fields may not have a separator which means the entire content makes up the field code.
The field code defines the behavior of the field and is comprised of the field identifier and often other
parameters such as field name and switches.
The field result contains the most recent evaluation of the field. This value is stored in the field result and
is what is displayed to the user. Some fields may not have any field result thus will not display anything in
the document. Likewise, some fields may not be updated yet therefore will also have no field result.
Here is a view of how a field is stored in Aspose.Words by using the DocumentExplorer example
which can be found on Github.

Aspose.Words is a class library designed for server-side processing of Microsoft Word documents and
supports fields in the following ways:
All fields in a document are preserved during open/save and conversions.
It is possible to update results of some of the most popular fields.
Fields in Microsoft Word
Skip to end of metadata

Attachments:7
Added by Adam Skelton, last edited by Adam Skelton on Mar 19, 2014
Go to start of metadata

Fields in Microsoft Word documents are complex. There are over 50 field types (each needs its own result
calculation procedure), formulas and expressions, bookmarks and references, functions and various switches.
Fields can also be nested.
Normally when a document is opened, the field result (the value of the field) is shown for all fields in the
document. You can toggle the display of field result or field codes in Microsoft Word for all fields by
pressing ALT+F9.
Field Code Field Result



Inserting Fields in Microsoft Word
To insert a field in Microsoft Word:
1. Click on the Insert menu.
2. Click on the Quick Parts drop down menu
3. Select Field

1. You are presented with a screen which allows you to enter the details of the field. On the left side you are
given a list of the possible fields and on the right side is a screen to visually edit the properties of the field.

2. Additionally you can press the Field Codes button which allows you to directly write out the field code.

3. Switches can also be inserted by using the Options button

4. Using either method, fill in the desired fields with the appropriate information then press Ok.
5. The field is inserted into the document at the current cursor position.

Updating Fields in Microsoft Word
To update a single field in Microsoft Word:
1. Move the caret into the field that you want to update.
2. Press F9 to update the field.
To update all fields in Microsoft Word:
1. Press Ctrl+A to select all the content in the document.
2. Press F9 to update all of the fields found within the selection.
Switching Between Display of Field Code and Field Result
To toggle field codes of a single field in Microsoft Word:
1. Move the caret into the desired field.
2. Press SHIFT+F9 to toggle the field code just for this field.
To toggle field codes of all fields in Microsoft Word:
1. Press ALT+F9
Converting Fields to Static Text in Microsoft Word
To convert a dynamic field to static text in Microsoft Word:
1. Move the caret into the field that you want to convert.
2. Press Ctrl+Shift+F9 to convert the fields to static text.
Removing a Field in Microsoft Word
To remove a field in Microsoft Word:
1. Select the entire content making up the field. If field codes are displayed then the opening and ending
braces need to be selected as well.
2. Press Delete to remove the entire field.
Fields in Aspose.Words
Skip to end of metadata

Added by Adam Skelton, last edited by Adam Skelton on Mar 19, 2014
Go to start of metadata

When a document is loaded into Aspose.Words, the fields of the document are loaded into the Aspose.Words
Document Object Model as a set of separate components (nodes). A single field is loaded as a collection of
FieldStart, FieldSeparator and FieldEnd nodes along with the content in between these nodes. If a field does
not have a field result then there will be no FieldSeparator node. All of these nodes are always found inline (as
children of Paragraph or SmartTag).
The content which makes up the field code is stored as Run nodes between the FieldStart and
FieldSeparator. The field result is stored between the FieldSeparator and FieldEnd nodes and can be
made up of various types of content. Normally the field result contains just text made up of Run
nodes, however it is possible for the FieldEnd node to be located in a completely different paragraph,
and thus making the field result comprised of block level nodes such as Table and Paragraph nodes as
well.
In Aspose.Words each of the FieldXXX nodes derives from FieldChar. This class provides a property
to check the type of field represented by the specified node through the FieldChar.FieldType property.
For example FieldType.FieldMergeField represents a merge field in the document.

There are some particular fields that exist in a Word document that are not imported into Aspose.Words
as a collection of FieldXXX nodes. For instance, LINK field and INCLUDEPICTURE field are imported into
Aspose.Words as a Shape or DrawingML objects. This object provides properties to work with the image
data normally stored in these fields.
Form fields are also imported into Aspose.Words as their own special class. The FormField class
represents a form field in a Word document and provides additional methods that are particular to
a form field.
Fields Supported during Update
Calculation of the following fields is supported in the current version of Aspose.Words:
= (formula field)
ADDRESSBLOCK
AUTHOR
COMPARE
CREATEDATE
DATE
DOCPROPERTY
DOCVARIABLE
GREETINGLINE
IF
INCLUDETEXT
MERGEFIELD
MERGEREC
MERGESEQ
NEXT
NEXTIF
NUMPAGES
PAGE
PAGEREF
REF
SECTION
SECTIONPAGES
SEQ
SET
STYLEREF
TIME
TITLE
TOA
TOC (including TOT and TOF)
TC
Sophisticated Parsing
Aspose.Words follows the way Microsoft Word processes fields and as a result it correctly handles:
Nested fields
: IF { =OR({ COMPARE { =2.5 +PRODUCT(3,5 ,8.4) } > 4}, { =2/2 }) } = 1 "Credit not acceptable" "Credit
acceptable"
Field argument can be a result of a nested field.
Fields can be nested within a field code as well as in the field result.
Spaces/no spaces, quotes/no quotes, escape characters in fields etc.:
MERGEFIELD \f"Text after""Field \n\ame with \" and \\\ and \\\*"\bTextBefor\e
Fields that span across multiple paragraphs.
Formula Fields
Aspose.Words provides a very serious implementation of the formula engine and supports the following:
Arithmetic and logical operators:
=(54+4*(6-77)-(5))+(-6-5)/4/5
Functions:
=ABS(-01.4)+2.645/(5.6^3.5)+776457 \\\# "#,##0"
References to bookmarks
: =IF(C>4, 5,ABS(A)*.76) +3.85
Number formatting switches:
=00000000 \\\# "$#,##0.00;($#,##0.00)"
The following functions in expressions are supported: ABS, AND, AVERAGE, COUNT, DEFINED,
FALSE, IF, INT, MAX, MIN, MOD, NOT, OR, PRODUCT, ROUND, SIGN, SUM, TRUE.
IF and COMPARE Fields
Just some of the IF expressions that Aspose.Words can easily calculate should give you an idea of how
powerful this feature is:
IF 3 > 5.7^4+MAX(4,3) True False
IF "abcd" > "abc" True False
IF "?ab*" = "1abdsg" True False
IF 4 = "2*2" True False
COMPARE 3+5/34 < 4.6/3/2
DATE and TIME Fields
Aspose.Words supports all date and time formatting switches available in Microsoft Word, some
examples are:
DATE @ "d-MMM-yy"
DATE @ "d/MM/yyyy h:mm am/pm
Mail Merge Fields
Aspose.Words imposes no limit on the complexity of mail merge fields in your documents and supports
nested IF and formula fields and can even calculate the merge fields name using a formula.
Some examples of mail merge fields that Aspose.Words supports:
Mail merge field switches:
MERGEFIELD FirstName \\\\* FirstCap \b "Mr. "
Nested merge fields in a formula:
IF { MERGEFIELD Value1 } >= { MERGEFIELD Value2 } True False
Calculate the name of the merge field at runtime:
MERGEFIELD { IF { MERGEFIELD Value1 } >= { MERGEFIELD Value2 } FirstName"LastName" }
Conditional move to next record in the data source:
NEXTIF { MERGEFIELD Value1 } <= { =IF(-2.45 >= 6*{ MERGEFIELD Value2 }, 2, -.45) }
Format Switches
A field in a document can have formatting switches that specify how the resulting value should be
formatted. Aspose.Words supports the following format switches:
@ - date and time formatting
\\\# - number formatting
\\\\* Caps
\\\\* FirstCap
\\\\* Lower
\\\\* Upper
\\\\* CHARFORMAT format result according to the first character of the field code.
\\\\* MERGEFORMAT format result according to how the old result is formatted.
Date and Number Formatting in Fields
When Aspose.Words calculates a field result, it often needs to parse a string into a number or date value
and also to format it back to a string.

By default Aspose.Words uses the current thread locale to perform parsing and formatting when
calculating field values during field update and mail merge. There are also options provided in the form of
the FieldOptions class which allows further control over which locale is used during field update.
By default the FieldOptions.FieldUpdateCultureSource property is set to
FieldUpdateCultureSource.CurrentThread which formats fields using the current thread locale.
This property can be set to FieldUpdateCultureSource.FieldCode so the language set from the field code
of the field is used for formatting instead.
Formatting using the Current Threads Locale
To control the locale used during field calculation, just set the Locale.setDefault property to a locale of
your choice before invoking field calculation.
Example
Shows how to change the culture used in formatting fields during update.
Java
// Store the current culture so it can be set back once mail merge is complete.
Locale currentCulture = Locale.getDefault();
// Set to German language so dates and numbers are formatted using this culture during
mail merge.
Locale.setDefault(new Locale("de", "DE"));

// Execute mail merge
doc.getMailMerge().execute(new String[]{"Date"}, new Object[]{new Date()});

// Restore the original culture.
Locale.setDefault(currentCulture);


Using the current locale to format fields allows a system to easily and consistently control how all fields in the
document are formatted during field update.
Formatting using the Locale in the Document
On the other hand, Microsoft Word formats each individual field based off the language of the text found
in the field (specifically, the runs from the field code). Sometimes during field update this may be the
desired behavior, for example if you have globalized documents containing content made up of many
different languages and would like each fields to honor the locale used from the text. Aspose.Words also
supports this functionality.
The Document class provides a FieldOptions property which contains members which can be used to
control how fields are updated within the document.
Example
Shows how to specify where the locale for date formatting during field update and mail merge is chosen
from.
Java
// Set the culture used during field update to the culture used by the field.
doc.getFieldOptions().setFieldUpdateCultureSource(FieldUpdateCultureSource.FIELD_CODE)
;
doc.getMailMerge().execute(new String[] { "Date2" }, new Object[] { new
SimpleDateFormat("yyyy/MM/DD").parse("2011/01/01") });

Inserting Fields into a Document
Skip to end of metadata

Added by Adam Skelton, last edited by Adam Skelton on Mar 19, 2014
Go to start of metadata

In Aspose.Words the DocumentBuilder.InsertField method is used to insert new fields into a document. The
first parameter accepts the full field code of the field to be inserted. The second parameter is optional and
allows the field result of the field to be set manually. If this is not supplied then the field is updated
automatically. You can pass null or empty to this parameter to insert a field with an empty field value.

If your field code has a parameter containing a space then it must be enclosed within speech marks. Otherwise
the field in both Microsoft Word and Aspose.Words may not work as expected as the parameter is treated by
both as being truncated
Inserts a merge field into a document using DocumentBuilder.
Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
builder.insertField("MERGEFIELD MyFieldName \\* MERGEFORMAT");


The same technique is used to insert fields nested within other fields.
Demonstrates how to insert fields nested within another field using DocumentBuilder.
Java
Document doc = new Document();
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

// Insert few page breaks (just for testing)
for (int i = 0; i < 5; i++)
builder.insertBreak(BreakType.PAGE_BREAK);

// Move DocumentBuilder cursor into the primary footer.
builder.moveToHeaderFooter(HeaderFooterType.FOOTER_PRIMARY);

// We want to insert a field like this:
// { IF {PAGE} <> {NUMPAGES} "See Next Page" "Last Page" }
Field field = builder.insertField("IF ");
builder.moveTo(field.getSeparator());
builder.insertField("PAGE");
builder.write(" <> ");
builder.insertField("NUMPAGES");
builder.write(" \"See Next Page\" \"Last Page\" ");

// Finally update the outer field to recalcaluate the final value. Doing this will
automatically update
// the inner fields at the same time.
field.update();


Finding the Field Code and Field Result
Skip to end of metadata

Added by Adam Skelton, last edited by Adam Skelton on Mar 19, 2014
Go to start of metadata

A field which is inserted using DocumentBuilder.InsertField returns a Field object. This is a faade class which
provides useful methods to quickly find such properties of a field. Note if you are only looking for the names of
merge fields in the document then you can instead use the built-in method MailMerge.GetFieldNames.
Example
Shows how to get names of all merge fields in a document.
Java
String[] fieldNames = doc.getMailMerge().getFieldNames();

Removing a Field
Skip to end of metadata

Added by Adam Skelton, last edited by Adam Skelton on Mar 19, 2014
Go to start of metadata

Sometimes it is necessary to remove a field from the document. This may occur when it is to be replaced with a
different field type or when the field is no longer needed in the document. For example a TOC field when
saving to HTML.
A field inserted into the document using DocumentBuilder.InsertField returns a Field object which
provides a convenience method to easily remove the field from the document.
Example
Removes a field from the document.
Java
Field field = builder.insertField("PAGE");
// Calling this method completely removes the field from the document.
field.remove();

Form Fields Overview
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Feb 03, 2012 (view change)
Go to start of metadata

A document that contains fill-in blanks (fields) is known as a form. For example, you can create a registration
form in Microsoft Word that uses drop-down lists from which users can select entries. Form field is a location
where a particular type of data, such as a name or address, is stored. Form fields in Microsoft Word include
text input, combobox and checkbox.

You can use form fields in your project to "communicate" with your users. For example, you create a document
whose content is protected, but only form fields are editable. The users can enter the data in the form fields
and submit the document. Your application that uses Aspose.Words can retrieve data from the form fields and
process it.
Form Fields in Microsoft Word
Skip to end of metadata

Attachments:2
Added by hammad, last edited by Adam Skelton on Feb 03, 2012 (view change)
Go to start of metadata
Inserting Form Fields in Microsoft Word

Use the Forms toolbar to insert form fields. To display the Forms toolbar, point to Toolbars on the View menu,
and then click Forms .


1. In the document, click where you want to insert the form field.
2. Do any of the following:
o To insert a text input where users can enter text , click Text Form Field . You can specify a
default entry so that users do not have to type an entry unless they want to change the
response.
o To insert a check box that the user can select or clear , click Check Box Form Field .
o To insert a drop-down list box that restricts available choices to those you specify , click
Drop-Down Form Field . If needed, a user can scroll through the list to view additional
choices.



Note: before you can make a form available to users, you must protect it by clicking Protect Form on the
Forms toolbar. Protection allows users to fill in the form but prevents them from changing the form's layout
and its standard elements. When you want to go back to writing or modifying the form, click Protect Form
again to unprotect the form.
Deleting Form Fields in Microsoft Word
Simply select a form field and press DELETE.
Form Fields in Aspose.Words
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Feb 03, 2012 (view change)
Go to start of metadata

Placing form fields into the document via code is easy. DocumentBuilder has special methods for inserting
them, one for each form field type.

Each of the methods accepts a string parameter representing the name of the form field. The name can be an
empty string. If however you specify a name for the form field, then a bookmark is automatically created with
the same name.
Inserting Form Fields
Use DocumentBuilder.InsertTextInput , DocumentBuilder.InsertCheckBox or
DocumentBuilder.InsertComboBox to insert form fields into a document.
Example
Use DocumentBuilder.InsertTextInput, DocumentBuilder.InsertCheckBox or
DocumentBuilder.InsertComboBox to insert form fields into a document.
Shows how to insert a combobox form field into a document.
Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

String[] items = {"One", "Two", "Three"};
builder.insertComboBox("DropDown", items, 0);

Obtaining Form Fields
A collection of form fields is represented by the FormFieldCollection class that can be retrieved using
the Range.FormFields property. This means that you can obtain form fields contained in any document
node including the document itself.
Example
Shows how to get a collection of form fields.
Java
Document doc = new Document(getMyDir() + "FormFields.doc");
FormFieldCollection formFields = doc.getRange().getFormFields();


You can get a particular form field by its index or name.
Example
Shows how to access form fields.
Java
Document doc = new Document(getMyDir() + "FormFields.doc");
FormFieldCollection documentFormFields = doc.getRange().getFormFields();

FormField formField1 = documentFormFields.get(3);
FormField formField2 = documentFormFields.get("CustomerName");


The FormField properties allow you to work with form field name, type, and result.
Example
Shows how to work with form field name, type, and result.
Java
Document doc = new Document(getMyDir() + "FormFields.doc");

FormField formField = doc.getRange().getFormFields().get(3);

if (formField.getType() == FieldType.FIELD_FORM_TEXT_INPUT)
formField.setResult("My name is " + formField.getName());

DocumentBuilder Overview
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Feb 03, 2012 (view change)
Go to start of metadata

This section describes how to use the DocumentBuilder class to easily generate documents or insert rich
content and formatting.

DocumentBuilder is a powerful class that is associated with a Document and allows dynamic document
building from scratch or the addition of new elements to an existing document. It provides methods to insert
text, paragraphs, lists, tables, images and other contents, specification of font, paragraph, and section
formatting, and other things. Using DocumentBuilder is somewhat similar in concept to using the StringBuilder
class of the Java Platform .
DocumentBuilder complements classes and methods available in the Aspose.Words Document
Object Model by simplifying most common document building tasks, such as inserting text, tables,
fields and hyperlinks.
Everything that is possible with DocumentBuilder is also possible when using the classes of the
Aspose.Words Document Object Model directly, but using Aspose.Words DOM classes directly
usually requires more lines of code than using DocumentBuilder .
DocumentBuilder has an internal cursor that you can navigate to a different location in a document
using various DocumentBuilder.MoveToXXX methods such as
DocumentBuilder.MoveToDocumentStart and DocumentBuilder.MoveToField .
You can insert text, images, bookmarks, form fields, and other document elements at the cursor
position using any of DocumentBuilder.InsertXXX methods such as DocumentBuilder.InsertField
, DocumentBuilder.InsertHtml and other similar methods.
Aspose.Words API provides several classes responsible for different document elements formatting.
Each of the classes encapsulates a number of formatting properties related to a particular document
element such as text, paragraph, section, and so on. For example, the Font class represents character
formatting properties, the ParagraphFormat class represents paragraph formatting properties etc.
The objects of these classes are returned by the corresponding DocumentBuilder properties (that have
the same names as the classes) so you can access them and set desired formatting during the document
build.
To start, you need to create a DocumentBuilder and associate it with a Document object.
Create a new instance of DocumentBuilder by calling its constructor and pass to it a Document
object for attachment to the builder.
Example
Shows how to create a simple document using a document builder.
Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
builder.write("Hello World!");

Inserting Document Elements
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Feb 03, 2012 (view change)
Go to start of metadata
I nserting a String of Text

Simply pass the string of text you need to insert into the document to the DocumentBuilder.Write method.
Text formatting is determined by the Font property. This object contains different font attributes (font name,
font size, color, and so on).

Some important font attributes are also represented by DocumentBuilder properties to allow you to access
them directly. These are boolean properties Font.Bold , Font.Italic , and Font.Underline .

Note that the character formatting you set will apply to all text inserted from the current position in the
document onwards.
Example
Inserts formatted text using DocumentBuilder.
Java
DocumentBuilder builder = new DocumentBuilder();

// Specify font formatting before adding text.
Font font = builder.getFont();
font.setSize(16);
font.setBold(true);
font.setColor(Color.BLUE);
font.setName("Arial");
font.setUnderline(Underline.DASH);

builder.write("Sample text.");

I nserting a Paragraph
DocumentBuilder.Writeln inserts a string of text into the document as well but in addition it adds a
paragraph break. Current font formatting is also specified by the DocumentBuilder.Font property and
current paragraph formatting is determined by the DocumentBuilder.ParagraphFormat property.
Example
Shows how to insert a paragraph into the document.
Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

// Specify font formatting
Font font = builder.getFont();
font.setSize(16);
font.setBold(true);
font.setColor(Color.BLUE);
font.setName("Arial");
font.setUnderline(Underline.DASH);

// Specify paragraph formatting
ParagraphFormat paragraphFormat = builder.getParagraphFormat();
paragraphFormat.setFirstLineIndent(8);
paragraphFormat.setAlignment(ParagraphAlignment.JUSTIFY);
paragraphFormat.setKeepTogether(true);

builder.writeln("A whole paragraph.");

I nserting a Table
The basic algorithm to create a table using DocumentBuilder is simple:
1. Start the table using DocumentBuilder.StartTable .
2. Insert a cell using DocumentBuilder.InsertCell . This automatically starts a new row. If needed, use the
DocumentBuilder.CellFormat property to specify cell formatting.
3. Insert cell contents using the DocumentBuilder methods.
4. Repeat steps 2 and 3 until the row is complete.
5. Call DocumentBuilder.EndRow to end the current row. If needed, use DocumentBuilder.RowFormat
property to specify row formatting.
6. Repeat steps 2 - 5 until the table is complete.
7. Call DocumentBuilder.EndTable to finish the table building.
The appropriate DocumentBuilder table creation methods are described below.
Starting a Table
Calling DocumentBuilder.StartTable is the first step in building a table. It can be also called inside a
cell, in which case it starts a nested table. The next method to call is DocumentBuilder.InsertCell .
Inserting a Cell
After you call DocumentBuilder.InsertCell , a new cell is created and any content you add using other
methods of the DocumentBuilder class will be added to the current cell. To start a new cell in the same
row, call DocumentBuilder.InsertCell again.
Use the DocumentBuilder.CellFormat property to specify cell formatting. It returns a CellFormat
object that represents all formatting for a table cell.
Ending a Row
Call DocumentBuilder.EndRow to finish the current row. If you call DocumentBuilder.InsertCell
immediately after that, then the table continues on a new row.
Use the DocumentBuilder.RowFormat property to specify row formatting. It returns a RowFormat
object that represents all formatting for a table row.
Ending a Table
Call DocumentBuilder.EndTable to finish the current table. This method should be called only once
after DocumentBuilder.EndRow {} was called. When called, DocumentBuilder.EndTable moves the
cursor out of the current cell to a position just after the table.
The following example demonstrates how to build a formatted table that contains 2 rows and 2 columns.
Example
Shows how to build a formatted table that contains 2 rows and 2 columns.
Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

Table table = builder.startTable();

// Insert a cell
builder.insertCell();
// Use fixed column widths.
table.autoFit(AutoFitBehavior.FIXED_COLUMN_WIDTHS);

builder.getCellFormat().setVerticalAlignment(CellVerticalAlignment.CENTER);
builder.write("This is row 1 cell 1");

// Insert a cell
builder.insertCell();
builder.write("This is row 1 cell 2");

builder.endRow();

// Insert a cell
builder.insertCell();

// Apply new row formatting
builder.getRowFormat().setHeight(100);
builder.getRowFormat().setHeightRule(HeightRule.EXACTLY);

builder.getCellFormat().setOrientation(TextOrientation.UPWARD);
builder.writeln("This is row 2 cell 1");

// Insert a cell
builder.insertCell();
builder.getCellFormat().setOrientation(TextOrientation.DOWNWARD);
builder.writeln("This is row 2 cell 2");

builder.endRow();

builder.endTable();

I nserting a Break
If you want to explicitly start a new line, paragraph, column, section, or page, call
DocumentBuilder.InsertBreak. Pass to this method the type of the break you need to insert that is
represented by the BreakType enumeration.
Example
Shows how to insert page breaks into a document.
Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

builder.writeln("This is page 1.");
builder.insertBreak(BreakType.PAGE_BREAK);

builder.writeln("This is page 2.");
builder.insertBreak(BreakType.PAGE_BREAK);

builder.writeln("This is page 3.");

I nserting an Image
DocumentBuilder provides several overloads of the DocumentBuilder.InsertImage method that allow
you to insert an inline or floating image. If the image is an EMF or WMF metafile, it will be inserted into
the document in metafile format. All other images will be stored in PNG format.

The DocumentBuilder.InsertImage method can use images from different sources:
From a file or URL by passing a string parameter DocumentBuilder.InsertImage(String)
From a stream by passing a Input Stream parameter DocumentBuilder.InsertImage(Stream)
From a Buffered Image object by passing an Image parameter DocumentBuilder.InsertImage(Image)
From a byte array by passing a byte array parameter DocumentBuilder.InsertImage(Byte[])
For each of the DocumentBuilder.InsertImage methods listed above, there are further overloads which
allow you to insert an image with the following options:
Inline or floating at a specific position e.g
DocumentBuilder.InsertImage(String,RelativeHorizontalPosition, double, RelativeVerticalPosition,
double, double, double, WrapType)
Percentage scale or custom size e.g DocumentBuilder.InsertImage(Stream, double, double)
Furthermore the DocumentBuilder.InsertImage method returns a Shape object that was just created and
inserted so you can further modify properties of the Shape .
Inserting an Inline Image
Pass a single string representing a file that contains the image to DocumentBuilder.InsertImage to insert
the image into the document as an inline graphic.
Example
Shows how to insert an inline image at the cursor position into a document.
Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

builder.insertImage(getMyDir() + "Watermark.png");

Inserting a Floating (Absolutely Positioned) Image
This example inserts a floating image from a file or URL at a specified position and size
Example
Shows how to insert a floating image from a file or URL.
Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

builder.insertImage(getMyDir() + "Watermark.png",
RelativeHorizontalPosition.MARGIN,
100,
RelativeVerticalPosition.MARGIN,
100,
200,
100,
WrapType.SQUARE);

I nserting a Bookmark
To insert a bookmark into the document, you should do the following:
1. Call DocumentBuilder.StartBookmark passing it the desired name of the bookmark.
2. Insert the bookmark text using DocumentBuilder methods.
3. Call DocumentBuilder.EndBookmark passing it the same name that you used with
DocumentBuilder.StartBookmark .
Bookmarks can overlap and span any range. To create a valid bookmark you need to call both
DocumentBuilder.StartBookmark and DocumentBuilder.EndBookmark with the same bookmark
name.

Badly formed bookmarks or bookmarks with duplicate names will be ignored when the document is saved.
Example
Shows how to insert a bookmark into a document using a document builder.
Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

builder.startBookmark("FineBookmark");
builder.writeln("This is just a fine bookmark.");
builder.endBookmark("FineBookmark");

I nserting a Field
Fields in Microsoft Word documents consist of a field code and a field result. The field code is like a
formula and the field result is the value that the formula produces. The field code may also contain field
switches that are additional instructions to perform a specific action.

You can switch between displaying field codes and results in your document in Microsoft Word using the
keyboard shortcut Alt+F9. Field codes appear between curly braces ( { } ).
Use DocumentBuilder.InsertField to create fields in the document. You need to specify a field type,
field code and field value. If you are not sure about the particular field code syntax, create the field in
Microsoft Word first and switch to see its field code.
Example
Inserts a merge field into a document using DocumentBuilder.
Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
builder.insertField("MERGEFIELD MyFieldName \\* MERGEFORMAT");

I nserting a Form Field
Form fields are a particular case of Word fields that allows "interaction" with the user. Form fields in
Microsoft Word include textbox, combobox and checkbox.

DocumentBuilder provides special methods to insert each type of form field into the document:
DocumentBuilder.InsertTextInput , DocumentBuilder.InsertCheckBox , and
DocumentBuilder.InsertComboBox . Note that if you specify a name for the form field, then a
bookmark is automatically created with the same name.
Inserting a Text Input
Call DocumentBuilder.InsertTextInput to insert a textbox into the document.
Example
Shows how to insert a text input form field into a document.
Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

builder.insertTextInput("TextInput", TextFormFieldType.REGULAR, "", "Hello", 0);

Inserting a Check Box
Call DocumentBuilder.InsertCheckBox to insert a checkbox into the document.
Example
Shows how to insert a checkbox form field into a document.
Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

builder.insertCheckBox("CheckBox", true, 0);

Inserting a Combo Box
Call DocumentBuilder.InsertComboBox to insert a combobox into the document.
Example
Shows how to insert a combobox form field into a document.
Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

String[] items = {"One", "Two", "Three"};
builder.insertComboBox("DropDown", items, 0);

I nserting HTML
You can easily insert an HTML string that contains an HTML fragment or whole HTML document into
the Word document. Just pass this string to the DocumentBuilder.InsertHtml method. One of the useful
implementations of the method is storing an HTML string in a database and inserting it into the document
during mail merge to get the formatted content added instead of building it using various methods of the
document builder.
Example
Inserts HTML into a document using DocumentBuilder.
Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

builder.insertHtml(
"<P align='right'>Paragraph right</P>" +
"<b>Implicit paragraph left</b>" +
"<div align='center'>Div center</div>" +
"<h1 align='left'>Heading 1 left.</h1>");

doc.save(getMyDir() + "DocumentBuilder.InsertHtml Out.doc");

I nserting a Hyperlink
DocumentBuilder.InsertHyperlink internally calls DocumentBuilder.InsertField
Unknown macro: {hyperLink}
0
0
Unknown macro: {hyperLink}
0
The method always adds apostrophes at the beginning and end of the URL.
Note that you need to specify font formatting for the hyperlink display text explicitly using the Font
property.
Example
Inserts a hyperlink into a document using DocumentBuilder.
Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

builder.write("Please make sure to visit ");

// Specify font formatting for the hyperlink.
builder.getFont().setColor(Color.BLUE);
builder.getFont().setUnderline(Underline.SINGLE);
// Insert the link.
builder.insertHyperlink("Aspose Website", "http://www.aspose.com", false);

// Revert to default formatting.
builder.getFont().clearFormatting();

builder.write(" for more information.");

doc.save(getMyDir() + "DocumentBuilder.InsertHyperlink Out.doc");

I nserting a Table of Contents
The DocumentBuilder.InsertTableOfContents method will only insert a TOC field into the document.
In order to build the table of contents and display the according page numbers, the both
Document.UpdateFields method must be called after the insertion of the field.
Example
Shows how to insert a Table of Contents field into a document.
Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

// Insert a table of contents at the beginning of the document.
builder.insertTableOfContents("\\o \"1-3\" \\h \\z \\u");

// The newly inserted table of contents will be initially empty.
// It needs to be populated by updating the fields in the document.
doc.updateFields();

Specifying Formatting
Skip to end of metadata

Attachments:9
Added by hammad, last edited by Adam Skelton on Feb 03, 2012 (view change)
Go to start of metadata
Font Formatting

Current font formatting is represented by a Font object returned by the DocumentBuilder.Font property. The
Font class contains a wide variety of the font properties possible in Microsoft Word.

Example
Shows how to set font formatting.
Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

// Set font formatting properties
Font font = builder.getFont();
font.setBold(true);
font.setColor(Color.BLUE);
font.setItalic(true);
font.setName("Arial");
font.setSize(24);
font.setSpacing(5);
font.setUnderline(Underline.DOUBLE);

// Output formatted text
builder.writeln("I'm a very nice formatted string.");

Paragraph Formatting
Current paragraph formatting is represented by a ParagraphFormat object that is returned by the
DocumentBuilder.ParagraphFormat property. This object encapsulates various paragraph formatting
properties available in Microsoft Word.

You can easily reset the paragraph formatting to default to Normal style, left aligned, no indentation, no
spacing, no borders and no shading by calling ParagraphFormat.ClearFormatting .
Example
Shows how to set paragraph formatting.
Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

// Set paragraph formatting properties
ParagraphFormat paragraphFormat = builder.getParagraphFormat();
paragraphFormat.setAlignment(ParagraphAlignment.CENTER);
paragraphFormat.setLeftIndent(50);
paragraphFormat.setRightIndent(50);
paragraphFormat.setSpaceAfter(25);

// Output text
builder.writeln("I'm a very nice formatted paragraph. I'm intended to demonstrate how
the left and right indents affect word wrapping.");
builder.writeln("I'm another nice formatted paragraph. I'm intended to demonstrate how
the space after paragraph looks like.");

Cell Formatting
Cell formatting is used during building of a table. It is represented by a CellFormat object returned by
the DocumentBuilder.CellFormat property. CellFormat encapsulates various table cell properties like
width or vertical alignment.

Example
Shows how to create a table that contains a single formatted cell.
Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

builder.startTable();
builder.insertCell();

// Set the cell formatting
CellFormat cellFormat = builder.getCellFormat();
cellFormat.setWidth(250);
cellFormat.setLeftPadding(30);
cellFormat.setRightPadding(30);
cellFormat.setTopPadding(30);
cellFormat.setBottomPadding(30);

builder.writeln("I'm a wonderful formatted cell.");

builder.endRow();
builder.endTable();

Row Formatting
Current row formatting is determined by a RowFormat object that is returned by the
DocumentBuilder.RowFormat property. The object encapsulates information about all table row
formatting.

Example
Shows how to create a table that contains a single cell and apply row formatting.
Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

Table table = builder.startTable();
builder.insertCell();

// Set the row formatting
RowFormat rowFormat = builder.getRowFormat();
rowFormat.setHeight(100);
rowFormat.setHeightRule(HeightRule.EXACTLY);
// These formatting properties are set on the table and are applied to all rows in the
table.
table.setLeftPadding(30);
table.setRightPadding(30);
table.setTopPadding(30);
table.setBottomPadding(30);

builder.writeln("I'm a wonderful formatted row.");

builder.endRow();
builder.endTable();

List Formatting
Aspose.Words allows the easy creation of lists by applying list formatting. DocumentBuilder provides
the DocumentBuilder.ListFormat property that returns a ListFormat object. This object has several
methods to start and end a list and to increase/decrease the indent.

There are two general types of lists in Microsoft Word: bulleted and numbered.
To start a bulleted list, call ListFormat.ApplyBulletDefault .
To start a numbered list, call ListFormat.ApplyNumberDefault .
The bullet or number and formatting are added to the current paragraph and all further paragraphs created
using DocumentBuilder until ListFormat.RemoveNumbers is called to stop bulleted list formatting.
In Word documents, lists may consist of up to nine levels. List formatting for each level specifies what
bullet or number is used, left indent, space between the bullet and text etc.
To increase the list level of the current paragraph by one level, call ListFormat.ListIndent .
To decrease the list level of the current paragraph by one level, call ListFormat.ListOutdent .
The methods change the list level and apply the formatting properties of the new level.

You can also use the ListFormat.ListLevelNumber property to get or set the list level for the paragraph. The list
levels are numbered 0 to 8.
Example
Shows how to build a multilevel list.
Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

builder.getListFormat().applyNumberDefault();

builder.writeln("Item 1");
builder.writeln("Item 2");

builder.getListFormat().listIndent();

builder.writeln("Item 2.1");
builder.writeln("Item 2.2");

builder.getListFormat().listIndent();

builder.writeln("Item 2.2.1");
builder.writeln("Item 2.2.2");

builder.getListFormat().listOutdent();

builder.writeln("Item 2.3");

builder.getListFormat().listOutdent();

builder.writeln("Item 3");

builder.getListFormat().removeNumbers();

Page Setup and Section Formatting
Page setup and section properties are encapsulated in the PageSetup object that is returned by the
DocumentBuilder.PageSetup property. The object contains all the page setup attributes of a section (left
margin, bottom margin, paper size, and so on) as properties.


Example
Shows how to set such properties as page size and orientation for the current section.
Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

// Set page properties
builder.getPageSetup().setOrientation(Orientation.LANDSCAPE);
builder.getPageSetup().setLeftMargin(50);
builder.getPageSetup().setPaperSize(PaperSize.PAPER_10_X_14);

Applying a Style
Some formatting objects like Font or ParagraphFormat support styles. A single built-in or user defined
style is represented by a Style object that contains the corresponding style properties like name, base
style, font and paragraph formatting of the style, and so on.

Furthermore, a Style object provides the Style.StyleIdentifier property that returns a locale-independent
style identifier represented by a Style.StyleIdentifier enumeration value. The point is that the names of
built-in styles in Microsoft Word are localized for different languages. Using a style identifier, you can
find the correct style regardless of the document language. The enumeration values correspond to the
Microsoft Word built-in styles such as Normal , Heading 1 , Heading 2 etc. All user-defined styles are
assigned the StyleIdentifier.User value.


Example
Shows how to apply a paragraph style.
Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

// Set paragraph style
builder.getParagraphFormat().setStyleIdentifier(StyleIdentifier.TITLE);

builder.write("Hello");

Borders and Shading
Borders are represented by the BorderCollection. This is a collection of Border objects that are accessed
by index or by border type. Border type is represented by the BorderType enumeration. Some values of
the enumeration are applicable to several or only one document element. For example,
BorderType.Bottom is applicable to a paragraph or table cell while BorderType.DiagonalDown
specifies the diagonal border in a table cell only.

Both the border collection and each separate border have similar attributes like color, line style, line
width, distance from text, and optional shadow. They are represented by properties of the same name.
You can achieve different border types by combining the property values. In addition, both
BorderCollection and Border objects allow you to reset these values to default by calling the
Border.ClearFormatting method. Note that when border properties are reset to default values, the
border is invisible.


The Shading class contains shading attributes for document elements. You can set the desired shading
texture and the colors that are applied to the background and foreground of the element.
The shading texture is set with a TextureIndex enumeration value that allows the application of various
patterns to the Shading object. For example, to set a background color for a document element, use the
TextureIndex.TextureSolid value and set the foreground shading color as appropriate.


Example
Shows how to apply borders and shading to a paragraph.
Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

// Set paragraph borders
BorderCollection borders = builder.getParagraphFormat().getBorders();
borders.setDistanceFromText(20);
borders.getByBorderType(BorderType.LEFT).setLineStyle(LineStyle.DOUBLE);
borders.getByBorderType(BorderType.RIGHT).setLineStyle(LineStyle.DOUBLE);
borders.getByBorderType(BorderType.TOP).setLineStyle(LineStyle.DOUBLE);
borders.getByBorderType(BorderType.BOTTOM).setLineStyle(LineStyle.DOUBLE);

// Set paragraph shading
Shading shading = builder.getParagraphFormat().getShading();
shading.setTexture(TextureIndex.TEXTURE_DIAGONAL_CROSS);
shading.setBackgroundPatternColor(new Color(240, 128, 128)); // Light Coral
shading.setForegroundPatternColor(new Color(255, 160, 122)); // Light Salmon

builder.write("I'm a formatted paragraph with double border and nice shading.");

Moving the Cursor
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Feb 03, 2012 (view change)
Go to start of metadata
Detecting the Current Cursor Position

You can obtain where the builder's cursor is currently positioned at any time. The
DocumentBuilder.CurrentNode property returns the node that is currently selected in this builder. The node is
a direct child of a paragraph. Any insert operations you perform using DocumentBuilder will insert before the
DocumentBuilder.CurrentNode. When the current paragraph is empty or the cursor is positioned just before
the end of the paragraph, DocumentBuilder.CurrentNode returns null.

Also, you can use the DocumentBuilder.CurrentParagraph property, which gets the paragraph that is currently
selected in this DocumentBuilder .
Example
Shows how to access the current node in a document builder.
Java
Document doc = new Document(getMyDir() + "DocumentBuilder.doc");
DocumentBuilder builder = new DocumentBuilder(doc);

Node curNode = builder.getCurrentNode();
Paragraph curParagraph = builder.getCurrentParagraph();

Moving to Any Node (Paragraphs and their Children)
If you have a document object node, which is a paragraph or a direct child of a paragraph, you can point
the builder's cursor to this node. Use the DocumentBuilder.MoveTo method to perform this.
Example
Shows how to move a cursor position to a specified node.
Java
Document doc = new Document(getMyDir() + "DocumentBuilder.doc");
DocumentBuilder builder = new DocumentBuilder(doc);

builder.moveTo(doc.getFirstSection().getBody().getLastParagraph());

Moving to the Document Start/End
If you need to move to the beginning of the document, call DocumentBuilder.MoveToDocumentStart.
If you need to move to the end of the document, call DocumentBuilder.MoveToDocumentEnd.
Example
Shows how to move a cursor position to the beginning or end of a document.
Java
Document doc = new Document(getMyDir() + "DocumentBuilder.doc");
DocumentBuilder builder = new DocumentBuilder(doc);

builder.moveToDocumentEnd();
builder.writeln("This is the end of the document.");

builder.moveToDocumentStart();
builder.writeln("This is the beginning of the document.");

Moving to a Section
If you are working with a document that contains multiple sections, you can move to a desired section
using DocumentBuilder.MoveToSection. This method moves the cursor to the beginning of a specified
section and accepts the index of the required section. When the section index is greater than or equal to 0,
it specifies an index from the beginning of the document with 0 being the first section. When the section
index is less than 0, it specifies an index from the end of the document with -1 being the last section.
Example
Shows how to move a cursor position to the specified section.
Java
Document doc = new Document(getMyDir() + "DocumentBuilder.doc");
DocumentBuilder builder = new DocumentBuilder(doc);

// Parameters are 0-index. Moves to third section.
builder.moveToSection(2);
builder.writeln("This is the 3rd section.");

Moving to a Header/Footer
When you need to place some data into a header or footer, you should move there first using
DocumentBuilder.MoveToHeaderFooter. The method accepts a HeaderFooterType enumeration
value that identifies the type of header or footer to where the cursor should be moved.

If you want to create headers and footers that are different for the first page, you need to set the
PageSetup.DifferentFirstPageHeaderFooter property to true . If you want to create headers and footers
that are different for even and odd pages, you need to set PageSetup.OddAndEvenPagesHeaderFooter
to true .
If you need to get back to the main story, use DocumentBuilder.MoveToSection to move out of the
header or footer.
Example
Creates headers and footers in a document using DocumentBuilder.
Java
// Create a blank document.
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

// Specify that we want headers and footers different for first, even and odd pages.
builder.getPageSetup().setDifferentFirstPageHeaderFooter(true);
builder.getPageSetup().setOddAndEvenPagesHeaderFooter(true);

// Create the headers.
builder.moveToHeaderFooter(HeaderFooterType.HEADER_FIRST);
builder.write("Header First");
builder.moveToHeaderFooter(HeaderFooterType.HEADER_EVEN);
builder.write("Header Even");
builder.moveToHeaderFooter(HeaderFooterType.HEADER_PRIMARY);
builder.write("Header Odd");

// Create three pages in the document.
builder.moveToSection(0);
builder.writeln("Page1");
builder.insertBreak(BreakType.PAGE_BREAK);
builder.writeln("Page2");
builder.insertBreak(BreakType.PAGE_BREAK);
builder.writeln("Page3");

doc.save(getMyDir() + "DocumentBuilder.HeadersAndFooters Out.doc");

Moving to a Paragraph
Use DocumentBuilder.MoveToParagraph to move the cursor to a desired paragraph in the current
section. You should pass two parameters to this method: paragraphIndex (the index of the paragraph to
move to) and characterIndex (the index of the character inside the paragraph).

The navigation is performed inside the current story of the current section. That is, if you moved the
cursor to the primary header of the first section, then paragraphIndex specifies the index of the paragraph
inside that header of that section.
When paragraphIndex is greater than or equal to 0, it specifies an index from the beginning of the section
with 0 being the first paragraph. When paragraphIndex is less than 0, it specifies an index from the end of
the section with -1 being the last paragraph.
The character index can currently only be specified as 0 to move to the beginning of the paragraph or -1
to move to the end of the paragraph.
Example
Shows how to move a cursor position to the specified paragraph.
Java
Document doc = new Document(getMyDir() + "DocumentBuilder.doc");
DocumentBuilder builder = new DocumentBuilder(doc);

// Parameters are 0-index. Moves to third paragraph.
builder.moveToParagraph(2, 0);
builder.writeln("This is the 3rd paragraph.");

Moving to a Table Cell
Use DocumentBuilder.MoveToCell if you need to move the cursor to a table cell in the current section.
This method accepts four parameters:
tableIndex - the index of the table to move to.
rowIndex - the index of the row in the table.
columnIndex - the index of the column in the table.
characterIndex - the index of the character inside the cell.
The navigation is performed inside the current story of the current section.
For the index parameters, when index is greater than or equal to 0, it specifies an index from the
beginning with 0 being the first element. When index is less than 0, it specifies an index from the end
with -1 being the last element.
Also, note that characterIndex currently can only specify 0 to move to the beginning of the cell or -1 to
move to the end of the cell.
Example
Shows how to move a cursor position to the specified table cell.
Java
Document doc = new Document(getMyDir() + "DocumentBuilder.doc");
DocumentBuilder builder = new DocumentBuilder(doc);

// All parameters are 0-index. Moves to the 2nd table, 3rd row, 5th cell.
builder.moveToCell(1, 2, 4, 0);
builder.writeln("Hello World!");

Moving to a Bookmark
Bookmarks are used frequently to mark particular places in the document where new elements are to be
inserted. To move to a bookmark, use DocumentBuilder.MoveToBookmark. This method has two
overloads. The simplest one accepts nothing but the name of the bookmark where the cursor is to be
moved.
Example
Shows how to move a cursor position to a bookmark.
Java
Document doc = new Document(getMyDir() + "DocumentBuilder.doc");
DocumentBuilder builder = new DocumentBuilder(doc);

builder.moveToBookmark("CoolBookmark");
builder.writeln("This is a very cool bookmark.");

This overload moves the cursor to a position just after the start of the bookmark with the specified name.

Another overload DocumentBuilder.MoveToBookmark(String, Boolean, Boolean) moves the cursor
to a bookmark with greater precision. It accepts two additional boolean parameters:
isStart determines whether to move the cursor to the beginning or to the end of the bookmark.
isAfter determines whether to move the cursor to be after the bookmark start or end position, or to move
the cursor to be before the bookmark start or end position.
Example
Shows how to move a cursor position to just after the bookmark end.
Java
Document doc = new Document(getMyDir() + "DocumentBuilder.doc");
DocumentBuilder builder = new DocumentBuilder(doc);

builder.moveToBookmark("CoolBookmark", false, true);
builder.writeln("This is a very cool bookmark.");

The comparison for both methods is not case-sensitive.

Inserting new text in this way does not replace the existing text of the bookmark.

Note that some bookmarks in the document are assigned to form fields. Moving to such a bookmark and
inserting text there inserts the text into the form field code. Although this will not invalidate the form field, the
inserted text will not be visible because it becomes part of the field code.
Moving to a Merge Field
Sometimes you may need to perform "manual" mail merge using DocumentBuilder or fill a merge field
in a special way inside a mail merge event handler. That is when DocumentBuilder.MoveToMergeField
could be useful. The method accepts the name of the merge field. It moves the cursor to a position just
beyond the specified merge field and removes the merge field.
Example
Shows how to move the cursor to a position just beyond the specified merge field.
Java
Document doc = new Document(getMyDir() + "DocumentBuilder.doc");
DocumentBuilder builder = new DocumentBuilder(doc);

builder.moveToMergeField("NiceMergeField");
builder.writeln("This is a very nice merge field.");


Note that moving the cursor to a merge field deletes the merge field from the document.

The comparison is not case-sensitive.
Find and Replace Overview
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Feb 03, 2012 (view change)
Go to start of metadata

Use Range.Replace to find or replace a particular string within the current range. It returns the number of
replacements made, so it is useful for searching strings without replace. An exception is thrown if a captured or
replacement string contains one or more special characters: paragraph break, cell break, section break, field
start, field separator, field end, inline picture, drawing object, footnote.
The Range.Replace method provides several overloads. Here are the possibilities they provide:
You can specify a string to be replaced, a string that will replace all its occurrences, whether the
replacement is case-sensitive, and whether only stand-alone words will be affected. Note that a word is
defined as being made up of only alpha-numeric characters. If replace is executed with only whole words
only being matched and the input string happens to contain symbols, then no phrases will be found.
You can pass a regular expression pattern used to find matches and a string that will replace them. This
overload replaces the whole match captured by the regular expression.
You can pass a regular expression pattern, and an object that implements the IReplacingCallBack
interface. This sets out a user-defined method, which evaluates replacement at each step, you can also
indicate whether the replacement should be done in a forward or backward direction. It is recommended
if you are removing nodes during replacement then the replacement should be executed backwards to
avoid any potential issues that may arise when removing nodes during the replacement process.

A class implementing the IReplacingCallBack interface will define a method IReplacingCallBack.Replacing
that accepts a ReplacingArgs object providing data for a custom replace operation. The method should
return a ReplaceAction enumeration value that specifies what happens to the current match during a
replace operation - whether it should be replaced, skipped, or the whole replace operation should be
terminated.
The following example shows how to use the aforementioned overloads. The sample class provides
methods, each of which uses a Range.Replace overload:
Replace1 simply replaces all occurrences of the word "sad" to "bad".
Replace2 replaces all occurrences of the words "sad" or "mad" to "bad".
Replace3 uses a replace evaluator method to concatenate occurrences of words "sad" or "bad" with the
counter value that is incremented each time the new occurrence is found.
Example
Shows how to replace all occurrences of word "sad" to "bad".
Java
Document doc = new Document(getMyDir() + "Document.doc");
doc.getRange().replace("sad", "bad", false, true);

Example
Shows how to replace all occurrences of words "sad" or "mad" to "bad".
Java
Document doc = new Document(getMyDir() + "Document.doc");
doc.getRange().replace(Pattern.compile("[s|m]ad"), "bad");

Example
Shows how to replace with a custom evaluator.
Java
public void replaceWithEvaluator() throws Exception
{
Document doc = new Document(getMyDir() + "Range.ReplaceWithEvaluator.doc");
doc.getRange().replace(Pattern.compile("[s|m]ad"), new MyReplaceEvaluator(),
true);
doc.save(getMyDir() + "Range.ReplaceWithEvaluator Out.doc");
}

private class MyReplaceEvaluator implements IReplacingCallback
{
/**
* This is called during a replace operation each time a match is found.
* This method appends a number to the match string and returns it as a
replacement string.
*/
public int replacing(ReplacingArgs e) throws Exception
{
e.setReplacement(e.getMatch().group() + Integer.toString(mMatchNumber));
mMatchNumber++;
return ReplaceAction.REPLACE;
}

private int mMatchNumber;
}

Ranges Overview
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Feb 03, 2012 (view change)
Go to start of metadata

In Aspose.Words, a Range is a flat window into an otherwise tree-like model of the document.
If you have worked with Microsoft Word Automation, you probably know that one of the main tools to
examine and modify document content is the Range object. Range is like a "window" into the document
content and formatting.
Aspose.Words also has the Range class and it is designed to look and act similarly to Range in Microsoft
Word. Although Range cannot cover an arbitrary portion of a document and does not have a Start and
End , you can access the range covered by any document node including the Document itself. In other
words, each node has its own range.
The Range object allows you to access and modify text, bookmarks and form fields within the range.
Retrieving Plain Text
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Feb 03, 2012 (view change)
Go to start of metadata

Use the Range.Text property to retrieve plain, unformatted text of the range.
Example
Shows how to get plain, unformatted text of a range.
Java
Document doc = new Document(getMyDir() + "Document.doc");
String text = doc.getRange().getText();

Deleting Text
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Feb 03, 2012 (view change)
Go to start of metadata

Range allows the deletion of all characters of the range by calling Range.Delete.
Example
Shows how to delete all characters of a range.
Java
Document doc = new Document(getMyDir() + "Document.doc");
doc.getSections().get(0).getRange().delete();

Appending Documents Overview
Skip to end of metadata

Attachments:2
Added by hammad, last edited by Adam Skelton on Feb 03, 2012 (view change)
Go to start of metadata

This topic discusses how to programmatically join and append documents using Aspose.Words. Appending
documents is a very common task, one which is fully supported. Using Aspose.Words you can easily append
one document to another using just a one line API call.

This topic provides details and code examples on how to append documents and how to further control how
the documents are joined. For instance, there are examples which show how to set an appended document to
appear on the next page and how to restart the page numbering in the pages that are joined.
Key Terms and Sample Documents
When appending documents the destination document is the base document to which the content from the
source document is imported into. These are common terms used frequently in the context of appending
and copying content from document to document.

Each sample below shows how to append documents with different options. In these samples we will be
using these two main documents along with a few variants of them in order to demonstrate the different
techniques outlined in this article.
The content of the destination document is below. In the code this is loaded into a Document object
referenced as dstDoc in our code. This document serves as the base document to which the source
document is appended on to.


The content of the source document is found below. In the code this also will be loaded into a Document
object and referenced as srcDoc. This document is what will be appended to the destination document.


Join a Document onto another Document
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Feb 03, 2012 (view change)
Go to start of metadata

The simplest way to join documents involves a single call the Document.AppendDocument method. This
method will append the Document object passed as a parameter to the end of the Document object which
called the method. The second parameter accepts an ImportFormatMode enumeration which defines how
conflicting styles are handled when one document is imported into the other.
Example
Shows how to append a document to the end of another document using no additional options.
Java
// Append the source document to the destination document using no extra options.
dstDoc.appendDocument(srcDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);


After this code is executed the destination document will include all content from the source document, which
is inserted at the end of the destination document.
How the AppendDocument Method Works
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Feb 03, 2012 (view change)
Go to start of metadata

Its useful to take a look at the logic behind the Document.AppendDocument method. This will provide some
useful background information which helps to:
Gain a better understanding of how the method works so if a resulting document does not appear as
expected you will have a better idea as to the reason why.
Understand the general underlying process of how to copy nodes between documents. This is useful if
you are planning to implement your own import method. The technique behind how a document is
appended is the same and can be applied to specific node types as well. For example, importing
content at the paragraph level instead of at the section level.
Provide a sample implementation which can be used if you are using an older version of
Aspose.Words before the Document.AppendDocument method was introduced. You can use this code
as a manual implementation to append documents.
The method below provides a manual implementation of the Document.AppendDocument function
which closely follows the same underlying process as used in the built-in method.
Example
Shows how to manually append the content from one document to the end of another document.
Java
/**
* A manual implementation of the Document.AppendDocument function which shows the
general
* steps of how a document is appended to another.
*
* @param dstDoc The destination document where to append to.
* @param srcDoc The source document.
* @param mode The import mode to use when importing content from another document.
*/
public void appendDocument(Document dstDoc, Document srcDoc, int mode) throws
Exception
{
// Loop through all sections in the source document.
// Section nodes are immediate children of the Document node so we can just
enumerate the Document.
for (Node srcNode : srcDoc)
{
Section srcSection = (Section)srcNode;

// Because we are copying a section from one document to another,
// it is required to import the Section node into the destination document.
// This adjusts any document-specific references to styles, lists, etc.
//
// Importing a node creates a copy of the original node, but the copy
// is ready to be inserted into the destination document.
Node dstSection = dstDoc.importNode(srcSection, true, mode);

// Now the new section node can be appended to the destination document.
dstDoc.appendChild(dstSection);
}
}

Each section is imported into the destination document and is appended to the end of the document. Since
content is imported section by section this means settings such as page setup and headers and footers are
preserved during the import.

It is also useful to note that because the joining point of two documents is at the section level the specific
joining point occurs between the last section of the destination document and the first section of the
source document. The section and page setup properties dictate how the two documents are joined
together at the joining of the two sections. The most common of these settings is to define if the source
document is to appear on the same page or a new page.
As suggested above this approach is not limited to just combining documents. It is a common approach
that you should use when you need to copy nodes from one document into another. There are three simple
steps to copy any node from one document to another:
1. Obtain the node in the source document that you want to copy.
2. Import the node into the destination document. Importing creates a new node that is a copy of the
original node, but suitable for insertion into the destination document.
3. Insert the imported node into the destination document.
Differences between ImportFormat Modes
Skip to end of metadata

Attachments:3
Added by hammad, last edited by Adam Skelton on Feb 03, 2012 (view change)
Go to start of metadata

This option is required when importing any node from one document to another. It dictates how formatting is
resolved when both documents contain the same style but which use different formatting. As the names
suggest, ImportFormatMode.KeepSourceFormatting will retain the original formatting used in the source
document while ImportFormatMode.UseDestinationStyles will cause any conflicting styles to use the
formatting defined in the destination document.

Microsoft Word also provides this option when copying content from one document to another.


Details of Keep Source Formatting
When the source formatting is retained for imported content any conflicting styles are copied to the
destination document and given a suffix number to distinguish them in the combined document. For
example if both documents contain content styled with the style Normal then when appending the
document the content formatted in destination document with this style will remain formatted with the
Normal style whereas the content from the source document will be formatted with a newly made style
called Normal_0 which is a copy of the original style used in the source document. Only styles which
are actually used in the source document will be copied over to the destination document.
Example
Shows how to append a document to another document while keeping the original formatting.
Java
// Load the documents to join.
Document dstDoc = new Document(gDataDir + "TestFile.Destination.doc");
Document srcDoc = new Document(gDataDir + "TestFile.Source.doc");

// Keep the formatting from the source document when appending it to the destination
document.
dstDoc.appendDocument(srcDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);

// Save the joined document to disk.
dstDoc.save(gDataDir + "TestFile.KeepSourceFormatting Out.docx");

This code results in the output below. The conflicting styles are copied to the document and are renamed.


Consolidating styles is handled this way when joining documents as styles that have identical names and
identical properties should still remain distinguishable. For example, two documents may be combined
which have no relation in any way to each other but which contain two styles names and properties which
happen to exactly match by accident. The correct behavior would be for these two styles to remain
separate in the combined document so they can both be changed independently which is not possible if
they were to be combined into one style.
If you require an option to combine identical styles in order to reduce the number of copied styles, then
feel free to post your request on the subject in our forum.
Details of Using Destination Styles
Using destination styles dictates that matching styles in the source document will take on the formatting
of the destination document. A block of text in the source document with the style Heading 1 will
remain with that style setting when its appended but it will take on the formatting of that style defined n
the destination document, even if the Heading 1 styling is vastly different in the destination document
from what it originally was in the source document.
Example
Shows how to append a document to another document using the formatting of the destination document.
Java
// Load the documents to join.
Document dstDoc = new Document(gDataDir + "TestFile.Destination.doc");
Document srcDoc = new Document(gDataDir + "TestFile.Source.doc");

// Append the source document using the styles of the destination document.
dstDoc.appendDocument(srcDoc, ImportFormatMode.USE_DESTINATION_STYLES);

// Save the joined document to disk.
dstDoc.save(gDataDir + "TestFile.UseDestinationStyles Out.doc");

Using destination styles has its advantages as it reduces duplicated styles which would occur when using
the ImportFormatMode.KeepSourceFormatting option.

This time the appended content uses the destination styles. No extra styles are created in the complete
document.


Further information about the different import modes can be found in the API description for the
ImportFormatMode enumeration.
Specifying How a Document is Joined
Together
Skip to end of metadata

Attachments:4
Added by hammad, last edited by Caroline von Schmalensee on Nov 05, 2013 (view change)
Go to start of metadata
Specifying the Source Document to Flow Continuously or Start from a New Page
Documents are appended at the section level therefore the PageSetup.SectionStart property of the
Section object defines how the content of the current section is joined in relation to the previous section.
If the PageSetup.SectionStart property is set to SectionStart.NewPage for the first section in the source
document then the content in this section is forced to start on a new page. Conversely if the property is set
to SectionStart.Continuous then the content is allowed to flow on the same page directly after the
previous sections content.

Specifying the PageSetup.SectionStart property as SectionStart.Continuous for the first section of the
source document will cause the content to appear together.
Example
Shows how to append a document to another document so the content flows continuously.
Java
Document dstDoc = new Document(gDataDir + "TestFile.Destination.doc");
Document srcDoc = new Document(gDataDir + "TestFile.Source.doc");

// Make the document appear straight after the destination documents content.
srcDoc.getFirstSection().getPageSetup().setSectionStart(SectionStart.CONTINUOUS);

// Append the source document using the original styles found in the source document.
dstDoc.appendDocument(srcDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
dstDoc.save(gDataDir + "TestFile.JoinContinuous Out.doc");

The generated output is below. The content of the joined document flows continuously.

Specifying the PageSetup.SectionStart property as SectionStart.NewPage instead will cause the
appended content to appear on a new page.
Example
Shows how to append a document to another document so it starts on a new page.
Java
Document dstDoc = new Document(gDataDir + "TestFile.Destination.doc");
Document srcDoc = new Document(gDataDir + "TestFile.Source.doc");

// Set the appended document to start on a new page.
srcDoc.getFirstSection().getPageSetup().setSectionStart(SectionStart.NEW_PAGE);

// Append the source document using the original styles found in the source document.
dstDoc.appendDocument(srcDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
dstDoc.save(gDataDir + "TestFile.JoinNewPage Out.doc");


Please note that sometimes even a section set to be continuous may be forced onto a different page. This
can happen as a result of the sections having different page settings. For example if one section has a
larger PageSetup.PageWidth or PageSetup.PageHeight setting then the two sections cannot flow on
one page.
By default a new document in Microsoft Word is created with sections to start on a new page. This option
can be set under Page Setup in Microsoft Word as shown below.

This option is represented by the PageSetup.SectionStart property of the Section object in
Aspose.Words. This property is of no real interest in the first section of a document when its not being
joined to another document as any section start type for the first section will not affect how the document
is displayed. Due to this reason it is unlikely to be changed and therefore the first section of a document
will almost always be set to start on new page by default. This property does however become important
when a document is being used to append to another as it defines how the source document is appended to
the destination document.
As a result of this documents that are appended without the PageSetup.SectionStart property specifically
defined will almost always result in the source document appearing on a new page by default.
Appending a Documents Content to Appear Together on the Same Page
A document can be appended so that the content will always appear together on the same page and not
split across two pages.
Example
Shows how to append a document to another document while keeping the content from splitting across
two pages.
Java
Document dstDoc = new Document(gDataDir + "TestFile.Destination.doc");
Document srcDoc = new Document(gDataDir + "TestFile.Source.doc");

// Set the source document to appear straight after the destination document's
content.
srcDoc.getFirstSection().getPageSetup().setSectionStart(SectionStart.CONTINUOUS);

// Iterate through all sections in the source document.
for(Paragraph para : (Iterable<Paragraph>) srcDoc.getChildNodes(NodeType.PARAGRAPH,
true))
{
para.getParagraphFormat().setKeepWithNext(true);
}

dstDoc.appendDocument(srcDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
dstDoc.save(gDataDir + "TestDcc.KeepSourceTogether Out.doc");

The code above will set the entire content of the appended document to be kept together on the same page
using the ParagraphFormat.KeepWithNext property of the Paragraph class.

The output produced is below. Since the position it is inserted at will cause the content to be split across
two pages the entire content is moved to the next page instead. Please note this is different from setting
the source document to appear on a new page, it is good to note that the source document is set to be
appended continuously

Controlling How Header and Footers Appear
Skip to end of metadata

Attachments:3
Added by hammad, last edited by Adam Skelton on Feb 03, 2012 (view change)
Go to start of metadata
Continuing Headers and Footers from the Destination Document

The headers and footers of a document provide an option which allows the current sections headers and
footers to continue on from the previous section. This setting can be seen in Microsoft Word below.

In Aspose.Words this setting is controlled by the HeaderFooterCollection.LinkToPrevious method.
Passing a value of true will cause all types of headers footers to removed from this section if there are
any and the headers and footers from the previous section to be displayed instead.
Example
Shows how to append a document to another document and continue headers and footers from the
destination document.
Java
Document dstDoc = new Document(gDataDir + "TestFile.Destination.doc");
Document srcDoc = new Document(gDataDir + "TestFile.Source.doc");

// Set the appended document to appear on a new page.
srcDoc.getFirstSection().getPageSetup().setSectionStart(SectionStart.NEW_PAGE);

// Link the headers and footers in the source document to the previous section.
// This will override any headers or footers already found in the source document.
srcDoc.getFirstSection().getHeadersFooters().linkToPrevious(true);

dstDoc.appendDocument(srcDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
dstDoc.save(gDataDir + "TestFile.LinkHeadersFooters Out.doc");

If the source document has multiple sections already all using the same headers and footers then they will
all most likely be linked to the headers and footers of the previous section. This will mean after the
headers and footers of the first section are linked to the previous section these sections will also
automatically inherit the headers and footers from destination document as well.

In some cases if your source document uses different headers in multiple sections you may need to call
the HeaderFooterCollection.LinkToPrevious method on each of these sections in order for them to
inherit the headers and footers from the destination document.
The resulting document is displayed below. The source document now takes on the headers and footers of
the destination document.


Stopping Headers and Footers from Continuing from the Destination Document
As described previously a section may be already set to inherit the headers and footers from the previous
section. Even a document which has no content in the headers and footers can still have a link to the
headers and footers of the previous section. When such a document is appended to another document then
the headers and footers from the destination document will carry through to the source document.

To avoid this situation the headers and footers must be unlinked by calling the
HeaderFooterCollection.LinkToPrevious method on the first section of the source document. Passing
false to this method will unlink all types of headers and footers from the previous section. It is enough to
unlink only the first section as any further linked sections in the source document now will not inherit any
headers or footers from the previous section.
Example
Shows how to append a document to another document so headers and footers do not continue from the
destination document.
Java
Document dstDoc = new Document(gDataDir + "TestFile.Destination.doc");
Document srcDoc = new Document(gDataDir + "TestFile.Source.doc");

// Even a document with no headers or footers can still have the LinkToPrevious
setting set to true.
// Unlink the headers and footers in the source document to stop this from continuing
the headers and footers
// from the destination document.
srcDoc.getFirstSection().getHeadersFooters().linkToPrevious(false);

dstDoc.appendDocument(srcDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
dstDoc.save(gDataDir + "TestFile.UnlinkHeadersFooters Out.doc");

Removing Headers and Footers from the Source Document
Sometimes documents which are being joined are no longer required to display their headers and footers.
Removing them can be easily achieved by calling the Section.ClearHeadersFooters method.
Example
Shows how to remove headers and footers from a document before appending it to another document.
Java
Document dstDoc = new Document(gDataDir + "TestFile.Destination.doc");
Document srcDoc = new Document(gDataDir + "TestFile.Source.doc");

// Remove the headers and footers from each of the sections in the source document.
for (Section section : srcDoc.getSections())
{
section.clearHeadersFooters();
}

// Even after the headers and footers are cleared from the source document, the
"LinkToPrevious" setting
// for HeadersFooters can still be set. This will cause the headers and footers to
continue from the destination
// document. This should set to false to avoid this behaviour.
srcDoc.getFirstSection().getHeadersFooters().linkToPrevious(false);

dstDoc.appendDocument(srcDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
dstDoc.save(gDataDir + "TestFile.RemoveSourceHeadersFooters Out.doc");

As in the previous examples above the headers and footers are unlinked from the previous section to
avoid the destination headers and footers being used in place of the removed headers.

The result shows the joined document retains the headers and footers the destination portion but are
removed in the source portion of the document.


Controlling How Page Numbering is Handled
Skip to end of metadata

Attachments:7
Added by hammad, last edited by Awais Hafeez on Jun 23, 2014 (view change)
Go to start of metadata
Restarting Page Numbering

By default combined documents which contain page numbering fields will automatically have the page
numbering continued throughout the joined document. For instance, the sample documents when joined
together will have continuous page numbering. The page number fields (PAGENUM) will display {1-4} across
the pages and total page fields (NUMPAGES) will display {4}.
A section contains the option to restart page numbering. In Microsoft Word this can be specified in the
Page Numbering options.


To restart the page numbering at the start of section the PageSetup.RestartPageNumbering property
must be set to true. The number which this is restarted to is defined by the
PageSetup.PageStartingNumber property. This property is set to 1 in Microsoft Word and
Aspose.Words by default. In this example we will restart the page numbering at the start of the source
document.
Example
Shows how to append a document to another document with page numbering restarted.
Java
Document dstDoc = new Document(gDataDir + "TestFile.Destination.doc");
Document srcDoc = new Document(gDataDir + "TestFile.Source.doc");

// Set the appended document to appear on the next page.
srcDoc.getFirstSection().getPageSetup().setSectionStart(SectionStart.NEW_PAGE);
// Restart the page numbering for the document to be appended.
srcDoc.getFirstSection().getPageSetup().setRestartPageNumbering(true);

dstDoc.appendDocument(srcDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
dstDoc.save(gDataDir + "TestFile.RestartPageNumbering Out.doc");

The output shows the page numbering has been restarted where the source document was appended.

Retaining Multiple Page Numbering Schemes when using the NUMPAGES Field
Often when documents containing NUMPAGES fields are appended the desired behavior is for that
field type to continue to display the total page count only for only those newly appended pages, just
like how they appeared in the original document. However the actual behavior is the opposite and
the NUMPAGES field will instead by design display the total number of pages across the entire
document.
In this context we will refer to each appended document with the page numbering in the first
section restarted as a subdocument. Since each subdocument has its numbering restarted and
therefore its own page numbering scheme, it should have the total pages field numbering reflecting
this by only having pages belonging to the sub document being counted by this field.
For example using the code below, the source document will have its numbering restarted. This will
result in the joined document having the page numbers {1, 2, 1, 2}. However the total pages field
(the NUMPAGES field) found next to it does not follow this scheme and will display {4}across all
pages.
This issue is demonstrated in the joint document below. The total pages for the content of the
destination document and source document are both desired to be 2. However the NUMPAGES
field across all pages still displays 4.


The only possible solution to this issue is a work around. This is because this type of behavior is
impossible to implement in Microsoft Word as there is no direct support for multiple schemed
numbering. Since this is not possible in Microsoft Word there is no method to directly implement
this using Aspose.Words as well.
The closest functionality which emulates the desired behavior is the SECTIONPAGES field. This
will display the total number of pages in the section. However this could only be used a partial
solution. Either of the destination and source document could possibly have many sections all of
which share the same page numbering scheme. This solution will only provide the total page
numbering for the current section which is not the correct output.
The source document is demonstrates this as between the two pages is a section break which divides
the content of the pages into different sections.


The solution which provides the correct behavior involves replacing the NUMPAGES field with
PAGEREF fields which refers to a bookmark positioned at the end of the sub document. This
solution is the optimal choice as the PAGEREF field will display the desired page numbers while
still retaining the other properties and behavior of a page field. Furthermore as long as the
bookmarks are not removed further content and sections can be added and removed and the page
numbering will reflect these changes correctly.
The implementation below provides an implementation and example which will automatically
convert all NUMPAGE fields in the combined document to PAGEREF fields using the described
technique above.
Example
Shows how to change the NUMPAGE fields in a document to display the number of pages only within a sub document.
Java
public static void appendDocument_ConvertNumPageFields() throws Exception
{
Document dstDoc = new Document(gDataDir + "TestFile.Destination.doc");
Document srcDoc = new Document(gDataDir + "TestFile.Source.doc");

// Restart the page numbering on the start of the source document.
srcDoc.getFirstSection().getPageSetup().setRestartPageNumbering(true);
srcDoc.getFirstSection().getPageSetup().setPageStartingNumber(1);

// Append the source document to the end of the destination document.
dstDoc.appendDocument(srcDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);

// After joining the documents the NUMPAGE fields will now display the total
number of pages which
// is undesired behaviour. Call this method to fix them by replacing them with
PAGEREF fields.
convertNumPageFieldsToPageRef(dstDoc);

dstDoc.updateFields();
// This needs to be called in order to update the new fields with page numbers.
dstDoc.updatePageLayout();

dstDoc.save(gDataDir + "TestFile.ConvertNumPageFields Out.doc");
}

/**
* Replaces all NUMPAGES fields in the document with PAGEREF fields. The replacement
field displays the total number
* of pages in the sub document instead of the total pages in the document.
*
* @param doc The combined document to process.
*/
public static void convertNumPageFieldsToPageRef(Document doc) throws Exception
{
// This is the prefix for each bookmark which signals where page numbering
restarts.
// The underscore "_" at the start inserts this bookmark as hidden in MS Word.
final String BOOKMARK_PREFIX = "_SubDocumentEnd";
// Field name of the NUMPAGES field.
final String NUM_PAGES_FIELD_NAME = "NUMPAGES";
// Field name of the PAGEREF field.
final String PAGE_REF_FIELD_NAME = "PAGEREF";

// Create a new DocumentBuilder which is used to insert the bookmarks and
replacement fields.
DocumentBuilder builder = new DocumentBuilder(doc);
// Defines the number of page restarts that have been encountered and therefore
the number of "sub" documents
// found within this document.
int subDocumentCount = 0;

// Iterate through all sections in the document.
for (Section section : doc.getSections())
{
// This section has it's page numbering restarted so we will treat this as the
start of a sub document.
// Any PAGENUM fields in this inner document must be converted to special
PAGEREF fields to correct numbering.
if (section.getPageSetup().getRestartPageNumbering())
{
// Don't do anything if this is the first section in the document. This
part of the code will insert the bookmark marking
// the end of the previous sub document so therefore it is not applicable
for first section in the document.
if (!section.equals(doc.getFirstSection()))
{
// Get the previous section and the last node within the body of that
section.
Section prevSection = (Section)section.getPreviousSibling();
Node lastNode = prevSection.getBody().getLastChild();

// Use the DocumentBuilder to move to this node and insert the
bookmark there.
// This bookmark represents the end of the sub document.
builder.moveTo(lastNode);
builder.startBookmark(BOOKMARK_PREFIX + subDocumentCount);
builder.endBookmark(BOOKMARK_PREFIX + subDocumentCount);

// Increase the subdocument count to insert the correct bookmarks.
subDocumentCount++;
}
}

// The last section simply needs the ending bookmark to signal that it is the
end of the current sub document.
if (section.equals(doc.getLastSection()))
{
// Insert the bookmark at the end of the body of the last section.
// Don't increase the count this time as we are just marking the end of
the document.
Node lastNode = doc.getLastSection().getBody().getLastChild();
builder.moveTo(lastNode);
builder.startBookmark(BOOKMARK_PREFIX + subDocumentCount);
builder.endBookmark(BOOKMARK_PREFIX + subDocumentCount);
}

// Iterate through each NUMPAGES field in the section and replace the field
with a PAGEREF field referring to the bookmark of the current subdocument
// This bookmark is positioned at the end of the sub document but does not
exist yet. It is inserted when a section with restart page numbering or the last
// section is encountered.
for (Node node : section.getChildNodes(NodeType.FIELD_START, true).toArray())
{
FieldStart fieldStart = (FieldStart)node;

if (fieldStart.getFieldType() == FieldType.FIELD_NUM_PAGES)
{
// Get the field code.
String fieldCode = getFieldCode(fieldStart);
// Since the NUMPAGES field does not take any additional parameters we
can assume the remaining part of the field
// code after the fieldname are the switches. We will use these to
help recreate the NUMPAGES field as a PAGEREF field.
String fieldSwitches = fieldCode.replace(NUM_PAGES_FIELD_NAME,
"").trim();

// Inserting the new field directly at the FieldStart node of the
original field will cause the new field to
// not pick up the formatting of the original field. To counter this
insert the field just before the original field
Node previousNode = fieldStart.getPreviousSibling();

// If a previous run cannot be found then we are forced to use the
FieldStart node.
if (previousNode == null)
previousNode = fieldStart;

// Insert a PAGEREF field at the same position as the field.
builder.moveTo(previousNode);
// This will insert a new field with a code like " PAGEREF
_SubDocumentEnd0 *\MERGEFORMAT ".
Field newField = builder.insertField(MessageFormat.format(" {0} {1}{2}
{3} ", PAGE_REF_FIELD_NAME, BOOKMARK_PREFIX, subDocumentCount, fieldSwitches), null);

// The field will be inserted before the referenced node. Move the
node before the field instead.
previousNode.getParentNode().insertBefore(previousNode,
newField.getStart());

// Remove the original NUMPAGES field from the document.
removeField(fieldStart);
}
}
}
}


The above method also uses a few functions internally. They are provided below.
Example
Provides some helper functions by the methods above
Java
/**
* Retrieves the field code from a field.
*
* @param fieldStart The field start of the field which to gather the field code from.
*/
private static String getFieldCode(FieldStart fieldStart) throws Exception
{
StringBuilder builder = new StringBuilder();

for (Node node = fieldStart; node != null && node.getNodeType() !=
NodeType.FIELD_SEPARATOR &&
node.getNodeType() != NodeType.FIELD_END; node =
node.nextPreOrder(node.getDocument()))
{
// Use text only of Run nodes to avoid duplication.
if (node.getNodeType() == NodeType.RUN)
builder.append(node.getText());
}
return builder.toString();
}

/**
* Removes the Field from the document.
*
* @param fieldStart The field start node of the field to remove.
*/
private static void removeField(FieldStart fieldStart) throws Exception
{
Node currentNode = fieldStart;
boolean isRemoving = true;
while (currentNode != null && isRemoving)
{
if (currentNode.getNodeType() == NodeType.FIELD_END)
isRemoving = false;

Node nextNode = currentNode.nextPreOrder(currentNode.getDocument());
currentNode.remove();
currentNode = nextNode;
}
}

The algorithm works by inserting a bookmark at the end of the section each time it finds a section with
restart page numbering. The PAGEREF fields that replace the NUMPAGE fields reference these
bookmarks. These fields display the total pages within this subdocument (in between document start, end
and sections with page numbering restarts). This will change the page numbering to reference the correct
number pages even if the subdocument consists of many sections.
It is good to note that this code will still work even if a document has no page numbering restarts. In this
case it will simply change any NUMPAGE field to a PAGEREF field with reference to a bookmark found
at the end of the document. This will display the same page numbering as what the NUMPAGES field
would but using PAGEREF instead.
After this code is executed the output of this document now appears correctly. The total page numbering
now appears as desired.


A closer look on the last page shows that the numbering is still correct even when the content is in a
different section.


By pressing ALT+F9 we can toggle field codes to show the field information. The algorithm above has
changed the NUMPAGES field into a PAFEREF field in order to display the desired page numbering.


Controlling How Lists are Handled
Skip to end of metadata

Attachments:6
Added by hammad, last edited by Adam Skelton on Feb 03, 2012 (view change)
Go to start of metadata

For the samples below involving lists we will be using these documents:
Destination document


Source Document


Each document contains a list, both of which are defined to use a linked style called MyStyle .


When appending documents which contain lists with linked styles, the chosen ImportFormatMode can
make a difference in how the lists behave when the documents are combined.
Example
Shows how to append a document to another document containing lists retaining source formatting.
Java
Document dstDoc = new Document(gDataDir + "TestFile.DestinationList.doc");
Document srcDoc = new Document(gDataDir + "TestFile.SourceList.doc");

// Append the content of the document so it flows continuously.
srcDoc.getFirstSection().getPageSetup().setSectionStart(SectionStart.CONTINUOUS);

dstDoc.appendDocument(srcDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
dstDoc.save(gDataDir + "TestFile.ListKeepSourceFormatting Out.doc");

Appending the documents using ImportFormatMode.KeepSourceFormatting like in the code above
will appear as expected as in the output below. The lists retain the correct numbering.


However when the same two documents are joined using ImportFormatMode.UseDestinationStyles
then the lists in the combined document continue on instead of being restarted as separate lists. This
behavior can be seen in the combined document below. The numbering of the two lists continues instead
of restarting where the source document was appended.


If the list in the document is not using a linked style or the linked style does not occur in the destination
document this issue will not occur.
The code below provides a general implementation of how to avoid this issue. Any list in the source
document which has a List.ListId already found in the destination document is copied and the list in the
document is changed to use the new copy instead.
Example
Shows how to append a document using destination styles and preventing any list numberings from
continuing on.
Java
Document dstDoc = new Document(gDataDir + "TestFile.DestinationList.doc");
Document srcDoc = new Document(gDataDir + "TestFile.SourceList.doc");

// Set the source document to continue straight after the end of the destination
document.
srcDoc.getFirstSection().getPageSetup().setSectionStart(SectionStart.CONTINUOUS);

// Keep track of the lists that are created.
HashMap newLists = new HashMap();

// Iterate through all paragraphs in the document.
for (Paragraph para : (Iterable<Paragraph>) srcDoc.getChildNodes(NodeType.PARAGRAPH,
true))
{
if (para.isListItem())
{
int listId = para.getListFormat().getList().getListId();

// Check if the destination document contains a list with this ID already. If
it does then this may
// cause the two lists to run together. Create a copy of the list in the
source document instead.
if (dstDoc.getLists().getListByListId(listId) != null)
{
List currentList;
// A newly copied list already exists for this ID, retrieve the stored
list and use it on
// the current paragraph.
if (newLists.containsKey(listId))
{
currentList = (List)newLists.get(listId);
}
else
{
// Add a copy of this list to the document and store it for later
reference.
currentList =
srcDoc.getLists().addCopy(para.getListFormat().getList());
newLists.put(listId, currentList);
}

// Set the list of this paragraph to the copied list.
para.getListFormat().setList(currentList);
}
}
}

// Append the source document to end of the destination document.
dstDoc.appendDocument(srcDoc, ImportFormatMode.USE_DESTINATION_STYLES);

// Save the combined document to disk.
dstDoc.save(gDataDir + "TestFile.ListUseDestinationStyles Out.docx");

After executing this code the numbering of the lists appears as expected in the output document.

Common Issues When Appending Documents
Skip to end of metadata

Attachments:4
Added by hammad, last edited by Adam Skelton on Oct 13, 2013 (view change)
Go to start of metadata

Q: I am using a blank document as a template to which further documents are appended. After executing my
code the first page of the generated document is a blank page.

A: Even a blank document is not actually completely empty. For a minimal document to be valid it must have at
least one section which contains a body, which in turn contains at least one paragraph. Along with this
information it is useful to remember that by default the first section of a document is set to start on a new
page.
Due to the template document already containing one section it should become clear that the blank
page appearing at the start of the document is actually the result of the next imported section
appearing on a new page.
To remedy this, the document needs to be fully empty before anything is appended to it. The
Document.RemoveAllChildren method is called to remove all sections from the document. This is
demonstrated in the code below.
Example
Shows how to remove all content from a document before using it as a base to append documents to.
Java
// Use a blank document as the destination document.
Document dstDoc = new Document();
Document srcDoc = new Document(gDataDir + "TestFile.Source.doc");

// The destination document is not actually empty which often causes a blank page to
appear before the appended document
// This is due to the base document having an empty section and the new document being
started on the next page.
// Remove all content from the destination document before appending.
dstDoc.removeAllChildren();

dstDoc.appendDocument(srcDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
dstDoc.save(gDataDir + "TestFile.BaseDocument Out.doc");


Q: I am joining two documents together and specifying settings so the joined documents will run together by
using the SectionStart.Continuous setting. The appended document does not appear on the same page instead
it still appears on a separate page.

A: The most likely cause is almost always because of a difference in PageSetup settings for the sections where
the documents are joined together. Sections can be set as continuous to appear on the same page only if the
settings which define the page structure are identical. For instance, a section which has larger
PageSetup.PageWidth and PageSetup.PageHeight properties than what is found in the other document cannot
be joined continuously. Even a very small difference in page size will cause the content to not flow continuously.
For this reason the page sizes and orientation should be set exactly identical between documents to be joined.
In this example below the two documents were to be joined continuously but as you can visibly see the two
documents have different page sizes and therefore the source document must start on a new page.


In order to solve this, the first section of the source document should have the appropriate PageSetup
properties made identical to the last section of the destination document. This is because these are the
two sections that will be joined and therefore to appear on the same page they must have the same
settings. The code below shows how to do this. In some instances other properties may need to be
changed as well in order for page
Example
Shows how to append a document to another document continuously which has different page settings.
Java
Document dstDoc = new Document(gDataDir + "TestFile.Destination.doc");
Document srcDoc = new Document(gDataDir + "TestFile.SourcePageSetup.doc");

// Set the source document to continue straight after the end of the destination
document.
// If some page setup settings are different then this may not work and the source
document will appear
// on a new page.
srcDoc.getFirstSection().getPageSetup().setSectionStart(SectionStart.CONTINUOUS);

// To ensure this does not happen when the source document has different page setup
settings make sure the
// settings are identical between the last section of the destination document.
// If there are further continuous sections that follow on in the source document then
this will need to be
// repeated for those sections as well.
srcDoc.getFirstSection().getPageSetup().setPageWidth(dstDoc.getLastSection().getPageSe
tup().getPageWidth());
srcDoc.getFirstSection().getPageSetup().setPageHeight(dstDoc.getLastSection().getPageS
etup().getPageHeight());
srcDoc.getFirstSection().getPageSetup().setOrientation(dstDoc.getLastSection().getPage
Setup().getOrientation());

dstDoc.appendDocument(srcDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
dstDoc.save(gDataDir + "TestFile.DifferentPageSetup Out.doc");


This should often only need to be done for the first section of the source document although if there a multiple
continuous sections from the first section in the source document then this will need to be repeated for these
sections as well.

Q: When rendering a document (to PDF, XPS or Image etc) the document which is appended does not appear
in the rendered output.

A: This is most likely because you have previously rendered the document or have called
Document.UpdatePageLayout before appending the document. When either of these methods is called the
document is laid out into pages and is stored into memory. If any content is inserted after this action then it
will be inserted into the Document Object Model but the layout of the pages will not be rebuilt to include this
change until the page layout is updated again.
To solve this you will need to either remove the calls before appending the document if they are
unneeded or call Document.UpdatePageLayout again after appending document. This technique is
demonstrated in the code below.
Example
Shows how to rebuild the document layout after appending further content.
Java
Document dstDoc = new Document(gDataDir + "TestFile.Destination.doc");
Document srcDoc = new Document(gDataDir + "TestFile.Source.doc");

// If the destination document is rendered to PDF, image etc or UpdatePageLayout is
called before the source document
// is appended then any changes made after will not be reflected in the rendered
output.
dstDoc.updatePageLayout();

// Join the documents.
dstDoc.appendDocument(srcDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);

// For the changes to be updated to rendered output, UpdatePageLayout must be called
again.
// If not called again the appended document will not appear in the output of the next
rendering.
dstDoc.updatePageLayout();

// Save the joined document to PDF.
dstDoc.save(gDataDir + "TestFile.UpdatePageLayout Out.pdf");

How to Apply Different AutoFit Settings to a
Table
Skip to end of metadata

Attachments:4
Added by hammad, last edited by Awais Hafeez on May 11, 2014 (view change)
Go to start of metadata

You can download the complete source code of the AutoFitTables sample here.
When creating a table using a visual agent such as Microsoft Word, you will often find yourself using one
of the AutoFit options to automatically size the table to the desired width. For instance you can use the
AutoFit to Window option to fit the table to the width of the page and AutoFit to Contents option to allow
each cell to grow or shrink to accommodate its contents.

By default Aspose.Words inserts a new table using AutoFit to Window. The table will size to the
available width on the page. To change the sizing behavior on such a table or an existing table you can
call Table.AutoFit method. This method accepts an AutoFitBehavior enumeration which defines what
type of auto fitting is applied to the table.
As in Microsoft Word, an autofit method is actually a shortcut which applies different properties to the
table all at once. These properties are actually what give the table the observed behavior. We will discuss
these properties for each autofit option.
We will use the following table and apply the different auto fit settings as a demonstration:

AutoFitting a Table to Window
Example
Autofits a table to fit the page width.
Java
// Open the document
Document doc = new Document(dataDir + "TestFile.doc");
Table table = (Table)doc.getChild(NodeType.TABLE, 0, true);

// Autofit the first table to the page width.
table.autoFit(AutoFitBehavior.AUTO_FIT_TO_WINDOW);

// Save the document to disk.
doc.save(dataDir + "TestFile.AutoFitToWindow Out.doc");

The result after this operation is a table which is lengthened to fit the width of the page is shown below.


When autofit to window is applied to a table the following operations are actually being performed behind the
scenes:
1. The Table.AllowAutoFit property is enabled to automatically resize columns to the available content.
2. A Table.PreferredWidth value of 100% is applied.
3. The CellFormat.PreferredWidth is removed from all cells in the table. Note this is a little bit different
to how Microsoft Word performs this step. In Microsoft Word the preferred width of each cell is set to
suitable values based off their current size and content. Aspose.Words does not update preferred
width so instead they are just cleared.
4. The column widths are recalculated for the current content of the table.
The end result is a table that occupies all available width. The widths of the columns in the table
change automatically as the user edits text in MS Word.
AutoFitting a Table to Contents
Example
Autofits a table in the document to its contents.
Java
// Open the document
Document doc = new Document(dataDir + "TestFile.doc");
Table table = (Table)doc.getChild(NodeType.TABLE, 0, true);

// Auto fit the table to the cell contents
table.autoFit(AutoFitBehavior.AUTO_FIT_TO_CONTENTS);

// Save the document to disk.
doc.save(dataDir + "TestFile.AutoFitToContents Out.doc");

The result of the above code causes the table to automatically resize each column to its contents.


When a table is auto fitted to contents the following steps are actually undertaken behind the scenes:
1. The Table.AllowAutoFit property is enabled to automatically resize each cell to accommodate its
contents.
2. The table-wide preferred width under Table.PreferredWidth is removed.
3. The CellFormat.PreferredWidth is removed for every cell in the table.
4. The column widths are recalculated for the current content in the table.
The end result is a table whose column widths and the entire table width change automatically to best
accommodate the content as the user edits text in MS Word.

Note that this autofit option clears the preferred widths from the cells just like in Microsoft Word. If you want
to preserve the column sizes and have the columns further grow or shrink to fit content then you should set
the Table.AllowAutoFit property to True on its own instead of using the autofit shortcut.
Disabling AutoFitting on a Table and Use Fixed Column Widths
Example
Disables autofitting and enables fixed widths for the specified table.
Java
// Open the document
Document doc = new Document(dataDir + "TestFile.doc");
Table table = (Table)doc.getChild(NodeType.TABLE, 0, true);

// Disable autofitting on this table.
table.autoFit(AutoFitBehavior.FIXED_COLUMN_WIDTHS);

// Save the document to disk.
doc.save(dataDir + "TestFile.FixedWidth Out.doc");

The result of disabling autofit and using fixed widths for the column sizes is shown below.


When a table has auto fit disabled and fixed column widths used instead the following steps are taken:
1. The Table.AllowAutoFit property is disabled so columns do not grow or shrink to their contents.
2. The table-wide preferred width is removed from Table.PreferredWidth.
3. The CellFormat.PreferredWidth is removed from all cells in the table.
The end result is a table whose column widths are defined using the CellFormat.Width property and
whose columns do not automatically resize when the user enter texts or the page size is modified.

Note that if no width is defined for CellFormat.Width then a default value of one inch (72 points) is used.
How to Build a Table from a DataTable
Skip to end of metadata

Attachments:2
Added by hammad, last edited by Awais Hafeez on May 11, 2014 (view change)
Go to start of metadata

You can download the complete source code of the ImportTableFromDataTable sample here.
Often your application will pull data from a database and store it in the form of a DataTable. You may
wish to easily insert this data into your document as a new table and quickly apply formatting to the
whole table.

Using Aspose.Words this task is very simple to achieve. The code presented within this article
demonstrates how to do this.

Note that the preferred way of inserting data from a DataTable into a document table is by using Mail Merge
with Regions. The technique presented in this article is only suggested if you are unable to create a suitable
template before hand to merge data with, in other words if you require everything to happen
programmatically.
The Solution
To build a table in a document from the data found in a DataTable:
1. Create a new DocumentBuilder object on your Document.
2. Start a new table using DocumentBuilder.
3. If we want to insert the names of each of the columns from our DataTable as a header row then iterate
through each data column and write the column names into a row in the table.
4. Iterate through each DataRow in the DataTable.
1. Iterate through each object in the DataRow.
2. Insert the object into the document using DocumentBuilder. The method used depends on the
type of the object being inserted e.g DocumentBuilder.Writeln for text and
DocumentBuilder.InsertImage for a byte array which represents an image.
3. At the end of processing of the data row also end the row being created by the DocumentBuilder
by using DocumentBuilder.EndRow.
5. Once all rows from the DataTable have been processed finish the table by calling
DocumentBuilder.EndTable.
6. Finally we can set the desired table style using one of the appropriate table properties such as
Table.StyleIdentifier to automatically apply formatting to the entire table.
The following data in our DataTable is used in this example:


The Code
The following code demonstrates how to achieve this in Aspose.Words. The
ImportTableFromDataTable method accepts a DocumentBuilder object, the DataTable containing the
data and a flag which specifies if the column heading from the DataTable are included at the top of the
table. This method builds a table from these parameters using the builders current position and
formatting.
Example
Provides a method to import data from the DataTable and insert it into a new table using the
DocumentBuilder.
Java
/*
* Imports the content from the specified DataTable into a new Aspose.Words Table
object.
* The table is inserted at the current position of the document builder and using the
current builder's formatting if any is defined.
*/
public static Table importTableFromDataTable(DocumentBuilder builder, DataTable
dataTable, boolean importColumnHeadings) throws Exception
{
Table table = builder.startTable();

ResultSetMetaData metaData = dataTable.getResultSet().getMetaData();
int numColumns = metaData.getColumnCount();

// Check if the names of the columns from the data source are to be included in a
header row.
if (importColumnHeadings)
{
// Store the original values of these properties before changing them.
boolean boldValue = builder.getFont().getBold();
int paragraphAlignmentValue = builder.getParagraphFormat().getAlignment();

// Format the heading row with the appropriate properties.
builder.getFont().setBold(true);
builder.getParagraphFormat().setAlignment(ParagraphAlignment.CENTER);

// Create a new row and insert the name of each column into the first row of
the table.
for (int i = 1; i < numColumns + 1; i++)
{
builder.insertCell();
builder.writeln(metaData.getColumnName(i));
}

builder.endRow();

// Restore the original formatting.
builder.getFont().setBold(boldValue);
builder.getParagraphFormat().setAlignment(paragraphAlignmentValue);
}

// Iterate through all rows and then columns of the data.
while(dataTable.getResultSet().next())
{
for (int i = 1; i < numColumns + 1; i++)
{
// Insert a new cell for each object.
builder.insertCell();

// Retrieve the current record.
Object item =
dataTable.getResultSet().getObject(metaData.getColumnName(i));
// This is name of the data type.
String typeName = item.getClass().getSimpleName();

if(typeName.equals("byte[]"))
{
// Assume a byte array is an image. Other data types can be added
here.
builder.insertImage((byte[])item, 50, 50);
}
else if(typeName.equals("Timestamp"))
{
// Define a custom format for dates and times.
builder.write(new SimpleDateFormat("MMMM d,
yyyy").format((Timestamp)item));
}
else
{
// By default any other item will be inserted as text.
builder.write(item.toString());
}

}

// After we insert all the data from the current record we can end the table
row.
builder.endRow();
}

// We have finished inserting all the data from the DataTable, we can end the
table.
builder.endTable();

return table;
}


The method can then be easily called using your DocumentBuilder and data.
Example
Shows how to import the data from a DataTable and insert it into a new table in the document.
Java
// Create a new document.
Document doc = new Document();

// We can position where we want the table to be inserted and also specify any extra
formatting to be
// applied onto the table as well.
DocumentBuilder builder = new DocumentBuilder(doc);

// We want to rotate the page landscape as we expect a wide table.
doc.getFirstSection().getPageSetup().setOrientation(Orientation.LANDSCAPE);

// Retrieve the data from our data source which is stored as a DataTable.
DataTable dataTable = getEmployees(databaseDir);

// Build a table in the document from the data contained in the DataTable.
Table table = importTableFromDataTable(builder, dataTable, true);

// We can apply a table style as a very quick way to apply formatting to the entire
table.
table.setStyleIdentifier(StyleIdentifier.MEDIUM_LIST_2_ACCENT_1);
table.setStyleOptions(TableStyleOptions.FIRST_ROW | TableStyleOptions.ROW_BANDS |
TableStyleOptions.LAST_COLUMN);

// For our table we want to remove the heading for the image column.
table.getFirstRow().getLastCell().removeAllChildren();

doc.save(dataDir + "Table.FromDataTable Out.docx");

The Result
The following table is produced by running the code above:

How to Create Headers Footers using
DocumentBuilder
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Feb 03, 2012 (view change)
Go to start of metadata

The following sample code demonstrates how to create headers/footers using DocumentBuilder.

Special attention is given to the following issues:
How to specify header/footer type.
How to instruct the document to display different headers/footers for the first page and for odd/even
pages.
How to insert an absolutely positioned image into the header.
How to set font and paragraph properties for the header/footer text.
How to insert page numbers into the header/footer.
How to use a table to make one part of the header/footer text align to the left edge and the other to
the right edge.
How to control whether headers/footers of a subsequent section of the document use
headers/footers defined in the previous section.
How to ensure proper header/footer appearance when using different page orientation and size for
subsequent sections.
Example
Maybe a bit complicated example, but demonstrates many things that can be done with headers/footers.
Java
public void primer() throws Exception
{
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

Section currentSection = builder.getCurrentSection();
PageSetup pageSetup = currentSection.getPageSetup();

// Specify if we want headers/footers of the first page to be different from other
pages.
// You can also use PageSetup.OddAndEvenPagesHeaderFooter property to specify
// different headers/footers for odd and even pages.
pageSetup.setDifferentFirstPageHeaderFooter(true);

// --- Create header for the first page. ---
pageSetup.setHeaderDistance(20);
builder.moveToHeaderFooter(HeaderFooterType.HEADER_FIRST);
builder.getParagraphFormat().setAlignment(ParagraphAlignment.CENTER);

// Set font properties for header text.
builder.getFont().setName("Arial");
builder.getFont().setBold(true);
builder.getFont().setSize(14);
// Specify header title for the first page.
builder.write("Aspose.Words Header/Footer Creation Primer - Title Page.");

// --- Create header for pages other than first. ---
pageSetup.setHeaderDistance(20);
builder.moveToHeaderFooter(HeaderFooterType.HEADER_PRIMARY);

// Insert absolutely positioned image into the top/left corner of the header.
// Distance from the top/left edges of the page is set to 10 points.
String imageFileName = getMyDir() + "Aspose.Words.gif";
builder.insertImage(imageFileName, RelativeHorizontalPosition.PAGE, 10,
RelativeVerticalPosition.PAGE, 10, 50, 50, WrapType.THROUGH);

builder.getParagraphFormat().setAlignment(ParagraphAlignment.RIGHT);
// Specify another header title for other pages.
builder.write("Aspose.Words Header/Footer Creation Primer.");

// --- Create footer for pages other than first. ---
builder.moveToHeaderFooter(HeaderFooterType.FOOTER_PRIMARY);

// We use table with two cells to make one part of the text on the line (with page
numbering)
// to be aligned left, and the other part of the text (with copyright) to be
aligned right.
builder.startTable();

// Clear table borders
builder.getCellFormat().clearFormatting();

builder.insertCell();
// Set first cell to 1/3 of the page width.
builder.getCellFormat().setPreferredWidth(PreferredWidth.fromPercent(100 /3));

// Insert page numbering text here.
// It uses PAGE and NUMPAGES fields to auto calculate current page number and
total number of pages.
builder.write("Page ");
builder.insertField("PAGE", "");
builder.write(" of ");
builder.insertField("NUMPAGES", "");

// Align this text to the left.

builder.getCurrentParagraph().getParagraphFormat().setAlignment(ParagraphAlignment.LEF
T);

builder.insertCell();
// Set the second cell to 2/3 of the page width.
builder.getCellFormat().setPreferredWidth(PreferredWidth.fromPercent(100 * 2 /
3));

builder.write("(C) 2001 Aspose Pty Ltd. All rights reserved.");

// Align this text to the right.

builder.getCurrentParagraph().getParagraphFormat().setAlignment(ParagraphAlignment.RIG
HT);

builder.endRow();
builder.endTable();

builder.moveToDocumentEnd();
// Make page break to create a second page on which the primary headers/footers
will be seen.
builder.insertBreak(BreakType.PAGE_BREAK);

// Make section break to create a third page with different page orientation.
builder.insertBreak(BreakType.SECTION_BREAK_NEW_PAGE);

// Get the new section and its page setup.
currentSection = builder.getCurrentSection();
pageSetup = currentSection.getPageSetup();

// Set page orientation of the new section to landscape.
pageSetup.setOrientation(Orientation.LANDSCAPE);

// This section does not need different first page header/footer.
// We need only one title page in the document and the header/footer for this page
// has already been defined in the previous section
pageSetup.setDifferentFirstPageHeaderFooter(false);

// This section displays headers/footers from the previous section by default.
// Call currentSection.HeadersFooters.LinkToPrevious(false) to cancel this.
// Page width is different for the new section and therefore we need to set
// a different cell widths for a footer table.
currentSection.getHeadersFooters().linkToPrevious(false);

// If we want to use the already existing header/footer set for this section
// but with some minor modifications then it may be expedient to copy
headers/footers
// from the previous section and apply the necessary modifications where we want
them.
copyHeadersFootersFromPreviousSection(currentSection);

// Find the footer that we want to change.
HeaderFooter primaryFooter =
currentSection.getHeadersFooters().getByHeaderFooterType(HeaderFooterType.FOOTER_PRIMA
RY);

Row row = primaryFooter.getTables().get(0).getFirstRow();

row.getFirstCell().getCellFormat().setPreferredWidth(PreferredWidth.fromPercent(100 /
3));
row.getLastCell().getCellFormat().setPreferredWidth(PreferredWidth.fromPercent(100
* 2 / 3));

// Save the resulting document.
doc.save(getMyDir() + "HeaderFooter.Primer Out.doc");
}

/**
* Clones and copies headers/footers form the previous section to the specified
section.
*/
private static void copyHeadersFootersFromPreviousSection(Section section) throws
Exception
{
Section previousSection = (Section)section.getPreviousSibling();

if (previousSection == null)
return;

section.getHeadersFooters().clear();

for (HeaderFooter headerFooter : previousSection.getHeadersFooters())
section.getHeadersFooters().add(headerFooter.deepClone(true));
}

How to Extract Content Based on Styles
Skip to end of metadata

Attachments:2
Added by hammad, last edited by Awais Hafeez on May 11, 2014 (view change)
Go to start of metadata

You can download the complete source code of the ExtractContentBasedOnStyles sample here.
At a simple level, retrieving the content based on styles from a Word document can be useful to identify,
list and count paragraphs and runs of text formatted with a specific style. For example, you may need to
identify particular kinds of content in the document, such as examples, titles, references, keywords, figure
names, and case studies.

To take this a few steps further, this can also be used to leverage the structure of the document, defined by
the styles it uses, to re-purpose the document for another output, such as HTML. This is in fact how the
Aspose documentation is built, putting Aspose.Words to the test. A tool built using Aspose.Words takes
the source Word documents and splits them into topics at certain heading levels. An XML file is produced
using Aspose.Words which is used to build the navigation tree you can see on the left. And then
Aspose.Words converts each topic into HTML.
The solution for retrieving text formatted with specific styles in a Word document is typically economical
and straightforward using Aspose.Words.
The Solution
To illustrate how easily Aspose.Words handles retrieving content based on styles, lets look at an
example. In this example, were going to retrieve text formatted with a specific paragraph style and a
character style from a sample Word document.

At a high level, this will involve:
1. Opening a Word document using the Aspose.Words.Document class.
2. Getting collections of all paragraphs and all runs in the document.
3. Selecting only the required paragraphs and runs.
Specifically, well retrieve text formatted with the Heading 1 paragraph style and the Intense
Emphasis character style from this sample Word document.


In this sample document, the text formatted with the Heading 1 paragraph style is Insert Tab, Quick
Styles and Theme, and the text formatted with the Intense emphasis character style is the several
instances of blue, italicized, bold text such as galleries and overall look.
The Code
The implementation of a style-based query is quite simple in the Aspose.Words document object model,
as it simply uses tools that are already in place.

Two class methods are implemented for this solution:
1. ParagraphsByStyleName This method retrieves an array of those paragraphs in the document that have
a specific style name.
2. RunsByStyleName This method retrieves an array of those runs in the document that have a specific
style name.
Both these methods are very similar, the only differences being the node types and the representation of
the style information within the paragraph and run nodes.
Here is an implementation of ParagraphsByStyleName :
Example
Find all paragraphs formatted with the specified style.
Java
public static ArrayList paragraphsByStyleName(Document doc, String styleName) throws
Exception
{
// Create an array to collect paragraphs of the specified style.
ArrayList paragraphsWithStyle = new ArrayList();
// Get all paragraphs from the document.
NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
// Look through all paragraphs to find those with the specified style.
for (Paragraph paragraph : (Iterable<Paragraph>) paragraphs)
{
if (paragraph.getParagraphFormat().getStyle().getName().equals(styleName))
paragraphsWithStyle.add(paragraph);
}
return paragraphsWithStyle;
}

This implementation also uses the Document.GetChildNodes method of the Document class, which
returns a collection of all nodes with the specified type, which in this case in all paragraphs.

Note that the second parameter of the Document.GetChildNodes method is set to true. This forces the
Document.GetChildNodes method to select from all child nodes recursively, rather than selecting the
immediate children only.
Its also worth pointing out that the paragraphs collection does not create an immediate overhead because
paragraphs are loaded into this collection only when you access items in them.
Then, all you need to do is to go through the collection, using the standard foreach operator and add
paragraphs that have the specified style to the paragraphsWithStyle array. The Paragraph style name can
be found in the Style.Name property of the Paragraph.ParagraphFormat object.
The implementation of RunsByStyleName is almost the same, although were obviously using
NodeType.Run to retrieve run nodes. The Font.Style property of a Run object is used to access style
information in the Run nodes.
Example
Find all runs formatted with the specified style.
Java
public static ArrayList runsByStyleName(Document doc, String styleName) throws
Exception
{
// Create an array to collect runs of the specified style.
ArrayList runsWithStyle = new ArrayList();
// Get all runs from the document.
NodeCollection runs = doc.getChildNodes(NodeType.RUN, true);
// Look through all runs to find those with the specified style.
for (Run run : (Iterable<Run>) runs)
{
if (run.getFont().getStyle().getName().equals(styleName))
runsWithStyle.add(run);
}
return runsWithStyle;
}


When both queries are implemented, all you need to do is to pass a document object and specify the style
names of the content you want to retrieve:
Example
Run queries and display results.
Java
// Open the document.
Document doc = new Document(dataDir + "TestFile.doc");

// Define style names as they are specified in the Word document.
final String PARA_STYLE = "Heading 1";
final String RUN_STYLE = "Intense Emphasis";

// Collect paragraphs with defined styles.
// Show the number of collected paragraphs and display the text of this paragraphs.
ArrayList paragraphs = paragraphsByStyleName(doc, PARA_STYLE);
System.out.println(java.text.MessageFormat.format("Paragraphs with \"{0}\" styles
({1}):", PARA_STYLE, paragraphs.size()));
for (Paragraph paragraph : (Iterable<Paragraph>) paragraphs)
System.out.print(paragraph.toString(SaveFormat.TEXT));

// Collect runs with defined styles.
// Show the number of collected runs and display the text of this runs.
ArrayList runs = runsByStyleName(doc, RUN_STYLE);
System.out.println(java.text.MessageFormat.format("\nRuns with \"{0}\" styles ({1}):",
RUN_STYLE, runs.size()));
for (Run run : (Iterable<Run>) runs)
System.out.println(run.getRange().getText());

End Result
When everything is done, running the sample will display the following output:

As you can see, this is a very simple example, showing the number and text of the collected paragraphs
and runs in the sample Word document.
How to Extract Content using
DocumentVisitor
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Feb 03, 2012 (view change)
Go to start of metadata

Aspose.Words can be used not only for creating Microsoft Word documents by building them dynamically or
merging templates with data, but also for parsing documents in order to extract separate document elements
such as headers, footers, paragraphs, tables, images, and others. Another possible task is to find all text of a
specific formatting or style.

Use the DocumentVisitor class to implement this usage scenario. This class corresponds to the well-known
Visitor design pattern. With DocumentVisitor , you can define and execute custom operations that require
enumeration over the document tree.
DocumentVisitor provides a set of VisitXXX methods that are invoked when a particular document
element (node) is encountered. For example, DocumentVisitor.VisitParagraphStart is called when
the beginning of a text paragraph is found and DocumentVisitor.VisitParagraphEnd is called when
the end of a text paragraph is found. Each DocumentVisitor.VisitXXX method accepts the
corresponding object that it encounters so you can use it as needed (say retrieve the formatting), e.g.
both DocumentVisitor.VisitParagraphStart and DocumentVisitor.VisitParagraphEnd accept a
Paragraph object.
Each DocumentVisitor.VisitXXX method returns a VisitorAction value that controls the
enumeration of nodes. You can request either to continue the enumeration, skip the current node (but
continue the enumeration), or stop the enumeration of nodes.
These are the steps you should follow to programmatically determine and extract various parts of a
document:
Create a class derived from DocumentVisitor .
Override and provide implementations for some or all of the DocumentVisitor.VisitXXX methods to
perform some custom operations.
Call Node.Accept on the node from where you want to start the enumeration. For example, if you
want to enumerate the whole document, use Document.Accept(DocumentVisitor) .
DocumentVisitor provides default implementations for all of the DocumentVisitor.VisitXXX
methods. This makes it easier to create new document visitors as only the methods required for the
particular visitor need to be overridden. It is not necessary to override all of the visitor methods.
This example shows how to use the Visitor pattern to add new operations to the Aspose.Words object
model. In this case, we create a simple document converter into a text format.
Example
Shows how to use the Visitor pattern to add new operations to the Aspose.Words object model. In this
case we create a simple document converter into a text format.
Java
public void toText() throws Exception
{
// Open the document we want to convert.
Document doc = new Document(getMyDir() + "Visitor.ToText.doc");

// Create an object that inherits from the DocumentVisitor class.
MyDocToTxtWriter myConverter = new MyDocToTxtWriter();

// This is the well known Visitor pattern. Get the model to accept a visitor.
// The model will iterate through itself by calling the corresponding methods
// on the visitor object (this is called visiting).
//
// Note that every node in the object model has the Accept method so the visiting
// can be executed not only for the whole document, but for any node in the
document.
doc.accept(myConverter);

// Once the visiting is complete, we can retrieve the result of the operation,
// that in this example, has accumulated in the visitor.
System.out.println(myConverter.getText());
}

/**
* Simple implementation of saving a document in the plain text format. Implemented as
a Visitor.
*/
public class MyDocToTxtWriter extends DocumentVisitor
{
public MyDocToTxtWriter() throws Exception
{
mIsSkipText = false;
mBuilder = new StringBuilder();
}

/**
* Gets the plain text of the document that was accumulated by the visitor.
*/
public String getText() throws Exception
{
return mBuilder.toString();
}

/**
* Called when a Run node is encountered in the document.
*/
public int visitRun(Run run) throws Exception
{
appendText(run.getText());

// Let the visitor continue visiting other nodes.
return VisitorAction.CONTINUE;
}

/**
* Called when a FieldStart node is encountered in the document.
*/
public int visitFieldStart(FieldStart fieldStart) throws Exception
{
// In Microsoft Word, a field code (such as "MERGEFIELD FieldName") follows
// after a field start character. We want to skip field codes and output field
// result only, therefore we use a flag to suspend the output while inside a
field code.
//
// Note this is a very simplistic implementation and will not work very well
// if you have nested fields in a document.
mIsSkipText = true;

return VisitorAction.CONTINUE;
}

/**
* Called when a FieldSeparator node is encountered in the document.
*/
public int visitFieldSeparator(FieldSeparator fieldSeparator) throws Exception
{
// Once reached a field separator node, we enable the output because we are
// now entering the field result nodes.
mIsSkipText = false;

return VisitorAction.CONTINUE;
}

/**
* Called when a FieldEnd node is encountered in the document.
*/
public int visitFieldEnd(FieldEnd fieldEnd) throws Exception
{
// Make sure we enable the output when reached a field end because some fields
// do not have field separator and do not have field result.
mIsSkipText = false;

return VisitorAction.CONTINUE;
}

/**
* Called when visiting of a Paragraph node is ended in the document.
*/
public int visitParagraphEnd(Paragraph paragraph) throws Exception
{
// When outputting to plain text we output Cr+Lf characters.
appendText(ControlChar.CR_LF);

return VisitorAction.CONTINUE;
}

public int visitBodyStart(Body body) throws Exception
{
// We can detect beginning and end of all composite nodes such as Section,
Body,
// Table, Paragraph etc and provide custom handling for them.
mBuilder.append("*** Body Started ***\r\n");

return VisitorAction.CONTINUE;
}

public int visitBodyEnd(Body body) throws Exception
{
mBuilder.append("*** Body Ended ***\r\n");
return VisitorAction.CONTINUE;
}

/**
* Called when a HeaderFooter node is encountered in the document.
*/
public int visitHeaderFooterStart(HeaderFooter headerFooter) throws Exception
{
// Returning this value from a visitor method causes visiting of this
// node to stop and move on to visiting the next sibling node.
// The net effect in this example is that the text of headers and footers
// is not included in the resulting output.
return VisitorAction.SKIP_THIS_NODE;
}


/**
* Adds text to the current output. Honours the enabled/disabled output flag.
*/
private void appendText(String text) throws Exception
{
if (!mIsSkipText)
mBuilder.append(text);
}

private final StringBuilder mBuilder;
private boolean mIsSkipText;
}

How to Extract Images from a Document
Skip to end of metadata

Added by hammad, last edited by Awais Hafeez on Jul 02, 2014 (view change)
Go to start of metadata

All images are stored inside Shape nodes in a Document.
To extract all images or images having specific type from the document, follow these steps:
* Use the Document.GetChildNodes method to select all Shape nodes.
Iterate through resulting node collections.
Check the Shape.HasImage boolean property.
Extract image data using the Shape.ImageData property.
Save image data to a file.
Example
Shows how to extract images from a document and save them as files.
Java
public void extractImagesToFiles() throws Exception
{
Document doc = new Document(getMyDir() + "Image.SampleImages.doc");

NodeCollection shapes = doc.getChildNodes(NodeType.SHAPE, true);
int imageIndex = 0;
for (Shape shape : (Iterable<Shape>) shapes)
{
if (shape.hasImage())
{
String imageFileName = java.text.MessageFormat.format(
"Image.ExportImages.{0} Out{1}", imageIndex,
FileFormatUtil.imageTypeToExtension(shape.getImageData().getImageType()));
shape.getImageData().save(getMyDir() + imageFileName);
imageIndex++;
}
}

// Newer Microsoft Word documents (such as DOCX) may contain a different type of
image container called DrawingML.
// Repeat the process to extract these if they are present in the loaded document.
NodeCollection dmlShapes = doc.getChildNodes(NodeType.DRAWING_ML, true);
int imageIndex = 0;
for (DrawingML dml : (Iterable<DrawingML>) dmlShapes)
{
if (dml.hasImage())
{
String imageFileName = java.text.MessageFormat.format(
"Image.ExportImages.{0} Out{1}", imageIndex,
FileFormatUtil.imageTypeToExtension(dml.getImageData().getImageType()));
dml.getImageData().save(getMyDir() + imageFileName);
imageIndex++;
}
}
}

Extract Content Overview and Code
Skip to end of metadata

Attachments:1
Added by Adam Skelton, last edited by Awais Hafeez on May 11, 2014 (view change)
Go to start of metadata

You can download the complete source code of the ExtractContent sample here.
A common requirement when working with documents is to easily extract specific content from a range
within the document. This content can consist of complex features such as paragraphs, tables, images etc.
Regardless of what content needs to extracted, the method in which to extract this content will always be
determined by which nodes are chosen to extract content between. These could be entire bodies of text or
simple runs of text. There are many possible situations and therefore many different node types to
consider when extracting content. For instance, you may want to extract content between:
Two specific paragraphs in the document.
Specific runs of text.
Different types of fields, for example merge fields.
Between the start and end ranges of a bookmark or comment.
Different bodies of text contained in separate sections.
In some situations you may even want to combine the different types of, for example, extract content
between a paragraph and field, or between a run and a bookmark.
Often the goal of extracting this content is to duplicate or save it separately into a new document. For
example, you may wish to extract content and:
Copy it to a separate document.
Rendered a specific portion of a document to PDF or an image.
Duplicate the content in the document many times.
Work with this content separate from the rest of the document.
This is easy to achieve using Aspose.Words and the code implementation below. This article provides the
full code implementation to achieve this along with samples of common scenarios using this method.
These samples are just a few demonstrations of the many possibilities that this method can be used for.
Some day this functionality will be a part of the public API and the extra code here will not be required.
Feel free to post your requests regarding this functionality on the Aspose.Words forum here .
The Solution
The code in this article addresses all of the possible situations above with one generalized and reusable
method.

The general outline of this technique involves:
1. Gathering the nodes which dictate the area of content that will be extracted from your document.
Retrieving these nodes is handled by the user in their code, based on what they want to be extracted.
2. Passing these nodes to the ExtractContent method which is provided below. You must also pass a
boolean parameter which states if these nodes that act as markers should be included in the extraction or
not.
3. The method will return a list of cloned (copied nodes) of the content specified to be extracted. You can
now use this in any way applicable, for example, creating a new document containing only the selected
content.
We will work with this document below in this article. As you can see it contains a variety of content.
Also note, the document contains a second section beginning in the middle of the first page. A bookmark
and comment are also present in the document but are not visible in the screenshot below.


The Code
To extract the content from your document you need to call the ExtractContent method below and pass
the appropriate parameters.

The underlying basis of this method involves finding block level nodes (paragraphs and tables) and
cloning them to create identical copies. If the marker nodes passed are block level then the method is able
to simply copy the content on that level and add it to the array.
However if the marker nodes are inline (a child of a paragraph) then the situation becomes more complex,
as it is necessary to split the paragraph at the inline node, be it a run, bookmark fields etc.

Content in the cloned parent nodes not present between the markers is removed. This process is used to
ensure that the inline nodes will still retain the formatting of the parent paragraph.
The method will also run checks on the nodes passed as parameters and throws an exception if either node
is invalid.
The parameters to be passed to this method are:
1. StartNode and EndNode : ** The first two parameters are the nodes which define where the extraction of
the content is to begin and to end at respectively. These nodes can be both block level ( Paragraph , Table
) or inline level (e.g Run , FieldStart , BookmarkStart etc.).
1. To pass a field you should pass the corresponding FieldStart object.
2. To pass bookmarks, the BookmarkStart and BookmarkEnd nodes should be passed.
3. To pass comments, the CommentRangeStart and CommentRangeEnd nodes should be used.
2. IsInclusive :

Defines if the markers are included in the extraction or not. If this option is set to false and the same node
or consecutive nodes are passed, then an empty list will be returned.
1. If a FieldStart node is passed then this option defines if the whole field is to be included or
excluded.
2. If a BookmarkStart or BookmarkEnd node is passed, this option defines if the bookmark is
included or just the content between the bookmark range.
3. If a CommentRangeStart or CommentRangeEnd node is passed, this option defines if the
comment itself is to be included or just the content in the comment range.
The implementation of the ExtractContent method is found below . This method will be referred to in
the scenarios in this article.
Example
This is a method which extracts blocks of content from a document between specified nodes.
Java
/**
* Extracts a range of nodes from a document found between specified markers and
returns a copy of those nodes. Content can be extracted
* between inline nodes, block level nodes, and also special nodes such as Comment or
Boomarks. Any combination of different marker types can used.
*
* @param startNode The node which defines where to start the extraction from the
document. This node can be block or inline level of a body.
* @param endNode The node which defines where to stop the extraction from the
document. This node can be block or inline level of body.
* @param isInclusive Should the marker nodes be included.
*/
public static ArrayList extractContent(Node startNode, Node endNode, boolean
isInclusive) throws Exception
{
// First check that the nodes passed to this method are valid for use.
verifyParameterNodes(startNode, endNode);

// Create a list to store the extracted nodes.
ArrayList nodes = new ArrayList();

// Keep a record of the original nodes passed to this method so we can split
marker nodes if needed.
Node originalStartNode = startNode;
Node originalEndNode = endNode;

// Extract content based on block level nodes (paragraphs and tables). Traverse
through parent nodes to find them.
// We will split the content of first and last nodes depending if the marker nodes
are inline
while (startNode.getParentNode().getNodeType() != NodeType.BODY)
startNode = startNode.getParentNode();

while (endNode.getParentNode().getNodeType() != NodeType.BODY)
endNode = endNode.getParentNode();

boolean isExtracting = true;
boolean isStartingNode = true;
boolean isEndingNode;
// The current node we are extracting from the document.
Node currNode = startNode;

// Begin extracting content. Process all block level nodes and specifically split
the first and last nodes when needed so paragraph formatting is retained.
// Method is little more complex than a regular extractor as we need to factor in
extracting using inline nodes, fields, bookmarks etc as to make it really useful.
while (isExtracting)
{
// Clone the current node and its children to obtain a copy.
CompositeNode cloneNode = (CompositeNode)currNode.deepClone(true);
isEndingNode = currNode.equals(endNode);

if(isStartingNode || isEndingNode)
{
// We need to process each marker separately so pass it off to a separate
method instead.
if (isStartingNode)
{
processMarker(cloneNode, nodes, originalStartNode, isInclusive,
isStartingNode, isEndingNode);
isStartingNode = false;
}

// Conditional needs to be separate as the block level start and end
markers maybe the same node.
if (isEndingNode)
{
processMarker(cloneNode, nodes, originalEndNode, isInclusive,
isStartingNode, isEndingNode);
isExtracting = false;
}
}
else
// Node is not a start or end marker, simply add the copy to the list.
nodes.add(cloneNode);

// Move to the next node and extract it. If next node is null that means the
rest of the content is found in a different section.
if (currNode.getNextSibling() == null && isExtracting)
{
// Move to the next section.
Section nextSection =
(Section)currNode.getAncestor(NodeType.SECTION).getNextSibling();
currNode = nextSection.getBody().getFirstChild();
}
else
{
// Move to the next node in the body.
currNode = currNode.getNextSibling();
}
}

// Return the nodes between the node markers.
return nodes;
}


We will also define a custom method to easily generate a document from extracted nodes. This method is used
in many of the scenarios below and simply creates a new document and imports the extracted content into it.
Example
This method takes a list of nodes and inserts them into a new document.
Java
public static Document generateDocument(Document srcDoc, ArrayList nodes) throws
Exception
{
// Create a blank document.
Document dstDoc = new Document();
// Remove the first paragraph from the empty document.
dstDoc.getFirstSection().getBody().removeAllChildren();

// Import each node from the list into the new document. Keep the original
formatting of the node.
NodeImporter importer = new NodeImporter(srcDoc, dstDoc,
ImportFormatMode.KEEP_SOURCE_FORMATTING);

for (Node node : (Iterable<Node>) nodes)
{
Node importNode = importer.importNode(node, true);
dstDoc.getFirstSection().getBody().appendChild(importNode);
}

// Return the generated document.
return dstDoc;
}


These helper methods below are internally called by the main extraction method. They are required, however
as they are not directly called by the user, it is not necessary to discuss them further.
Example
The helper methods used by the ExtractContent method.
Java
/**
* Checks the input parameters are correct and can be used. Throws an exception if
there is any problem.
*/
private static void verifyParameterNodes(Node startNode, Node endNode) throws
Exception
{
// The order in which these checks are done is important.
if (startNode == null)
throw new IllegalArgumentException("Start node cannot be null");
if (endNode == null)
throw new IllegalArgumentException("End node cannot be null");

if (!startNode.getDocument().equals(endNode.getDocument()))
throw new IllegalArgumentException("Start node and end node must belong to the
same document");

if (startNode.getAncestor(NodeType.BODY) == null ||
endNode.getAncestor(NodeType.BODY) == null)
throw new IllegalArgumentException("Start node and end node must be a child or
descendant of a body");

// Check the end node is after the start node in the DOM tree
// First check if they are in different sections, then if they're not check their
position in the body of the same section they are in.
Section startSection = (Section)startNode.getAncestor(NodeType.SECTION);
Section endSection = (Section)endNode.getAncestor(NodeType.SECTION);

int startIndex = startSection.getParentNode().indexOf(startSection);
int endIndex = endSection.getParentNode().indexOf(endSection);

if (startIndex == endIndex)
{
if (startSection.getBody().indexOf(startNode) >
endSection.getBody().indexOf(endNode))
throw new IllegalArgumentException("The end node must be after the start
node in the body");
}
else if (startIndex > endIndex)
throw new IllegalArgumentException("The section of end node must be after the
section start node");
}

/**
* Checks if a node passed is an inline node.
*/
private static boolean isInline(Node node) throws Exception
{
// Test if the node is desendant of a Paragraph or Table node and also is not a
paragraph or a table a paragraph inside a comment class which is decesant of a
pararaph is possible.
return ((node.getAncestor(NodeType.PARAGRAPH) != null ||
node.getAncestor(NodeType.TABLE) != null) && !(node.getNodeType() ==
NodeType.PARAGRAPH || node.getNodeType() == NodeType.TABLE));
}

/**
* Removes the content before or after the marker in the cloned node depending on the
type of marker.
*/
private static void processMarker(CompositeNode cloneNode, ArrayList nodes, Node node,
boolean isInclusive, boolean isStartMarker, boolean isEndMarker) throws Exception
{
// If we are dealing with a block level node just see if it should be included and
add it to the list.
if(!isInline(node))
{
// Don't add the node twice if the markers are the same node
if(!(isStartMarker && isEndMarker))
{
if (isInclusive)
nodes.add(cloneNode);
}
return;
}

// If a marker is a FieldStart node check if it's to be included or not.
// We assume for simplicity that the FieldStart and FieldEnd appear in the same
paragraph.
if (node.getNodeType() == NodeType.FIELD_START)
{
// If the marker is a start node and is not be included then skip to the end
of the field.
// If the marker is an end node and it is to be included then move to the end
field so the field will not be removed.
if ((isStartMarker && !isInclusive) || (!isStartMarker && isInclusive))
{
while (node.getNextSibling() != null && node.getNodeType() !=
NodeType.FIELD_END)
node = node.getNextSibling();

}
}

// If either marker is part of a comment then to include the comment itself we
need to move the pointer forward to the Comment
// node found after the CommentRangeEnd node.
if (node.getNodeType() == NodeType.COMMENT_RANGE_END)
{
while (node.getNextSibling() != null && node.getNodeType() !=
NodeType.COMMENT)
node = node.getNextSibling();

}

// Find the corresponding node in our cloned node by index and return it.
// If the start and end node are the same some child nodes might already have been
removed. Subtract the
// difference to get the right index.
int indexDiff = node.getParentNode().getChildNodes().getCount() -
cloneNode.getChildNodes().getCount();

// Child node count identical.
if (indexDiff == 0)
node = cloneNode.getChildNodes().get(node.getParentNode().indexOf(node));
else
node = cloneNode.getChildNodes().get(node.getParentNode().indexOf(node) -
indexDiff);

// Remove the nodes up to/from the marker.
boolean isSkip;
boolean isProcessing = true;
boolean isRemoving = isStartMarker;
Node nextNode = cloneNode.getFirstChild();

while (isProcessing && nextNode != null)
{
Node currentNode = nextNode;
isSkip = false;

if (currentNode.equals(node))
{
if (isStartMarker)
{
isProcessing = false;
if (isInclusive)
isRemoving = false;
}
else
{
isRemoving = true;
if (isInclusive)
isSkip = true;
}
}

nextNode = nextNode.getNextSibling();
if (isRemoving && !isSkip)
currentNode.remove();
}

// After processing the composite node may become empty. If it has don't include
it.
if (!(isStartMarker && isEndMarker))
{
if (cloneNode.hasChildNodes())
nodes.add(cloneNode);
}

}

Extract Content Between Paragraphs
Skip to end of metadata

Attachments:1
Added by Adam Skelton, last edited by Adam Skelton on Feb 09, 2012
Go to start of metadata

This demonstrates how to use the method above to extract content between specific paragraphs. In this case,
we want to extract the body of the letter found in the first half of the document.

We can tell that this is between the 7 th and 11 th paragraph.
The code below accomplishes this task. The appropriate paragraphs are extracted using the
CompositeNode.GetChild method on the document and passing the specified indices. We then pass
these nodes to the ExtractContent method and state that these are to be included in the extraction.
This method will return the copied content between these nodes which are then inserted into a new
document.
Example
Shows how to extract the content between specific paragraphs using the ExtractContent method above.
Java
// Load in the document
Document doc = new Document(gDataDir + "TestFile.doc");

// Gather the nodes. The GetChild method uses 0-based index
Paragraph startPara = (Paragraph)doc.getFirstSection().getChild(NodeType.PARAGRAPH, 6,
true);
Paragraph endPara = (Paragraph)doc.getFirstSection().getChild(NodeType.PARAGRAPH, 10,
true);
// Extract the content between these nodes in the document. Include these markers in
the extraction.
ArrayList extractedNodes = extractContent(startPara, endPara, true);

// Insert the content into a new separate document and save it to disk.
Document dstDoc = generateDocument(doc, extractedNodes);
dstDoc.save(gDataDir + "TestFile.Paragraphs Out.doc");

The Result
The output document which contains the two paragraphs that were extracted.

Extract Content Between Different Types of
Nodes
Skip to end of metadata

Attachments:1
Added by Adam Skelton, last edited by Adam Skelton on Feb 09, 2012
Go to start of metadata

We can extract content between any combinations of block level or inline nodes. In this scenario below we will
extract the content between first paragraph and the table in the second section inclusively. We get the
markers nodes by calling Body.FirstParagraph and CompositeNode.GetChild method on the second section of
the document to retrieve the appropriate Paragraph and Table nodes.

For a slight variation lets instead duplicate the content and insert it below the original.
Example
Shows how to extract the content between a paragraph and table using the ExtractContent method.
Java
// Load in the document
Document doc = new Document(gDataDir + "TestFile.doc");

Paragraph startPara = (Paragraph)doc.getLastSection().getChild(NodeType.PARAGRAPH, 2,
true);
Table endTable = (Table)doc.getLastSection().getChild(NodeType.TABLE, 0, true);

// Extract the content between these nodes in the document. Include these markers in
the extraction.
ArrayList extractedNodes = extractContent(startPara, endTable, true);

// Lets reverse the array to make inserting the content back into the document easier.
Collections.reverse(extractedNodes);

while (extractedNodes.size() > 0)
{
// Insert the last node from the reversed list
endTable.getParentNode().insertAfter((Node)extractedNodes.get(0), endTable);
// Remove this node from the list after insertion.
extractedNodes.remove(0);
}

// Save the generated document to disk.
doc.save(gDataDir + "TestFile.DuplicatedContent Out.doc");

The Result
The content between the paragraph and table has been duplicated below the original.

Extract Content Between Paragraphs Based
on Style
Skip to end of metadata

Attachments:1
Added by Adam Skelton, last edited by Adam Skelton on Feb 09, 2012 (view change)
Go to start of metadata

You may need to extract the content between paragraphs of the same or different style, such as between
paragraphs marked with heading styles.

The code below shows how to achieve this. It is a simple example which will extract the content between the
first instance of the Heading 1 and Header 3 styles without extracting the headings as well. To do this we
set the last parameter to false, which specifies that the marker nodes should not be included.
In a proper implementation this should be run in a loop to extract content between all paragraphs of
these styles from the document. This scenario uses the ParagraphsByStyleName method from the
ExtractContentBasedOnStyle sample found here . The extracted content is copied into a new
document.
Example
Shows how to extract content between paragraphs with specific styles using the ExtractContent method.
Java
// Load in the document
Document doc = new Document(gDataDir + "TestFile.doc");

// Gather a list of the paragraphs using the respective heading styles.
ArrayList parasStyleHeading1 = paragraphsByStyleName(doc, "Heading 1");
ArrayList parasStyleHeading3 = paragraphsByStyleName(doc, "Heading 3");

// Use the first instance of the paragraphs with those styles.
Node startPara1 = (Node)parasStyleHeading1.get(0);
Node endPara1 = (Node)parasStyleHeading3.get(0);

// Extract the content between these nodes in the document. Don't include these
markers in the extraction.
ArrayList extractedNodes = extractContent(startPara1, endPara1, false);

// Insert the content into a new separate document and save it to disk.
Document dstDoc = generateDocument(doc, extractedNodes);
dstDoc.save(gDataDir + "TestFile.Styles Out.doc");

The Result
The output generated is below


Extract Content Between Specific Runs
Skip to end of metadata

Attachments:1
Added by Adam Skelton, last edited by Adam Skelton on Oct 14, 2012 (view change)
Go to start of metadata

You can extract content between inline nodes such as a Run as well. Runs from different paragraphs can be
passed as markers.

The code below shows how to extract specific text in-between the same Paragraph node.
Example
Shows how to extract content between specific runs of the same paragraph using the ExtractContent
method.
Java
// Load in the document
Document doc = new Document(gDataDir + "TestFile.doc");

// Retrieve a paragraph from the first section.
Paragraph para = (Paragraph)doc.getChild(NodeType.PARAGRAPH, 7, true);

// Use some runs for extraction.
Run startRun = para.getRuns().get(1);
Run endRun = para.getRuns().get(4);

// Extract the content between these nodes in the document. Include these markers in
the extraction.
ArrayList extractedNodes = extractContent(startRun, endRun, true);

// Get the node from the list. There should only be one paragraph returned in the
list.
Node node = (Node)extractedNodes.get(0);
// Print the text of this node to the console.
System.out.println(node.toString(SaveFormat.TEXT));

The Result
The extracted text displayed on the console.

Extract Content using a Field
Skip to end of metadata

Attachments:1
Added by Adam Skelton, last edited by Adam Skelton on Feb 09, 2012
Go to start of metadata

To use a field as marker, the FieldStart node should be passed. The last parameter to the ExtractContent
method will define if the entire field is to be included or not.

Lets extract the content between the FullName merge field and a paragraph in the document. We use the
DocumentBuilder.MoveToMergeField(String, Boolean, Boolean) method of DocumentBuilder class. This will
return the FieldStart node from the name of merge field passed to it. We will then
In our case lets set the last parameter passed to the ExtractContent method to false to exclude the
field from the extraction. We will render the extracted content to PDF.
Example
Shows how to extract content between a specific field and paragraph in the document using the
ExtractContent method.
Java
// Load in the document
Document doc = new Document(gDataDir + "TestFile.doc");

// Use a document builder to retrieve the field start of a merge field.
DocumentBuilder builder = new DocumentBuilder(doc);

// Pass the first boolean parameter to get the DocumentBuilder to move to the
FieldStart of the field.
// We could also get FieldStarts of a field using GetChildNode method as in the other
examples.
builder.moveToMergeField("Fullname", false, false);

// The builder cursor should be positioned at the start of the field.
FieldStart startField = (FieldStart)builder.getCurrentNode();
Paragraph endPara = (Paragraph)doc.getFirstSection().getChild(NodeType.PARAGRAPH, 5,
true);

// Extract the content between these nodes in the document. Don't include these
markers in the extraction.
ArrayList extractedNodes = extractContent(startField, endPara, false);

// Insert the content into a new separate document and save it to disk.
Document dstDoc = generateDocument(doc, extractedNodes);
dstDoc.save(gDataDir + "TestFile.Fields Out.pdf");

The Result
The extracted content between the field and paragraph, without the field and paragraph marker nodes
rendered to PDF.


Extract Content from a Comment
Skip to end of metadata

Attachments:3
Added by Adam Skelton, last edited by Adam Skelton on Feb 09, 2012
Go to start of metadata

A comment is made up of the CommentRangeStart, CommentRangeEnd and Comment nodes. All of these
nodes are inline. The first two nodes encapsulate the content in the document which is referenced by the
comment, as seen in the screenshot below.

The Comment node itself is an InlineStory that can contain paragraphs and runs. It represents the message of
the comment as seen as a comment bubble in the review pane. As this node is inline and a descendant of a body
you can also extract the content from inside this message as well.
In our document we have one comment. Lets display it by showing markup in the Review tab:


The comment encapsulates the heading, first paragraph and the table in the second section.
Lets extract this comment into a new document. The IsInclusive option dictates if the comment itself
is kept or discarded. The code to do this is below.
Example
Shows how to extract content referenced by a comment using the ExtractContent method.
Java
// Load in the document
Document doc = new Document(gDataDir + "TestFile.doc");

// This is a quick way of getting both comment nodes.
// Your code should have a proper method of retrieving each corresponding start and
end node.
CommentRangeStart commentStart =
(CommentRangeStart)doc.getChild(NodeType.COMMENT_RANGE_START, 0, true);
CommentRangeEnd commentEnd = (CommentRangeEnd)doc.getChild(NodeType.COMMENT_RANGE_END,
0, true);

// Firstly extract the content between these nodes including the comment as well.
ArrayList extractedNodesInclusive = extractContent(commentStart, commentEnd, true);
Document dstDoc = generateDocument(doc, extractedNodesInclusive);
dstDoc.save(gDataDir + "TestFile.CommentInclusive Out.doc");

// Secondly extract the content between these nodes without the comment.
ArrayList extractedNodesExclusive = extractContent(commentStart, commentEnd, false);
dstDoc = generateDocument(doc, extractedNodesExclusive);
dstDoc.save(gDataDir + "TestFile.CommentExclusive Out.doc");

The Result
Firstly the extracted output with the IsInclusive parameter set to true. The copy will contain the comment
as well.

Secondly the extracted output with isInclusive set to false. The copy contains the content but without the
comment.



How to Extract Text Only
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Oct 14, 2012 (view change)
Go to start of metadata

The ways to retrieve text from the document are:
Use Document.Save with SaveFormat.Text to save as plain text into a file or stream.
Use Node.ToString and pass the SaveFormat.Text parameter. Internally, this invokes save as text into
a memory stream and returns the resulting string.
Use Node.GetText to retrieve text with all Microsoft Word control characters including field codes.
Implement a custom DocumentVisitor to perform customized extraction.
Using Node.GetText
A Word document can contains control characters that designate special elements such as field, end of
cell, end of section etc. The full list of possible Word control characters is defined in the ControlChar
class. The Node.GetText method returns text with all of the control character characters present in the
node.
Calling ToString returns the plain text representation of the document only without control characters. For
further information on exporting as plain text see Using SaveFormat.Text.
Example
Shows the difference between calling the GetText and ToString methods on a node.
Java
Document doc = new Document();

// Enter a dummy field into the document.
DocumentBuilder builder = new DocumentBuilder(doc);
builder.insertField("MERGEFIELD Field");

// GetText will retrieve all field codes and special characters
System.out.println("GetText() Result: " + doc.getText());

// ToString will export the node to the specified format. When converted to text it
will not retrieve fields code
// or special characters, but will still contain some natural formatting characters
such as paragraph markers etc.
// This is the same as "viewing" the document as if it was opened in a text editor.
System.out.println("ToString() Result: " + doc.toString(SaveFormat.TEXT));

Using SaveFormat.Text
This example saves the document as follows:
Filters out field characters and field codes, shape, footnote, endnote and comment references.
Replaces end of paragraph ControlChar.Cr characters with ControlChar.CrLf combinations.
Uses UTF8 encoding.
Example
Shows how to save a document in TXT format.
Java
Document doc = new Document(getMyDir() + "Document.doc");

doc.save(getMyDir() + "Document.ConvertToTxt Out.txt");

How to Find and Highlight Text
Skip to end of metadata

Attachments:2
Added by hammad, last edited by Awais Hafeez on Jun 19, 2014 (view change)
Go to start of metadata

You can download the complete source code of the FindAndHighlight sample here.
This article describes now to programmatically find and highlight a particular word or a phrase in a
document using Aspose.Words.

It might seem easy at first to just find the string of text in a document and change its formatting, but the
main difficulty is that due to formatting, the match string could be spread over several runs of text.
Consider the following example. The phrase Hello World! consists of three different runs, its beginning
is italic, middle is bold, while the last part regular text. In addition to formatting, any bookmarks in the
middle of text will split it into more runs.
Hello World !
The above example is represented in Aspose.Words using the following objects:
Run ( Run.Text = Hello, Font.Italic = true)
Run ( Run.Text = World, Font.Bold = true)
Run ( Run.Text = !)
This article provides a solution designed to handle the described case if necessary it collects the word
(or phrase) from several runs, while skipping non-run nodes.
The following document is used in this sample.


The Code
The sample code will open a document and find any instance of the text your document. A replace
handler is set up to handle the logic to be applied to each resulting match found. In this case the resulting
runs are split around the txt and the resulting runs highlighted.
Example
Finds and highlights all instances of a particular word or a phrase in a Word document.
Java
package FindAndHighlight;

import java.util.regex.Pattern;
import java.util.ArrayList;
import java.awt.Color;
import java.io.File;
import java.net.URI;

import com.aspose.words.Document;
import com.aspose.words.IReplacingCallback;
import com.aspose.words.ReplaceAction;
import com.aspose.words.NodeType;
import com.aspose.words.ReplacingArgs;
import com.aspose.words.Node;
import com.aspose.words.Run;


class Program
{
public static void main(String[] args) throws Exception
{
// Sample infrastructure.
URI exeDir = Program.class.getResource("").toURI();
String dataDir = new File(exeDir.resolve("../../Data")) + File.separator;

Document doc = new Document(dataDir + "TestFile.doc");

// We want the "your document" phrase to be highlighted.
Pattern regex = Pattern.compile("your document", Pattern.CASE_INSENSITIVE);
// Generally it is recommend if you are modifying the document in a custom
replacement evaluator
// then you should use backward replacement by specifying false value to the
third parameter of the replace method.
doc.getRange().replace(regex, new ReplaceEvaluatorFindAndHighlight(), false);

// Save the output document.
doc.save(dataDir + "TestFile Out.doc");
}
}

class ReplaceEvaluatorFindAndHighlight implements IReplacingCallback
{
/**
* This method is called by the Aspose.Words find and replace engine for each
match.
* This method highlights the match string, even if it spans multiple runs.
*/
public int replacing(ReplacingArgs e) throws Exception
{
// This is a Run node that contains either the beginning or the complete
match.
Node currentNode = e.getMatchNode();

// The first (and may be the only) run can contain text before the match,
// in this case it is necessary to split the run.
if (e.getMatchOffset() > 0)
currentNode = splitRun((Run)currentNode, e.getMatchOffset());

// This array is used to store all nodes of the match for further
highlighting.
ArrayList runs = new ArrayList();

// Find all runs that contain parts of the match string.
int remainingLength = e.getMatch().group().length();
while (
(remainingLength > 0) &&
(currentNode != null) &&
(currentNode.getText().length() <= remainingLength))
{
runs.add(currentNode);
remainingLength = remainingLength - currentNode.getText().length();

// Select the next Run node.
// Have to loop because there could be other nodes such as BookmarkStart
etc.
do
{
currentNode = currentNode.getNextSibling();
}
while ((currentNode != null) && (currentNode.getNodeType() !=
NodeType.RUN));
}

// Split the last run that contains the match if there is any text left.
if ((currentNode != null) && (remainingLength > 0))
{
splitRun((Run)currentNode, remainingLength);
runs.add(currentNode);
}

// Now highlight all runs in the sequence.
for (Run run : (Iterable<Run>) runs)
run.getFont().setHighlightColor(Color.YELLOW);

// Signal to the replace engine to do nothing because we have already done all
what we wanted.
return ReplaceAction.SKIP;
}

/**
* Splits text of the specified run into two runs.
* Inserts the new run just after the specified run.
*/
private static Run splitRun(Run run, int position) throws Exception
{
Run afterRun = (Run)run.deepClone(true);
afterRun.setText(run.getText().substring(position));
run.setText(run.getText().substring((0), (0) + (position)));
run.getParentNode().insertAfter(afterRun, run);
return afterRun;
}
}

The Result
Each resulting match is highlighted yellow, even those matches that have different formatting and span
across multiple runs.

How to Insert a Document into another
Document
Skip to end of metadata

Added by hammad, last edited by tahir manzoor on May 26, 2014 (view change)
Go to start of metadata

There is often a need to insert one document into another. For example to insert a document at a bookmark,
merge field or at a custom text marker. At the moment, there is no single method in Aspose.Words that can do
this in one line of code.
However, a document in Aspose.Words is represented by a tree of nodes; the object model is rich and
the task of combining documents is just a matter of moving nodes between the document trees. This
article shows how to implement a method for inserting one document into another and using it in a
variety of scenarios.
I nsert a Document at Any Location
To insert the content of one document to another at an arbitrary location the following simple
InsertDocument method can be used. This technique will be referred to by other scenarios
described below.
Example
This is a method that inserts contents of one document at a specified location in another document.
Java
/**
* Inserts content of the external document after the specified node.
* Section breaks and section formatting of the inserted document are ignored.
*
* @param insertAfterNode Node in the destination document after which the content
* should be inserted. This node should be a block level node (paragraph or table).
* @param srcDoc The document to insert.
*/
public static void insertDocument(Node insertAfterNode, Document srcDoc) throws
Exception
{
// Make sure that the node is either a paragraph or table.
if ((insertAfterNode.getNodeType() != NodeType.PARAGRAPH) &
(insertAfterNode.getNodeType() != NodeType.TABLE))
throw new IllegalArgumentException("The destination node should be either a
paragraph or table.");

// We will be inserting into the parent of the destination paragraph.
CompositeNode dstStory = insertAfterNode.getParentNode();

// This object will be translating styles and lists during the import.
NodeImporter importer = new NodeImporter(srcDoc, insertAfterNode.getDocument(),
ImportFormatMode.KEEP_SOURCE_FORMATTING);

// Loop through all sections in the source document.
for (Section srcSection : srcDoc.getSections())
{
// Loop through all block level nodes (paragraphs and tables) in the body of
the section.
for (Node srcNode : (Iterable<Node>) srcSection.getBody())
{
// Let's skip the node if it is a last empty paragraph in a section.
if (srcNode.getNodeType() == (NodeType.PARAGRAPH))
{
Paragraph para = (Paragraph)srcNode;
if (para.isEndOfSection() && !para.hasChildNodes())
continue;
}

// This creates a clone of the node, suitable for insertion into the
destination document.
Node newNode = importer.importNode(srcNode, true);

// Insert new node after the reference node.
dstStory.insertAfter(newNode, insertAfterNode);
insertAfterNode = newNode;
}
}
}

This is a method that inserts contents of one document at a specified location in another document. This
method preserve the section breaks and section formatting of the inserted document.
Java
/**
* Inserts content of the external document after the specified node.
*
* @param insertAfterNode Node in the destination document after which the content
* should be inserted. This node should be a block level node (paragraph or table).
* @param srcDoc The document to insert.
*/
public static void InsertDocumentWithSectionFormatting(Node insertAfterNode, Document
srcDoc) throws Exception
{
// Make sure that the node is either a pargraph or table.
if ((insertAfterNode.getNodeType() != NodeType.PARAGRAPH) &
(insertAfterNode.getNodeType() != NodeType.TABLE))
throw new Exception("The destination node should be either a paragraph or
table.");

// Document to insert srcDoc into.
Document dstDoc = (Document)insertAfterNode.getDocument();

// To retain section formatting, split the current section into two at the marker
node and then import the content from srcDoc as whole sections.
// The section of the node which the insert marker node belongs to

Section currentSection = (Section)insertAfterNode.getAncestor(NodeType.SECTION);

// Don't clone the content inside the section, we just want the properties of the
section retained.
Section cloneSection = (Section)currentSection.deepClone(false);

// However make sure the clone section has a body, but no empty first paragraph.
cloneSection.ensureMinimum();

cloneSection.getBody().getFirstParagraph().remove();

// Insert the cloned section into the document after the original section.
insertAfterNode.getDocument().insertAfter(cloneSection, currentSection);

// Append all nodes after the marker node to the new section. This will split the
content at the section level at
// the marker so the sections from the other document can be inserted directly.
Node currentNode = insertAfterNode.getNextSibling();
while (currentNode != null)
{
Node nextNode = currentNode.getNextSibling();
cloneSection.getBody().appendChild(currentNode);
currentNode = nextNode;
}

// This object will be translating styles and lists during the import.
NodeImporter importer = new NodeImporter(srcDoc, dstDoc,
ImportFormatMode.USE_DESTINATION_STYLES);

// Loop through all sections in the source document.
for (Section srcSection : srcDoc.getSections())
{
Node newNode = importer.importNode(srcSection, true);
// Append each section to the destination document. Start by inserting it
after the split section.
dstDoc.insertAfter(newNode, currentSection);
currentSection = (Section)newNode;
}
}

I nsert a Document at a Bookmark
Use the InsertDocument method shown above to insert documents in bookmarked places of the
main template.
To do this, just create a bookmarked paragraph where you want the document to be inserted. This
bookmark should not enclose multiple paragraphs or text that you want to appear in the resulting
document after the generation. Just set an empty paragraph and bookmark it. You can even put a
small description of the inserted content inside this paragraph.
Example
I nvokes the I nsertDocument method shown above to insert a document at a bookmark.
Java
Document mainDoc = new Document(getMyDir() + "InsertDocument1.doc");
Document subDoc = new Document(getMyDir() + "InsertDocument2.doc");

Bookmark bookmark = mainDoc.getRange().getBookmarks().get("insertionPlace");
insertDocument(bookmark.getBookmarkStart().getParentNode(), subDoc);

mainDoc.save(getMyDir() + "InsertDocumentAtBookmark Out.doc");

I nsert a Document During Mail Merge
This example relies on the InsertDocument method shown at the beginning of the article to insert a
document into a merge field during mail merge execution.
Example
Demonstrates how to use the I nsertDocument method to insert a document into a merge field during mail merge.
Java
public void insertDocumentAtMailMerge() throws Exception
{
// Open the main document.
Document mainDoc = new Document(getMyDir() + "InsertDocument1.doc");

// Add a handler to MergeField event
mainDoc.getMailMerge().setFieldMergingCallback(new
InsertDocumentAtMailMergeHandler());

// The main document has a merge field in it called "Document_1".
// The corresponding data for this field contains fully qualified path to the
document
// that should be inserted to this field.
mainDoc.getMailMerge().execute(
new String[] { "Document_1" },
new String[] { getMyDir() + "InsertDocument2.doc" });

mainDoc.save(getMyDir() + "InsertDocumentAtMailMerge Out.doc");
}

private class InsertDocumentAtMailMergeHandler implements IFieldMergingCallback
{
/**
* This handler makes special processing for the "Document_1" field.
* The field value contains the path to load the document.
* We load the document and insert it into the current merge field.
*/
public void fieldMerging(FieldMergingArgs e) throws Exception
{
if ("Document_1".equals(e.getDocumentFieldName()))
{
// Use document builder to navigate to the merge field with the specified
name.
DocumentBuilder builder = new DocumentBuilder(e.getDocument());
builder.moveToMergeField(e.getDocumentFieldName());

// The name of the document to load and insert is stored in the field
value.
Document subDoc = new Document((String)e.getFieldValue());

// Insert the document.
insertDocument(builder.getCurrentParagraph(), subDoc);

// The paragraph that contained the merge field might be empty now and you
probably want to delete it.
if (!builder.getCurrentParagraph().hasChildNodes())
builder.getCurrentParagraph().remove();

// Indicate to the mail merge engine that we have inserted what we wanted.
e.setText(null);
}
}

public void imageFieldMerging(ImageFieldMergingArgs args) throws Exception
{
// Do nothing.
}
}


If a document to be inserted is stored as binary data in the database field (BLOB field), use the following
example.
Example
A slight variation to the above example to load a document from a BLOB database field instead of a file.
Java
private class InsertDocumentAtMailMergeBlobHandler implements IFieldMergingCallback
{
/**
* This handler makes special processing for the "Document_1" field.
* The field value contains the path to load the document.
* We load the document and insert it into the current merge field.
*/
public void fieldMerging(FieldMergingArgs e) throws Exception
{
if ("Document_1".equals(e.getDocumentFieldName()))
{
// Use document builder to navigate to the merge field with the specified
name.
DocumentBuilder builder = new DocumentBuilder(e.getDocument());
builder.moveToMergeField(e.getDocumentFieldName());

// Load the document from the blob field.
ByteArrayInputStream inStream = new
ByteArrayInputStream((byte[])e.getFieldValue());
Document subDoc = new Document(inStream);
inStream.close();

// Insert the document.
insertDocument(builder.getCurrentParagraph(), subDoc);

// The paragraph that contained the merge field might be empty now and you
probably want to delete it.
if (!builder.getCurrentParagraph().hasChildNodes())
builder.getCurrentParagraph().remove();

// Indicate to the mail merge engine that we have inserted what we wanted.
e.setText(null);
}
}

public void imageFieldMerging(ImageFieldMergingArgs args) throws Exception
{
// Do nothing.
}
}

I nsert a Document During Replace
Sometimes, there is a requirement to insert documents to places marked with some text. For
example, the template can contain paragraphs with the text [INTRODUCTION], [CONCLUSION]
and so forth. In the resulting document, these paragraphs should be replaced with the content
taken from external documents. This can be achieved with the following code, which also uses the
InsertDocument method.
Example
Shows how to insert content of one document into another during a customized find and replace operation.
Java
public void insertDocumentAtReplace() throws Exception
{
Document mainDoc = new Document(getMyDir() + "InsertDocument1.doc");
mainDoc.getRange().replace(Pattern.compile("\\[MY_DOCUMENT\\]"), new
InsertDocumentAtReplaceHandler(), false);
mainDoc.save(getMyDir() + "InsertDocumentAtReplace Out.doc");
}

How to Remove Footers but Leave Headers
Intact
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Feb 03, 2012 (view change)
Go to start of metadata

Each section in a document can have up to three headers and up to three footers (for first, even and odd
pages). If you want to delete all footers in a document you need to loop through all sections and remove every
footer node.
Example
Deletes all footers from all sections, but leaves headers intact.
Java
Document doc = new Document(getMyDir() + "HeaderFooter.RemoveFooters.doc");

for (Section section : doc.getSections())
{
// Up to three different footers are possible in a section (for first, even and
odd pages).
// We check and delete all of them.
HeaderFooter footer;

footer =
section.getHeadersFooters().getByHeaderFooterType(HeaderFooterType.FOOTER_FIRST);
if (footer != null)
footer.remove();

// Primary footer is the footer used for odd pages.
footer =
section.getHeadersFooters().getByHeaderFooterType(HeaderFooterType.FOOTER_PRIMARY);
if (footer != null)
footer.remove();

footer =
section.getHeadersFooters().getByHeaderFooterType(HeaderFooterType.FOOTER_EVEN);
if (footer != null)
footer.remove();
}

doc.save(getMyDir() + "HeaderFooter.RemoveFooters Out.doc");


{
public int replacing(ReplacingArgs e) throws Exception
{
Document subDoc = new Document(getMyDir() + "InsertDocument2.doc");

// Insert a document after the paragraph, containing the match text.
Paragraph para = (Paragraph)e.getMatchNode().getParentNode();
insertDocument(para, subDoc);

// Remove the paragraph with the match text.
para.remove();

return ReplaceAction.SKIP;
}
}

How to Remove Page and Section Breaks
Skip to end of metadata

Attachments:4
Added by hammad, last edited by Awais Hafeez on May 11, 2014 (view change)
Go to start of metadata

You can download the complete source code of the RemoveBreaks sample here.
A document often consists of several sections, for example section breaks to provide different page
settings for different parts of the document. Likewise, a document can have explicit page breaks to
separate content on different pages.

In most cases it is convenient to have a structured document, but sometimes multiple sections and user-
defined page breaks are redundant and it may become necessary to remove them. For example after
appending multiple documents together you may want to remove the separate section and combine them
into one. You may also want to remove redundant page breaks after mail merge.
An explicit page break can be caused by many different things in a document:
A page break character. This is represented in a document by the page break character
ControlChar.PageBreakChar.
A Section which is set to begin on a new page (by the sections PageSetup.SectionStart to
SectionStart.NewPage).
A Paragraph with ParagraphFormat.PageBreakBefore set. This forces a page break before the paragraph.
This sample shows how to remove page and section breaks from the document using Aspose.Words.
Solution
To remove page and section breaks from a document you should follow the steps below:
1. Load a document into the Document class by passing a file path or stream to the appropriate Document
constructor.
2. To remove page breaks:
1. Retrieve the collection of Paragraph nodes of document.
2. Check if each Paragraph has the ParagraphFormat.PageBreakBefore property set and set it to
false it if it does.
3. Scan each run of the paragraph for the ControlChar.PageBreakChar character and remove this
character.
3. The work involved to remove section breaks is a bit more involved. To remove section breaks you should
combine all sections in document into one section:
1. Iterate over all sections and move content into the last section.
2. Remove all sections except for the last section in the document.
The following Word document is used in this sample:




It contains one page break and one section break. The section break separates the document into two
different sections. The first section contains content in one column, while the second is formatted in a
two-column layout.
The Code
Removing Page Breaks
Firstly the code to remove the page breaks is discussed. Generally a single Run contains only a page
break character by itself. There are cases in which a run can contain text and a page break character and in
some cases even multiple page break characters. Therefore the code is made robust and all instances of
the page break character found are removed.
Example
Removes all page breaks from the document.
Java
private static void removePageBreaks(Document doc) throws Exception
{
// Retrieve all paragraphs in the document.
NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);

// Iterate through all paragraphs
for (Paragraph para : (Iterable<Paragraph>) paragraphs)
{
// If the paragraph has a page break before set then clear it.
if (para.getParagraphFormat().getPageBreakBefore())
para.getParagraphFormat().setPageBreakBefore(false);

// Check all runs in the paragraph for page breaks and remove them.
for (Run run : (Iterable<Run>) para.getRuns())
{
if (run.getText().contains(ControlChar.PAGE_BREAK))
run.setText(run.getText().replace(ControlChar.PAGE_BREAK, ""));
}

}

}

Firstly all paragraphs in the document are gathered using the Document.GetChildNodes method. (The
second parameter of the Document.GetChildNodes method is set to true, this instructs the method to
select all child nodes recursively, otherwise only immediate children will be selected.)

During the enumeration if a paragraph has the page break before setting enabled then the setting is
removed. Each run of the paragraph is then checked for the presence of a ControlChar.PageBreakChar
character. If a run contains one or more of these characters they are removed by replacing them with an
empty string.
Note that in a Word Document the same character is used to represent a page break and section break.
The ControlChar.PageBreakChar and ControlChar.SectionBreakChar are identical. You will only
ever encounter a page break represented by this character in a document using Aspose.Words. This is
explained further in the section below.
Removing Section Breaks
Removing section breaks from a document is more complicated than page breaks. In the Aspose.Words
document object model, sections are represented as separate instances of the Section class. The content
found within these sections is added as children of the Section object, for example as Body or Paragraph
nodes. To remove section breaks all content of the sections should be combined into one and the other
sections removed. This will achieve the same result as deleting each section break in Microsoft Word.
Depending on how you want to modify your document you may find in this situation that simply
changing each section to appear continually one after the other is the better option instead of combining
them all. This would allow different section formatting to still be retained. This can be achieved by
iterating through all sections in the document and setting PageSetup.SectionStart property of the
sections PageSetup class to SectionStart.Continuous .
In Microsoft Word when you delete a break between two the newly combined section inherits all
properties from the second section. Thus if all sections are combined in the same way the resulting
formatting should be inherited from the last section in the document. To match the same behavior
programmatically the code is set up to transfer all content from the first sections into the last section of
the document:
Example
Combines all sections in the document into one.
Java
private static void removeSectionBreaks(Document doc) throws Exception
{
// Loop through all sections starting from the section that precedes the last one
// and moving to the first section.
for (int i = doc.getSections().getCount() - 2; i >= 0; i--)
{
// Copy the content of the current section to the beginning of the last
section.
doc.getLastSection().prependContent(doc.getSections().get(i));
// Remove the copied section.
doc.getSections().get(i).remove();
}
}

Starting from the last section the content of each previous section is copied over to the beginning of the
last section using the Section.PrependContent method. Then the Section.Remove method is used to
remove the empty section.
End Result
The resulting document is shown below. All page breaks are removed and all sections combined. This
results in the text appearing together instead of split across different pages.



How to Rename Merge Fields
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Oct 16, 2012 (view change)
Go to start of metadata

An example that shows how to create your own MergeField class, that represents a single merge field in a
Microsoft Word document and allows you to get or set its name.
Example
Shows how to rename merge fields in a Word document.
Java
package Examples;

import org.testng.annotations.Test;
import com.aspose.words.Document;
import com.aspose.words.NodeCollection;
import com.aspose.words.NodeType;
import com.aspose.words.FieldStart;
import com.aspose.words.FieldType;
import com.aspose.words.Run;
import com.aspose.words.Node;

import java.util.regex.Matcher;
import java.util.regex.Pattern;


/**
* Shows how to rename merge fields in a Word document.
*/
public class ExRenameMergeFields extends ExBase
{
/**
* Finds all merge fields in a Word document and changes their names.
*/
public void renameMergeFields() throws Exception
{
// Specify your document name here.
Document doc = new Document(getMyDir() + "RenameMergeFields.doc");

// Select all field start nodes so we can find the merge fields.
NodeCollection fieldStarts = doc.getChildNodes(NodeType.FIELD_START, true);
for (FieldStart fieldStart : (Iterable<FieldStart>) fieldStarts)
{
if (fieldStart.getFieldType() == FieldType.FIELD_MERGE_FIELD)
{
MergeField mergeField = new MergeField(fieldStart);
mergeField.setName(mergeField.getName() + "_Renamed");
}
}

doc.save(getMyDir() + "RenameMergeFields Out.doc");
}
}

/**
* Represents a facade object for a merge field in a Microsoft Word document.
*/
class MergeField
{
MergeField(FieldStart fieldStart) throws Exception
{
if (fieldStart.equals(null))
throw new IllegalArgumentException("fieldStart");
if (fieldStart.getFieldType() != FieldType.FIELD_MERGE_FIELD)
throw new IllegalArgumentException("Field start type must be
FieldMergeField.");

mFieldStart = fieldStart;

// Find the field separator node.
mFieldSeparator = findNextSibling(mFieldStart, NodeType.FIELD_SEPARATOR);
if (mFieldSeparator == null)
throw new IllegalStateException("Cannot find field separator.");

// Find the field end node. Normally field end will always be found, but in
the example document
// there happens to be a paragraph break included in the hyperlink and this
puts the field end
// in the next paragraph. It will be much more complicated to handle fields
which span several
// paragraphs correctly, but in this case allowing field end to be null is
enough for our purposes.
mFieldEnd = findNextSibling(mFieldSeparator, NodeType.FIELD_END);
}

/**
* Gets or sets the name of the merge field.
*/
String getName() throws Exception
{
String fieldResult = getTextSameParent(mFieldSeparator.getNextSibling(),
mFieldEnd);
int startPos = fieldResult.indexOf("");
startPos = (startPos >= 0) ? startPos + 1 : 0;

int endPos = fieldResult.indexOf("");
endPos = (endPos >= 0) ? endPos : fieldResult.length();

return fieldResult.substring(startPos, endPos);
}
void setName(String value) throws Exception
{
// Merge field name is stored in the field result which is a Run
// node between field separator and field end.
Run fieldResult = (Run)mFieldSeparator.getNextSibling();
fieldResult.setText(java.text.MessageFormat.format("{0}", value));

// But sometimes the field result can consist of more than one run, delete
these runs.
removeSameParent(fieldResult.getNextSibling(), mFieldEnd);

updateFieldCode(value);
}

private void updateFieldCode(String fieldName) throws Exception
{
// Field code is stored in a Run node between field start and field separator.
Run fieldCode = (Run)mFieldStart.getNextSibling();
Matcher matcher = G_REGEX.matcher(fieldCode.getText());

matcher.find();

String newFieldCode = java.text.MessageFormat.format(" {0}{1} ",
matcher.group(1).toString(), fieldName);
fieldCode.setText(newFieldCode);

// But sometimes the field code can consist of more than one run, delete these
runs.
removeSameParent(fieldCode.getNextSibling(), mFieldSeparator);
}

/**
* Goes through siblings starting from the start node until it finds a node of the
specified type or null.
*/
private static Node findNextSibling(Node startNode, int nodeType) throws Exception
{
for (Node node = startNode; node != null; node = node.getNextSibling())
{
if (node.getNodeType() == nodeType)
return node;
}
return null;
}

/**
* Retrieves text from start up to but not including the end node.
*/
private static String getTextSameParent(Node startNode, Node endNode) throws
Exception
{
if ((endNode != null) && (startNode.getParentNode() !=
endNode.getParentNode()))
throw new IllegalArgumentException("Start and end nodes are expected to
have the same parent.");

StringBuilder builder = new StringBuilder();
for (Node child = startNode; !child.equals(endNode); child =
child.getNextSibling())
builder.append(child.getText());

return builder.toString();
}

/**
* Removes nodes from start up to but not including the end node.
* Start and end are assumed to have the same parent.
*/
private static void removeSameParent(Node startNode, Node endNode) throws
Exception
{
if ((endNode != null) && (startNode.getParentNode() !=
endNode.getParentNode()))
throw new IllegalArgumentException("Start and end nodes are expected to
have the same parent.");

Node curChild = startNode;
while ((curChild != null) && (curChild != endNode))
{
Node nextChild = curChild.getNextSibling();
curChild.remove();
curChild = nextChild;
}
}

private final Node mFieldStart;
private final Node mFieldSeparator;
private final Node mFieldEnd;

private static final Pattern G_REGEX =
Pattern.compile("\\s*(MERGEFIELD\\s|)(\\s|)(\\S+)\\s+");
}

How to Replace Fields with Static Text
Skip to end of metadata

Attachments:2
Added by hammad, last edited by Awais Hafeez on May 11, 2014 (view change)
Go to start of metadata

You can download the complete source code of the ReplaceFieldsWithStaticText sample here.
This technique refers to removing dynamic fields from a document which change the text they display
when updated and transforming them into plain text that will remain as they are even when fields are
updated.

This is often required when you wish to save your document as a static copy, for example for when
sending as an attachment in an e-mail. The conversion of fields such as a DATE or TIME field to static
text will enable them to display the same date as when you sent them. In some situations you may need to
remove conditional IF fields from your document and replace them with the most recent text result
instead. For example, converting the result of an IF field to static text so it will no longer dynamically
change its value if the fields in the document are updated.
The Solution
The process of converting fields to static text involves extracting the field result (the most recently
updated text stored in the field) and retaining this value while removing the field objects around it. This
will result in what was a dynamic field to be static text instead.

For example, the diagram below shows how an IF field is stored in a document. The text is
encompassed by the special field nodes FieldStart and FieldEnd . The FieldSeparator node separates
the text inside the field into the field code and field result. The field code is what defines the general
behavior of the field while the field result stores the most recent result when this field is updated by either
by Microsoft Word or Aspose.Words. The field result is what is stored in the field and displayed in the
document when viewed.


The structure can also be seen below in hierarchical form using the demo project DocumentExplorer ,
which ships with the Aspose.Words installer.


As described in the process above, to convert the field to static text all nodes between the FieldStart and
FieldSeparator inclusive, and also the FieldEnd node must be removed.
Please note that this technique cannot be used properly on some fields in the header or footer. For
example attempting to convert a PAGE field in a header or footer to static text will cause the same value
to appear across all pages. This is because headers and footers are repeated across multiple pages and
when they remain as fields they are handled specially so they display the correct result for each page.
However upon conversion, the field in the header is transformed into a static run of text. This run of text
will be evaluated as if it is the last page in the section which will cause any of PAGE field in the header to
display the last page over all pages.
The Code
The implementation which converts fields to static text is described below. The
ConvertFieldsToStaticText method can be called at any time within your application. After invoking
this method, all of the fields of the specified field type that are contained within the composite node will
be transformed into static text.
Example
This class provides a static method convert fields of a particular type to static text.
Java
private static class FieldsHelper extends DocumentVisitor
{
/**
* Converts any fields of the specified type found in the descendants of the node
into static text.
*
* @param compositeNode The node in which all descendants of the specified
FieldType will be converted to static text.
* @param targetFieldType The FieldType of the field to convert to static text.
*/
public static void convertFieldsToStaticText(CompositeNode compositeNode, int
targetFieldType) throws Exception
{
FieldsHelper helper = new FieldsHelper(targetFieldType);
compositeNode.accept(helper);

}

private FieldsHelper(int targetFieldType)
{
mTargetFieldType = targetFieldType;
}

public int visitFieldStart(FieldStart fieldStart)
{
// We must keep track of the starts and ends of fields incase of any nested
fields.
if (fieldStart.getFieldType() == mTargetFieldType)
{
mFieldDepth++;
fieldStart.remove();
}
else
{
// This removes the field start if it's inside a field that is being
converted.
CheckDepthAndRemoveNode(fieldStart);
}

return VisitorAction.CONTINUE;
}

public int visitFieldSeparator(FieldSeparator fieldSeparator)
{
// When visiting a field separator we should decrease the depth level.
if (fieldSeparator.getFieldType() == mTargetFieldType)
{
mFieldDepth--;
fieldSeparator.remove();
}
else
{
// This removes the field separator if it's inside a field that is being
converted.
CheckDepthAndRemoveNode(fieldSeparator);
}

return VisitorAction.CONTINUE;
}

public int visitFieldEnd(FieldEnd fieldEnd)
{
if (fieldEnd.getFieldType() == mTargetFieldType)
fieldEnd.remove();
else
CheckDepthAndRemoveNode(fieldEnd); // This removes the field end if it's
inside a field that is being converted.

return VisitorAction.CONTINUE;
}

public int visitRun(Run run)
{
// Remove the run if it is between the FieldStart and FieldSeparator of the
field being converted.
CheckDepthAndRemoveNode(run);

return VisitorAction.CONTINUE;
}

public int visitParagraphEnd(Paragraph paragraph)
{
if (mFieldDepth > 0)
{
// The field code that is being converted continues onto another
paragraph. We
// need to copy the remaining content from this paragraph onto the next
paragraph.
Node nextParagraph = paragraph.getNextSibling();

// Skip ahead to the next available paragraph.
while (nextParagraph != null && nextParagraph.getNodeType() !=
NodeType.PARAGRAPH)
nextParagraph = nextParagraph.getNextSibling();

// Copy all of the nodes over. Keep a list of these nodes so we know not
to remove them.
while (paragraph.hasChildNodes())
{
mNodesToSkip.add(paragraph.getLastChild());
((Paragraph)nextParagraph).prependChild(paragraph.getLastChild());
}

paragraph.remove();
}

return VisitorAction.CONTINUE;
}

public int visitTableStart(Table table)
{
CheckDepthAndRemoveNode(table);

return VisitorAction.CONTINUE;
}

/**
* Checks whether the node is inside a field or should be skipped and then removes
it if necessary.
*/
private void CheckDepthAndRemoveNode(Node node)
{
if (mFieldDepth > 0 && !mNodesToSkip.contains(node))
node.remove();
}

private int mFieldDepth = 0;
private ArrayList mNodesToSkip = new ArrayList();
private int mTargetFieldType;
}

The method accepts two parameters, A CompositeNode and a FieldType enumeration. Being able to
pass any composite node to this method allows you to convert fields to static text in specific parts of your
document only.

For example you can pass a Document object and convert the fields of the specified type from the entire
document to static text, or you could pass the Body object of a section and convert only fields found
within that body.
When passing a block level node such as a Paragraph , be aware that in some cases fields can span
across multiple paragraphs. For instance the FieldCode of a field can contain multiple paragraphs which
will cause the FieldEnd to appear in a separate paragraph from the corresponding FieldStart . In this
case you will find that a portion of the field code may still remain after the process has finished. If this
happens then it is recommended to instead pass the parent of the composite to avoid this.
The FieldType enumeration passed to the method specifies what type of field should be convert to static
text. A field of any other type encountered in the document will be left unchanged.
Example
Shows how to convert all fields of a specified type in a document to static text.
Java
Document doc = new Document(dataDir + "TestFile.doc");

// Pass the appropriate parameters to convert all IF fields encountered in the
document (including headers and footers) to static text.
FieldsHelper.convertFieldsToStaticText(doc, FieldType.FIELD_IF);

// Save the document with fields transformed to disk.
doc.save(dataDir + "TestFileDocument Out.doc");

Example
Shows how to convert all fields of a specified type in a body of a document to static text.
Java
Document doc = new Document(dataDir + "TestFile.doc");

// Pass the appropriate parameters to convert PAGE fields encountered to static text
only in the body of the first section.
FieldsHelper.convertFieldsToStaticText(doc.getFirstSection().getBody(),
FieldType.FIELD_PAGE);

// Save the document with fields transformed to disk.
doc.save(dataDir + "TestFileBody Out.doc");

Example
Shows how to convert all fields of a specified type in a paragraph to static text.
Java
Document doc = new Document(dataDir + "TestFile.doc");

// Pass the appropriate parameters to convert all IF fields to static text that are
encountered only in the last
// paragraph of the document.
FieldsHelper.convertFieldsToStaticText(doc.getFirstSection().getBody().getLastParagrap
h(), FieldType.FIELD_IF);

// Save the document with fields transformed to disk.
doc.save(dataDir + "TestFileParagraph Out.doc");

How to Replace or Modify Hyperlinks
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Feb 03, 2012 (view change)
Go to start of metadata

This example shows how to find and modify all hyperlinks in a Word document.

To find and modify hyperlinks it would be nice to have some sort of Hyperlink object with properties, but in the
current version, there is no built-in functionality in Aspose.Words to deal with hyperlink fields.
Hyperlinks in Microsoft Word documents are fields. A field consists of the field code and field result.
In the current version of Aspose.Words, there is no single object that represents a field. Aspose.Words
represents a field by a set of nodes: FieldStart , one or more Run nodes of the field code,
FieldSeparator , one or more Run nodes of the field result and FieldEnd .
While Aspose.Words does not have a high-level abstraction to represent fields and hyperlink fields in
particular, all of the necessary low-level document elements and their properties are exposed and with
a bit of coding you can implement quite sophisticated document manipulation features.
This example shows how to create a simple class that represents a hyperlink in the document. Its
constructor accepts a FieldStart object that must have FieldType.FieldHyperlink type. After you use
the Hyperlink class, you can get or set its Target , Name , and IsLocal properties. Now it is easy to
change targets and names of the hyperlinks throughout the document. In the example, all of the
hyperlinks are changed to http://aspose.com.
Example
Finds all hyperlinks in a Word document and changes their URL and display name.
Java
package Examples;

import org.testng.annotations.Test;
import com.aspose.words.Document;
import com.aspose.words.NodeList;
import com.aspose.words.FieldStart;
import com.aspose.words.FieldType;
import com.aspose.words.NodeType;
import com.aspose.words.Run;
import com.aspose.words.Node;

import java.util.regex.Matcher;
import java.util.regex.Pattern;


/**
* Shows how to replace hyperlinks in a Word document.
*/
public class ExReplaceHyperlinks extends ExBase
{
/**
* Finds all hyperlinks in a Word document and changes their URL and display name.
*/
public void replaceHyperlinks() throws Exception
{
// Specify your document name here.
Document doc = new Document(getMyDir() + "ReplaceHyperlinks.doc");

// Hyperlinks in a Word documents are fields, select all field start nodes so
we can find the hyperlinks.
NodeList fieldStarts = doc.selectNodes("//FieldStart");
for (FieldStart fieldStart : (Iterable<FieldStart>) fieldStarts)
{
if (fieldStart.getFieldType() == FieldType.FIELD_HYPERLINK)
{
// The field is a hyperlink field, use the "facade" class to help to
deal with the field.
Hyperlink hyperlink = new Hyperlink(fieldStart);

// Some hyperlinks can be local (links to bookmarks inside the
document), ignore these.
if (hyperlink.isLocal())
continue;

// The Hyperlink class allows to set the target URL and the display
name
// of the link easily by setting the properties.
hyperlink.setTarget(NEW_URL);
hyperlink.setName(NEW_NAME);
}
}

doc.save(getMyDir() + "ReplaceHyperlinks Out.doc");
}

private static final String NEW_URL = "http://www.aspose.com";
private static final String NEW_NAME = "Aspose - The .NET & Java Component
Publisher";
}


/**
* This "facade" class makes it easier to work with a hyperlink field in a Word
document.
*
* A hyperlink is represented by a HYPERLINK field in a Word document. A field in
Aspose.Words
* consists of several nodes and it might be difficult to work with all those nodes
directly.
* Note this is a simple implementation and will work only if the hyperlink code and
name
* each consist of one Run only.
*
* [FieldStart][Run - field code][FieldSeparator][Run - field result][FieldEnd]
*
* The field code contains a string in one of these formats:
* HYPERLINK "url"
* HYPERLINK \l "bookmark name"
*
* The field result contains text that is displayed to the user.
*/
class Hyperlink
{
Hyperlink(FieldStart fieldStart) throws Exception
{
if (fieldStart == null)
throw new IllegalArgumentException("fieldStart");
if (fieldStart.getFieldType() != FieldType.FIELD_HYPERLINK)
throw new IllegalArgumentException("Field start type must be
FieldHyperlink.");

mFieldStart = fieldStart;

// Find the field separator node.
mFieldSeparator = findNextSibling(mFieldStart, NodeType.FIELD_SEPARATOR);
if (mFieldSeparator == null)
throw new IllegalStateException("Cannot find field separator.");

// Find the field end node. Normally field end will always be found, but in
the example document
// there happens to be a paragraph break included in the hyperlink and this
puts the field end
// in the next paragraph. It will be much more complicated to handle fields
which span several
// paragraphs correctly, but in this case allowing field end to be null is
enough for our purposes.
mFieldEnd = findNextSibling(mFieldSeparator, NodeType.FIELD_END);

// Field code looks something like [ HYPERLINK "http:\\www.myurl.com" ], but
it can consist of several runs.
String fieldCode = getTextSameParent(mFieldStart.getNextSibling(),
mFieldSeparator);
Matcher matcher = G_REGEX.matcher(fieldCode.trim());
matcher.find();
mIsLocal = (matcher.group(1) != null) && (matcher.group(1).length() > 0);
//The link is local if \l is present in the field code.
mTarget = matcher.group(2).toString();
}

/**
* Gets or sets the display name of the hyperlink.
*/
String getName() throws Exception
{
return getTextSameParent(mFieldSeparator, mFieldEnd);
}
void setName(String value) throws Exception
{
// Hyperlink display name is stored in the field result which is a Run
// node between field separator and field end.
Run fieldResult = (Run)mFieldSeparator.getNextSibling();
fieldResult.setText(value);

// But sometimes the field result can consist of more than one run, delete
these runs.
removeSameParent(fieldResult.getNextSibling(), mFieldEnd);
}

/**
* Gets or sets the target url or bookmark name of the hyperlink.
*/
String getTarget() throws Exception
{
return mTarget;
}
void setTarget(String value) throws Exception
{
mTarget = value;
updateFieldCode();
}

/**
* True if the hyperlink's target is a bookmark inside the document. False if the
hyperlink is a url.
*/
boolean isLocal() throws Exception
{
return mIsLocal;
}
void isLocal(boolean value) throws Exception
{
mIsLocal = value;
updateFieldCode();
}

private void updateFieldCode() throws Exception
{
// Field code is stored in a Run node between field start and field separator.
Run fieldCode = (Run)mFieldStart.getNextSibling();
fieldCode.setText(java.text.MessageFormat.format("HYPERLINK {0}\"{1}\"",
((mIsLocal) ? "\\l " : ""), mTarget));

// But sometimes the field code can consist of more than one run, delete these
runs.
removeSameParent(fieldCode.getNextSibling(), mFieldSeparator);
}

/**
* Goes through siblings starting from the start node until it finds a node of the
specified type or null.
*/
private static Node findNextSibling(Node startNode, int nodeType) throws Exception
{
for (Node node = startNode; node != null; node = node.getNextSibling())
{
if (node.getNodeType() == nodeType)
return node;
}
return null;
}

/**
* Retrieves text from start up to but not including the end node.
*/
private static String getTextSameParent(Node startNode, Node endNode) throws
Exception
{
if ((endNode != null) && (startNode.getParentNode() !=
endNode.getParentNode()))
throw new IllegalArgumentException("Start and end nodes are expected to
have the same parent.");

StringBuilder builder = new StringBuilder();
for (Node child = startNode; !child.equals(endNode); child =
child.getNextSibling())
builder.append(child.getText());

return builder.toString();
}

/**
* Removes nodes from start up to but not including the end node.
* Start and end are assumed to have the same parent.
*/
private static void removeSameParent(Node startNode, Node endNode) throws
Exception
{
if ((endNode != null) && (startNode.getParentNode() !=
endNode.getParentNode()))
throw new IllegalArgumentException("Start and end nodes are expected to
have the same parent.");

Node curChild = startNode;
while ((curChild != null) && (curChild != endNode))
{
Node nextChild = curChild.getNextSibling();
curChild.remove();
curChild = nextChild;
}
}

private final Node mFieldStart;
private final Node mFieldSeparator;
private final Node mFieldEnd;
private boolean mIsLocal;
private String mTarget;

/**
* RK I am notoriously bad at regexes. It seems I don't understand their way of
thinking.
*/
private static final Pattern G_REGEX = Pattern.compile(
"\\S+" + // one or more non spaces HYPERLINK or other word in other
languages
"\\s+" + // one or more spaces
"(?:\"\"\\s+)?" + // non capturing optional "" and one or more spaces,
found in one of the customers files.
"(\\\\l\\s+)?" + // optional \l flag followed by one or more spaces
"\"" + // one apostrophe
"([^\"]+)" + // one or more chars except apostrophe (hyperlink target)
"\"" // one closing apostrophe
);
}

How to Use Control Characters
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Feb 03, 2012 (view change)
Go to start of metadata

Microsoft Word documents may contain various characters that have a special meaning. Normally they are
used for formatting purposes and are not drawn in the normal mode. You can make them visible if you click the
Show/Hide Formatting Marks button located on the Standard toolbar.

Sometimes you may need to add or remove characters to/from the text. For instance, when obtaining text
programmatically from the document, Aspose.Words preserves most of the control characters, so if you need
to work with this text you should probably remove or replace the characters.
The ControlChar class is a repository for the constants that represent control characters often
encountered in documents. It provides both char and string versions of the same constants. For
example, string ControlChar.LineBreak and char ControlChar.LineBreakChar have the same
value.

Use this class whenever you want to deal with control characters.
Example
Shows how to use control characters.
Java
// Replace "\r" control character with "\r\n"
text = text.replace(ControlChar.CR, ControlChar.CR_LF);

How to Convert Between Measurement Units
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Feb 03, 2012 (view change)
Go to start of metadata

Most of the object properties provided in the Aspose.Words API that represent some measurement
(width/height, margins and various distances) accept values in points (1 inch equals 72 points). Sometimes this
is not convenient so there is the ConvertUtil class that provides helper functions to convert between various
measurement units. It allows converting inches to points, points to inches, pixels to points, and points to pixels.
When pixels are converted to points and vice versa, it can be performed at 96 dpi (dots per inch) resolution or
at the specified dpi resolution.

ConvertUtil is very useful when setting different page properties because for instance inches are more usual
measurement units than points. The following example demonstrates how to set up the page properties in
inches.
Example
Shows how to specify page properties in inches.
Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

PageSetup pageSetup = builder.getPageSetup();
pageSetup.setTopMargin(ConvertUtil.inchToPoint(1.0));
pageSetup.setBottomMargin(ConvertUtil.inchToPoint(1.0));
pageSetup.setLeftMargin(ConvertUtil.inchToPoint(1.5));
pageSetup.setRightMargin(ConvertUtil.inchToPoint(1.5));
pageSetup.setHeaderDistance(ConvertUtil.inchToPoint(0.2));
pageSetup.setFooterDistance(ConvertUtil.inchToPoint(0.2));

Aspose.Words Product Documentation PAGE * Arabic * MERGEFORMAT 1
How-to Insert and Work with the Table of
Contents Field
Skip to end of metadata

Attachments:11
Added by Adam Skelton, last edited by Adam Skelton on Mar 20, 2014
Go to start of metadata

Often you will work with documents containing a table of contents (TOC). Using Aspose.Words you can insert
your own table of contents or completely rebuild existing table of contents in the document using just a few
lines of code.
This article outlines how to work with the table of contents field and demonstrates:
How to insert a brand new TOC
Update new or existing TOCs in the document.
Specify switches to control the formatting and overall structure f the TOC.
How to modify the styles and appearance of the table of contents.
How to remove an entire TOC field along with all entries form the document.
Insert a Table of Contents Programmatically
The DocumentBuilder.InsertTableOfContents method is called to insert a TOC field into the document at
the current position of the DocumentBuilder.
A table of contents in a Word document can be built in a number of ways and formatted using a variety of
options. The field switches that you pass to the method control the way the table is built and displayed in
your document.
The default switches that are used in a TOC inserted in Microsoft Word are \o 1-3 \h \z \u.
Descriptions of these switches as well as a list of supported switches can be found later in the article. You
can either use that guide obtain the correct switches or if you already have a document containing the
similar TOC that you want you can show field codes (ALT+F9) and copy the switches directly from the
field.
Example
Shows how to insert a Table of Contents field into a document.
Java
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

// Insert a table of contents at the beginning of the document.
builder.insertTableOfContents("\\o \"1-3\" \\h \\z \\u");

// The newly inserted table of contents will be initially empty.
// It needs to be populated by updating the fields in the document.
doc.updateFields();


Example
Demonstrates how to insert a Table of contents (TOC) into a document using heading styles as entries.
Java
Document doc = new Document();

// Create a document builder to insert content with into document.
DocumentBuilder builder = new DocumentBuilder(doc);

// Insert a table of contents at the beginning of the document.
builder.insertTableOfContents("\\o \"1-3\" \\h \\z \\u");

// Start the actual document content on the second page.
builder.insertBreak(BreakType.PAGE_BREAK);

// Build a document with complex structure by applying different heading styles thus
creating TOC entries.
builder.getParagraphFormat().setStyleIdentifier(StyleIdentifier.HEADING_1);

builder.writeln("Heading 1");

builder.getParagraphFormat().setStyleIdentifier(StyleIdentifier.HEADING_2);

builder.writeln("Heading 1.1");
builder.writeln("Heading 1.2");

builder.getParagraphFormat().setStyleIdentifier(StyleIdentifier.HEADING_1);

builder.writeln("Heading 2");
builder.writeln("Heading 3");

builder.getParagraphFormat().setStyleIdentifier(StyleIdentifier.HEADING_2);

builder.writeln("Heading 3.1");

builder.getParagraphFormat().setStyleIdentifier(StyleIdentifier.HEADING_3);

builder.writeln("Heading 3.1.1");
builder.writeln("Heading 3.1.2");
builder.writeln("Heading 3.1.3");

builder.getParagraphFormat().setStyleIdentifier(StyleIdentifier.HEADING_2);

builder.writeln("Heading 3.2");
builder.writeln("Heading 3.3");

// Call the method below to update the TOC.
doc.updateFields();

The code demonstrates the new table of contents being inserted into a blank document. The
DocumentBuilder class is then used to insert some sample content formatting with the appropriate
heading styles which are used to mark the content to be included in the TOC. The next lines then populate
the TOC by updating the fields and page layout of the document.

Without these calls when the output document is opened you would find that there would be a TOC field
but with no visible content. This is because the TOC field has been inserted but is not yet populated until
its updated in the document. Further information about this is discussed in the next section.
Updating the Table of Contents
Aspose.Words allows you to completely update a TOC with only a few lines of code. This can be done to
populate a newly inserted TOC or to update an existing TOC after changes to the document have been
made.
The following two methods must be used in order to update the TOC fields in the document:
1. Document.UpdateFields
2. Document.UpdatePageLayout
Please note that these two update methods are required to be called in that order. If reversed the table of
contents will be populated but no page numbers will be displayed. Any number of different TOCs can be
updated. These methods will automatically update all TOCs found in the document.
Example
Shows how to completely rebuild TOC fields in the document by invoking field update.
Java
doc.updateFields();

The first call to Document.UpdateFields will build the TOC, all text entries are populated and the TOC
appears almost complete. The only thing missing is the page numbers which for now are displayed with
?.
The second call to Document.UpdatePageLayout will build the layout of the document in memory. This
needs to be done to gather the page numbers of the entries. The correct page numbers calculated from this
call are then inserted into the TOC.
Using Switches to Control the Behavior of the Table of Contents.
As with any other field, the TOC field can accept switches defined within the field code that control the
how the table of contents is built. Certain switches are used to control which entries are included and at
what level while others are used to control the appearance of the TOC. Switches can be combined
together to allow complex table of contents to be produced.

By default these switches above are included when inserting a default TOC in the document. A TOC with
no switches will include content from the built-in heading styles (as if the \O switch is set).
The available TOC switches that are supported by Aspose.Words are listed below and their uses are
described in detail. They can be divided into separate sections based off their type. The switches in the
first section define what content to include in the TOC and the switches in the second section control the
appearance of the TOC.
If a switch is not listed here then it is currently unsupported. All switches will be supported in future
versions. We are adding further support with every release.
Entry Marking Switches
Switch Description
Headin
g
Styles
This switch defines that the TOC should be built off the built-in heading styles. In Microsoft Word these are
defined by Heading 1 Heading 9. In Aspose.Words these styles are represented by the corresponding
StyleIdentifier enumeration. This enumeration represents a locale independent identifier of a style, for
(\O
Switch
)

example StyleIdentifier.Heading1 represents the Heading 1 style. Using this, the formatting and properties
of the style can be retrieved from the Style collection of the document. The corresponding Style class can
be retrieved from the Document.Styles collection by using the indexed property of type StyleIdentifier.


Any content formatted with these styles are included in the table of contents. The level of the heading will
define the corresponding hierarchical level of the entry in the TOC. For instance, a paragraph with Heading
1 style will be treated as the first level in the TOC whereas a paragraph with Heading 2 will be treated as
the next level in the hierarchy and so forth.
Outlin
e
Levels
(\U
switch)


Each paragraph can define an outline level under Paragraph options.


This setting dictates which level this paragraph should be treated in document hierarchy. This is commonly
used practice used to easily structure the layout of a document. This hierarchy can be viewed by changing
to Outline View in Microsoft Word. Similar to heading styles, there can be 1 9 outline levels in addition to
the Body Text level. Outline levels 1 9 will appear in the TOC in the corresponding level of the hierarchy
Any content with an outline level either set in the paragraph style or directly on the paragraph itself is
included in the TOC. In Aspose.Words the outline level is represented by the
ParagraphFormat.OutlineLevel property of the Paragraph node. The outline level of a paragraph style is
represented in the same way by the Style.ParagraphFormat property.

Note that built-in heading styles such as Heading 1 have an outline level compulsory set in style
settings.

Custo
m
Styles
(\T
switch)


This switch will allow custom styles to be used when collecting entries to be used in the TOC. This is often
used in conjunction with the \O switch to include custom styles along with built-in heading styles in the
TOC.
The parameters of the switch should be enclosed within speech marks. Many custom styles can be
included, for each style the name should be specified followed by a comma followed by the level that the
style should appear in the TOC as. Further styles are also separated by a comma as well.
For instance
{ TOC \o 1-3 \t CustomHeading1, 1, CustomHeading2, 2}
will use content styled with CustomHeading1 as level 1 content in the TOC and CustomHeading2
as level 2.
Use TC
Fields
(\F and
\L
Switch
es)

In older versions of Microsoft Word, the only way to build a TOC was the use of TC fields. These fields are
inserted hidden into the document even when field codes are shown. They include the text that should be
displayed in the entry and the TOC is built from them. This functionality is now not used very often but
may still be useful in some occasions to include entries in the TOC which are not indented to be visible in
the document.
When inserted these fields appear hidden even when field codes are displayed. They cannot be seen
without showing hidden content. To see these fields Show paragraph formatting must be selected.



These fields can be inserted into a document at any position like any other field and are represented by
the FieldType.FieldTOCEntry enumeration.

The \F switch in a TOC is used to specify that TC fields should be used as entries. The switch on its own
without any extra identifier means that any TC field in the document will be included. Any extra
parameter, often a single letter, will designate that only TC fields which have a matching \f switch will be
included in the TOC. For instance *
{ TOC \f t }
will only include TC fields such as
{ TC \f t }
The TOC field also has a related switch, the \L switch specifies that only TC field with levels
within the specified range are included.


The TC fields themselves also can have several switches set. These are:
\F Explained above.
\L Defines which level in the TOC this TC field will appear in. A TOC which uses this same switch
will only include this TC field if its within the specified range.
_\N The page numbering for this TOC entry is not displayed.Sample code of how to insert TC
fields can be found in the next section.

Appearance Related Switches
Switch Description
Omit
Page
Numbe
rs
(\N
Switch)

This switch is used to hide page numbers for certain levels of the TOC. For example you can define
{TOC \o 1-4 \n 3-4 }
and the page numbers on the entries of levels 3 and four will be hidden along with the leader dots
(if there are any). To specify only one level a range should still be used, for example 1-1 will
exclude page numbers only for the first level.
Supplying no level range will omit page numbers for all levels in the TOC. This is useful to set
when exporting a document to HTML or similar format. This is because HTML based formats
dont have any page concept and thus dont need any page numbering.

Insert
As
Hyperli
nks
(\H
Switch)

This switch specifies that TOC entries are inserted as hyperlinks. When viewing a document in Microsoft
Word these entries will still appear as normal text inside the TOC but are hyperlinked and thus can be
used to navigate to the position of the original entry in the document by using Ctrl + Left Click in Microsoft
Word. When this switch is included then these links are also preserved in other formats. For instance in
HTML based formats including EPUB and rendered formats such as PDF and XPS these will be exported as
working links.
Without this switch set the TOC in all of these outputs will be exported as plain text and will not
demonstrate this behavior. If a document is opened in MS Word the text of the entries will also not be
clickable in this way but the page numbers can still be used to navigate to the original entry.


Set
Separa
tor
This switch allows the content separating the title of the entry and page numbering to be easily changed
in the TOC. The separator to use should be specified after this switch and enclosed in speech marks.
Contrary to what is documented in Office documentation, only one character can be used instead of up to
Charact
er
(\P
Switch)


five. This applies to both MS Word and Aspose.Words.
Using this switch is not recommended as it does not allow much control over what it used to separate
entries and page numbers in the TOC. Instead it is recommended to edit the appropriate TOC style such as
StyleIdentifier.TOC1 and from there edit the leader style with access to specific font members etc. Further
details of how to do this can be found later in the article.


Preserv
e Tab
Entries
(\W
Switch)

Using this switch will specify that that any entries that have a tab character, for instance a heading which
has a tab at the end of the line, will be retained as a proper tab character when populating the TOC. This
means the function of the tab character will be present in the TOC and can be used to format the entry.
For example certain entries may use tab stops and tab characters to evenly space out the text. As long as
the corresponding TOC level defines the equivalent tab stops then the generated TOC entries will appear
with similar spacing.

In the same situation if this switch was not defined then the tab characters would be converted to white
space equivalent as non functioning tabs. The output would then not appear as expected.


Preserv
e New
Line
Entries
(\X
Switch)


Similar to the switch above, this switch specifies that headings spanning over multiple lines (using new line
characters not separate paragraphs) will be preserved as they are in the generated TOC. For example, a
heading which is to spread across multiple lines can use the new line character (Ctrl + Enter or
ControlChar.LineBreak) to separate content across different lines. With this switch specified, the entry in
the TOC will preserve these new line characters as shown below.

In this situation if the switch is not defined then the new line characters are converted to a single white
space.



Insert TC Fields
You can insert a new TC field at the current position of the DocumentBuilder by calling the
DocumentBuilder.InsertField method and specifying the field name as TC along with any switches that
are needed.
Example
Shows how to insert a TC field into the document using DocumentBuilder.
Java
Document doc = new Document();

// Create a document builder to insert content with.
DocumentBuilder builder = new DocumentBuilder(doc);

// Insert a TC field at the current document builder position.
builder.insertField("TC \"Entry Text\" \\f t");

Often a specific line of text is designated for the TOC and is marked with a TC field. The easy way to do
this in MS Word is to highlight the text and press ALT+SHIFT+O. This automatically creates a TC field
using the selected text. The same technique can be accomplished through code. The code below will find
text matching the input and insert a TC field in the same position with the text. The code is based off the
same technique used in the article.
Example
Shows how to find and insert a TC field at text in a document.
Java
public void insertTCFieldsAtText() throws Exception {
Document doc = new Document();

// Insert a TC field which displays "Chapter 1" just before the text "The
Beginning" in the document.
doc.getRange().replace(Pattern.compile("The Beginning"), new
InsertTCFieldHandler("Chapter 1", "\\l 1"), false);
}

public class InsertTCFieldHandler implements IReplacingCallback {
// Store the text and switches to be used for the TC fields.
private String mFieldText;
private String mFieldSwitches;

/**
* The switches to use for each TC field. Can be an empty string or null.
*/
public InsertTCFieldHandler(String switches) throws Exception {
this(null, switches);
}

/**
* The display text and the switches to use for each TC field. Display text
Can be an empty string or null.
*/
public InsertTCFieldHandler(String text, String switches) throws Exception {
mFieldText = text;
mFieldSwitches = switches;
}

public int replacing(ReplacingArgs args) throws Exception {
// Create a builder to insert the field.
DocumentBuilder builder = new DocumentBuilder((Document)
args.getMatchNode().getDocument());
// Move to the first node of the match.
builder.moveTo(args.getMatchNode());

// If the user specified text to be used in the field as display text
then use that, otherwise use the
// match string as the display text.
String insertText;

if (!(mFieldText == null || "".equals(mFieldText)))
insertText = mFieldText;
else
insertText = args.getMatch().group();

// Insert the TC field before this node using the specified string as
the display text and user defined switches.
builder.insertField(java.text.MessageFormat.format("TC \"{0}\" {1}",
insertText, mFieldSwitches));

// We have done what we want so skip replacement.
return ReplaceAction.SKIP;
}


}


Modify a Table of Contents
Change the Formatting of Styles
The formatting of entries in the TOC do not use the original styles of the marked entries, instead each
level is formatted using an equivalent TOC style. For example the first level in the TOC is formatted with
the TOC1 style, the second level formatted with the TOC2 style and so on. This means that to change the
look of the TOC these styles must be modified. In Aspose.Words these styles are represented by the
locale independent StyleIdentifier.TOC1 through to StyleIdentifier.TOC9 and can be retrieved from the
Document.Styles collection using these identifiers.
Once the appropriate style of the document has been retrieved the formatting for this style can be
modified. Any changes to these styles will be automatically reflected on the TOCs in the document.
Example
Changes a formatting property used in the first level TOC style.
Java
Document doc = new Document();
// Retrieve the style used for the first level of the TOC and change the formatting of
the style.
doc.getStyles().getByStyleIdentifier(StyleIdentifier.TOC_1).getFont().setBold(true);


It is also useful to note that any direct formatting of a paragraph (defined on the paragraph itself and not
in the style) marked to be included the TOC will be copied over in the entry in the TOC. For example if
the Heading 1 style is used to mark content for the TOC and this style has Bold formatting while the
paragraph also has italic formatting directly applied to it. The resulting TOC entry will not be bold as that
is part of style formatting however it will be italic as this is directly formatted on the paragraph.
You can also control the formatting of the separators used between each entry and page number. By
default this is a dotted line which is spread across to the page numbering using a tab character and a right
tab stop lined up close to the right margin.
Using the Style class retrieved for the particular TOC level you want to modify, you can also modify how
these appear in the document.
To change how this appears firstly Style.ParagraphFormat must be called to retrieve the paragraph
formatting for the style. From this the tab stops can be retrieved by calling ParagraphFormat.TabStops
and the appropriate tab stop modified. Using this same technique the tab itself can be moved or removed
all together.
Example
Shows how to modify the position of the right tab stop in TOC related paragraphs.
Java
Document doc = new Document(getMyDir() + "Field.TableOfContents.doc");

// Iterate through all paragraphs in the document
for (Paragraph para : (Iterable<Paragraph>) doc.getChildNodes(NodeType.PARAGRAPH,
true))
{
// Check if this paragraph is formatted using the TOC result based styles.
This is any style between TOC and TOC9.
if (para.getParagraphFormat().getStyle().getStyleIdentifier() >=
StyleIdentifier.TOC_1 && para.getParagraphFormat().getStyle().getStyleIdentifier() <=
StyleIdentifier.TOC_9)
{
// Get the first tab used in this paragraph, this should be the tab
used to align the page numbers.
TabStop tab = para.getParagraphFormat().getTabStops().get(0);
// Remove the old tab from the collection.

para.getParagraphFormat().getTabStops().removeByPosition(tab.getPosition());
// Insert a new tab using the same properties but at a modified
position.
// We could also change the separators used (dots) by passing a
different Leader type
para.getParagraphFormat().getTabStops().add(tab.getPosition() - 50,
tab.getAlignment(), tab.getLeader());
}
}

doc.save(getMyDir() + "Field.TableOfContentsTabStops Out.doc");


Removing a Table of Contents from the Document
A table of contents can be removed from the document by removing all nodes found between the
FieldStart and FieldEnd node of the TOC field.
The code below demonstrates this. The removal of the TOC field is simpler than a normal field as we do
not keep track of nested fields. Instead we check the FieldEnd node is of type FieldType.FieldTOC which
means we have encountered the end of the current TOC. This technique can be used in this case without
worrying about any nested fields as we can assume that any properly formed document will have no fully
nested TOC field inside another TOC field.
Firstly the FieldStart nodes of each TOC are collected and stored. The specified TOC is then enumerated
so all nodes within the field are visited and stored. The nodes are then removed from the document.
Example
Demonstrates how to remove a specified TOC from a document.
Java
public void removeTOCFromDocument() throws Exception {
// Open a document which contains a TOC.
Document doc = new Document(getMyDir() + "Field.TableOfContents.doc");

// Remove the first table of contents from the document.
removeTableOfContents(doc, 0);

// Save the output.
doc.save(getMyDir() + "Field.TableOfContentsRemoveToc Out.doc");
}

/**
* Removes the specified table of contents field from the document.
*
* @param doc The document to remove the field from.
* @param index The zero-based index of the TOC to remove.
*/
public static void removeTableOfContents(Document doc, int index) throws Exception {
// Store the FieldStart nodes of TOC fields in the document for quick access.
ArrayList fieldStarts = new ArrayList();
// This is a list to store the nodes found inside the specified TOC. They will
be removed
// at the end of this method.
ArrayList nodeList = new ArrayList();

for (FieldStart start : (Iterable<FieldStart>)
doc.getChildNodes(NodeType.FIELD_START, true)) {
if (start.getFieldType() == FieldType.FIELD_TOC) {
// Add all FieldStarts which are of type FieldTOC.
fieldStarts.add(start);
}
}

// Ensure the TOC specified by the passed index exists.
if (index > fieldStarts.size() - 1)
throw new ArrayIndexOutOfBoundsException("TOC index is out of
range");

boolean isRemoving = true;
// Get the FieldStart of the specified TOC.
Node currentNode = (Node) fieldStarts.get(index);

while (isRemoving) {
// It is safer to store these nodes and delete them all at once
later.
nodeList.add(currentNode);
currentNode = currentNode.nextPreOrder(doc);

// Once we encounter a FieldEnd node of type FieldTOC then we know we
are at the end
// of the current TOC and we can stop here.
if (currentNode.getNodeType() == NodeType.FIELD_END) {
FieldEnd fieldEnd = (FieldEnd) currentNode;
if (fieldEnd.getFieldType() == FieldType.FIELD_TOC)
isRemoving = false;
}
}

// Remove all nodes found in the specified TOC.
for (Node node : (Iterable<Node>) nodeList) {
node.remove();
}
}
How to Extract or Remove Comments
Skip to end of metadata

Attachments:3
Added by hammad, last edited by Awais Hafeez on May 11, 2014 (view change)
Go to start of metadata

You can download the complete source code of the ProcessComments sample here.
Using Comments in a Word document (in addition to Track Changes) is a common practice when
reviewing documents, particularly when there are multiple reviewers. There can be situations where the
only thing you need from a document is the comments. Say you want to generate a list of review findings,
or perhaps you have collected all the useful information from the document and you simply want to
remove unnecessary comments. You may want to view or remove the comments of a particular reviewer.

In this sample we are going to look at some simple methods for both gathering information from the
comments within a document and for removing comments from a document. Specifically well cover how
to:
Extract all the comments from a document or only the ones made by a particular author.
Remove all the comments from a document or only from a particular author.
Solution
To illustrate how to extract and remove comments from a document, we will go through the following
steps:
1. Open a Word document using the Aspose.Words.Document class.
2. Get all comments from the document into a collection.
3. To extract comments:
1. Go through the collection using the foreach operator.
2. Extract and list the author name, date & time and text of all comments.
3. Extract and list the author name, date & time and text of comments written by a specific author,
in this case the author ks.
4. To remove comments:
1. Go backwards through the collection using the for operator.
2. Remove comments.
5. Save the changes.
Were going to use the following Word document for this exercise:


As you can see, it contains several Comments from two authors with the initials pm and ks.
The Code
The code in this sample is actually quite simple and all methods are based on the same approach. A
comment in a Word document is represented by a Comment object in the Aspose.Words document object
model. To collect all the comments in a document use the Document.GetChildNodes method with the
first parameter set to NodeType.Comment. Make sure that the second parameter of the
Document.GetChildNodes method is set to true: this forces the Document.GetChildNodes to select
from all child nodes recursively, rather than only collecting the immediate children.

The Document.GetChildNodes method is very useful and you can use it every time you need to get a list
of document nodes of any type. The resulting collection does not create an immediate overhead because
the nodes are selected into this collection only when you enumerate or access items in it.
Example
Extracts the author name, date&time and text of all comments in the document.
Java
static ArrayList extractComments(Document doc) throws Exception
{
ArrayList collectedComments = new ArrayList();
// Collect all comments in the document
NodeCollection comments = doc.getChildNodes(NodeType.COMMENT, true);
// Look through all comments and gather information about them.
for (Comment comment : (Iterable<Comment>) comments)
{
collectedComments.add(comment.getAuthor() + " " + comment.getDateTime() + " "
+ comment.toString(SaveFormat.TEXT));
}
return collectedComments;
}

After you have selected Comment nodes into a collection, all you have to do is extract the information
you need. In this sample, author initials, date, time and the plain text of the comment is combined into one
string; you could choose to store it in some other ways instead.

The overloaded method that extracts the Comments from a particular author is almost the same, it just
checks the authors name before adding the info into the array.
Example
Extracts the author name, date&time and text of the comments by the specified author.
Java
static ArrayList extractComments(Document doc, String authorName) throws Exception
{
ArrayList collectedComments = new ArrayList();
// Collect all comments in the document
NodeCollection comments = doc.getChildNodes(NodeType.COMMENT, true);
// Look through all comments and gather information about those written by the
authorName author.
for (Comment comment : (Iterable<Comment>) comments)
{
if (comment.getAuthor() == authorName)
collectedComments.add(comment.getAuthor() + " " + comment.getDateTime() +
" " + comment.toString(SaveFormat.TEXT));
}
return collectedComments;
}


If you are removing all comments, there is no need to move through the collection deleting comments one by
one; you can remove them by calling NodeCollection.Clear on the comments collection.
Example
Removes all comments in the document.
Java
static void removeComments(Document doc) throws Exception
{
// Collect all comments in the document
NodeCollection comments = doc.getChildNodes(NodeType.COMMENT, true);
// Remove all comments.
comments.clear();
}


When you need to selectively remove comments, the process becomes more similar to the code we used for
comment extraction.
Example
Removes comments by the specified author.
Java
static void removeComments(Document doc, String authorName) throws Exception
{
// Collect all comments in the document
NodeCollection comments = doc.getChildNodes(NodeType.COMMENT, true);
// Look through all comments and remove those written by the authorName author.
for (int i = comments.getCount() - 1; i >= 0; i--)
{
Comment comment = (Comment)comments.get(i);
if (comment.getAuthor() == authorName)
comment.remove();
}
}


The main point to highlight here is the use of the for operator. Unlike the simple extraction, here you want to
delete a comment. A suitable trick is to iterate the collection backwards from the last Comment to the first
one. The reason for this if you start from the end and move backwards, the index of the preceding items
remains unchanged, and you can work your way back to the first item in the collection.
Example
The demo-code that illustrates the methods for the comments extraction and removal.
Java
// Extract the information about the comments of all the authors.
for (String comment : (Iterable<String>) extractComments(doc))
System.out.print(comment);

// Remove comments by the "pm" author.
removeComments(doc, "pm");
System.out.println("Comments from \"pm\" are removed!");

// Extract the information about the comments of the "ks" author.
for (String comment : (Iterable<String>) extractComments(doc, "ks"))
System.out.print(comment);

// Remove all comments.
removeComments(doc);
System.out.println("All comments are removed!");

// Save the document.
doc.save(dataDir + "Test File Out.doc");

End Result
When launched, the sample displays the following results. First it lists all comments by all authors, then it
lists comments by the selected author only. Finally, the code removing all comments.

The output Word document has now comments removed from it:


How to Add a Comment
Skip to end of metadata

Added by Awais Hafeez, last edited by Awais Hafeez on Mar 30, 2014
Go to start of metadata

Comments of the document are represented by the Comment class. Use CommentRangeStart and
CommentRangeEnd classes to specify a region of text that is to be commented.
Example
Shows how to add a comment to a paragraph in the document.
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
builder.write("Some text is added.");

Comment comment = new Comment(doc, "Awais Hafeez", "AH", new Date());
builder.getCurrentParagraph().appendChild(comment);
comment.getParagraphs().add(new Paragraph(doc));
comment.getFirstParagraph().getRuns().add(new Run(doc, "Comment text."));

doc.save(MyDir + "out.docx");
Example
Shows how to anchor a comment to a region of text.
Document doc = new Document();

Paragraph para1 = new Paragraph(doc);
Run run1 = new Run(doc, "Some ");
Run run2 = new Run(doc, "text ");
para1.appendChild(run1);
para1.appendChild(run2);
doc.getFirstSection().getBody().appendChild(para1);

Paragraph para2 = new Paragraph(doc);
Run run3 = new Run(doc, "is ");
Run run4 = new Run(doc, "added ");
para2.appendChild(run3);
para2.appendChild(run4);
doc.getFirstSection().getBody().appendChild(para2);

Comment comment = new Comment(doc, "Awais Hafeez", "AH", new Date());
comment.getParagraphs().add(new Paragraph(doc));
comment.getFirstParagraph().getRuns().add(new Run(doc, "Comment text."));

CommentRangeStart commentRangeStart = new CommentRangeStart(doc, comment.getId());
CommentRangeEnd commentRangeEnd = new CommentRangeEnd(doc, comment.getId());

run1.getParentNode().insertAfter(commentRangeStart, run1);
run3.getParentNode().insertAfter(commentRangeEnd, run3);
commentRangeEnd.getParentNode().insertAfter(comment, commentRangeEnd);

doc.save(MyDir + "out.docx");
About Mail Merge in Aspose.Words
Skip to end of metadata

Added by hammad, last edited by Caroline von Schmalensee on Nov 12, 2013 (view change)
Go to start of metadata

Aspose.Words can generate documents from templates with mail merge fields. The data from an external
source like a database or file is placed into these fields and formatted, and the resulting document is saved in
the folder you specify.
Mail merge is a feature of Microsoft Word for quickly and easily creating documents like letters,
labels and envelopes. Aspose.Words takes the standard mail merge and advances it many steps ahead,
turning it into a full-fledged reporting solution that allows you to generate even more complex
documents such as reports, catalogs, inventories, and invoices.
The advantages of the Aspose.Words reporting solution are:
Design reports in Microsoft Word, use standard mail merge fields.
Define regions in the document that grow, such as detail rows of an order.
Insert images during mail merge.
Execute any custom logic, control formatting, or insert complex content using mail merge event
handlers.
Populate documents with data from any type of data source.
Working with Mail Merges
Steps to perform a mail merge are quite easy. First, you use Microsoft Word to create and design a Word
document normally called a template. Note that the document does not have to be a Microsoft Word
Template (.dot), it can be a normal .doc document. You insert some special fields called merge fields into
the template in places where you want data from your data source to be later inserted.
Then you open the document in Aspose.Words and execute a mail merge operation. The mail merge
operation will take data from your data source and merge it into the document. You can then save the
document in Word binary format (.doc) or any other format supported by Aspose.Words. You can save it
to a file or stream right to the client browser.
You can also designate repeatable merge regions in the document or insert special merge fields that allow
you to insert other content such as images.
Depending on how you set up mail merge fields and repeatable regions inside the document, the
document will grow to accommodate multiple records in your data source.
If you do not use mail merge regions, then the mail merge will be similar to Microsoft Word mail merge
and the whole document content will be repeated for each record in the data source.
Using repeatable mail merge regions, you can designate portions inside a document that will be repeated
for each record in the data source. For example, if you mark a table row as a repeatable region then this
table row will be repeated, causing the table to dynamically grow to accommodate all of your data.
Data Sources
Data can come from a source in a variety of formats supported by Java. It can be a ResultSet, an array of
values or from any custom data source. You can also set up a custom data source for mail merging by
implementing the IMailMergeDataSource interface. This allows data to be merged from any data source
including a list or business objects.
Data which comes from a ResultSet (the result of a database query) is wrapped into an Aspose.Words
class called DataTable . Many DataTable objects can be added to a DataSet. These classes mimic the
basic functionality of DataTable and DataSet in .NET and allow to more easily merge complex hieratical
data into documents.
The DataTable, DataSet and DataRelation classes are required because the ResultSet interface (a core
part of the Java framework) has certain limitations which make merging merging with mail mege with
regions and nested mail merge not possible on their own:
1. The ResultSet interface does not include any members which can serve as a table name. This is
required during mail merge to link the data to the corresponding region in the document.
2. There is no direct way to define relationships between other related ResultSet objects. This is a key
requirement to make nested mail merge work.
When executing nested mail merge (merging data straight from a hieratical data source), DataTable
sources can be added to a DataSet class and DataRelation instances added to define relationships
between the data.
Relationships between each DataTable object are defined by creating a new DataRelation and adding it
to the DataSet . The parameters required to be passed to the DataRelation :
1. The name of the relation as a string. This can be any name but it should describe the relationship it
represents by mentioning the table names that the relation represents.
2. The parent table name as a string.
3. The child table name as a string.
4. The primary key column or columns in the parent table as an array of strings. Each value from these
columns is linked to the child columns.
5. The primary key column or columns in the child table as an array of strings. Like above, these are linked to
the columns of the parent table.
The names of the primary key columns in the parent and child table are used to create the relation
between the two tables.
Considerations when Working with the DataTable and DataSet Classes
Using the DataTable and DataSet classes is analogous of working with disconnected data. All data
required for the full nested mail merge must be loaded in memory and must be made scrollable. This
means that each ResultSet must left open during the duration of mail merge and the
ResultSet.TYPE_SCROLL_INSENSITIVE.
The code below shows how to create a connection to a database which is scrollable.
Java
/**
* Utility function that creates a statement to the database.
*/
public static Statement createStatement() throws Exception
{
return mConnection.createStatement(ResultSet.TYPE_SCROLL_INSENSITIVE,
ResultSet.CONCUR_READ_ONLY);
}


Alternatively, if you require the connection to the database to be closed, load the ResultSet into an instance
of the CachedRowSet class. This will store the data from the ResultSet and allow you to close the
connection.
This code example shows how to store the results from a database query in a CachedRowSet so the
database connection can be closed.
Java
ResultSet resultSet = createStatement().executeQuery("SELECT * FROM Orders");
CachedRowSetImpl cached = new CachedRowSetImpl();
// This loads the data into a CachedResultSet. The connection can be closed after
this line.
cached.populate(resultSet);

// Load the cached data into a new DataTable.
DataTable orders = new DataTable(cached, "Orders");


If loading all data used for a nested mail merge into memory is not viable, use a custom mail merge data source
by creating a class implementing the IMailMergeDataSource interface.
Prepare a Document
Skip to end of metadata

Attachments:4
Added by hammad, last edited by Adam Skelton on Feb 03, 2012 (view change)
Go to start of metadata

Before you execute a mail merge, you need to prepare the document template. You should insert merge fields
that will be replaced with values from your data source.

Inserting Merge Fields into a Document:
1. Open your document in Microsoft Word.
2. In the document, click where you want to place a merge field.
3. Open the Insert menu and select Field ... to open the Field dialog.


4. From the Field names list, select MergeField .
5. In the Field name text box, enter a name for the merge field and press OK .


Now you have a new merge field placed in your document. Microsoft Word shows it like this:


Of course, since a merge field is a regular Microsoft Word field, you can switch between displaying
field codes and results in your document in Microsoft Word using the keyboard shortcut Alt+F9. Field
codes appear between curly braces:


Merge Field Formatting
Skip to end of metadata

Attachments:1
Added by hammad, last edited by Caroline von Schmalensee on Nov 12, 2013 (view change)
Go to start of metadata

If you want to format merged data, you need to format merge fields in the document as appropriate. Do not
format the data in the data source because its formatting is not retained when you merge the data into the
document.
This topic provides basic information about the merge field formatting. To learn the details, please
refer to the Microsoft Word documentation.
Change Text Formatting
1. In the main document, select the field that contains the information you want to format, including the
surrounding merge field characters ( ).
2. On the Format menu, click an option, such as Font or Paragraph, and select the desired options.
Change Capitalization
Merge fields in Microsoft Word support several options that affect how the data in the merge field is
capitalized. Aspose.Words honors those options. You can set capitalization options in the Field dialog
box in Microsoft Word.

Using Field Codes to Specify Formatting
Microsoft Word supports switches that control how numbers and dates are formatted and Aspose.Words
honors those switches.

In Microsoft Word, press Alt+F9 to display field codes in the main document, and then add switches to
the merge fields.
For example:
To display the number "34987.89" as "$34,987.89," add a numeric picture switch (\\\# $#,###.00).
To display the number "0945" as "9:45 PM," add the date/time picture switch (@ "h:mm am/pm").
To ensure that the merged information has the same font and point size you apply to the merge field, add
the \\\* MERGEFORMAT switch.
Please see the Microsoft Word documentation about field referenes to obtain more details about field
switches.
Simple Mail Merge Explained
Skip to end of metadata

Attachments:2
Added by hammad, last edited by Adam Skelton on Feb 03, 2012 (view change)
Go to start of metadata

In order to prepare your template to perform a simple mail merge (without regions, similar to the classic mail
merge available in Microsoft Word) you should just insert one or more merge fields in the places you want to
be populated with data from the data source.

Let us take a look at the Dinner Invitation demo. It creates a letter for a list of clients defined in the database.
The template contains a number of merge fields that are populated from two data sources; in other words,
two mail merges are performed one after the other. First, data from the first data source is merged into the
template. This data source contains only one row because this is information about the inviter, so the whole
document content is not repeated and only the appropriate fields are filled with data.
Then the second mail merge operation is executed. The data source it uses contains information about
the clients, and consists of multiple rows (the demo selects the top 5 clients living in the USA ).
Therefore, the whole template is repeated for each data row and every repeated copy is populated with
the corresponding client's data.


As a result, we have a document that consists of five filled-in, complete, and personalized invitation
letters (a fragment of the very first one is shown below).


As you can see, it is possible, and sometimes useful to perform more than one merge operation with
the same template to add data in stages.
You can insert NEXT fields in the Word document to cause the mail merge engine to select the next
record from the data source and continue merging. When the engine encounters a NEXT field, it just
selects the next record in the data source and continues merging without copying any content. This can
be used when creating documents such as mailing labels.
</div> </div> <fieldset class="hidden parameters"> <input type="hidden" title="i18n.done.name" value="Done">
<input type="hidden" title="i18n.manage.watchers.dialog.title" value="Manage Watchers"> <input type="hidden"
title="i18n.manage.watchers.unable.to.remove.error" value="Failed to remove watcher. Refresh page to see latest
status."> <input type="hidden" title="i18n.manage.watchers.status.adding.watcher" value="Adding
watcher&amp;hellip;"> </fieldset> <script type="text/x-template" title="manage-watchers-dialog"> <div
class="dialog-content"> <div class="column page-watchers"> <h3>Watching this page</h3> <p
class="description">These people are notified when the page is changed. You can add or remove people from this
list.</p> <form action="/docs/json/addwatch.action" method="POST"> <input type="hidden" name="atl_token"
value="88f116dba683705c6294c964bbb7e494c6392291"> <input type="hidden" name="pageId"
value="15860169"/> <input type="hidden" id="add-watcher-username" name="username" value=""/> <label
for="add-watcher-user">User</label> <input id="add-watcher-user" name="userFullName" type="search"
class="autocomplete-user" value="" placeholder="Full name or username" autocomplete="off" data-max="10"
data-target="#add-watcher-username" data-dropdown-target="#add-watcher-dropdown" data-template="{title}"
data-none-message="No matching users found."> <input id="add-watcher-submit" type="submit" name="add"
value="Add"> <div id="add-watcher-dropdown" class="aui-dd-parent autocomplete"></div> <div class="status
hidden"></div> </form> <ul class="user-list"> <li class="loading">Loading&hellip;</li> <li class="no-users
hidden">No page watchers</li> </ul> </div> <div class="column space-watchers"> <h3>Watching this space</h3>
<p class="description">These people are notified when any content in the space is changed. You cannot modify
this list.</p> <ul class="user-list"> <li class="loading">Loading&hellip;</li> <li class="no-users hidden">No space
watchers</li> </ul> </div> </div> </script> <script type="text/x-template" title="manage-watchers-user"> <li
class="watch-user"> <img class="profile-picture confluence-userlink" src="{iconUrl}" data-
username="{username}"> <a class="confluence-userlink" href="{url}" data-username="{username}">{fullName}
<span class="username">({username})</span></a> <span class="remove-watch" title="Remove" data-
username="{username}">Remove</span> </li> </script> <script type="text/x-template" title="manage-watchers-
help-link"> <div class="dialog-help-link"> <a href="http://docs.atlassian.com/confluence/docs-
34/Managing+Watchers" target="_blank">Help</a> </div> </script> <br class="clear"> </div> </div> </div> <div
id="footer"> <ul id="poweredby"> <li class="noprint"> Aspose 2002-2014. All Rights Reserved.</li> </ul> </div>
<!-- include system javascript resources --> <!-- end system javascript resources --> </div> </body> </html>
Mail Merge with Regions Explained
Skip to end of metadata

Attachments:2
Added by hammad, last edited by Adam Skelton on Jun 06, 2013 (view change)
Go to start of metadata

If you want to dynamically grow portions inside the document, use mail merge with regions. To specify a mail
merge region in the document you need to insert two mail merge fields to mark the beginning and end of the
mail merge region. All document content that is included inside a mail merge region automatically will be
repeated for every record in the data source (in most cases this is a table).

To mark the beginning of a mail merge region, insert a MERGEFIELD with the name TableStart:MyTable , where
MyTable corresponds to the name of the table. To mark the end of the mail merge region insert another
MERGEFIELD with the name TableEnd:MyTable . Between these marking fields, place merge fields that
correspond to the fields of your data source (table columns). These merge fields will be populated with data
from the first row of the data source, then the whole region will be repeated, and the new fields will be
populated with data from the second row, and so on.
Follow these simple rules when marking a region:
TableStart and TableEnd fields must be inside the same section in the document.
If used inside a table, TableStart and TableEnd must be inside the same row in the table.
Mail merge regions can be nested inside each other.
Mail merge regions should be well formed (there is always a pair of matching TableStart and TableEnd
with the same table name).
Duplicate regions with the same name are allowed. To allow merging of duplicate regions set
MailMerge.MergeDuplicateRegions property to true. For backward compatibility reasons if you are
merging using a IMailMergeDataSourceRoot or a DataSet datasource then duplicate regions will be
merged automatically regardless of the value of the MailMerge.MergeDuplicateRegions option.
As an example, have a look at the Product Catalog demo. Here is a fragment of a region prepared for
mail merge:


You can see a mail merge region defined for populating with data from the Products table. Note that
both the marking fields TableStart:Products and TableEnd:Products are placed inside the same row of
the Word table.
After executing the mail merge, here is the result:


Mail Merge with Regions Explained
Skip to end of metadata

Attachments:2
Added by hammad, last edited by Adam Skelton on Jun 06, 2013 (view change)
Go to start of metadata

If you want to dynamically grow portions inside the document, use mail merge with regions. To specify a mail
merge region in the document you need to insert two mail merge fields to mark the beginning and end of the
mail merge region. All document content that is included inside a mail merge region automatically will be
repeated for every record in the data source (in most cases this is a table).

To mark the beginning of a mail merge region, insert a MERGEFIELD with the name TableStart:MyTable , where
MyTable corresponds to the name of the table. To mark the end of the mail merge region insert another
MERGEFIELD with the name TableEnd:MyTable . Between these marking fields, place merge fields that
correspond to the fields of your data source (table columns). These merge fields will be populated with data
from the first row of the data source, then the whole region will be repeated, and the new fields will be
populated with data from the second row, and so on.
Follow these simple rules when marking a region:
TableStart and TableEnd fields must be inside the same section in the document.
If used inside a table, TableStart and TableEnd must be inside the same row in the table.
Mail merge regions can be nested inside each other.
Mail merge regions should be well formed (there is always a pair of matching TableStart and TableEnd
with the same table name).
Duplicate regions with the same name are allowed. To allow merging of duplicate regions set
MailMerge.MergeDuplicateRegions property to true. For backward compatibility reasons if you are
merging using a IMailMergeDataSourceRoot or a DataSet datasource then duplicate regions will be
merged automatically regardless of the value of the MailMerge.MergeDuplicateRegions option.
As an example, have a look at the Product Catalog demo. Here is a fragment of a region prepared for
mail merge:


You can see a mail merge region defined for populating with data from the Products table. Note that
both the marking fields TableStart:Products and TableEnd:Products are placed inside the same row of
the Word table.
After executing the mail merge, here is the result:


</div> </div> <fieldset class="hidden parameters"> <input type="hidden" title="i18n.done.name" value="Done">
<input type="hidden" title="i18n.manage.watchers.dialog.title" value="Manage Watchers"> <input type="hidden"
title="i18n.manage.watchers.unable.to.remove.error" value="Failed to remove watcher. Refresh page to see latest
status."> <input type="hidden" title="i18n.manage.watchers.status.adding.watcher" value="Adding
watcher&amp;hellip;"> </fieldset> <script type="text/x-template" title="manage-watchers-dialog"> <div
class="dialog-content"> <div class="column page-watchers"> <h3>Watching this page</h3> <p
class="description">These people are notified when the page is changed. You can add or remove people from this
list.</p> <form action="/docs/json/addwatch.action" method="POST"> <input type="hidden" name="atl_token"
value="88f116dba683705c6294c964bbb7e494c6392291"> <input type="hidden" name="pageId"
value="15860170"/> <input type="hidden" id="add-watcher-username" name="username" value=""/> <label
for="add-watcher-user">User</label> <input id="add-watcher-user" name="userFullName" type="search"
class="autocomplete-user" value="" placeholder="Full name or username" autocomplete="off" data-max="10"
data-target="#add-watcher-username" data-dropdown-target="#add-watcher-dropdown" data-template="{title}"
data-none-message="No matching users found."> <input id="add-watcher-submit" type="submit" name="add"
value="Add"> <div id="add-watcher-dropdown" class="aui-dd-parent autocomplete"></div> <div class="status
hidden"></div> </form> <ul class="user-list"> <li class="loading">Loading&hellip;</li> <li class="no-users
hidden">No page watchers</li> </ul> </div> <div class="column space-watchers"> <h3>Watching this space</h3>
<p class="description">These people are notified when any content in the space is changed. You cannot modify
this list.</p> <ul class="user-list"> <li class="loading">Loading&hellip;</li> <li class="no-users hidden">No space
watchers</li> </ul> </div> </div> </script> <script type="text/x-template" title="manage-watchers-user"> <li
class="watch-user"> <img class="profile-picture confluence-userlink" src="{iconUrl}" data-
username="{username}"> <a class="confluence-userlink" href="{url}" data-username="{username}">{fullName}
<span class="username">({username})</span></a> <span class="remove-watch" title="Remove" data-
username="{username}">Remove</span> </li> </script> <script type="text/x-template" title="manage-watchers-
help-link"> <div class="dialog-help-link"> <a href="http://docs.atlassian.com/confluence/docs-
34/Managing+Watchers" target="_blank">Help</a> </div> </script> <br class="clear"> </div> </div> </div> <div
id="footer"> <ul id="poweredby"> <li class="noprint"> Aspose 2002-2014. All Rights Reserved.</li> </ul> </div>
<!-- include system javascript resources --> <!-- end system javascript resources --> </div> </body> </html>
How to Execute Simple Mail Merge
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Feb 03, 2012 (view change)
Go to start of metadata

After you have the template properly prepared, you are ready to run mail merge. Use the MailMerge object
methods to execute it. The MailMerge object is returned by the Document.MailMerge property.

Call MailMerge.Execute passing it a data source object to perform a simple mail merge. Here is a list of the
data objects acceptable by the MailMerge.Execute overloads:
DataTable . Fills mail merge fields in the document with values from a DataTable .
IMailMergeDataSource . You can pass any object to this method that implements the
IMailMergeDataSource interface. This allows you to merge data from custom data sources such as
business objects, hashtables or lists.
A pair of arrays, one of which represents a set of the field names (array of strings), and another that
represents a set of the corresponding field values (array of objects). Note that the number of array
elements must be the same in both of the arrays.
Note that a simple mail merge done using MailMerge.Execute ignores fields that are inside mail
merge regions. Only merge fields that are not inside any mail merge region are populated.
Field names are not case sensitive. If a field name is not found in the document but is encountered in
the data source, it is ignored.
Let us take an example. Imagine that you need to create a personalized letter filled with the data
entered by the user in your application. You prepare the template accordingly by inserting merge
fields named Company , Address , Address2 , and so on. Then you create two arrays and pass them to
MailMerge.Execute .
Example
Performs a simple insertion of data into merge fields.
Java
// Open an existing document.
Document doc = new Document(getMyDir() + "MailMerge.ExecuteArray.doc");

// Fill the fields in the document with user data.
doc.getMailMerge().execute(
new String[] {"FullName", "Company", "Address", "Address2", "City"},
new Object[] {"James Bond", "MI5 Headquarters", "Milbank", "", "London"});

doc.save(getMyDir() + "MailMerge.ExecuteArray Out.doc");

How to Execute Mail Merge with Regions
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Feb 03, 2012 (view change)
Go to start of metadata

Performing mail merge with regions is as easy as one without regions. Just pass a data source object containing
data rows to the MailMerge.ExecuteWithRegions method. You can even use a DataSet object to execute a
mail merge for several regions filling each of them with the data from a separate table. Here is the list of the
acceptable objects:
ResultSet . A ResultSet must be wrapped in a DataTable class in order to define the table name.
DataSet A set of DataTable objects can be wrapped into a single DataSet and relations defined
between each table. This allows tables with hierarchical structure to be mail merged using nested mail
merge.
IMailMergeDataSource . You can pass any object to this method that implements the
IMailMergeDataSource interface. This allows you to merge data into mail merge regions from custom
data sources such as business objects, hashtables or lists.
Merging Data from a ResultSet
In Aspose.Words a ResultSet object produced from a database query is passed to the mail merge engine
by first wrapping it in a new instance of the DataTable class. The DataTable constructor accepts a
ResultSet object and also the name of the table as a string.

Each ResultSet must be wrapped in a new instance of a DataTable object in order to be merged. The
DataTable class provides the bare minimum functionality required for nested mail merge when this type
of data source is used.
Using the DataTable class, mail merge can be executed directly by passing it to the
MailMerge.ExecuteWithRegions(DataTable) method or the DataTable can be added to a DataSet
along with other DataTable objects which how nested mail merge is achieved. More details about the
DataSet class are covered in the next sections of the documentation.
Since the DataTable is a wrapper for the ResultSet it also provides members which allow you to access
the contained data.
The DataTable.ResultSet property is used to retrieve the contained ResultSet of the DataTable .
The DataTable.DataSet property is used to retrieve the parent DataSet which contains this DataTable.
This will return null for a DataTable which does not belong to a DataSet .
The DataTable.TableName property is used to retrieve the table name.
Example
Executes a mail merge with repeatable regions.
Java
public void executeWithRegionsDataTable() throws Exception
{
Document doc = new Document(getMyDir() + "MailMerge.ExecuteWithRegions.doc");

int orderId = 10444;

// Perform several mail merge operations populating only part of the document each
time.

// Use DataTable as a data source.
// The table name property should be set to match the name of the region defined
in the document.
com.aspose.words.DataTable orderTable = getTestOrder(orderId);
doc.getMailMerge().executeWithRegions(orderTable);

com.aspose.words.DataTable orderDetailsTable = getTestOrderDetails(orderId,
"ExtendedPrice DESC");
doc.getMailMerge().executeWithRegions(orderDetailsTable);

doc.save(getMyDir() + "MailMerge.ExecuteWithRegionsDataTable Out.doc");
}

private static com.aspose.words.DataTable getTestOrder(int orderId) throws Exception
{
java.sql.ResultSet resultSet = executeDataTable(java.text.MessageFormat.format(
"SELECT * FROM AsposeWordOrders WHERE OrderId = {0}",
Integer.toString(orderId)));

return new com.aspose.words.DataTable(resultSet, "Orders");
}

private static com.aspose.words.DataTable getTestOrderDetails(int orderId, String
orderBy) throws Exception
{
StringBuilder builder = new StringBuilder();

builder.append(java.text.MessageFormat.format(
"SELECT * FROM AsposeWordOrderDetails WHERE OrderId = {0}",
Integer.toString(orderId)));

if ((orderBy != null) && (orderBy.length() > 0))
{
builder.append(" ORDER BY ");
builder.append(orderBy);
}

java.sql.ResultSet resultSet = executeDataTable(builder.toString());
return new com.aspose.words.DataTable(resultSet, "OrderDetails");
}

/**
* Utility function that creates a connection, command,
* executes the command and return the result in a DataTable.
*/
private static java.sql.ResultSet executeDataTable(String commandText) throws
Exception
{
Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");// Loads the driver

// Open the database connection.
String connString = "jdbc:odbc:DRIVER={Microsoft Access Driver (*.mdb)};" +
"DBQ=" + getDatabaseDir() + "Northwind.mdb" + ";UID=Admin";

// From Wikipedia: The Sun driver has a known issue with character encoding and
Microsoft Access databases.
// Microsoft Access may use an encoding that is not correctly translated by the
driver, leading to the replacement
// in strings of, for example, accented characters by question marks.
//
// In this case I have to set CP1252 for the european characters to come through
in the data values.
java.util.Properties props = new java.util.Properties();
props.put("charSet", "Cp1252");

// DSN-less DB connection.
java.sql.Connection conn = java.sql.DriverManager.getConnection(connString,
props);

// Create and execute a command.
java.sql.Statement statement = conn.createStatement();
return statement.executeQuery(commandText);
}

How to Use Nested Mail Merge Regions
Skip to end of metadata

Attachments:7
Added by hammad, last edited by Awais Hafeez on May 11, 2014 (view change)
Go to start of metadata

You can download the complete source code of the NestedMailMerge sample here.
Most data in relational databases or XML files is hierarchical (e.g. with parent-child relationships). The
most common example is an invoice or an order containing multiple items.

Aspose.Words allows nesting mail merge regions inside each other in a document to reflect the way the
data is nested and this allows you to easily populate a document with hierarchical data.
This article details the steps of how to set up a working nested mail merge application to generate a
collection of invoices where each contain multiple items. An example project with complete source code
and files can be downloaded. The process and code will be explained step by step and common issues
addressed at the end of the article.
What are Nested Mail Merge Regions and When Would I use Them?



Nested mail merge regions are at least two regions in which one is defined entirely inside the other, so
they are nested in one another. In a document it looks like this:

Just as in standard mail merge each region contains the data from one table. Whats different in nested
mail merge is that the Order region has the Item region nested inside it. This makes the Order region the
parent and the Item region the child. This means that when the data is merged from the data source, the
regions will act just like a parent-child relationship where data coming from the Order table is linked to
the Item table.
The example below shows the data being passed to the nested merge regions and the output that is
generated by the merge.


As you can see, each order from the Order table is inserted followed by each item from the Item table
that is related to that order. Then the next order will be inserted along with their items until all the orders
and items are listed.
Step 1 Create the Template
This is the same process as creating a standard mail merge document with regions. Remember that with
mail merge regions we can have the same field name in different regions so there is no need to change
any column names. Here is what our Word template looks like:



There are a few things you need to consider when preparing nested mail merge regions and merge regions
in general.
The mail merge region opening and closing tag (e.g. TableStart:Order, TableEnd:Order) both need to
appear in the same row or cell. For example, if you start a merge region in a cell of a table, you must end
the merge region in the same row as the first cell.
The names of the columns in the DataTable must match the merge field name. Unless you have specified
mapped fields the merge will not be successful for those fields whose names are different.
The opening and closing table tags need to be well formed . This means that the StartTable and EndTable
table tags must match. An incorrectly formed region will cause all nested mail merge regions to stop
displaying anything at all.
If one of these rules is broken the program may produce unexpected results or an exception may be
thrown.
Step 2 Create the Data Source
The data to be merged into the template can come from a variety of sources, mainly relational databases
or XML documents. In our example we are going to use a Microsoft Access database to store our data and
load each table from the database into DataTable objects. Nested mail merge will be executed using this
DataSet.

The orders data is contained within the database as shown below.


These files should be included in our project folder:

Step 3 Ensure Correct Table Names and Relationships Exist Between Tables
For Aspose.Words to perform nested mail merge correctly, the following requirements must be met:
1. The names of the mail merge regions in the document must match the names of the DataTables
populated from the data source.
2. The nesting order of mail merge regions in the document must match the data relationships between the
tables in the data source.
Since our data is coming from a database, the data is expected to be represented in a ResultSet object.
Aspose.Words includes special classes used when mail merging from data stored in ResultSet objects.
Each ResultSet is wrapped into its own DataTable object. These DataTable objects are added to a
DataSet and realtions between each DataTable defined. This is the basis of how nested mail merge
works.
For further information about setting up data relations see the following article in the documentation here
.
Step 4 Prepare the Code
The code for setting up nested mail merge is simple to implement with Aspose.Words. Remember when
setting up your project:
To include the reference to Aspose.Words.
Load each table from the data source into appropriate tables.
Load the data into a DataSet object.
We create a Document object which loads our invoice template. Then Aspose.Words merges the data
from our DataSet and fills the document with data. Then we save the results into a document in the
desired format. Here is the complete code for our project:
Example
Shows how to generate an invoice using nested mail merge regions.
Java
public static void main(String[] args) throws Exception
{
// Sample infrastructure.
URI exeDir = Program.class.getResource("").toURI();
String dataDir = new File(exeDir.resolve("../../Data")) + File.separator;

// Create the dataset which will hold each DataTable used for mail merge.
DataSet pizzaDs = new DataSet();

// Create a connection to the database
createConnection(dataDir);

// Populate each DataTable from the database. Each query which return a ResultSet
object containing the data from the table.
// This ResultSet is wrapped into an Aspose.Words implementation of the DataTable
class and added to a DataSet.
DataTable orders = new DataTable(executeQuery("SELECT * from Orders"), "Orders");
pizzaDs.getTables().add(orders);

DataTable itemDetails = new DataTable(executeQuery("SELECT * from Items"),
"Items");
pizzaDs.getTables().add(itemDetails);

// In order for nested mail merge to work, the mail merge engine must know the
relation between parent and child tables.
// Add a DataRelation to specify relations between these tables.
pizzaDs.getRelations().add(new DataRelation(
"OrderToItemDetails",
"Orders",
"Items",
new String[]{"OrderID"},
new String[]{"OrderID"}));

// Open the template document.
Document doc = new Document(dataDir + "Invoice Template.doc");

// Execute nested mail merge with regions
doc.getMailMerge().executeWithRegions(pizzaDs);

// Save the output to disk
doc.save(dataDir + "Invoice Out.doc");

}

/**
* Executes a query to the demo database using a new statement and returns the result
in a ResultSet.
*/
protected static ResultSet executeQuery(String query) throws Exception
{
return createStatement().executeQuery(query);
}

/**
* Utility function that creates a connection to the Database.
*/
public static void createConnection(String dataDir) throws Exception
{
// Load a DB driver that is used by the demos
Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");

// The path to the database on the disk.
File dataBase = new File(dataDir, "InvoiceDB.mdb");

// Compose connection string.
String connectionString = "jdbc:odbc:DRIVER={Microsoft Access Driver (*.mdb)};" +
"DBQ=" + dataBase + ";UID=Admin";
// Create a connection to the database.
mConnection = DriverManager.getConnection(connectionString);
}

private static Connection mConnection;

The End Result
Here is the resulting Microsoft Word document produced after running the code:



Scrolling through the document produced you can see that the nested mail merge was successful. Each
orders details are generated including the corresponding items purchased.
Common I ssues When Developing using Nested Mail Merge

Q: When using nested mail merge, the generated output has no fields that are merged; instead the original
name of the merge field just stays the same?

A: Check the data is being loaded properly into tables. The tables should have their TableName property set, a
primary key and a relationship defined.
Additionally check that the merge fields are named properly. To do this press Alt+ F9 in Microsoft
Word and make sure the name in the merge fields matches the columns in the tables. Try to use the
following code in your project to ensure merge fields are being loaded in correctly. It gathers an array
of strings that contain the names of the merge fields that are loaded into the document.
Example
Shows how to get names of all merge fields in a document.
Java
String[] fieldNames = doc.getMailMerge().getFieldNames();


Q: The output of nested merging displays no data from the child table for the first entry in the parent table but
displays all items for the last entry in the parent table, even ones that are not actually linked to it?

A: This happens when the merge regions in the template document are not correctly formed. Check step 1 of
this documentation for information on the structure nested merge regions.

Q: Each entry from the parent table displays every item in the child table, even ones that are not actually linked
to it?

A: This happens when the relationship between the parent and child tables are set up incorrectly or not at all.
Check step 2 for information on how to setup a DataRelation .
How to Set up Relations for use in Nested
Mail Merge with Regions
Skip to end of metadata

Attachments:1
Added by hammad, last edited by Adam Skelton on Feb 03, 2012 (view change)
Go to start of metadata

When using executing mail merge with nested regions there must be relationships present between parent and
child data in order for the process to work correctly. Skipping this important step is one of the most common
causes of nested mail merge to fail.

Even though setting up relations between data tables related is a requirement when using mail merge with
regions, there are some cases in which relations could be set automatically for you. This depends on the data
source being used.
This article explains perform nested mail merge by providing detailed instructions on how to set up
relations for each particular type of data source.


Explicitly Creating Relations between DataTables in a DataSet
Each DataTable object wraps a ResultSet containing the data from your data source. A related collection
of these DataTable objects can be added to a DataSet class and the DataRelation created to describe the
relations between each of these tables.

The code snippet below demonstrates how to create relationship between the two data tables Player and
Country within the DataSet called dataSource.
Example
Shows how to create a simple DataRelation for use in nested mail merge.
Java
dataSet.getRelations().add(new DataRelation("OrderToItem", orderTable.getTableName(),
itemTable.getTableName(), new String[] {"Order_Id"}, new String[] {"Order_Id"}));

Creating Relations between I MailMergeDataSource Objects
Implementation of the IMailMergeDataSource interface provides you an easy use your business objects
as a data source for nested mail merge. You can also use nested mail merge with these objects as well.
This is achieved by implementing the IMailMergeDatasource.GetChildDataSource method.
Implementation of the IMailMergeDataSource interface provides you an easy use your business objects
as a data source for nested mail merge. You can also use nested mail merge with these objects as well.
This is achieved by implementing the IMailMergeDatasource.GetChildDataSource method.
The details of how to fully implement this interface into your project can be found in the API
documentation for the IMailMergeDataSource interface. According to the focus of this article, we are
only going to concentrate on the implementation of relationships between these types of objects and not
any further implementation.
The code example below shows how to use IMailMergeDataSource.GetChildDataSource to provide
relationships between parent and child data. This method is called by mail the mail merge engine
whenever a nested region is encountered in the current parent region. This method is invoked in the parent
IMailMergeDataSource which is handling the region so it can return the appropriate child data for the
current parent record.
Example
Shows how to get a child collection of objects by using the GetChildDataSource method in the parent
class.
Java
public IMailMergeDataSource getChildDataSource(String tableName)
{
// Get the child collection to merge it with the region provided with tableName
variable.
if(tableName.equals("Order"))
return new OrderMailMergeDataSource(mCustomers.get(mRecordIndex).getOrders());
else
return null;
}

The IMailMergeDataSource.GetChildDataSource method returns the child data related on the
particular parent index. You are expected to return the correct child data from your custom objects based
on the given table name and current position of the parent data. In this case this achieved by retrieving the
order collection based on the current customer index. If this class is used in mail merge without nested
regions or in simple mail merge, this method should simply return null.

Also note that the parent region might have one or more related child sets. These are differentiated using
the supplied table name parameter. Each child data source should be derived from the
IMailMergeDataSource interface as well.
How to Mail Merge from XML using
IMailMergeDataSource
Skip to end of metadata

Attachments:2
Added by hammad, last edited by Awais Hafeez on May 11, 2014 (view change)
Go to start of metadata

You can download the complete source code of the XmlMailMerge sample here.
Given the widespread use and support of the XML markup language, the ability to run a mail merge from
an XML file to a Word template document has become a common requirement.

This article provides a simple example of how, using Aspose.Words, you can execute mail merge from
XML using a custom data source which implements the IMailMergeDataSource interface .
Solution
To achieve this, we will implement our own custom data source which reads the parsed XML sored in
memory. When mail merge is executed our class is requested to return values for each of the fields in the
document. The values from the XML is read and passed to the mail merge engine to be merged into the
document.

Well use this simple XML file which contains the customer information we want to use in the mail
merge.
XML
<?xml version="1.0" encoding="utf-8"?> <customers> <customer Name="John Ben Jan"
ID="1" Domain="History" City="Boston"/> <customer Name="Lisa Lane" ID="2"
Domain="Chemistry" City="LA"/> <customer Name="Dagomir Zits" ID="3"
Domain="Heraldry" City="Milwaukee"/> <customer Name="Sara Careira Santy" ID="4"
Domain="IT" City="Miami"/> </customers>

Note that the structure of the XML document can also be varied and the data will still be read correctly.
This allows different types of XML documents to be merged easily. The XML can be changed so that
each table represented as an element in the XML with each field of the table being a child element and the
field value being the text node of this element.
Heres our sample Word template document. The Name, ID, Domain and City fields have been set up as
merge fields, and correspond to the nodes in the XML file.

To execute mail merge with data from an XML data source we will :
1. Load the XML into memory.
2. Pass the data to a new instance of the XmlMailMergeDataTable class which is included with this sample.
3. Run the Aspose.Words MailMerge.Execute method.
Its really pretty simple. Using Aspose.Words, the mail merge operation will replace the merge fields in
the document with the values from the XML file.
The Code
Make sure in the Word template that you have set up merge fields wherever you want the data inserted.

Firstly, we store the XML file from disk into memory by parsing it and storing it in a
org.w3c.dom.Document object.
This object which represents the XML is passed to the XmlMailMergeDataTable class. This class is the
middle-man between the data source and the mail merge engine, allowing data from the XML represented
in memory to be passed the mail merge engine and merged into the document.
Then we open the template document, and run the mail merge on the XmlMailMergeDataTable using the
Aspose.Words Mail Merge object.
Example
Shows how to execute mail merge using an XML data source by implementing IMailMergeDataSource.
Java
package XMLMailMerge;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import java.io.File;
import java.net.URI;

import com.aspose.words.Document;

/**
* This sample demonstrates how to execute mail merge with data from an XML data
source. The XML file is read into memory,
* stored in a DOM and passed to a custom data source implementing
IMailMergeDataSource. This returns each value from XML when
* called by the mail merge engine.
*/
class Program
{
public static void main(String[] args) throws Exception
{
// Sample infrastructure.
URI exeDir = Program.class.getResource("").toURI();
String dataDir = new File(exeDir.resolve("../../Data")) + File.separator;

// Use DocumentBuilder from the javax.xml.parsers package and Document class
from the org.w3c.dom package to read
// the XML data file and store it in memory.
DocumentBuilder db =
DocumentBuilderFactory.newInstance().newDocumentBuilder();
// Parse the XML data.
org.w3c.dom.Document xmlData = db.parse(dataDir + "Customers.xml");

// Open a template document.
Document doc = new Document(dataDir + "TestFile.doc");

// Note that this class also works with a single repeatable region (and any
nested regions).
// To merge multiple regions at the same time from a single XML data source,
use the XmlMailMergeDataSet class.
// e.g doc.getMailMerge().executeWithRegions(new
XmlMailMergeDataSet(xmlData));
doc.getMailMerge().execute(new XmlMailMergeDataTable(xmlData, "customer"));

// Save the output document.
doc.save(dataDir + "TestFile Out.doc");
}
}

The XmlMailMergeDataTable class is a custom data source implementing IMailMergeDataSource. The
code for this class is provided below. The IMailMergeDataSource interface allows you to manually
define where the data used for mail merge comes from. In this case the data is read from the XML file
loaded into memory. The details of how classes implementing this interface works are not explained in
full here but can be found in the API documentation for the IMailMergeDataSource class.

The general process that the XmlMailMergeDataTable class employs when providing data to the mail
merge engine involves iterating over the nodes in the DOM and extracting the appropriate values with
each record to be merged. The DOM represents XML tags as nodes and elements and when the mail
merge engine requests the value of a field the data is extracted from the currrent node and the value
returned.
When the record for the table has finished the mail merge engine instructs the pointer to be moved
forward and the current node is moved to the next sibling.
If mail merge with regions is used along with nested regions then the
IMailMergeDataSource.GetChildDataSource method is called. A new instance of
XmlMailMergeDataTable is created with the root node being the child node matching the first record of
the table name.
Example
Shows how to create a class implementing IMailMergeDataSource which allows data to be mail merged
from an XML document.
Java
package XMLMailMerge;

import com.aspose.words.IMailMergeDataSource;
import org.w3c.dom.Element;
import org.w3c.dom.Node;

import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;
import java.util.HashMap;

/**
* A custom mail merge data source that allows you to merge data from an XML document
into Word templates.
* This class demonstrates how data can be read from a custom data source (XML parsed
and loaded into a DOM) and merged
* into a document using the IMailMergeDataSource interface.
*
* An instance of this class represents a single table in the data source and in the
template.
* Note: We are using the Document and Node class from the org.w3c.dom package here
and not from Aspose.Words.
*/
public class XmlMailMergeDataTable implements IMailMergeDataSource
{
/**
* Creates a new XmlMailMergeDataSource for the specified XML document and table
name.
*
* @param xmlDoc The DOM object which contains the parsed XML data.
* @param tableName The name of the element in the data source where the data of
the region is extracted from.
*/
public XmlMailMergeDataTable(org.w3c.dom.Document xmlDoc, String tableName) throws
Exception
{
this(xmlDoc.getDocumentElement(), tableName);
}

/**
* Private constructor that is also called by GetChildDataSource.
*/
private XmlMailMergeDataTable(Node rootNode, String tableName) throws Exception
{
mTableName = tableName;

// Get the first element on this level matching the table name.
mCurrentNode = (Node)retrieveExpression("./" + tableName).evaluate(rootNode,
XPathConstants.NODE);
}

/**
* The name of the data source. Used by Aspose.Words only when executing mail
merge with repeatable regions.
*/
public String getTableName()
{
return mTableName;
}

/**
* Aspose.Words calls this method to get a value for every data field.
*/
public boolean getValue(String fieldName, Object[] fieldValue) throws Exception
{
// Attempt to retrieve the child node matching the field name by using XPath.
Node value = (Node)retrieveExpression(fieldName).evaluate(mCurrentNode,
XPathConstants.NODE);
// We also look for the field name in attributes of the element node.
Element nodeAsElement = (Element)mCurrentNode;

if (value != null)
{
// Field exists in the data source as a child node, pass the value and
return true.
// This merges the data into the document.
fieldValue[0] = value.getTextContent();
return true;
}
else if (nodeAsElement.hasAttribute(fieldName))
{
// Field exists in the data source as an attribute of the current node,
pass the value and return true.
// This merges the data into the document.
fieldValue[0] = nodeAsElement.getAttribute(fieldName);
return true;
}
else
{
// Field does not exist in the data source, return false.
// No value will be merged for this field and it is left over in the
document.
return false;
}
}

/**
* Moves to the next record in a collection. This method is a little different
then the regular implementation as
* we are walking over an XML document stored in a DOM.
*/
public boolean moveNext()
{
if (!isEof())
{
// Don't move to the next node if this the first record to be merged.
if (!mIsFirstRecord)
{
// Find the next node which is an element and matches the table name
represented by this class.
// This skips any text nodes and any elements which belong to a
different table.
do
{
mCurrentNode = mCurrentNode.getNextSibling();
}
while ((mCurrentNode != null) &&
!(mCurrentNode.getNodeName().equals(mTableName) && (mCurrentNode.getNodeType() ==
Node.ELEMENT_NODE)));
}
else
{
mIsFirstRecord = false;
}
}

return (!isEof());
}

/**
* If the data source contains nested data this method will be called to retrieve
the data for
* the child table. In the XML data source nested data this should look like this:
*
* <Tables>
* <ParentTable>
* <Name>ParentName</Name>
* <ChildTable>
* <Text>Content</Text>
* </ChildTable>
* </ParentTable>
* </Tables>
*/
public IMailMergeDataSource getChildDataSource(String tableName) throws Exception
{
return new XmlMailMergeDataTable(mCurrentNode, tableName);
}

private boolean isEof()
{
return (mCurrentNode == null);
}

/**
* Returns a cached version of a compiled XPathExpression if available, otherwise
creates a new expression.
*/
private XPathExpression retrieveExpression(String path) throws Exception
{
XPathExpression expression;

if(mExpressionSet.containsKey(path))
{
expression = (XPathExpression)mExpressionSet.get(path);
}
else
{
expression = mXPath.compile(path);
mExpressionSet.put(path, expression);
}
return expression;
}

/**
* Instance variables.
*/
private Node mCurrentNode;
private boolean mIsFirstRecord = true;
private final String mTableName;
private final HashMap mExpressionSet = new HashMap();
private final XPath mXPath = XPathFactory.newInstance().newXPath();
}

End Result
And heres the result below, page one of four pages in the output file, one page for each of the four
customers in the XML file. The merge fields in the template have been replaced by the customer details in
the XML file.

How to Apply Custom Formatting during
Mail Merge
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Feb 03, 2012 (view change)
Go to start of metadata

The MailMerge class provides two events that could be very useful in expanding mail merge capabilities. The
MailMerge.FieldMergingCallback property accepts a class which implements the methods
IFieldMergingCallback.FieldMerging and IFieldMergingCallback.ImageFieldMerging. These can be used to
implement custom control over the mail merge process.

The IFieldMergingCallback.FieldMerging event occurs during mail merge when a simple mail merge field is
encountered in the document. This gives further control over the mail merge and you can perform any actions
when the event occurs. This method is wrapped in a class that implements the IFieldMergingCallBack interface
and accepts a FieldMergingArgs object that provides data for the corresponding event.
Example
Demonstrates how to implement custom logic in the MergeField event to apply cell formatting.
Java
public void mailMergeAlternatingRows() throws Exception {
Document doc = new Document(getMyDir() + "MailMerge.AlternatingRows.doc");

// Add a handler for the MergeField event.
doc.getMailMerge().setFieldMergingCallback(new HandleMergeFieldAlternatingRows());

// Execute mail merge with regions.
com.aspose.words.DataTable dataTable = getSuppliersDataTable();
doc.getMailMerge().executeWithRegions(dataTable);

doc.save(getMyDir() + "MailMerge.AlternatingRows Out.doc");
}

private class HandleMergeFieldAlternatingRows implements IFieldMergingCallback {
/**
* Called for every merge field encountered in the document.
* We can either return some data to the mail merge engine or do something
* else with the document. In this case we modify cell formatting.
*/
public void fieldMerging(FieldMergingArgs e) throws Exception {
if (mBuilder == null)
mBuilder = new DocumentBuilder(e.getDocument());

// This way we catch the beginning of a new row.
if (e.getFieldName().equals("CompanyName")) {
// Select the color depending on whether the row number is even or odd.
Color rowColor;
if (isOdd(mRowIdx))
rowColor = new Color(213, 227, 235);
else
rowColor = new Color(242, 242, 242);

// There is no way to set cell properties for the whole row at the moment,
// so we have to iterate over all cells in the row.
for (int colIdx = 0; colIdx < 4; colIdx++) {
mBuilder.moveToCell(0, mRowIdx, colIdx, 0);

mBuilder.getCellFormat().getShading().setBackgroundPatternColor(rowColor);
}

mRowIdx++;
}
}

public void imageFieldMerging(ImageFieldMergingArgs args) throws Exception {
// Do nothing.
}

private DocumentBuilder mBuilder;
private int mRowIdx;
}

/*
* Returns true if the value is odd; false if the value is even.
*/

private static boolean isOdd(int value) throws Exception {
return (value % 2 != 0);
}

/**
* Create DataTable and fill it with data.
* In real life this DataTable should be filled from a database.
*/
private static com.aspose.words.DataTable getSuppliersDataTable() throws Exception {
java.sql.ResultSet resultSet = createCachedRowSet(new String[]{"CompanyName",
"ContactName"});

for (int i = 0; i < 10; i++)
addRow(resultSet, new String[]{"Company " + Integer.toString(i), "Contact " +
Integer.toString(i)});

return new com.aspose.words.DataTable(resultSet, "Suppliers");
}

How to Insert Check Boxes during Mail
Merge
Skip to end of metadata

Attachments:2
Added by hammad, last edited by Awais Hafeez on May 11, 2014 (view change)
Go to start of metadata

You can download the complete source code of the MailMergeFormFields sample here.
One of the important Aspose.Words features is the reporting (mail merge) engine. The mail merge engine
takes a document on input, looks for MERGEFIELD fields in it and replaces them with data obtained
from the data source. Normally, simple text is inserted, but a customer asked if it is possible to generate a
document where boolean data values are output as check box form fields.

The answer is yes - it is possible and it is very easy, thanks to the ability to extend the mail merge process
using event handlers. The MailMerge object provides the MergeField and MergeImageField event
handlers.
Other interesting examples of extending standard mail merge using event handlers are:
Insert HTML into merge fields (sample code in the documentation for the MergeField event).
Insert images from any custom storage (files, BLOB fields etc).
Insert text with formatting (font, size, style etc).
This screenshot of Microsoft Word shows a template document with the merge fields:


This screenshot of Microsoft Word shows the generated document. Note some fields were replaced with
simple text, some fields were replaced with check box form fields and the Subject field was replaced with
a text input form field.


Example
Complete source code of a program that inserts checkboxes and text input form fields into a document
during mail merge.
Java
package MailMergeFormFields;

import java.io.File;
import java.net.URI;

import com.aspose.words.Document;
import com.aspose.words.IFieldMergingCallback;
import com.aspose.words.FieldMergingArgs;
import com.aspose.words.DocumentBuilder;
import com.aspose.words.TextFormFieldType;
import com.aspose.words.ImageFieldMergingArgs;


/**
* This sample shows how to insert check boxes and text input form fields during mail
merge into a document.
*/
class Program
{
/**
* The main entry point for the application.
*/
public static void main(String[] args) throws Exception
{
Program program = new Program();
program.execute();
}

private void execute() throws Exception
{
URI exeDir = Program.class.getResource("").toURI();
String dataDir = new File(exeDir.resolve("../../Data")) + File.separator;

// Load the template document.
Document doc = new Document(dataDir + "Template.doc");

// Setup mail merge event handler to do the custom work.
doc.getMailMerge().setFieldMergingCallback(new HandleMergeField());

// This is the data for mail merge.
String[] fieldNames = new String[] {"RecipientName", "SenderName",
"FaxNumber", "PhoneNumber",
"Subject", "Body", "Urgent", "ForReview", "PleaseComment"};
Object[] fieldValues = new Object[] {"Josh", "Jenny", "123456789", "",
"Hello",
"Test message 1", true, false, true};

// Execute the mail merge.
doc.getMailMerge().execute(fieldNames, fieldValues);

// Save the finished document.
doc.save(dataDir + "Template Out.doc");
}

private class HandleMergeField implements IFieldMergingCallback
{
/**
* This handler is called for every mail merge field found in the document,
* for every record found in the data source.
*/
public void fieldMerging(FieldMergingArgs e) throws Exception
{
if (mBuilder == null)
mBuilder = new DocumentBuilder(e.getDocument());

// We decided that we want all boolean values to be output as check box
form fields.
if (e.getFieldValue() instanceof Boolean)
{
// Move the "cursor" to the current merge field.
mBuilder.moveToMergeField(e.getFieldName());

// It is nice to give names to check boxes. Lets generate a name such
as MyField21 or so.
String checkBoxName = java.text.MessageFormat.format("{0}{1}",
e.getFieldName(), e.getRecordIndex());

// Insert a check box.
mBuilder.insertCheckBox(checkBoxName, (Boolean)e.getFieldValue(), 0);

// Nothing else to do for this field.
return;
}

// Another example, we want the Subject field to come out as text input
form field.
if ("Subject".equals(e.getFieldName()))
{
mBuilder.moveToMergeField(e.getFieldName());
String textInputName = java.text.MessageFormat.format("{0}{1}",
e.getFieldName(), e.getRecordIndex());
mBuilder.insertTextInput(textInputName, TextFormFieldType.REGULAR, "",
(String)e.getFieldValue(), 0);
}
}

public void imageFieldMerging(ImageFieldMergingArgs args) throws Exception
{
// Do nothing.
}

private DocumentBuilder mBuilder;
}
}

How to Insert Images from a Database
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Feb 03, 2012 (view change)
Go to start of metadata

The MailMerge.FieldMergingCallback event occurs during mail merge when an image mail merge field is
encountered in the document. An image mail merge field is a merge field named like Image:MyFieldName. You
can respond to this event to return a file name, stream, or an Image object to the mail merge engine so that it
is inserted into the document.

The MailMerge.FieldMergingCallback property accepts a class implementing the IFieldMergingCallback
interface. This class defines the method that is called to handle the merging for the image field. The method
handler receives an argument of type ImageFieldMergingArgs . There are three properties available
ImageFieldMergingArgs.ImageFileName , ImageFieldMergingArgs.ImageStream and
ImageFieldMergingArgs.Image to specify from where the image must be taken. Set only one of these
properties.
Example
Shows how to insert images stored in a database BLOB field into a report.
Java
public void mailMergeImageFromBlob() throws Exception {
Document doc = new Document(getMyDir() + "MailMerge.MergeImage.doc");

// Set up the event handler for image fields.
doc.getMailMerge().setFieldMergingCallback(new HandleMergeImageFieldFromBlob());

Class.forName("sun.jdbc.odbc.JdbcOdbcDriver"); // Loads the driver

// Open the database connection.
String connString = "jdbc:odbc:DRIVER={Microsoft Access Driver (*.mdb)};" +
"DBQ=" + getDatabaseDir() + "Northwind.mdb" + ";UID=Admin";

// DSN-less DB connection.
java.sql.Connection conn = java.sql.DriverManager.getConnection(connString);

// Create and execute a command.
java.sql.Statement statement = conn.createStatement();
java.sql.ResultSet resultSet = statement.executeQuery("SELECT * FROM Employees");

com.aspose.words.DataTable table = new com.aspose.words.DataTable(resultSet,
"Employees");

// Perform mail merge.
doc.getMailMerge().executeWithRegions(table);

// Close the database.
conn.close();

doc.save(getMyDir() + "MailMerge.MergeImage Out.doc");
}

private class HandleMergeImageFieldFromBlob implements IFieldMergingCallback {
public void fieldMerging(FieldMergingArgs args) throws Exception {
// Do nothing.
}

/**
* This is called when mail merge engine encounters Image:XXX merge field in the
document.
* You have a chance to return an Image object, file name or a stream that
contains the image.
*/
public void imageFieldMerging(ImageFieldMergingArgs e) throws Exception {
// The field value is a byte array, just cast it and create a stream on it.
ByteArrayInputStream imageStream = new ByteArrayInputStream((byte[])
e.getFieldValue());
// Now the mail merge engine will retrieve the image from the stream.
e.setImageStream(imageStream);
}
}

How to Control New Pages during Mail
Merge
Skip to end of metadata

Attachments:1
Added by hammad, last edited by Adam Skelton on Feb 03, 2012 (view change)
Go to start of metadata
Question

Is it possible to create a new page in the document for each record in the data source when executing mail
merge?
Conversely, is it possible to make sure all merged records appear continuously without page breaks?
Answers

Yes. There are different techniques depending upon whether you are using simple mail merge or mail merge
with regions.
Controlling New Pages when Using Simple Mail Merge
In Microsoft Word, go to File / Page Setup / Layout. Select Section / Start from new page. Since the
mail merge engine duplicates document content and the result is multiple document sections (one section
per merged record), choosing this option will force Word to start every section from a new page.



Controlling New Pages when Using Mail Merge with Regions
If you use mail merge with regions, then the mail merge region is duplicated for each record. A mail
merge region can include block level elements such as paragraphs, tables, table rows all inside a single
section. You can control page breaks for each merged record in a number of ways:
Format the first paragraph in the region to have a page break before it using Format / Paragraph / Line
and Page Breaks .
Insert a page break using Insert / Break in Microsoft Word at the end of the mail merge region.

When the MailMerge.RemoveEmptyParagraphs property is set to true any paragraphs which are empty or
only contain TableStart or TableEnd merge fields are removed automatically during mail merge. In this
situation if you are applying the techniques to such a paragraph then this will cause incorrect behavior as the
paragraphs containing page breaks will be removed during mail merge.
To remedy this issue you can consider moving the content from the next paragraph onto the previous
paragraph so it will not be removed during mail merge.
How to Remove Unmerged Fields and Empty
Paragraphs during Mail Merge
Skip to end of metadata

Attachments:1
Added by hammad, last edited by Adam Skelton on Nov 29, 2012 (view change)
Go to start of metadata

When merging data into a document you often require control over how unmerged merge fields are removed
from a document. For instance you may want to have any leftover merge fields removed, along with any
surronding fields. You may also wish to remove any paragraphs that become empty during mail merge.

The MailMerge.CleanupOptions property is used along with the MailMergeCleanupOptions enumeration to
specify different options on how the mail merge engine deals with such left over merge fields.
The members of the MailMergeCleanupOptions enumeration are flags so that a combination of the
different cleanup options can be used simultaneously.
The following diagram gives a general demonstration of how the different cleanup options will
remove different field constructions during mail merge.


Given that the Amount field was not merged with any data:
If the MailMergeCleanupOptions.RemoveUnusedFields flag is enabled by itself then the Amount
merge field is removed. The outer IF field and paragraph still remains and the IF field maybe updated
with an error code as the left hand side of the expression is missing.
If the MailMergeCleanupOptions.RemoveContainingFields flag is enabled then not only would the
merge field be removed, but also the outer IF field is removed as well leaving just the plain text result
of the IF field.
If the MailMergeCleanupOptions.RemoveEmptyParagraphs flag is enabled then in addition, if the
paragraph content became empty (for example if the result of the IF field happened to be empty text
or spaces) then the entire paragraph would be removed.
Its useful to note that if you are merging data using separate data sources then these options should be
enabled only on the last execute call. This is so no fields or regions are prematurely removed from the
document.
Removing Merge Fields
You can request the mail merge engine to remove any unused mail merge fields automatically during mail
merge by applying the MailMergeCleanupOptions.RemoveUnusedFields flag to
MailMerge.CleanupOptions.
Example
Shows how to automatically remove unmerged merge fields during mail merge.
Java
doc.getMailMerge().setCleanupOptions(MailMergeCleanupOptions.REMOVE_UNUSED_FIELDS);

Removing Containing Fields
Commonly a merge field can be contained within another field such as an IF field or a formula field.
Aspose.Words provides an option to remove this outer field when the merge field is merged or removed
from the document. To remove such containing fields you can enable the
MailMergeCleanupOptions.RemoveContainingFields flag with MailMerge.CleanupOptions.

This option is used to match the behavior of Microsoft Word during mail merge which always
automatically removes outer fields from a field which is merged and leaves only the plain text result.

Note that this option will only remove a containing field if the field was actually merged with data or if the
merge field was removed by using the MailMergeCleanupOptions.RemoveUnusedFields option.
Example
Shows how to instruct the mail merge engine to remove any containing fields from around a merge field
during mail merge.
Java
doc.getMailMerge().setCleanupOptions(MailMergeCleanupOptions.REMOVE_CONTAINING_FIELDS)
;

Removing Empty Paragraphs
Sometimes you may need to completely remove paragraphs that contained mail merge fields which
became empty during mail merge. For example, the mail merge field could be merged with empty data or
the merge field removed because it was unused.

In either of those two situations the MailMergeCleanupOptions.RemoveEmptyParagraphs flag will
automatically remove such empty paragraphs from the document during mail merge.
Additionally, this option will also remove any TableStart and TableEnd merge fields if the rest of the
paragraph is empty. This can be used to combine the tables inside a region into one automatically during
mail merge.
Example
Shows how to make sure empty paragraphs that result from merging fields with no data are removed from
the document.
Java
doc.getMailMerge().setCleanupOptions(MailMergeCleanupOptions.REMOVE_EMPTY_PARAGRAPHS);

How to Remove Unmerged Regions from a
Document
Skip to end of metadata

Attachments:2
Added by hammad, last edited by Awais Hafeez on May 11, 2014 (view change)
Go to start of metadata

You can download the complete source code of the RemoveEmptyRegions sample here.
In previous versions of Aspose.Words, mail merge regions which have been merged but which contained
no data were removed from the document automatically. After upgrading you may find these regions now
remain after mail merge is executed. A new feature has now been implemented which allows you to
control how unused regions are handled during mail merge through the introduction of the
MailMerge.CleanupOptions property. This property provides flags to control how the document is
created.

By default the removal of unused regions is set to false and if unmerged regions are not removed
automatically after upgrading this is the most likely reason why. If you choose to set this property before
mail merge is executed then unused regions will be automatically removed.
The Solution
The MailMergeCleanupOptions.RemoveUnusedRegions flag can be used to control how unused
regions are handled during mail merge.

To demonstrate how this property is used, we will merge a document with an empty data source
containing no data tables. It is apparent that this will result in unused regions in the document. The result
after enabling the MailMerge.CleanupOptions property to true will demonstrate that the unused regions
are removed automatically by the mail merge engine.
The following steps are used to demonstrate this:
1. Create an empty DataSet from an empty DataTable .
2. Enable the MailMergeCleanupOptions.RemoveUnusedRegions flag in MailMerge.CleanupOptions .
3. Merge the data with the document using the MailMerge.ExecuteWithRegions method.
The following document is used in our sample. It contains multiple regions which serve as a good
example. Notice how the first region in the document includes a nested region.


The Code
The MailMergeCleanupOptions.RemoveUnusedRegions flag is enabled before mail merge is executed.
This instructs the mail merge engine to remove any regions which are not merged with any data.
Example
Shows how to remove unmerged mail merge regions from the document.
Java
// Open the document.
Document doc = new Document(dataDir + "TestFile.doc");

// Create a dummy data source containing no data.
DataSet data = new DataSet();

// Set the appropriate mail merge clean up options to remove any unused regions from
the document.
doc.getMailMerge().setCleanupOptions(MailMergeCleanupOptions.REMOVE_UNUSED_REGIONS);

// Execute mail merge which will have no effect as there is no data. However the
regions found in the document will be removed
// automatically as they are unused.
doc.getMailMerge().executeWithRegions(data);

// Save the output document to disk.
doc.save(dataDir + "TestFile.RemoveEmptyRegions Out.doc");


The MailMergeCleanupOptions.RemoveUnusedRegions flag will remove any region in the document which is
not found in the current data source.
If you are merging data from many data sources by using separate calls to
MailMerge.ExecuteWithRegions then you need to make sure that this flag is only enabled with the
very last merge. Otherwise all unused regions will be removed from the document before they can be
merged.
End Result
The output below shows the result after merging the document with an empty data source and the
MailMergeCleanupOptions.RemoveUnusedRegions flag to true. All of the regions which contained no
data were successfully removed from the document.


You can also observe some of the related content surronding the region is not removed along with the
unused region. To manually handle how a region is removed or replaced you can use the technique
provided in the Apply Custom Logic to Empty Regions article here .
How to Apply Custom Logic to Unmerged
Regions
Skip to end of metadata

Attachments:6
Added by hammad, last edited by Awais Hafeez on May 11, 2014 (view change)
Go to start of metadata

You can download the complete source code of the ApplyCustomLogicToEmptyRegions sample here.
There are some situations where completely removing unmerged regions from the document during mail
merge is not desired or results in the document looking incomplete. This can occur when the absence of
input data should be displayed to the user in the form of a message instead of the region being completely
removed.

There are also times when the removal of the unused region on its own is not enough, for instance if the
region is preceded with a title or the region is contained with a table. If this region is unused then the title
and table will still remain after the region is removed which will look out of place in the document.
This article provides a solution to manually define how unused regions in the document are handled. The
base code for this functionality is supplied and can be easily reused in other project.
The logic to be applied to each region is defined inside a class that implements the
IFieldMergingCallback interface. In the same way a mail merge handler can be set up to control how
each field is merged, this handler can be set up to perform actions on each field in an unused region or on
the region as a whole. Within this handler you can set code to change the text of a region, remove nodes
or empty rows and cells etc.
In this sample we will be using the document displayed below. It contains nested regions and a region
contained within a table.


As a quick demonstration we can execute a sample database on the sample document with the
MailMergeCleanupOptions.RemoveUnusedRegions flag enabled. This property will automatically
remove unmerged regions from the document during mail merge.
The data source includes two records for the StoreDetails region but purposely does have any data for the
child ContactDetails regions for one of the records. Furthermore the Suppliers region does not have any
data rows either. This will cause unused regions to remain in the document. The result after merging the
document with this data source is below.


As noted on the image you can see that the ContactDetails region for the second record and Suppliers
regions have been automatically removed by the mail merge engine as they have no data. However there
are a few issues that make this output document look incomplete:
The ContactDetails region still leaves a paragraph with the text Contact Details .
In the same case there is no indication that there are no phone numbers, only a blank space which could
lead to confusion.
The table and title related to the Suppliers region also remains after the region inside the table is
removed.
The technique provided in this article demonstrates how to apply custom logic on each unmerged regions
to avoid these issues.
The Solution
To manually apply logic to each unused region in the document we take advantage of features already
available in Aspose.Words.

The mail merge engine provides a property to remove unused regions through the
MailMergeCleanupOptions.RemoveUnusedRegions flag. This can be disabled so that such regions are
left untouched during mail merge. This allows us to leave the unmerged regions in the document and
handle them manually ourselves instead.
We can then take advantage of the MailMerge.FieldMergingCallback property as a means to apply our
own custom logic to these unmerged regions during mail merge through the use of a handler class
implementing the IFieldMergingCallback interface.
This code within the handler class is the only class you will need to modify in order to control the logic
applied to unmerged regions. The other code in this sample can be reused without modification in any
project.
This sample project demonstrates this technique. It involves the following steps:
1. Execute mail merge on the document using your data source. The
MailMergeCleanupOptions.RemoveUnusedRegions flag is disabled for now we want the regions to
remain so we can handle them manually. Any regions without data will be left unmerged in the
document.
2. Call the ExecuteCustomLogicOnEmptyRegions method. This method is provided in this sample. It
performs actions which allows the specified handler to be called for each unmerged region. This method
is reusable and can be copied unaltered to any project which requires it (along with any dependent
methods).

This method executes the following steps:
1. Sets the handler specified by the user to the MailMerge.FieldMergingCallback property .
2. Calls the CreateDataSourceFromDocumentRegions method which accepts the users Document
and ArrayList containing regions names. This method will create a dummy data source containing
tables for each unmerged region in the document .
3. Executes mail merge on the document using the dummy data source. When mail merge is
executed with this data source it allows the user specified handler to be called for each unmerge
region and the custom logic applied
The Code
The implementation for the ExecuteCustomLogicOnEmptyRegions method is found below. This
method accepts several parameters:
1. The Document object containing unmerged regions which are to be handled by the passed handler .
2. The handler class which defines the logic to apply to unmerged regions. This handler must implement the
IFieldMergingCallback interface .
3. Through the use of the appropriate overload, the method can also accept a third parameter a list of
region names as strings. If this is specified then only region names remaining the document specified in
the list will be manually handled. Other regions which are encountered will not be called by the handler
and removed automatically .

When the overload with only two parameters is specified, every remaining region in the document is
included by the method to be handled manually.
Example
Shows how to execute custom logic on unused regions using the specified handler.
Java
/**
* Applies logic defined in the passed handler class to all unused regions in the
document. This allows to manually control
* how unused regions are handled in the document.
*
* @param doc The document containing unused regions.
* @param handler The handler which implements the IFieldMergingCallback interface and
defines the logic to be applied to each unmerged region.
*/
public static void executeCustomLogicOnEmptyRegions(Document doc,
IFieldMergingCallback handler) throws Exception
{
executeCustomLogicOnEmptyRegions(doc, handler, null); // Pass null to handle all
regions found in the document.
}

/**
* Applies logic defined in the passed handler class to specific unused regions in the
document as defined in regionsList. This allows to manually control
* how unused regions are handled in the document.
*
* @param doc The document containing unused regions.
* @param handler The handler which implements the IFieldMergingCallback interface and
defines the logic to be applied to each unmerged region.
* @param regionsList A list of strings corresponding to the region names that are to
be handled by the supplied handler class. Other regions encountered will not be
handled and are removed automatically.
*/
public static void executeCustomLogicOnEmptyRegions(Document doc,
IFieldMergingCallback handler, ArrayList regionsList) throws Exception
{
// Certain regions can be skipped from applying logic to by not adding the table
name inside the CreateEmptyDataSource method.
// Enable this cleanup option so any regions which are not handled by the user's
logic are removed automatically.

doc.getMailMerge().setCleanupOptions(MailMergeCleanupOptions.REMOVE_UNUSED_REGIONS);

// Set the user's handler which is called for each unmerged region.
doc.getMailMerge().setFieldMergingCallback(handler);

// Execute mail merge using the dummy dataset. The dummy data source contains the
table names of
// each unmerged region in the document (excluding ones that the user may have
specified to be skipped). This will allow the handler
// to be called for each field in the unmerged regions.
doc.getMailMerge().executeWithRegions(createDataSourceFromDocumentRegions(doc,
regionsList));
}

/**
* A helper method that creates an empty Java disconnected ResultSet with the
specified columns.
*/
private static ResultSet createCachedRowSet(String[] columnNames) throws Exception
{
RowSetMetaDataImpl metaData = new RowSetMetaDataImpl();
metaData.setColumnCount(columnNames.length);
for (int i = 0; i < columnNames.length; i++)
{
metaData.setColumnName(i + 1, columnNames[i]);
metaData.setColumnType(i + 1, java.sql.Types.VARCHAR);
}

CachedRowSetImpl rowSet = new CachedRowSetImpl();
rowSet.setMetaData(metaData);

return rowSet;
}

/**
* A helper method that adds a new row with the specified values to a disconnected
ResultSet.
*/
private static void addRow(ResultSet resultSet, String[] values) throws Exception
{
resultSet.moveToInsertRow();

for (int i = 0; i < values.length; i++)
resultSet.updateString(i + 1, values[i]);

resultSet.insertRow();

// This "dance" is needed to add rows to the end of the result set properly.
// If I do something else then rows are either added at the front or the
result
// set throws an exception about a deleted row during mail merge.
resultSet.moveToCurrentRow();
resultSet.last();
}


If you considering running the ExecuteCustomLogicOnEmptyRegions method consecutively with different
handlers (e.g each handler applies logic to certain fields) then you will need to disable the removal of unused
regions so such regions are not removed in between these calls.
Example
Defines the method used to manually handle unmerged regions.
Java
/**
* Returns a DataSet object containing a DataTable for the unmerged regions in the
specified document.
* If regionsList is null all regions found within the document are included. If an
ArrayList instance is present
* the only the regions specified in the list that are found in the document are
added.
*/
private static DataSet createDataSourceFromDocumentRegions(Document doc, ArrayList
regionsList) throws Exception
{
final String TABLE_START_MARKER = "TableStart:";
DataSet dataSet = new DataSet();
String tableName = null;

for (String fieldName : doc.getMailMerge().getFieldNames())
{
if (fieldName.contains(TABLE_START_MARKER))
{
tableName = fieldName.substring(TABLE_START_MARKER.length());
}
else if (tableName != null)
{
// Only add the table as a new DataTable if it doesn't already exists in
the DataSet.
if (dataSet.getTables().get(tableName) == null)
{
ResultSet resultSet = createCachedRowSet(new String[] {fieldName});

// We only need to add the first field for the handler to be called
for the fields in the region.
if (regionsList == null || regionsList.contains(tableName))
{
addRow(resultSet, new String[] {"FirstField"});
}

dataSet.getTables().add(new DataTable(resultSet, tableName));
}
tableName = null;
}
}

return dataSet;
}

This method involves finding all unmerged regions in the document. This is accomplished using the
MailMerge.GetFieldNames method. This method returns all merge fields in the document, including the
region start and end markers (represented by merge fields with the prefix TableStart or TableEnd).

When a TableStart merge field is encountered this is added as a new DataTable to the DataSet . Since a
region may appear more than once (for example because it is a nested region where the parent region has
been merged with multiple records), the table is only created and added if it does not already exist in the
DataSet .
When an appropriate region start has been found and added to the database, the next field (which
corresponds to the first field in the region) is added to the DataTable . Only the first field is required to
be added for each field in the region to be merged and passed to the handler.
We also set the field value of the first field to FirstField to make it easier to apply logic to the first or
other fields in the region. By including this it means it is not necessary to hardcode the name of the first
field or implement extra code to check if the current field is the first in the handler code.
The code below demonstrates how this system works. The document shown at the start at this article is
remerged with the same data source but this time the unused regions are handled by custom code.
Example
Shows how to handle unmerged regions after mail merge with user defined code.
Java
// Open the document.
Document doc = new Document(dataDir + "TestFile.doc");

// Create a data source which has some data missing.
// This will result in some regions that are merged and some that remain after
executing mail merge.
DataSet data = getDataSource();

// Make sure that we have not set the removal of any unused regions as we will handle
them manually.
// We achieve this by removing the RemoveUnusedRegions flag from the cleanup options
by using the AND and NOT bitwise operators.
doc.getMailMerge().setCleanupOptions(doc.getMailMerge().getCleanupOptions() &
~MailMergeCleanupOptions.REMOVE_UNUSED_REGIONS);

// Execute mail merge. Some regions will be merged with data, others left unmerged.
doc.getMailMerge().executeWithRegions(data);

// The regions which contained data now would of been merged. Any regions which had no
data and were
// not merged will still remain in the document.
// Apply logic to each unused region left in the document using the logic set out in
the handler.
// The handler class must implement the IFieldMergingCallback interface.
executeCustomLogicOnEmptyRegions(doc, new EmptyRegionsHandler());

// Save the output document to disk.
doc.save(dataDir + "TestFile.CustomLogicEmptyRegions1 Out.doc");

The code performs different operations based on the name of the region retrieved using the
FieldMergingArgs.TableName property. Note that depending upon your document and regions you can
code the handler to run logic dependent on each region or code which applies to every unmerged region in
the document or a combination of both.

The logic for the ContactDetails region involves changing the text of each field in the ContactDetails
region with an appropriate message stating that there is no data. The names of each field are matched
within the handler using the FieldMergingArgs.FieldName property.
A similar process is applied to the Suppliers region with the addition of extra code to handle the table
which contains the region. The code will check if the region is contained within a table (as it may have
already been removed). If it is, it will remove the entire table from the document as well as the paragraph
which precedes it as long as it is formatted with a heading style e.g Heading 1.
Example
Shows how to define custom logic in a handler implementing IFieldMergingCallback that is executed for
unmerged regions in the document.
Java
public static class EmptyRegionsHandler implements IFieldMergingCallback
{
/**
* Called for each field belonging to an unmerged region in the document.
*/
public void fieldMerging(FieldMergingArgs args) throws Exception
{
// Change the text of each field of the ContactDetails region individually.
if ("ContactDetails".equals(args.getTableName()))
{
// Set the text of the field based off the field name.
if ("Name".equals(args.getFieldName()))
args.setText("(No details found)");
else if ("Number".equals(args.getFieldName()))
args.setText("(N/A)");
}

// Remove the entire table of the Suppliers region. Also check if the previous
paragraph
// before the table is a heading paragraph and if so remove that too.
if ("Suppliers".equals(args.getTableName()))
{
Table table =
(Table)args.getField().getStart().getAncestor(NodeType.TABLE);

// Check if the table has been removed from the document already.
if (table.getParentNode() != null)
{
// Try to find the paragraph which precedes the table before the table
is removed from the document.
if (table.getPreviousSibling() != null &&
table.getPreviousSibling().getNodeType() == NodeType.PARAGRAPH)
{
Paragraph previousPara = (Paragraph)table.getPreviousSibling();
if (isHeadingParagraph(previousPara))
previousPara.remove();
}

table.remove();
}
}
}

/**
* Returns true if the paragraph uses any Heading style e.g Heading 1 to Heading 9
*/
private boolean isHeadingParagraph(Paragraph para) throws Exception
{
return (para.getParagraphFormat().getStyleIdentifier() >=
StyleIdentifier.HEADING_1 && para.getParagraphFormat().getStyleIdentifier() <=
StyleIdentifier.HEADING_9);
}

public void imageFieldMerging(ImageFieldMergingArgs args) throws Exception
{
// Do Nothing
}
}

The result of the above code is shown below. The unmerged fields within the first region are replaced
with informative text and the removal of the table and heading allows the document to look complete.

The code which removes the parent table could also be made to run on every unused region instead of just
a specific region by removing the check for the table name. In this case if any region inside a table was
not merged with any data, both the region and the container table will be automatically removed as well.
We can insert different code in the handler to control how unmerged regions are handled. Using the code
below in the handler instead will change the text in the first paragraph of the region to a helpful message
while any subsequent paragraphs in the region are removed. These other paragraphs are removed as they
would remain in the region after merging our message.
The replacement text is merged into the first field by setting the specified text into the
FieldMergingArgs.Text property. The text from this property is merged into the field by the mail merge
engine.
The code applies this for only the first field in the region by checking the FieldMergingArgs.FieldValue
property. The field value of the first field in the region is marked with FirstField . This makes this type
of logic easier to implement over many regions as no extra code is required.
Example
Shows how to replace an unused region with a message and remove extra paragraphs.
Java
// Store the parent paragraph of the current field for easy access.
Paragraph parentParagraph = args.getField().getStart().getParentParagraph();

// Define the logic to be used when the ContactDetails region is encountered.
// The region is removed and replaced with a single line of text stating that there
are no records.
if ("ContactDetails".equals(args.getTableName()))
{
// Called for the first field encountered in a region. This can be used to execute
logic on the first field
// in the region without needing to hard code the field name. Often the base logic
is applied to the first field and
// different logic for other fields. The rest of the fields in the region will
have a null FieldValue.
if ("FirstField".equals((String)args.getFieldValue()))
{
// Remove the "Name:" tag from the start of the paragraph
parentParagraph.getRange().replace("Name:", "", false, false);
// Set the text of the first field to display a message stating that there are
no records.
args.setText("No records to display");
}
else
{
// We have already inserted our message in the paragraph belonging to the
first field. The other paragraphs in the region
// will still remain so we want to remove these. A check is added to ensure
that the paragraph has not already been removed.
// which may happen if more than one field is included in a paragraph.
if (parentParagraph.getParentNode() != null)
parentParagraph.remove();
}
}

The resulting document after the code above has been executed is shown below. The unused region is
replaced with a message stating that there are no records to display.



As another example we can insert the code below in place of the code originally handling the
SuppliersRegion . This will display a message within the table and merge the cells instead of removing
the table from the document. Since the region resides within a table with multiple cells, it looks nicer to
have the cells of the table merged together and the message centered.
Example
Shows how to merge all the parent cells of an unused region and display a message within the table.
Java
// Replace the unused region in the table with a "no records" message and merge all
cells into one.
if ("Suppliers".equals(args.getTableName()))
{
if ("FirstField".equals((String)args.getFieldValue()))
{
// We will use the first paragraph to display our message. Make it centered
within the table. The other fields in other cells
// within the table will be merged and won't be displayed so we don't need to
do anything else with them.
parentParagraph.getParagraphFormat().setAlignment(ParagraphAlignment.CENTER);
args.setText("No records to display");
}

// Merge the cells of the table together.
Cell cell = (Cell)parentParagraph.getAncestor(NodeType.CELL);
if (cell != null)
{
if (cell.isFirstCell())
cell.getCellFormat().setHorizontalMerge(CellMerge.FIRST); // If this cell
is the first cell in the table then the merge is started using "CellMerge.First".
else
cell.getCellFormat().setHorizontalMerge(CellMerge.PREVIOUS); // Otherwise
the merge is continued using "CellMerge.Previous".
}
}

The resulting document after the code above has been executed is shown below.

Finally we can call the ExecuteCustomLogicOnEmptyRegions method and specify the table names that
should be handled within our handler method, while specifying others to be automatically removed.
Example
Shows how to specify only the ContactDetails region to be handled through the handler class.
Java
// Only handle the ContactDetails region in our handler.
ArrayList regions = new ArrayList();
regions.add("ContactDetails");
executeCustomLogicOnEmptyRegions(doc, new EmptyRegionsHandler(), regions);

Calling this overload with the specified ArrayList will create the data source which only contains data
rows for the specified regions. Regions other than the ContactDetails region will not be handled and will
be removed automatically by the mail merge engine instead.

The result of the above call using the code in our original handler is shown below.


How to Produce Multiple Documents during
Mail Merge
Skip to end of metadata

Added by hammad, last edited by Awais Hafeez on May 11, 2014 (view change)
Go to start of metadata

You can download the complete source code of the MultipleDocsInMailMerge sample here.
A typical mail merge operation with Aspose.Words fills just one document with data from your data
source (e.g. creates an invoice or a letter).

To produce multiple documents you need to mail merge multiple times. If you need to produce a separate
document for each record in your data source, you need to do the following:
Loop through all rows in the data table.
Load (or clone) the original document before mail merge.
Mail merge with each row and save the document.
You can load the template document from a file or stream before each mail merge, but usually, it is faster
to load the document only once and then clone it in memory before each mail merge.
Please remember that to perform mail merge you should have a proper template document. This template
can be either a Microsoft Word Template or a normal Microsoft Word document, but it needs to contain
MERGEFIELD fields in the places where you want the data to be inserted. The name of each field shall
be the same as the corresponding field in your data source.
Example
Produce multiple documents during mail merge.
Java
package MultipleDocsInMailMerge;

import java.io.File;
import java.net.URI;
import java.sql.*;
import java.text.MessageFormat;
import java.util.Hashtable;

import com.aspose.words.Document;


class Program
{
public static void main(String[] args) throws Exception
{
//Sample infrastructure.
URI exeDir = Program.class.getResource("").toURI();
String dataDir = new File(exeDir.resolve("../../Data")) + File.separator;

produceMultipleDocuments(dataDir, "TestFile.doc");
}

public static void produceMultipleDocuments(String dataDir, String srcDoc) throws
Exception
{
// Open the database connection.
ResultSet rs = getData(dataDir, "SELECT * FROM Customers");

// Open the template document.
Document doc = new Document(dataDir + srcDoc);

// A record of how many documents that have been generated so far.
int counter = 1;

// Loop though all records in the data source.
while(rs.next())
{
// Clone the template instead of loading it from disk (for speed).
Document dstDoc = (Document)doc.deepClone(true);

// Extract the data from the current row of the ResultSet into a
Hashtable.
Hashtable dataMap = getRowData(rs);

// Execute mail merge.
dstDoc.getMailMerge().execute(keySetToArray(dataMap),
dataMap.values().toArray());

// Save the document.
dstDoc.save(MessageFormat.format(dataDir + "TestFile Out {0}.doc",
counter++));
}
}

/**
* Creates a Hashtable from the name and value of each column in the current row
of the ResultSet.
*/
public static Hashtable getRowData(ResultSet rs) throws Exception
{
ResultSetMetaData metaData = rs.getMetaData();
Hashtable values = new Hashtable();

for(int i = 1; i <= metaData.getColumnCount(); i++)
{
values.put(metaData.getColumnName(i), rs.getObject(i));
}

return values;
}

/**
* Utility function that returns the keys of a Hashtable as an array of Strings.
*/
public static String[] keySetToArray(Hashtable table)
{
return (String[])table.keySet().toArray(new String[table.size()]);
}

/**
* Utility function that creates a connection to the Database.
*/
public static ResultSet getData(String dataDir, String query) throws Exception
{
// Load a DB driver that is used by the demos
Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");
// Compose connection string.
String connectionString = "jdbc:odbc:DRIVER={Microsoft Access Driver
(*.mdb)};" +
"DBQ=" + new File(dataDir, "Customers.mdb") +
";UID=Admin";
// DSN-less DB connection.
Connection connection = DriverManager.getConnection(connectionString);

Statement statement =
connection.createStatement(ResultSet.TYPE_SCROLL_INSENSITIVE,
ResultSet.CONCUR_READ_ONLY);

return statement.executeQuery(query);
}
}

How to Use Advanced Mail Merge Features
Skip to end of metadata

Added by hammad, last edited by Adam Skelton on Feb 28, 2012 (view change)
Go to start of metadata

The MailMerge class provides some additional properties and methods that allow further customization of the
mail merge process.
Using Mapped Fields
The MailMerge class allows you to automatically map between names of fields in your data source and
names of mail merge fields in the document. To perform this, use the MailMerge.MappedDataFields
property that returns a MappedDataFieldCollection object. MappedDataFieldCollection is a collection
of string keys and string values. The keys are the names of mail merge fields in the document and the
values are the names of fields in your data source. The class provides all properties and methods typical
for a regular collection such as MappedDataFieldCollection.Add, MappedDataFieldCollection.Clear,
MappedDataFieldCollection.Remove, etc.
Example
Shows how to add a mapping when a merge field in a document and a data field in a data source have
different names.
Java
doc.getMailMerge().getMappedDataFields().add("MyFieldName_InDocument",
"MyFieldName_InDataSource");

Obtaining Merge Field Names
You can get a collection of the merge field names available in the document. Call
MailMerge.GetFieldNames that returns an array of string that contains the names. The method supports
extended syntax in field names. A new string array is created on every call. The method does not
eliminate duplicate field names.
Example
Shows how to get names of all merge fields in a document.
Java
String[] fieldNames = doc.getMailMerge().getFieldNames();

Deleting Merge Fields
Sometimes you may want to remove any mai
Mail Merge using 'Mustache' Template Syntax
Skip to end of metadata

Added by tahir manzoor, last edited by tahir manzoor on May 19, 2014 (view change)
Go to start of metadata

This new syntax allows you to create templates for use with mail merge that use plain text markers instead of
merge fields. These markers look like this:
{{ FieldName }}
You can enable mail merging into plain text fields by enabling the MailMerge.UseNonMergeFields
property and can freely mix them in your template with the Microsoft Word fields such as IF or
Formula fields.
Example
Performs a simple insertion of data into merge fields and sends the document to the browser inline.
Java
// Open an existing document.
Document doc = new Document(getMyDir() + "MailMerge.ExecuteArray.doc");

doc.getMailMerge().setUseNonMergeFields(true);

// Fill the fields in the document with user data.
doc.getMailMerge().execute(
new String[] {"FullName", "Company", "Address", "Address2", "City"},
new Object[] {"James Bond", "MI5 Headquarters", "Milbank", "", "London"});

doc.save(getMyDir() + "MailMerge.ExecuteArray Out.doc");

Object.Attribute Syntax
You can easily merge attributes of fields using the following syntax:
{{ Address.Street }}
This will merge data from XML data which looks like this:
<Order> // <-- Current context is here.
<Number>23</Number>
<Address>
<Street>Nelson Street</Street>
<Suburb>Howick</Suburb>
<City>Auckland</City>
</Address>
<PhoneNumber>543 1234</PhoneNumber>
</Order>
Foreach Blocks
You can merge data from multiple records using the foreach tag. This is similar to mail merge regions
with convectional merge fields. You can nest such blocks.
{{ #foreach Order }}
{{ Number }}
{{ Address.Street }}
{{ #foreach Item }}
{{ Description }} {{ Cost}} {{ Total }}
{{/foreach Item }}
{{ /foreach Order }}
You can also mix these fields and place them inside other Microsoft Word fields such as IF or Formula
fields.
Example
Following code example shows how to use mail merge using Mustache Syntax (Foreach Blocks). You can download the
complete source code of the XmlMailMergeDataSet samplehere
Java
// Use DocumentBuilder from the javax.xml.parsers package and Document class from the
org.w3c.dom package to read
// the XML data file and store it in memory.
DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();

// Parse the XML data.
org.w3c.dom.Document xmlData = db.parse(MyDir + "Orders.xml");

// Open a template document.
Document doc = new Document(MyDir + "ExecuteTemplate.doc");

doc.getMailMerge().setUseNonMergeFields(true);
// Note that this class also works with a single repeatable region (and any nested
regions).
// To merge multiple regions at the same time from a single XML data source, use the
XmlMailMergeDataSet class.
// e.g doc.getMailMerge().executeWithRegions(new XmlMailMergeDataSet(xmlData));
doc.getMailMerge().executeWithRegions(new XmlMailMergeDataSet(xmlData));

// Save the output document.
doc.save(MyDir + "Out.docx");

Known I ssue
There is one exception using Object.Attribute Syntax in the template.
For example,
1) a master table named master
2) its field named details
3) a related detail table named details
4) and a reference to master.details in a template.

This makes the topmost detail table's item to be used instead of details field which may be
undesired. So, if such conflicts exist in data sources then avoid using of a dot in their field names (or
alternatively avoid such conflicts). This refers to data sources and templates of all types.
l merge fields from a document without executing mail merge, for instance if you a processing a
document and wish to remove any merge fields before conversion. To achieve this you can use
MailMerge.DeleteFields to remove all remaining mail merge fields. This method removes
MERGEFIELD and NEXT fields from the document.

This method is not influenced by any flags set under MailMerge.CleanupOptions and executing this
method only removes merge fields not any containing fields or empty paragraphs. If such options are
desired then the MailMerge.CleanupOptions property should be used instead.
Example
Shows how to delete all merge fields from a document without executing mail merge.
Java
doc.getMailMerge().deleteFields();

You might also like