You are on page 1of 57

Configuring and

Administrating TREX
using the
TREX Admin Tool

Bettina Knauss
NetWeaver RIG EMEA

SAP AG

Walldorf 07.03.2007
TREX Introduction

TREX Administration Tool

Landscape Configuration
RFC Connection
Administrating, Monitoring
Traces
TREX Architecture

SAP AG 2006, Title of Presentation / Speaker Name / 3


TREX Anatomy

TREX provides several client options:


Java client for communication via HTTP/XML in SAP EP
ABAP client for communication via RFC or ICM in SAP landscape
C++ and Python clients for internal calls and development

Inside TREX there are four main services:


Name server: manages TREX landscape, allocates TREX services
Index server: indexing and retrieval
– Text-mining engine for classification and similarity search
– Text search engine for search and indexing unstructured text
– Attribute/BIA engine for searching and indexing structured data
Queue server: manages asynchronous indexing
Preprocessor: document retrieval, filtering, linguistic processing

SAP AG 2006, Title of Presentation / Speaker Name / 4


Name Server

TREX Name Server


Monitors the landscape (for high availability)
Maintains a list of all services and their status
Is called whenever one service seeks another
Distributes load

Example
When a service sends the name server the request
GetServer (IndexServer, SearchMode, MyIndex)
the name server answers with the address
<host>:<port>
of the index server to which to send the request

SAP AG 2006, Title of Presentation / Speaker Name / 5


Name Server: Initialization Files

The most important .ini files are:

topology.ini
Read by all name servers
Contains all index-relevant information
– To edit the file, use the TREX standalone admin tool

sapprofile.ini
Read by all TREX services and clients
Specifies:
– Port number of local name server
– Host and port numbers of all master name servers
– Amount of shared memory used by topology.ini data
– System ID
– Path information to where each service saves its data

SAP AG 2006, Title of Presentation / Speaker Name / 6


Queue Server

TREX Queue Server


Collects indexing requests
– Sends them to the index server
Enables asynchronous indexing
– Scheduled
– Event triggered
Includes scheduler for replication
– Replication runs on index server
Stores snapshots for replication

SAP AG 2006, Title of Presentation / Speaker Name / 7


Preprocessor 1

TREX Preprocessor
Delivers documents that the engines can use directly
Supports almost any data type
Gets documents via HTTP from source
Converts documents to HTML
Keeps the document structure
Extracts attributes
– Metadata from DOC, PDF, ...
.* .zip .ppt
– Names from a lexicon
.pdf .*
– Application-specific attributes
Performs linguistic processing .* <html> .doc
– Tokenization <head>…</head>
<body>…</body>
– Stemming </html>
– Tagging
(using third party products)

SAP AG 2006, Title of Presentation / Speaker Name / 8


Preprocessor 2

TREX Preprocessor
Reduces workload on the other engines
Works independently of the indexes
Is stateless
Java ABAP Index
Client Client Server

Name
Python
Extensions Preprocessor Server
Client

HTTP Client HTML Filter Lexicon Highlighting Extensions

SAP AG 2006, Title of Presentation / Speaker Name / 9


Text Search Engine

Search Indexing
Exact search Many documents at once
– SAP – Up to tens of millions
Phrase search Many formats *
– “SAP AG” – PDF, doc, ppt, zip, …
Boolean search With or without queueing
– SAP AND ORACLE – Synchronous or asynchronous
Masked or wildcard search Automatic language identification *
– Web* – 31 languages so far …
Fuzzy or error-tolerant search Attribute extraction *
– Kagerman Kagermann – DC and other metadata
Linguistic search Linguistic processing *
– Houses House – Tokenizing, tagging, stemming, …
Attribute search Ranking
– Author = Stevens – TF*IDF and P-norm

* Via Preprocessor

SAP AG 2006, Title of Presentation / Speaker Name / 10


Text Mining Engine

Text Mining Search Classification

See also Taxonomy generation


Get more documents like this Based on QBC and/or EBC

Refine your query Document classification


More or less general similar terms Assign documents to categories

Guided navigation Document feature extraction


See result set sizes in advance Find characteristic terms

Find similar documents Document clustering


Based on document features Discover sets of related documents

Find similar terms Term clustering


Based on document statistics Discover sets of related terms

SAP AG 2006, Title of Presentation / Speaker Name / 11


Attribute Engine

Attribute Indexing Dublin Core


Attribute engine has its own index Metadata Model
– Separate from other indexes
Attributes are used for text mining
– Classification Resource
– Similar document search
– Taxonomy building
– Feature extraction
has-attributes
Attribute Search
Search over document metadata Title Format
– Title Creator Identifier
– Creator Subject Source
– … Description Language
Publisher Relation
Contributor Coverage
Date Rights
Type …

SAP AG 2006, Title of Presentation / Speaker Name / 12


How Search Works: An Example

BooksOnline, an online bookstore, offers a range of books with the


special feature that a customer can search the full text of the books
online before purchase

Auditor Jane wants to buy a book about invoice verification and


decides to evaluate the suggestions offered by the BooksOnline
search service

The following slides describe how the SAP NetWeaver search


service used by BooksOnline answers her search request

SAP AG 2006, Title of Presentation / Speaker Name / 13


Search Example 1

Jane enters invoice verification in the BooksOnline search field in


the Web browser on her office desktop PC
The business application forwards her search request, together
with information about the kind of search and which index to use,
as an HTTP/XML packet via the Java client to the Web server

Java Client
TREX
Name Queue
Name
Server Preprocessor Queue
Server
Server Preprocessor Server

Index Server
Index Server
Web Server Text Mining Text Search Attribute
Engine Engine Engine

Do a phrase search for


invoice verification in
Index Index Index
the BooksOnline index

SAP AG 2006, Title of Presentation / Speaker Name / 14


Search Example 2

The Web server converts the HTTP message into the format used
inside TREX and sends a request to the name server for the name
and address of a service to handle the request
The name server checks its list of available servers and tells the
Web server the address of an index server that has received the
fewest calls so far and can handle the request

Java Client
TREX
Name Queue
Where can I Name
Server Preprocessor Queue
Server
Server Preprocessor Server
send this
request?
Send it to
Index Index Server
Index Server
Server 1
Web Server Text Mining Text Search Attribute
Engine Engine Engine

Index Index Index

SAP AG 2006, Title of Presentation / Speaker Name / 15


Search Example 3

The Web server passes the search request to the index server as
a TCP/IP packet
The index server sees that the request is for a phrase search and
therefore forwards the phrase to the preprocessor for language
identification, tokenization, tagging, and stemming

Java Client
TREX
Name Queue
Name
Server Preprocessor Queue
Server
Do a phrase searchServer
for Preprocessor Server
invoice verification in
the BooksOnline index
Index Server
Index Server
Web Server !Text Mining Text Search Attribute
Engine Aphrase search Engine
Engine –
this means work for
the preprocessor!
The language of the search Index Index Index
may be specified in advance

SAP AG 2006, Title of Presentation / Speaker Name / 16


Search Example 4

The preprocessor performs linguistic processing. It parses


the phrase into two words invoice and verification, tags
them as nouns, reduces the words to their stem forms (in
this case the words themselves) and sends the result back
to the index server

Java Client
TREX
Name Queue
Name
Server Preprocessor Queue
Server
Server Preprocessor Server

Please preprocess Index Server


the phrase invoice Index Server
Web Server verification
Text Mining Done –Attribute
Text Search two English
Engine Engine Engine
nouns in stem form

Index Index Index

SAP AG 2006, Title of Presentation / Speaker Name / 17


Search Example 5

The index server sends the preprocessed request to the search


engine for optimization and result retrieval
The query optimizer in the search engine analyzes the query,
builds the query tree, which in this case has three nodes, one for
each word and one for AND, and optimizes it based on index
statistics, to evaluate the term that appears less frequently first

Java Client
TREX
Name Queue
Name
Server Preprocessor Queue
Server
Server Preprocessor Server
This is a simple
query – just a
Index Server
2-word phraseIndex Server
Web Server Text Mining Text Search Attribute
Engine Engine Engine
The index listing for invoice
is longer than the index
listing for verification Index
so Index Index
select verification first

SAP AG 2006, Title of Presentation / Speaker Name / 18


Search Example 6

The search engine finds the row for the term verification in the
BooksOnline index and selects the set of books containing the
term, then it checks this set of books against the row for the term
invoice and selects just the books that contain both terms
Next, it reads the addresses of the terms in each book, calculates
rank values, sorts the results, and takes the top ten (or more)

Java Client
TREX
Name Queue
Name
Server Preprocessor Queue
Server
Server Preprocessor Server

Calculate ranks
and sort
Index Server
Index Server
1. Find set of books
Web Server with verification
Text Mining Text Search Attribute
2. Find subset Engine
with Engine Engine

The rank of a document invoice


for a term is defined by 3. Find addresses
TF*IDF ranking Index
of both terms Index Index

SAP AG 2006, Title of Presentation / Speaker Name / 19


Search Example 7

The search engine reads all the requested attributes for the
selected books, including titles and authors and keys to the
documents
The engine uses the keys to load the document contents and
scans the texts for the first occurrences of the search phrase (or
linguistic variants of the phrase) to create a brief summary text

Java Client
TREX
Name Queue
Name
Server Preprocessor Queue
Server
Server Preprocessor Server

Scans through the Index Server


texts to find the Index Server
Web Server first few sentences
Text Mining Text Search Attribute
containing the
Engine Engine Engine
phrase invoice
The preprocessor
verification
extracted attributes
during indexing Index Index Index

SAP AG 2006, Title of Presentation / Speaker Name / 20


Search Example 8

The search engine passes the result set back via the index server
for merging with results from any other engines (here none)
The index server passes the result set back via the Web server
and the Java client to the graphical user interface
Jane sees a ranked list of books about invoice verification less
than a second after she launched the search
Java Client
TREX
Name Queue
Name
Server Preprocessor Queue
Server
Server Preprocessor Server

Index Server
Index Server
Web Server Text Mining Text Search Attribute
Engine Engine Engine
73 books found
in 0.14 seconds
Index Index Index

SAP AG 2006, Title of Presentation / Speaker Name / 21


Search: Results

A sample document from the result set


Exact format depends on application settings

Internal
InternalAuditing
Auditing
by
byFirst
FirstAuthor,
Author,Second
SecondAuthor
Author
Economic
Economic Publishers, NewYork
Publishers, New York
Invoice
Invoice verification is the nextstep
verification is the next step......The
Theinvoice
invoiceverification
verificationininthe
the......
375
375pages
pagesFirst
Firstedition
editionISBN
ISBN0-3XX-XXXXX-X
0-3XX-XXXXX-X
Browse
Browsefull
fulltext
text

Document Link to Sample phrases


attributes document with search terms
highlighted

Results ranked by frequency of search terms


How many results returned depends on application settings

SAP AG 2006, Title of Presentation / Speaker Name / 22


How Indexing Works: An Example

BooksOnline worked hard to give Jane such a rewarding search


experience

Before Jane could see a ranked list of books about invoice


verification and browse the books, BooksOnline had to index the
full texts of all the books

The following slides describe how the SAP NetWeaver search


service used by BooksOnline indexes the full texts of the books on
show in its website

SAP AG 2006, Title of Presentation / Speaker Name / 23


Indexing Example 1

The BooksOnline indexing administrator opens the SAP queue


and index administration tool and sends a request to TREX to
create an index called BooksOnline
The ABAP Client forwards the index request as a Remote
Function Call via the SAP Gateway to the RFC server

ABAP Client
TREX
Name Queue
Name
Server Preprocessor Queue
Server
RFC Server
Server Preprocessor Server
Gateway

Index Server
Index Server
Create an index Text Mining Text Search Attribute
called BooksOnline Engine Engine Engine

Indexing can be done just as Index Index Index


well via the Java Client

SAP AG 2006, Title of Presentation / Speaker Name / 24


Indexing Example 2

The name server tells the RFC server the address of an index
server that can create the index
In a one-box implementation of TREX, this step is straightforward
unless the index server is down for some reason
The name server uses a round robin procedure to select an index
server
ABAP Client
TREX
Name Queue
Name
Server Preprocessor Queue
Server
RFC Server
Server Preprocessor Server
Gateway

I want to create So go to Index Server


a new index! <host>:<port> Index Server

Text Mining Text Search Attribute


Engine Engine Engine

Index Index Index

SAP AG 2006, Title of Presentation / Speaker Name / 25


Indexing Example 3

The RFC server sends the request to the index server


The index server creates a new index called BooksOnline
The new index is still empty but any documents to be indexed can
now be assigned to it

ABAP Client
TREX
Name Queue
Name
Server Preprocessor Queue
Server
RFC Server
Server Preprocessor Server
Gateway

I want to create a Index Server


Index Server
new index called
BooksOnline Text Mining Text Search Attribute
Engine Engine Engine

New index created


successfully! Index Index Index

SAP AG 2006, Title of Presentation / Speaker Name / 26


Indexing Example 4

The administrator sends a request to index the new books in a


specified folder and write the results in the BooksOnline index
The digital files for the books are in a variety of formats, but TREX
can handle all standard formats, such as Microsoft Word (.doc),
Adobe Page Description Format (.pdf), and plain text (.txt)
The name server directs the request to an available queue server
ABAP Client
TREX
Name Queue
Name
Server Preprocessor Queue
Server
RFC Server
Server Preprocessor Server
Gateway
Please put this indexing request
Please index all the in your queue and have the
Index Server
books in folder documents indexed
Index as soon as
Server
<path_to_folder> TREX finds the time to do it
Text Mining Text Search Attribute
Engine Engine Engine

Queueing is an option:
Indexing can also be Index Index Index
done immediately

SAP AG 2006, Title of Presentation / Speaker Name / 27


Indexing Example 5

The queue server receives the list of URLs for the documents
.htm .pdf .ppt from the specified folder and persists them in a queue for the
index for as long as required until a preprocessor is available
.xls .doc .txt Indexing a large collection of documents can be a long job, so the
administrator can hold or flush the queue manually at any time

ABAP Client
TREX
Name Queue
Name
Server Preprocessor Queue
Server
RFC Server
Server Preprocessor Server
Gateway
Queue server receives document
URLs and adds them to the
Index Server
BooksOnline queue
Index for indexing
Server

Text Mining Text Search Attribute


Engine Engine Engine

BooksOnline has all its books


available in digital form (either as
author files or scanned and OCR'd) Index Index Index
ready for indexing and browsing

SAP AG 2006, Title of Presentation / Speaker Name / 28


Indexing Example 6

The queue server sends the documents to a free preprocessor


.htm .pdf .ppt
The preprocessor fetches documents via URLs, filters them from
their original format to HTML, identifies their language, tokenizes
.xls .doc .txt them into sequences of terms, tags the terms as nouns or
whatever, and stems the terms as appropriate
The preprocessed documents are then sent to the index server
ABAP Client
TREX
Name Queue
Name
Server Preprocessor Queue
Server
RFC Server
Server Preprocessor Server
Gateway
A lot of work for
the preprocessor
Index Server
Index Server HTML

Text Mining Text Search Attribute


Engine Engine Engine

Index Index Index

SAP AG 2006, Title of Presentation / Speaker Name / 29


Indexing Example 7

The index server forwards the documents to the search engine


.htm .pdf .ppt
For each document, the search engine writes a list of all its terms
and for each term it writes a list of positions in the document
.xls .doc .txt where the term appears
The engine merges the term list for each document to the existing
term-document matrix that forms the BooksOnline index
ABAP Client
TREX
Name Queue
Name
Server Preprocessor Queue
Server
RFC Server
Server Preprocessor Server
Gateway

Index Server
Index Server

Text Mining Text Search Attribute


Engine Engine Engine

Indexing data merged


into existing matrix Index Index Index

SAP AG 2006, Title of Presentation / Speaker Name / 30


Indexing Example 8

The BooksOnline indexing administrator can use the TREX queue


and index administration tool to display the status of the indexing
process at any time during the process

ABAP Client The tool lets you follow the progress of


TREX
queued documents from left to right

Gateway

SAP AG 2006, Title of Presentation / Speaker Name / 31


TREX Introduction

TREX Administration Tool

Landscape Configuration
RFC Connection
Administrating, Monitoring
Traces
TREX Administration Tools

The TREX administration tool is the place to:


Set up and configure a distributed landscape
Monitor and administer services, indexes, queues, replication, ...
Show trace files, configuration files, version info, ...

There are three flavors:


Standalone
– Richest feature set
– Requires full access to TREX host
ABAP
– Restricted feature set
– Easy access on customer systems
Java
– Highly restricted feature set
– Browser access via Portal

SAP AG 2006, Title of Presentation / Speaker Name / 33


TREX Administration Tool

Start Tool

DEMO

SAP AG 2006, Title of Presentation / Speaker Name / 34


TREX Introduction

TREX Administration Tool

Landscape Configuration
RFC Connection
Administrating, Monitoring
Traces
Landscape Example

SAP AG 2006, Title of Presentation / Speaker Name / 36


Distributed Scenario – Simple Example

One master, multiple slaves

IS Index server
M Master Master Slaves
MI Master index mytrexmaster
mytrexslave1 ... 2
NS Name server
PP Preprocessor RFC WS
WS
RFC
Q Queue
M NS PP
QS Queue server M QS S NS PP
M IS S IS
RFC RFC server
SN Snapshots
Q Q Q
S Slave Q MI SN
SI SI
SI Slave index
WS Web server

http://trex.wdf.sap.corp:1080/ Documentation Distributed Search and Classification (TREX) 7.0 SP2 Systems

SAP AG 2006, Title of Presentation / Speaker Name / 37


Distributed Scenarios – Shared Backup Server

One backup, multiple masters, multiple slaves, one filer

Master Host Slave Hosts


mytrexmaster1 mytrexslave1/2

RFC WS
RFC WS

M NS PP
S NS PP
Backup Host M QS File Server
S IS
M IS
mytrexbackup
T

RFC WS
Q Q
Q QQ MI MIQ
M NS PP
B QS Slave Hosts
B IS Master Host Q mytrexslave3/4
Q SI SI
mytrexmaster2 SN SNQSI

RFC WS RFC WS

M NS PP S NS PP
M QS S IS
M IS

http://trex.wdf.sap.corp:1080/ Documentation Distributed Search and Classification (TREX) 7.0 SP2 Systems

SAP AG 2006, Title of Presentation / Speaker Name / 38


Distributed Scenarios – Dedicated Backup Servers

Multiple backups, multiple masters, multiple slaves, one filer


Backup Host Master Host Slave Hosts

mytrexbackup1 mytrexmaster1 mytrexslave1/2

RFC WS RFC WS WS
RFC

M NS PP M NS PP S NS PP
B QS M QS File Server
S IS
B IS M IS
T

Q Q Q
Q QQ MI MI

Q
Backup Host Q SI SI Slave Hosts
Master Host SN SNQSI
mytrexslave3/4
mytrexbackup2 mytrexmaster2

RFC WS RFC WS RFC WS

S NS PP M NS PP S NS PP
B QS M QS S IS
B IS M IS

http://trex.wdf.sap.corp:1080/ Documentation Distributed Search and Classification (TREX) 7.0 SP2 Systems

SAP AG 2006, Title of Presentation / Speaker Name / 39


Landscape Configuration

DEMO

SAP AG 2006, Title of Presentation / Speaker Name / 40


TREX Introduction

TREX Administration Tool

Landscape Configuration
RFC Connection
Administrating, Monitoring
Traces
Creating RFC Connection

SAP AG 2006, Title of Presentation / Speaker Name / 42


RFC Connection

DEMO

SAP AG 2006, Title of Presentation / Speaker Name / 43


TREX Introduction

TREX Administration Tool

Landscape Configuration
RFC Connection
Administrating, Monitoring
Traces
Reorg I

SAP AG 2006, Title of Presentation / Speaker Name / 45


Reorg II

SAP AG 2006, Title of Presentation / Speaker Name / 46


Reorg III

SAP AG 2006, Title of Presentation / Speaker Name / 47


Reorg IV

SAP AG 2006, Title of Presentation / Speaker Name / 48


Alert area

SAP AG 2006, Title of Presentation / Speaker Name / 49


Alert server Configuration

SAP AG 2006, Title of Presentation / Speaker Name / 50


Checks that are executed and required actions I

SAP AG 2006, Title of Presentation / Speaker Name / 51


Checks that are executed and required actions II

SAP AG 2006, Title of Presentation / Speaker Name / 52


Checks that are executed and required actions III

SAP AG 2006, Title of Presentation / Speaker Name / 53


Checks that are executed and required actions IV

SAP AG 2006, Title of Presentation / Speaker Name / 54


TREX Introduction

TREX Administration Tool

Landscape Configuration
RFC Connection
Administrating, Monitoring
Traces
TREX Traces

DEMO

SAP AG 2006, Title of Presentation / Speaker Name / 56


Copyright 2006 SAP AG. All Rights Reserved

No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG. The information contained herein may be
changed without prior notice.
Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors.
Microsoft, Windows, Outlook, and PowerPoint are registered trademarks of Microsoft Corporation.
IBM, DB2, DB2 Universal Database, OS/2, Parallel Sysplex, MVS/ESA, AIX, S/390, AS/400, OS/390, OS/400, iSeries, pSeries, xSeries, zSeries, System i, System i5, System p,
System p5, System x, System z, System z9, z/OS, AFP, Intelligent Miner, WebSphere, Netfinity, Tivoli, Informix, i5/OS, POWER, POWER5, POWER5+, OpenPower and PowerPC are
trademarks or registered trademarks of IBM Corporation.
Adobe, the Adobe logo, Acrobat, PostScript, and Reader are either trademarks or registered trademarks of Adobe Systems Incorporated in the United States and/or other countries.
Oracle is a registered trademark of Oracle Corporation.
UNIX, X/Open, OSF/1, and Motif are registered trademarks of the Open Group.
Citrix, ICA, Program Neighborhood, MetaFrame, WinFrame, VideoFrame, and MultiWin are trademarks or registered trademarks of Citrix Systems, Inc.
HTML, XML, XHTML and W3C are trademarks or registered trademarks of W3C ®, World Wide Web Consortium, Massachusetts Institute of Technology.
Java is a registered trademark of Sun Microsystems, Inc.
JavaScript is a registered trademark of Sun Microsystems, Inc., used under license for technology invented and implemented by Netscape.
MaxDB is a trademark of MySQL AB, Sweden.
SAP, R/3, mySAP, mySAP.com, xApps, xApp, SAP NetWeaver, and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered
trademarks of SAP AG in Germany and in several other countries all over the world. All other product and service names mentioned are the trademarks of their respective companies.
Data contained in this document serves informational purposes only. National product specifications may vary.

The information in this document is proprietary to SAP. No part of this document may be reproduced, copied, or transmitted in any form or for any purpose without the express prior
written permission of SAP AG.
This document is a preliminary version and not subject to your license agreement or any other agreement with SAP. This document contains only intended strategies, developments,
and functionalities of the SAP® product and is not intended to be binding upon SAP to any particular course of business, product strategy, and/or development. Please note that this
document is subject to change and may be changed by SAP at any time without notice.
SAP assumes no responsibility for errors or omissions in this document. SAP does not warrant the accuracy or completeness of the information, text, graphics, links, or other items
contained within this material. This document is provided without a warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability,
fitness for a particular purpose, or non-infringement.
SAP shall have no liability for damages of any kind including without limitation direct, special, indirect, or consequential damages that may result from the use of these materials. This
limitation shall not apply in cases of intent or gross negligence.
The statutory liability for personal injury and defective products is not affected. SAP has no control over the information that you may access through the use of hot links contained in
these materials and does not endorse your use of third-party Web pages nor provide any warranty whatsoever relating to third-party Web pages.

SAP AG 2006, Title of Presentation / Speaker Name / 57

You might also like