Professional Documents
Culture Documents
Malarvizhi Kandasamy (k.malarvizhi@in.ibm.com), Staff Software Engineer, JCR Development, IBM Ramgopal Kanasani (rakanasa@in.ibm.com), Software Engineer, JCR Development, IBM
March 2011 Copyright International Business Machines Corporation 2011. All rights reserved. Summary: This white paper explains how text searches are performed on the content stored in the JavaTM Content Repository in IBM WebSphere Portal 7; specifically, we focus on the search component of the WebSphere Portal 7 IBM Lotus Web Content Management authoring portlet that uses the WebSphere Portal Search Engine, which differs from previous WebSphere Portal versions.
Table of Contents
1 Introduction...............................................................................................................................2 2 Overview of JCR TextSearch....................................................................................................2 2.1 Index maintenance............................................................................................................2 2.2 Search...............................................................................................................................3 3 Configuring JCR Content Model TextSearch............................................................................3 3.1 Standalone WebSphere Portal environment.....................................................................3 3.2 Manually configuring JCRCollection1 in standalone environment.....................................4 3.3 Clustered WebSphere Portal environment......................................................................11 3.4 Preparing the remote search service...............................................................................13 3.5 Configuring the remote search service............................................................................19 4 Setting up JMS in a clustered environment.............................................................................25 4.1 Adding a cluster as a bus member..................................................................................28 5 Searching a seedlist document in WCM.................................................................................37 6 Troubleshooting JCR TextSearch...........................................................................................44 7 Conclusion..............................................................................................................................49 8 Resources...............................................................................................................................49 About the authors......................................................................................................................50
1 Introduction
In WebSphere Portal 7, the Java Content Repository (JCR) uses the WebSphere Portal Search Engine (PSE) for its text search functions. In WebSphere Portal versions 6 and earlier, JCR uses the Juru text engine, a Java library developed by the Haifa Research Lab (HRL) and the component that maintains the text index and performs the searches over it. However, to align with the industry-standard approach and to support multiple search engines and repositories, JCR adopted HRLs PSE, which is based on the Apache Lucene search engine in WebSphere Portal 7. This paper discusses the indexing and search part of Lotus Web Content Management (hereafter called WCM) and the different configurations introduced in version 7. We also describe the configuration techniques for standalone, clustered, and farm environments, and provide some best practices and troubleshooting tips.
Full crawl
This is the activity that builds the text index directory from scratch, which usually occurs when WebSphere Portal is installed and the server is started for the first time. In the first run, JCR processes internally and prepares to build the index from scratch. Only in the next scheduled run will the crawler start collecting the document. The full crawler indexes the document in asynchronous mode only. This means that, even if the WebSphere Portal server goes down while building the index, the process can resume and start building the index from the point of failure when the server comes up again.
Incremental crawl
Once the full crawl has successfully built the index directory from scratch, the WebSphere Portal scheduler keeps checking at frequent, specific intervals to determine whether there are any modifications in the repository. If so, it updates the index directory with all these changes.
2.2 Search
In WebSphere Portal 7, JCR uses XPath as the query language. When you want to search some information from the content repository, you specify XPath as the input to JCR. In the XPath query there are two built-in functions, text-contains() and text-score(), which provide text search functions on the JCR node using the search pattern. JCR supports different types of searches including fielded search, scoped search, fuzzy search, stemmed search, as well as linguistic features and ranking/score in search. For more information on search and the convertors used in TextSearch, refer the developerWorks article, Java content repository TextSearch in IBM WebSphere Portal and IBM Lotus Web Content Management: Overview and troubleshooting.
Enable the textsearch (jcr.textsearch.enabled=true). Set this value to false to disable the text search at runtime. NOTE: By disabling the search, documents won't be collected during the crawler schedule, and the Authoring portlet search won't work. By default, this value is set to false. Once WebSphere Portal is installed successfully, enable TextSearch by setting this property to true.
Set the Convertor to extract the binary content (jcr.textsearch.convertor = com.ibm.icm.ts.convertor.WpsConvertor). As mentioned previously, there are three convertor options available for this property. The recommended option is WpsConvertor, which calls the Document Conversion Service internally in JCR. Create the index directory in the location specified (jcr.textsearch.indexdirectory = c:/IBM/WebSphere/wp_profile/PortalServer/jcr/search) Set the PSE type as localhost (jcr.textsearch.PSE.type=localhost). This is the value to be set in a standalone system. The other options are Simple Object Access Protocol (SOAP) and Enterprise JavaBeans (EJBs), which are used to configure remote search service for a clustered environment. We will see the different options in detail later in this document. Set the Incremental Topic used during Incremental crawl (jcr.textsearch.incrementalcrawl.topic = jms/JCRSeedTopic1100).
2. 3.
For a standalone environment, select Default Portal Search Service in the Search service field (see figure 2). Keep the Name of the Collection as JCRCollection1, and specify the location of the Search Collection and the same location in icm.properties. The default language for the Collection is English, which will be used as the indexing language during crawling. The index language enhances the quality of search results that are returned. All other fields are optional.
4.
In this example, we specify the location of the Collection as C:\JCR, which creates the search index directory in the location C:\JCR as shown in figure 3.
5.
Once the Collection is created, it displays in the list of Search Collections. Click the JCRCollection1 collection to create a new content source, as shown in figure 4.
6.
The content source handles indexing the documents and is where you specify the crawler parameters. When specifying the content source parameters, choose the content source type as Seedlist Provider and provide the name for the new content source, in this case, JCRContentSource (see figure 5). Also in figure 5, specify the value for the URL as follows: http://server name:portnumber/seedlist/server? Action=GetDocuments&Format=ATOM&Locale= en_US&Range=5&Source=com.ibm.lotus.search.plugins.seedlist.retriever.jcr.JCRRetriever Factory&Start=0&SeedlistId=1@ In this URL, the range parameter specifies the number of documents in a single page of a crawler session. Here, the Retriever sends the response to the crawler in XML ATOM format, and the response is sent as pages, each of which contains a range number of documents.
7.
For example, if there is a list of 100 updates to be indexed, and the range is set as 10, then the retriever sends 10 pages to the crawler, each of which contains 10 documents. By default, the range value is set as 100. The administrator can change the range parameter in the URL based on the portal requirement. The crawler has a timeout of 10 minutes set internally; however, if the portal is too slow and the Retriever is unable to retrieve 100 documents in 10 minutes, then the administrator can reduce the range value.
Figure 5. Configure JCR Content Source for JCRCollection1
8.
Still in figure 5, under the General Parameters section, configure the parameters for the Content Source: Levels of links to follow. Use the drop-down menu to select how many levels of pages the crawler will follow from the Seedlist. It is unlimited. Number of documents to collect. Use this to configure the maximum number of documents to collect. Force complete crawl. Indicates whether the crawler needs to fetch only updates from the Seedlist feed or from the full list of content. When enabled (checked), the crawler will request the full list of content items. When unchecked, the crawler will request only the list of updates.
Stop collecting after. Indicates the maximum duration, in minutes, that the crawler should operate. Stop fetching a document. Indicates how much time, in seconds, the crawler will spend attempting to fetch a document.
If the Content Source is created successfully, a message will display at the top of the page, as shown in figure 6.
Figure 6. Content Source JCRSource in collection JCRCollection1 is OK message
NOTE: These instructions are in the WebSphere Product Documentation topic Setting up JCR search collections for creating the JCRCollection1 collection. Using this collection and content source, you are able to search for items within the WCM authoring portlet. If JCRCollection1 is created manually, then the scheduling interval must be configured for the Content Source such that the crawler runs automatically in the configured interval. To do this: 1. 2. Use the Schedulers tab to set the frequency with which the crawler should run to update the search content (see figure 7). Choose the date, time, and update interval when the crawler should start running. Click Create.
For JCRCollection1, which is created automatically by the application, the index maintenance is scheduled to run every 60 minutes. If you want to change this frequency, you can configure it in the scheduler. Delete the existing scheduled Updates, and choose the day, time, and interval; click Create. Figure 8 shows that the scheduler is configured to run on Jan 13, 2011 at 2:00 PM; thereafter, it continues to run every 30 minutes.
10
Consider a case in which we have a vertical clustered setup with two nodes configured (members WebSphere_Portal and WebSphere_Portal_jcrislgrp1). The WebSphere_Portal node is running in port 10079, and the WebSphere_Portal_jcrislgrp1 node is running in port 10133 (see figure 10).
12
In this case, you can configure remote search service by using either EJBs or as a Web service via SOAP. Before doing so, however, you must have the required .zip files and install the EJB/SOAP application for remote search service. Follow the instructions in the Product Documentation topic, Preparing for remote search service, to prepare the system for remote search service. In this paper, we show the remote search service using SOAP.
13
2.
Depending on the requirements of your environment, install one of the two applications WebScannerEJbEar.ear (for EJB Service) or WebScannerSoap.ear (for SOAP Service) on server1 (see figure 12).
Figure 12. WAS Admin Console showing the SOAP application running
3.
Extract the PSE libraries and add them to the classpath on server1, as follows: a) Create a directory with the name Extract under the directory installableApps. 14
b) c) d) e)
Locate the file PseLibs.zip in the directory installableApps and extract its contents into the Extract directory that you created in the previous step. Open the administrative console, select Environment Shared Libraries, and create or modify the new shared library named PSE (see figure 13). Add the extract/lib library to the classpath by adding a new line to the Classpath field, giving the full path, was_profile_root/installableApps/extract/lib. Save your changes to the master configuration.
4.
Define a new Classloader for server1, as follows (see figure 14): a) b) c) d) In the WAS administrative console, select Servers Server types WebSphere application servers server1. Under Server Infrastructure, select Java and Process Management, and click Classloaders. Click New and then click Apply. Under Additional Properties, click Libraries, then click Add. Select the Library Name PSE from the drop-down list, and click OK. Save your changes to the master configuration.
15
5.
Now we must determine the required values for configuring the portlet parameters; specifically, the value for the port number for the SOAP URL parameter. The appropriate port number for the SOAP URL parameter is the port on which the application server runs; in other words, the HTTP transport on which Server1 is configured to run. To determine the correct port number, on the administrative console, select Application servers server1 Ports WC_defaulthost (see figure 15).
16
6.
Make sure that the port number set in the following file matches this port: was_profile_root/installedApps/cell/WebScannerEar.ear/WebScannerSoap.war/ wsdl/com/ibm/hrl/portlets/WsPSE/WebScannerLiteServerSOAPService.wsdl where cell is the cell name of your remote search machine.
7.
Edit the file, looking for the port given in the value for the SOAP address location, for example (see figure 16): <soap: address location="http://localhost:your_port_no/WebScannerSOAP/servlet/rpcrouter"/>
17
8.
In the administrative console, select Resources Asynchronous beans Work managers, and create a new Work manager named PSEWorkManager with the following attributes (see figure 17): Name: PSEWorkManager JNDI Name: wps/searchIndexWM Minimum Number of Threads: 20 Maximum number of Threads: 60 Growable = True (ensure that the Growable check box is selected) Service Names: Application Profiling Service, WorkArea, Security, Internationalization
9.
18
10. Finally, open the WAS administrative console, select Applications Application Types WebSphere enterprise applications, and scroll to WebScannerEar. You can use the filter feature to search for these names. Click the check box and click Start. A message confirms that the application started successfully.
19
20
4.
For a clustered setup, configure the DefaultCollectionDirectory as a local directory in the remote search server that is accessible by all the nodes in the setup. Once the service is configured, it appears in the Search Services page (see figure 20).
5.
Select the configured remote service Remote SOAP and manually create the JCRCollection1 collection with the collection location in the remote search server, as shown in figures 21 and 22.
21
22
6.
After the Collection and Content Source are created, update the remote search service properties in the icm.properties JCR configuration file. Here are the configuration properties essential for JCR TextSearch: Set the Portal Search Engine type as SOAP or EJB. In our case, we set it to SOAP: jcr.textsearch.PSE.type=SOAP Set the Portal Search Engine Soap URL as jcr.textsearch.SOAP.url=http://9.124.160.188:10054/WebScannerSOAP/servlet/rpcro uter You should set its value to: http://your_soap_search_server.your.example_domain.com:port/WebScannerSOAP/ servlet/rpcrouter where your_soap_search_server.your.example_domain.com is the name of the remote search server, and port is the port number that you obtained from the file
23
was_profile_root/installedApps/cell/WebScannerEar.ear/WebScannerSoap.war/ wsdl/com/ibm/hrl/portlets/WsPSE/WebScannerLiteServerSOAPService.wsdl 7. Edit the file, looking for the port given in the value for the SOAP address location, for example: <soap: address location="http://localhost:your_port_no/WebScannerSOAP/servlet/rpcrouter"/> . When you enter the URL in a Web browser, you should see something like that shown in figure 24.
Figure 24. SOAP RPC Router in Remote Search Server
Since the remote search server is configured in one of the nodes of the clustered environment, the deployment manager is updated with these settings, and the same data appears in the Administration portlet of the other secondary clustered nodes. Hence, you should be able to see the Remote Soap Search service and JCRCollection1 in the secondary node, WebSphere_Portal_jcrislgrp1, which is running in port 10133 (see figure 25).
24
25
2.
Enter a name for the bus; here, it's JCRBus. The name should be the same as specified in the JCR icm.properties file, property name jcr.textsearch.busName=JCRBus (figure 27).
26
3.
Uncheck the Bus security option, to disable bus security; click Next. A summary of the Bus creation, with administrative security settings disabled for the bus is displayed (figure 28).
4.
Review the summary, and click Finish. The JCRBus displays as shown in figure 29. Save your changes to the master configuration.
27
28
3.
Select the Cluster scope in WAS environments that support server clusters (see figure 31).
29
4.
Select the cluster and click Next. Select the Enable messaging engine policy assistance? check box (see figure 32).
Figure 32. Messaging engine policy assistance for the selected Bus member
5.
Select Data store as type of the message store and click Next (see figure 33).
30
6.
31
7.
Select Use existing data source, enter the JNDI name, and the name of the schema and authentication alias to be used (see figure 35). The JNDI name is jdbc/wpdbDS_<jcr target db name>; for example,. jdbc/wpdbDS_jcr. (Refer to icm.properties file for the JNDI name).
32
8.
Click Next. Optional: You can view the current settings of the initial and maximum Java Virtual Machine (JVM) heap sizes. If you want to tune performance by changing the current settings, select the Change heap sizes check box and enter the changes in the proposed heap sizes fields. Click Next. A summary of the added bus member in the PortalCluster scope displays (see figure 36).
9.
33
10. Click Finish, to confirm the creation of the bus member, and save your changes to the master configuration. 11. Restart the WebSphere Portal Server and Deployment Manager. After restart, if you check on the status of the messaging bus, you can confirm it is started (see figure 37).
34
The JMS resources such as Topic Connection factories and Topics are created during WebSphere Portal installation, so these don't need to be created for a standalone and cluster setup after that. For the WCM Authoring Search to work successfully, you should find the Topic Connection factories (figure 38) and Topics (figure 39) in the Deployment Manager console under JMS resources.
35
36
Farm environment In this scenario, WebSphere Portal search should be configured as if it is part of a clustered environment. The search server should be set up as a separate portal instance outside the farm and configured to search the farm. Use the Remote Search Service to configure the search server.
Once the crawler completes its processing, you can see that the document count is 296, and the Status is Idle (see figure 41).
37
If you want the crawler to collect the documents automatically, then wait for two scheduled indexing runs to occur. For example, if you've configured the interval as 1 hour, then check after 2 hours; you'll see that the document count is a value greater than zero. 1. Now, edit any WCM content and save it (figure 42).
38
2.
Wait until the next index maintenance interval, or run the crawler manually. Figure 43 shows that the crawler has collected one document that was modified.
39
3.
To check whether the document was indexed successfully, we can use the Search and Browse the Collection (spectacles) icon in the Manage Search Collections from All Services window (see figure 44). You can use this to search for content and information directly against the search collection, which differs from searching in the WCM Authoring portlet.
40
4.
Type the string on which you want to search in the "Search for" entry field, and click Search. Search and Browse displays the search results in a table (see figure 45).
41
NOTE: The WCM Authoring portlet uses the JCR XPath query to fetch search results from the repository. For more information about the Search using XPath query, refer to the developerWorks article, Java content repository TextSearch in IBM WebSphere Portal and IBM Lotus Web Content Management: Overview and troubleshooting. 5. Now, if you search for the edited document in the WCM Authoring portlet (see figure 46), the results will display as shown in figure 47.
42
43
The XPath query for the search text Malars test document is [text-contains(.,'Malar* [SUBTREE_UUID]:[d10b57d5-c1f0-4f0b-8188-5554a5d5ff79]')] order by text-score(.,'Malar* [SUBTREE_UUID]:[d10b57d5-c1f0-4f0b-8188-5554a5d5ff79]') descending.
As indicated by the error, TextSearch is not enabled. You need to enable TextSearch by setting the property jcr.textsearch.enabled as true. (2) Incremental crawl in WebSphere Portal 7 is using the JMS messaging engine. If the JMS Topic Connection factory does not exist in WAS, then it fails with the below message in SystemErr logs:
[7/30/10 13:29:28:187 EDT] 0000002e SystemErr R Caused by: javax.naming.NameNotFoundException: Context: rtp33/nodes/rtp33/servers/WebSphere_Portal, name: jms/JCRSeedTCF: First component in name jms/JCRSeedTCF not found. [Root exception is org.omg.CosNaming.NamingContextPackage.NotFound: IDL:omg.org/CosNaming/NamingContext/NotFound:1.0]
You need to ensure that the Topic Connection factory (figure 48) and Incremental Crawl Topic (figure 49) are created in WAS, as mentioned in the property: jcr.textsearch.crawl.tcf=jms/JCRSeedTCF, jcr.textsearch.incrementalcrawl.topic=jms/JCRSeedTopic1100
44
Also, ensure the JCRBus as mentioned in property jcr.textsearch.busName=JCRBus is running (see figure 50).
45
(3) If the database is transferred from Derby to other databases like DB2, ensure that the ConfigEngine task has run, to create the JMS resources in the new database:
./ConfigEngine.sh create-jcr-jms-resources-post-dbxfer -DWasPassword=password command to create JMS resources in the new database.
Failure to run this task may cause the crawler to fail to collect documents in JCRCollection1. (4) When Document Conversion Services is unable to extract content from binary documents, JCR throws a com.ibm.icm.ts.ConverterException exception:
Caused by: com.ibm.wps.odc.convert.ConvertorException: com.ibm.wps.odc.convert.ConvertorException: Stellent Conversion Error: file is corrupt Caused by: com.ibm.wps.odc.convert.PasswordProtectedException: com.ibm.wps.odc.convert.PasswordProtectedException: File is password protected or encrypted
For all types of ConvertorExceptions, including file corrupt, password protected, MIME type not supported, among others, JCR indexes all the attributes of the document except the file content. Hence, the document is still searchable. (5) You are using Microsoft SQL Server, JDBC driver 3.0, and notice the following SQLExceptions when the WebSphere Portal server starts:
[11/15/10 17:17:01:200 CST] 00000055 SibMessage E [JCRSeedBus:PortalCluster.000JCRSeedBus] CWSIS0002E: The messaging engine encountered an exception while starting. Exception: com.ibm.ws.sib.msgstore.PersistenceException: CWSIS1501E: The data source has produced an unexpected exception: com.microsoft.sqlserver.jdbc.SQLServerException: Cannot resolve the collation conflict between "SQL_Latin1_General_CP1_CI_AS" and "SQL_Latin1_General_CP1_CS_AS" in the INTERSECT operation. [11/15/10 17:17:01:246 CST] 00000055 SibMessage E [JCRSeedBus:PortalCluster.000JCRSeedBus] CWSID0027E: Messaging engine PortalCluster.000-JCRSeedBus cannot be restarted because a serious error has been reported.
46
[11/15/10 17:17:01:246 CST] 00000055 SibMessage I [JCRSeedBus:PortalCluster.000JCRSeedBus] CWSID0016I: Messaging engine PortalCluster.000-JCRSeedBus is in state Stopped.
This exception hampers the ability of JCR Search to collect documents in JCRCollection1. The solution is to use the SQL Server JDBC version 2.0 Driver, as WAS V7 is tested with JDBC Driver version 2.0 and Microsoft SQL Server 2008 database. WAS V7 currently does not support JDBC driver 3.0. Similarly, WAS V7 is tested with Microsoft SQL Server JDBC Driver version 1.2 and Microsoft SQL Server 2005 database. (6) When the WebSphere Portal server starts, you notice the following errors in the logs:
[12/10/10 9:49:27:739 EST] 0000000b SibMessage E [JCRSeedBus:wpslesblade05.WebSphere_Portal-JCRSeedBus] CWSIS0002E: The messaging engine encountered an exception while starting. Exception: com.ibm.ws.sib.msgstore.PersistenceException: CWSIS1501E: The data source has produced an unexpected exception: java.lang.IllegalStateException: CWSIS1530E: The data type, -9, was found instead of the expected type, 12, for column, URI, in table, jcr.SIBCLASSMAP. [12/10/10 9:49:27:781 EST] 0000000b WSRdbManagedC W DSRA1300E: Feature is not implemented: javax.sql.PooledConnection.removeStatementEventListener [12/10/10 9:49:27:784 EST] 0000000b SibMessage E [JCRSeedBus:wpslesblade05.WebSphere_Portal-JCRSeedBus] CWSID0035E: Messaging engine wpslesblade05.WebSphere_Portal-JCRSeedBus cannot be started; detected error reported during com.ibm.ws.sib.msgstore.impl.MessageStoreImpl start() [12/10/10 9:49:27:786 EST] 0000000b SibMessage E [JCRSeedBus:wpslesblade05.WebSphere_Portal-JCRSeedBus] CWSID0027E: Messaging engine wpslesblade05.WebSphere_Portal-JCRSeedBus cannot be restarted because a serious error has been reported. [12/10/10 9:49:27:787 EST] 0000000b SibMessage I [JCRSeedBus:wpslesblade05.WebSphere_Portal-JCRSeedBus] CWSID0016I: Messaging engine wpslesblade05.WebSphere_Portal-JCRSeedBus is in state Stopped.
This issue occurs on servers whose WAS version is 7.0.0.11 or earlier. The exception hampers the execution of JCR Search. The solution is to implement the workaround described in the IBM Support document, PK11027: Messaging engine startup fails for some Oracle driver/server levels, whereby you update the sib.properties file to add the line sib.msgstore.jdbcPerformColumnChecks=false Turning off column checking will resolve the issue. This problem is corrected by APAR PM13911 in Fix Pack 7.0.0.13, per the Support document titled, PM13911: ILLEGALSTATEEXCEPTION ATTEMPTING TO START A MESSAGING ENGINE AGAINST AN SQL SERVER 2008 DATABASE WITH A JDBC 2.0 DRIVER. The issue has been fixed in WAS version 7.0.0.13 or later. (7) In a clustered environment, you notice log errors during WebSphere Portal startup, and the JCR Search does not work. For example, you may find the following errors and exceptions in the WebSphere Portal server's SystemOut.log file:
[12/7/10 20:53:19:975 EST] 00000020 SibMessage I [JCRSeedBus:PortalCluster.000JCRSeedBus] CWSIS1538I: The messaging engine, ME_UUID=4A76DD636E4D7F8D, INC_UUID=31828889C3AEC661, is attempting to obtain an exclusive lock on the data store.
47
[12/7/10 20:53:20:334 EST] 00000021 SibMessage I [JCRSeedBus:PortalCluster.000JCRSeedBus] CWSIS1545I: A single previous owner was found in the messaging engine's data store, ME_UUID=5A4BB5D8F066DB65, INC_UUID=66E9D52EA9D8BA08 [12/7/10 20:53:20:339 EST] 00000021 SibMessage E [JCRSeedBus:PortalCluster.000JCRSeedBus] CWSIS1535E: The messaging engine's unique id does not match that found in the data store. ME_UUID=4A76DD636E4D7F8D, ME_UUID(DB)=5A4BB5D8F066DB65 [12/7/10 20:53:20:346 EST] 00000020 SibMessage I [JCRSeedBus:PortalCluster.000JCRSeedBus] CWSIS1593I: The messaging engine, ME_UUID=4A76DD636E4D7F8D, INC_UUID=31828889C3AEC661, has failed to gain an initial lock on the data store. [12/7/10 20:53:20:361 EST] 00000020 SibMessage E [JCRSeedBus:PortalCluster.000JCRSeedBus] CWSIS1519E: Messaging engine PortalCluster.000-JCRSeedBus cannot obtain the lock on its data store, which ensures it has exclusive access to the data.
The reason for the errors is that the JCR Message Engine fails to start. To resolve the problem: 1. 2. Stop the NodeAgent and WebSphere Portal servers, and connect to your database. Drop all tables that are related to the message engine's data store. There are nine tables, all beginning will with the first three letters, "SIBxxx": SIB000 SIB001 SIB002 SIBCLASSMAP SIBKEYS SIBLISTING SIBOWNER SIBOWNERO SIBXACTS 3. Restart the Deployment Manager, NodeAgent, and WebSphere Portal servers, to recreate the tables automatically with the correct message engine data store's unique ID. This is documented in the IBM Support Technote, The messaging engine's unique ID does not match that found in the data store.
(8) After install, you set jcr.textsearch.enable=true, and restart WebSphere Portal. However, when you repeatedly run the crawler manually from the Administration portlet, you notice the exception,"Full Crawl Topics are not free. Please retry later," appearing in the log:
[7/23/10 11:49:52:340 EST] 0000007a JCRCFLLoggerI E com.ibm.icm.ts.tss.JCRCFLLoggerImpl com.ibm.icm.ts.tss.ls.LibraryServerImpl.retrieveTopicFromPool [WebContainer : 8]: Full Crawl Topics are not free. Please retry later. com.ibm.icm.ts.tss.TextEngineException: Full Crawl Topics are not free. Please retry later. at com.ibm.icm.ts.tss.ls.LibraryServerImpl.retrieveTopicFromPool(LibraryServerImpl.java:583) at com.ibm.icm.ts.tss.app.SubscriptionIndexMaintainer.getTopicFromPool(SubscriptionIndexMaint ainer.java:1529)
The error occurs because, when the crawler is repeatedly started manually, the full-crawl process is initiated, starting the publishing of messages for the entire repository repeatedly and exhausting all the full-crawl Topics. Hence, the JCRRetriever throws the above exception.
48
During the next crawler index schedule, the expired full-crawl topics will be claimed, and JCRRetriever will index the messages in the next interval. It is recommended for the crawler to automatically start collecting documents during the scheduled interval, and avoid repeatedly running the crawler manually. (9) When your search is not returning a document in the WCM Authoring portlet, you may want to know whether the document is indexed in the JCRCollection1 directory. To do this, search the keyword directly in the Search and Browse the Collection spectacles icon in the Manage Search Collections from All Services window. If the search returns results, it indicates that the document is indexed in the collection directory successfully. There could be some other reasons for the keyword search not returning results in the Authoring portlet, in which case, you need to contact JCR Support. If Search and Browse the collection does not return results, then wait until the next indexing scheduled interval and search again. For other issues in TextSearch, collect the Index Maintenance and Search logs. To do this: 1. Go to WebSphere Portal Administration Enable Tracing. 2. Set the trace to [com.ibm.icm.ts.*=finest] for text search, and remove other JCR traces like [com.ibm.icm.*=finest] in the Portal admin console. This helps to reduce the log file content and debug the issue faster. 3. Edit the document for which search is not working and save it. 4. Manually run the crawler in Administration Search Administration Manage Search Search Collections JCRCollection1 JCR Content Source. Wait for, say, 5 minutes to allow the crawler to completely build the index directory. 5. Collect the logs as "index-logs. 6. Search for content, and collect the logs as "search-logs". Contact JCR Support with the collected logs (SystemOut.log, SystemErr.log, trace.log and all history trace logs) for index maintenance and search.
7 Conclusion
This paper has presented the new features of JCR TextSearch Indexing in WebSphere Portal 7. Our aim is that WebSphere Portal developers, administrators, and customers who use JCR TextSearch can use this document to help them understand configuration techniques in different environments, and to administer and troubleshoot common issues in WebSphere Portal 7.
8 Resources
developerWorks white paper, Making content searchable anywhere using IBM WebSphere Portal's publishing Seedlist Framework.
49
developerWorks article, Introducing the Java Content Repository API. developerWorks article, Using Apache Lucene to search text. WebSphere Portal wiki article, Java content repository TextSearch Support tools for IBM WebSphere Portal: Overview and usage. WebSphere Portal Server 7 Product Documentation. developerWorks WebSphere Portal product page.
Trademarks
DB2, developerWorks, IBM, Lotus, and WebSphere are trademarks or registered trademarks of IBM Corporation in the United States, other countries, or both. Microsoft and Windows are registered trademarks of Microsoft Corporation in the United States, other countries, or both. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Other company, product, and service names may be trademarks or service marks of others.
50