A Study of I/O and Virtualization Performance With A Search Engine Based On An XML Database and Lucene

A Study of I/O and Virtualization Performance with a Search Engine based on an XML database and Lucene
Ed Buech, EMC edward.bueche@emc.com, May 25, 2011
Agenda
My Background Documentum xPlore Context and History Overview of Documentum xPlore Tips and Observations on IO and Host Virtualization
My Background
Ed Buech Information Intelligence Group within EMC EMC Distinguished Engineer & xPlore Architect Areas of expertise
Content Management (especially performance & scalability) Database (SQL and XML) and Full text search Previous experience: Sybase and Bell Labs
Part of the EMC Documentum xPlore development team

Pleasanton (CA), Grenoble (France), Shanghai, and Rotterdam (Netherlands)
4
Documentum search 101

Documentum Content Server provides an object/ relational data model and query language
Object metadata called attributes (sample: title, subject, author) Sub-types can be created with customer defined attributes Documentum Query Language (DQL) Example:
SELECT object_name FROM foo WHERE subject = bar AND customer_id = ID1234
DQL also support full text extensions

Example:
SELECT object_name FROM foo SEARCH DOCUMENT CONTAINS hello world WHERE subject = bar AND customer_id = ID1234
Introducing Documentum xPlore

Provides Integrated Search for Documentum
but is built as a standalone search engine to replace FAST Instream
Built over EMC xDB, Lucene, and leading content extraction and linguistic analysis software
Documentum Search History-at-a-glance

almost 15 years of Structured/Unstructured integrated search 2010 - ??? xPlore Integration
Verity Integration 1996 2005 Basic full text search through DQL Basic attribute search 1 day 1 hour latency Embedded implementation FAST Integration 2005 2011 Combined structured / unstructured search 2 5 min latency Score ordered results Replaces FAST in DCTM Integrated security Deep facet computation HA/DR improvements Latency: typically seconds Improved Administration Virtualization Support
1996
2005
2010
Enhancing Documentum Deployments with Search

RDBMS search DQL Content Server SQL
DCTM client
Without Full Text in a Documentum deployment a DQL query will be directed to the RDBMS
DQL is translated into SQL
However, relational querying has many limitations
Enhancing Documentum Deployments with Search

RDBMS DQL SQL
search
Documentum client
Content Server
DQL for search can be directed to the full text engine instead of RDBMS (FTDQL) This allows query to be serviced by xPlore In this case DQL is translated into xQuery (the query language of xPlore / xDB)
xQuery
Metadata + content
Some Basic Design Concepts behind Documentum xPlore

Inverted Indexes are not optimized for all usecases
B+-tree indexes can be far more efficient for simple, low-latency/highly dynamic scenarios
De-normalization can t efficiently solve all problems

Update propagation problem can be deadly Joins are a necessary part of most applications
Applications need fine control over not only search criteria, but also result sets
10
Design concepts (con t)

Applications need fluid, changing metadata schemas that can be efficiently queried
Adding metadata through joins with side-tables can be inefficient to query
Users want the power of Information Retrieval on their structured queries Data Management, HA, DR shouldn t be an after-thought When possible, operate within standards Lucene is not a database. Most Lucene applications deploy with databases.
11
Lessons Learned
Fit to use-case
Structured Query use-cases
Unstructured Query use-cases
Indexes, DB, and IR

Full Text searches Hierarchical data representations (XML)
Fit to use-case
Relational DB technology
Constantly changing schemas Scoring, Relevance, Entities
Unstructured Query use-cases
Indexes, DB, and IR

Meta data query
JOINs
Fit to use-case
Transactions
Advanced data management (partitions)
Full Text index technology Unstructured Query use-cases
Indexes, DB, and IR
Fit to use-case
Relational DB technology
Full Text index technology Unstructured Query use-cases
Documentum xPlore
Bring best-of-breed XML Database with powerful Apache Lucene Fulltext Engine Provides structured and unstructured search leveraging XML and XQuery standards Designed with Enterprise readiness, scalability and ingesCon Advanced Data Management funcConality necessary for large scale systems Industry leading linguisCc technology and comprehensive format lters Metrics and AnalyCcs xPlore API
Indexing Services Content Processing Services Analytics Search Services Node & Data Management Services Admin Services
xDB API xDB Query Processing& Optimization xDB Transaction, Index & Page Management
EMC xDB: Native XML database

Formerly XHive database
100% java XML stored in persistent DOM format
Each XML node can be located through a 64 bit identifier Structure mapped to pages Easy to operate on GB XML files
Full Transactional Database Query Language: XQuery with full text extensions
Indexing & Optimization

Palette of index options optimizer can pick from At it simplest: indexLookup(key) node id
17
Libraries / Collections & Indexes
= xDB Library / xPlore collection

A
= xDB Index = xDB xml file (dftxml, tracking xml, status, metrics, audit)
= xDB segment
Scope of index covers all xml files in all sub-libraries

A
C
B
Lucene Integration
Transactional
Non-committed index updates in separate (typically in memory) lucene indexes Recently committed (but dirty) indexes backed by xDB log Query to index leverages Lucene multi-searcher with filter to apply update/delete blacklisting
Lucene indexes managed to fit into xDB s ARIES-based recovery mechanism No changes to Lucene
Goal: no obstacles to be as current as possible
19
Lucene Integration (con t)

Both value and full text queries supported
XML elements mapped to lucene fields Tokenized and value-based fields available
Composite key queries supported

Lucene much more flexible than traditional Btree composite indexes
ACL and Facet information stored in Lucene field array

Documentum s security ACL security model highly complex and potentially dynamic Enables secure facet computation
20
xPlore has lucene search engine capabilities plus.

XQuery provides powerful query & data manipulation language
A typical search engine can t even express a join Creation of arbitrary structure for result set Ability to call to language-based functions or javabased methods
Ability to use B-tree based indexes when needed

xDB optimizer decides this
Transactional update and recovery of data/index Hierarchical data modeling capability
Tips and Observations on IO and Host Virtualization

Virtualization offers huge savings for companies through consolidation and automation Both Disk and Host virtualization available However, there are pitfalls to avoid
One-size-fits-all Consolidation contention Availability of resources
22
Tip #1: Don t assume that one-size-fits all

Most IT shops will create VM or SAN templates that have a fixed resource consumption
Reduces admin costs Example: Two CPU VM with 2 GB of memory Deviations from this must be made in a special request
Recommendations:
Size correctly, don t accept insufficient resources Test pre-production environments
Same concept applies for disk virtualization

The capacity of disks are typically expressed in terms of two metrics: space and I/O capacity
Space defined in terms of GBytes I/O capacity defined in terms of I/O s per sec
50GB and 100 I/ O s per sec capacity 50GB and 200 I/ O s per sec capacity
NAS and SAN are forms of disk virtualization

The space associated with a SAN volume (for example) could be striped over multiple disks The more disks allocated, the higher the I/O capacity
50GB and 400 I/ O s per sec capacity
Linear mapping s and Luns

Four Luns
Logical volume with linear mapping

Allocated for Index Free space in volume
When mapped directly to physical disks then this could concentrate I/ O to fewer than a desired set of drives. High-end SAN s like Symmetrix can handle this situation with virtual LUN s
25
EMC Symmetrix: Nondisruptive Mobility

Virtual LUN VP Mobility
Virtual Pools
Flash
400 GB RAID 5
Fast, efficient mobility Maintains replication and quality of service during relocations Supports up to thousands of concurrent VP LUN migrations Recommendation: work with storage technicians to ensure backend storage has sufficient I/O
Fibre Channel
600 GB 15K RAID 1
Tier 2
V L U N
SATA
2 TB RAID 6
Tip #2: Consolidation Contention

Virtualization provides benefit from consolidation Consolidation provides resources to the active
Your resources can be consumed by other VM s, other apps Physical resources can be over-stretched
Recommendations:
Track actual capacity vs. planned
Vmware: track number of times your VM is denied CPU SANs: track % I/O utilization vs. number of I/O s
For Vmware leverage guaranteed minimum resource allocations and/or allocate to nonoverloaded HW
Some Vmware statistics

Ready metric
Generated by Vcenter and represents the number of cycles (across all CPUs) in which VM was denied CPU Generated in milliseconds and real-time sample happens at best every 20 secs For interactive apps: As a percentage of offered capacity > 10% is considered worrisome
Pages-in, Pages-out
Can indicate over subscription of memory
28
Sample %Ready for a production VM with xPlore deployment for an entire week
16% 14%
12%
10% 8% 6%
In this case Avg resp time doubled and max resp time grew by 5x
official area that Indicates pain
4%
2% 0%
29
Actual Ready samples during several hour period

Ready samples (# of millisecs VM denied CPU in 20 sec intervals)
2500
2000 1500 1000 500 0
30
Some Subtleties with Interactive CPU denial

The Ready metric represents denial upon demand
Interactive workloads can be bursty If no demand, then Ready counter will be low
Poor user response encourages less usage

Like walking on a broken leg Causing less Ready samples
Denial spike
20 sec interval
31
Sharing I/O capacity If Multiple VM s (or servers) are sharing the same underlying physical volumes and the capacity is not managed properly
then the available I/O capacity of the volume could be less than the theoretical capacity
This can be seen if the OS tools show that the disk is very busy (high utilization) while the number of I/Os is lower than expected
Volume for other application Volume for Lucene application
Both volumes spread over the same set of drives and effectively sharing the I/O capacity
Recommendations on diagnosing disk I/O related issues

On Linux/UNIX
Have IT group install SAR and IOSTAT
Also install a disk I/O testing tool (like Bonnie )
Compare Bonnie output with SAR & IOSTAT data

High disk Utilization at much lower achieved rates could indicate contention from other applications
Also, High SAR I/O wait time might be an indication of slow disks
On Windows
Leverage the Windows Performance Monitor Objects: Processor, Physical Disk, Memory
Sample output from the Bonnie tool

bonnie -s 1024 -y -u -o_direct -v 10 -p 10 This will increase the size of the file to 2 Gb. Examine the output. Focus on the random I/O area: ---Sequential Output (sync)----- ---Sequential Input-- --Rnd Seek-CharUnlk- -DIOBlock- -DRewrite- -CharUnlk- -DIOBlock- --04k (10)Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU Mach2 10*2024 73928 97 104142 5.3 26246 2.9 8872 22.5 43794 1.9 735.7 15.2
-s 1024 means that 2 GB files will be created -o_direct means that direct I/O (by-passing buffer cache) will be done -v 10 means that 10 different 2GB files will be created. -p 10 means that 10 different threads will query those files Bonnie is an open source disk I/O driver tool for Linux that can be useful for pretesting Linux disk environments prior to an xPlore/Lucene install.
This output means that the random read test saw 735 random I/ O s per sec at 15% CPU busy
Linux indicators compared to bonnie output

I/O stat output: Device: sde SAR d output:
09:29:17 09:29:27 DEV dev8-65 tps 209.24 rd_sec/s 4877.97 wr_sec/s 1.62 avgrq-sz 23.32 avgqu-sz 1.62 await 7.75 svctm 3.80 %util 79.59
Notice that at 200+ I/Os per sec the underlying volume is 80% busy. Although there could be multiple causes, one could be that some other VM is consuming the remaining I/O capacity (735 209 = 500+). tps 206.10 kB_read/s 2402.40 kB_wrtn/s 0.80 kB_read 24024 kB_wrtn 8
SAR u output: 09:29:17 PM 09:29:27 PM 09:29:27 PM 09:29:27 PM 09:29:27 PM 09:29:27 PM CPU all 0 1 2 3 %user 41.37 62.44 30.90 36.35 35.77 %nice 0.00 0.00 0.00 0.00 0.00 %system 5.56 10.56 4.26 3.96 3.46 %iowait 29.86 25.38 35.56 30.76 27.64 %steal 0.00 0.00 0.00 0.00 0.00 High I/O wait %idle 23.21 1.62 29.28 28.93 33.13
See https://community.emc.com/docs/DOC-9179 for additional example
Tip #3: Try to ensure availability of resources

Similar to the previous issue, but
resource displacement not caused by overload, Inactivity can cause Lucene resources to be displaced Not different from running on large shared native OS host
Recommendation:
Periodic warmup
non-intrusive
See next example
IO / caching test use-case

Unselective Term search
100 sample queries Avg( hits per term) = 4,300+, max ~ 60,000 Searching over 100 s of DCTM object attributes + content
Medium result window

Avg( results returned per query) = 350 (max: 800)
Stored Fields Utilized

Some security & facet info
Goal:
Pre-cache portions of the index to improve response time in scenarios Reboot, buffer cache contention, & vm memory contention
Some xPlore Structures for Search

Dictionary of terms Posting list (doc-id s for term)
Stored fields (facets and node-ids) 1st doc N-th doc xDB XML store (contains text for summary)
Facet decompression map
Security indexes (b-tree based)
Frequency and position structures ignored for simplicity
IO model for search in xPlore

Search Term: term1 term2 Dictionary Result set Posting list (doc-id s for term)
Stored fields Xdb node-id plus facet / security info Security lookup (b-tree based) xDB XML store (contains text for summary)
Facet decompression map
Separation of covering values in stored fields and summary

Potentially thousands of hits Potentially thousands of results
Security lookup Facet Calc
Small structure
Stored fields (Random access)
Small number for result window Xdb docs with text for summary
FinalFacet calc values over thousands of results Res-1 - sum Res-2 - sum Res-3 - sum : : Res-350-sum
xPlore Memory Pool areas at-a-glance

Other vm working mem
xPlore caches Lucene Caches & working memory
Native code content extraction & linguistic processing memory
xDB Buffer Cache
Operating System File Buffer cache (dynamically sized)
xPlore Instance (fixed size) memory
Lucene data resides primarily in OS buffer cache

Dictionary of terms Posting list (doc-ids for term)
N-th doc
xDB XML store (contains text for summary)
Stored fields (facets and node-ids) 1st doc
N-th doc
Other vm working mem
xPlore caches
Lucene Caches & working memory
xDB Buffer Cache
Native code content extraction & linguistic processing memory
Operating System File Buffer cache
(dynamically sized) xPlore Instance (fixed size)

memory
Potential for many things to sweep lucene from that cache
42
Test Env
32 GB memory Direct attached storage (no SAN) 1.4 million documents Lucene index size = 10 GB Size of internal parts of Lucene CFS file
Stored fields (fdt, fdx): 230 MB (2% of index) Term Dictionary (tis,tii): 537 MB (5% of index) Positions (prx): 8.78 GB (80% of index) Frequencies (frq) : 1.4 GB (13 % of index)
Text in xDB stored compressed separately

43
Some results of the query suite

Test Avg Resp MB preto cached consume all results (sec) 1.89 0.95 1.73 1.58 1.65 0.59 0 241 537 8,789 1,406 10,970 I/O per result Total MB loaded into memory (cached + test) 77 272 604 8,800 1,436 10,970
Nothing cached Stored fields cached Term dict cached Positions cached Frequencies cached Entire index cached
0.89 0.38 0.79 0.74 0.63 < 0.05
Linux buffer cache cleared completely before each run Resp as seen by final user in Documentum Facets not computed in this example. Just a result set returned. With Facets response time difference more pronounced. Mileage will vary depending on a series of factors that include query complexity, compositions of the index, and number of results consumed
44
Other Notes
Caching 2% of index yields a response time that is only 60% greater than if the entire index was cached.
Caching cost only 9 secs on a mirrored drive pair Caching cost 6800 large sequential I/O s vs. potentially 58,000 random I/O s
Mileage will vary, factors include

Phrase search Wildcard search Multi-term search
SAN s can grow I/O capacity as search complexity increases

45
Contact
Ed Buech
edward.bueche@emc.com http://community.emc.com/people/Ed_Bueche/blog http://community.emc.com/docs/DOC-8945
46

A Study of I/O and Virtualization Performance With A Search Engine Based On An XML Database and Lucene

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Study of I/O and Virtualization Performance With A Search Engine Based On An XML Database and Lucene

Uploaded by

Copyright:

Available Formats

A Study of I/O and Virtualization Performance with a Search Engine based on an XML database and Lucene

Ed Buech, EMC edward.bueche@emc.com, May 25, 2011

Part of the EMC Documentum xPlore development team

Documentum search 101

DQL also support full text extensions

Introducing Documentum xPlore

Documentum Search History-at-a-glance

Enhancing Documentum Deployments with Search

However, relational querying has many limitations

Enhancing Documentum Deployments with Search

Some Basic Design Concepts behind Documentum xPlore

De-normalization can t efficiently solve all problems

Design concepts (con t)

Structured Query use-cases

Unstructured Query use-cases

Indexes, DB, and IR

Constantly changing schemas Scoring, Relevance, Entities

Structured Query use-cases

Unstructured Query use-cases

Indexes, DB, and IR

Advanced data management (partitions)

Full Text index technology Unstructured Query use-cases

Structured Query use-cases

Indexes, DB, and IR

Full Text index technology Unstructured Query use-cases

Structured Query use-cases

EMC xDB: Native XML database

Indexing & Optimization

Libraries / Collections & Indexes

= xDB Library / xPlore collection

Scope of index covers all xml files in all sub-libraries

Lucene Integration (con t)

Composite key queries supported

ACL and Facet information stored in Lucene field array

xPlore has lucene search engine capabilities plus.

Ability to use B-tree based indexes when needed

Transactional update and recovery of data/index Hierarchical data modeling capability

Tips and Observations on IO and Host Virtualization

Tip #1: Don t assume that one-size-fits all

Same concept applies for disk virtualization

NAS and SAN are forms of disk virtualization

50GB and 400 I/ O s per sec capacity

Linear mapping s and Luns

Logical volume with linear mapping

EMC Symmetrix: Nondisruptive Mobility

Tip #2: Consolidation Contention

Some Vmware statistics

official area that Indicates pain

Actual Ready samples during several hour period

Some Subtleties with Interactive CPU denial

Poor user response encourages less usage

Recommendations on diagnosing disk I/O related issues

Compare Bonnie output with SAR & IOSTAT data

Sample output from the Bonnie tool

Linux indicators compared to bonnie output

See https://community.emc.com/docs/DOC-9179 for additional example

Tip #3: Try to ensure availability of resources

See next example

IO / caching test use-case

Medium result window

Stored Fields Utilized

Some xPlore Structures for Search