Scaling MySQL: A Case Study of Hyperic HQ

Scaling MySQL
A Case Study of Hyperic HQ

A Case Study of Hyperic HQ
Scott Feldstein, Senior Software Engineer
S297257
Speaker’s logo
here (optional)
What is Hyperic HQ?
Hyperic HQ provides a single
remote console that allows
operation teams to track
performance/event data,
create complex alerts and
escalations, run diagnostics,
and issue control actions.
2008 CommunityOne Conference | developers.sun.com/events/communityone | 2

HQ Database Support
• MySQL
Oracle
• PostgreSQL (current embedded database
solution)

Performance Bottleneck
The Database
• Dependent on hardware performance (CPU,
memory, etc)
I/O
• Disk latency
• Network latency (remote database)
• Slow queries

How Much Data
Medium Size Deployment Scenario
‣
300 Platforms (300 remote agents collecting
MEASUREMENT_DATA
data)
‣
2,100 Servers
ØMEASUREMENT_ID
ØTIMESTAMP
‣
21,000 Services
ØVALUE
‣
468,000 metrics
PRIMARY KEYenabled (20 metrics per
(TIMESTAMP, MEASUREMENT_ID)
resource)
‣
20,000 metric data points per minute
(average)
‣
28,800,000 metric data rows per day
Metric Data Flow
• Agent collects data and sends reports to server
with multiple data points
Server batch inserts metric data points
• If network connection fails, agent continues to
collect, but server “backfills” unavailable
When agent reconnects, spooled data overwrite
backfilled data points
Ag Serve
ent r

MySQL Batch Insert Statement
• Syntax INSERT INTO TABLE (a,b,c) values (0, 0,

0), (1,1,1),(2,2,2),(3,3,3),...,...
• Extremely fast since there is only one round trip to the
database for a batch of inserts
• Only limitation on statement size is determined by
server configuration variable "max_allowed_packet"
• Other options for increasing insert speed

• Set unique_checks=0, insert, set unique_checks=1
• Set foreign_key_checks=0, insert, set
foreign_key_checks=1

INSERT ... ON DUPLICATE KEY
UPDATE
• Application sensitive to time. In some
circumstances, this will result in duplicate data
rows (by primary key), and row values have to
be updated.
When batch insert fails, retry batch with INSERT
ON DUPLICATE KEY syntax
• Compared to other databases, HQ iteratively
updates failed rows and attempts batch insert
on rest. Retry process until batch has
completed.

Batch Aggregate Inserter
• Queue metric data from separate agent reports

• Minimize number of insert statements, connections,
and CPU load
•Maximize workload efficiency
• Optimal configuration for 700 agents
• Workers: 3
• BatchSize: 2000
• QueueSize: 4000000
• Peak at 2.2 million metric data inserts per
minute

Data Consolidation
Inspired by RRDtool an open
source Round Robin Database to
store and display time series data
• Lower resolution tables track min, avg, and max

§
Table storing all collected data points (most
activity) capped at 2 days worth
§
Data compression runs hourly

Limit Table Growth
MEASUREMENT_DATA
ØMEASUREMENT_ID
ØTIMESTAMP
MEASUREMENT_DATA_1H
ØVALUE
ØMEASUREMENT_ID
Size Limit 2 Days ØTIMESTAMP MEASUREMENT_DATA_6H
ØVALUE
ØMIN ØMEASUREMENT_ID
ØMAX ØTIMESTAMP
ØVALUE
Size Limit 14 Days ØMIN
ØMAX
Size Limit 31 Days

ÜMEASUREMENT_DATA_1D (limit N years)

Software Partitioning
• MEASUREMENT_DATA split into

18 tables, representing 9 days
(2 per day)
Application calculates which
table to insert into/select from
• Tables truncated after roll-up
rather than delete rows

Truncation vs. Deletion
• Deletion causes contention on rows in table,

impacting any concurrent SQL operation
Truncation reallocates space for the table
object, instead of fragmentation
• Truncation drops and recreates the table - faster
operation (DDL operation)

Indexes
• Every InnoDB table has a special index called the
clustered index (based on primary key) where
the physical data for the rows is stored
• Advantages
Selects faster - row data is on the same page where the
index search leads
✓ Inserts in (timestamp) order - avoid page splits and
fragmentation
✓ Fewer Indexes - less space, less maintenance overhead

Non-Clustered Index
• Pages scattered
throughout the disk
• Selects go across
non-contiguous pages

Clustered Index
• InnoDB by default
creates clustered index
based on the primary
key
Physically ordered pages
on disk
• Selects have advantage
of fewer I/O operations

Anatomy of a Data Query
SELECT begin AS timestamp,
AVG(value) AS value, MAX(value) AS peak, MIN(value) AS
low
FROM (SELECT 1207631340000 + (2880000 * i) AS begin
FROM EAM_MEASUREMENT_DATA,
EAM_NUMBERS WHERE i < 60) n,
WHERE timestamp BETWEEN begin AND begin + 2879999 AND
measurement_id = 600332
GROUP BY begin ORDER BY begin;
• Query to select aggregate data from metric tables
for a specific metric
• EAM_MEASUREMENT_DATA is a union view of all
metric tables

MySQL View Shortcomings
• Query optimizer does not apply where condition

to inner select, causing entire tables to be
selected serially before query condition applied
Sequential table scan
• Temp table space unreasonably large
Performance suffers

Fewer Tables
SELECT begin AS timestamp, AVG(value) AS value, MAX(value) AS peak, MIN(value) AS low
FROM
(SELECT * FROM HQ_METRIC_DATA_2D_1S UNION ALL
SELECT * FROM HQ_METRIC_DATA_2D_0S UNION ALL
SELECT * FROM HQ_METRIC_DATA_0D_1S) EAM_MEASUREMENT_DATA,
(SELECT 1207631340000 + (2880000 * i) AS begin FROM EAM_NUMBERS WHERE i < 60) n
WHERE timestamp BETWEEN begin AND begin + 2879999 AND measurement_id = 600332
• Explicitly select only on tables based on time

range
• Where clause still not applied to individual selects
in union, but less data selected

Best Performance
SELECT begin AS timestamp, AVG(value) AS value, MAX(value) AS peak, MIN(value) AS low
FROM
(SELECT 1207631340000 + (2880000 * i) AS begin FROM EAM_NUMBERS WHERE i < 60) n,
(SELECT * FROM HQ_METRIC_DATA_2D_1S
WHERE timestamp between 1207767600000 and 1207804140000 AND
measurement_id = 600332 UNION ALL
SELECT * FROM HQ_METRIC_DATA_2D_0S
measurement_id = 600332) EAM_MEASUREMENT_DATA
WHERE timestamp BETWEEN begin AND begin + 2879999 AND measurement_id = 600332

ID Generator Requirements
• Rows need to be populated in schema

initialization with hard-coded IDs
Start sequential IDs at 10001 to reserve space
for hard-coded IDs
• MySQL’s auto-incrementing does not allow
either
2008 CommunityOne Conference | developers.sun.com/events/communityone |

Sequences Table and Function
CREATE FUNCTION nextseqval

CREATE TABLE (iname CHAR(50))
`hq_sequence` (
RETURNS INT
DETERMINISTIC
`seq_name` char(50)
BEGIN
NOT NULL PRIMARY KEY,
SET @new_seq_val = 0;
`seq_val` int(11) UPDATE hq_sequence set
seq_val =
DEFAULT NULL
@new_seq_val:=seq_val+1
WHERE seq_name=iname;
);
RETURN @new_seq_val;
END;
2008 CommunityOne Conference | developers.sun.com/events/communityone |

Using Sequences in MySQL
• Original Solution - InnoDB Sequence Table

results in lock timeout and deadlock issues from
contention
• Buffer Using In-Memory/Heap Table - locking
issues

MyISAM Sequence Table
• Change HQ_SEQUENCE to MyISAM rather than

InnoDB
MyISAM - non-transactional database table
• Inconsistent state resulting from server crashes

Hibernate Hi-Lo
• Hibernate Hi-Lo sequence generator

• Back to using HQ_SEQUENCE with InnoDB
• Hibernate buffers in memory a block of 100 IDs (Low
value) and increment when reaches High value
• Uses separate connection that does not
participate in transactions
• Big performance benefit
• PostgreSQL & Oracle use native sequence generators,
so more roundtrips to database
• HQ startup time cut down up to 30% (1 min)

Performance Statistics
• HQ Hardware
• 2 Quad Core 2 GHz CPUs, 16 GB RAM, 4GB JVM Heap
• MySQL Hardware
• 2 Quad Core 1.6 GHz CPUs, 8 GB RAM, 4.5 GB InnoDB
Buffer Pool
• Both on CentOS 5.x
• Sustained Load
• Between 200,000 - 300,000 metrics / min, peaked at
2.2 million metrics / min
• Load Avg
• HQ ~ 2, Peaked at 8
• MySQL ~ 1.5, Peaked at 2.5
• CPU Usage HQ and MySQL 10 - 20 %

Recommended Server Options
• innodb_buffer_pool_size
innodb_flush_log_at_trx_commit
• tmp_table_size, max_heap_table_size, and
max_tmp_tables
• innodb_flush_method
query_cache_size
More information at http://support.hyperic.com

Scaling MySQL
Scott Feldstein
S297257

Scaling MySQL: A Case Study of Hyperic HQ

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Scaling MySQL: A Case Study of Hyperic HQ

Uploaded by

Copyright:

Available Formats

Scaling MySQL

A Case Study of Hyperic HQ

2008 CommunityOne Conference | developers.sun.com/events/communityone | 2

2008 CommunityOne Conference | developers.sun.com/events/communityone | 3

2008 CommunityOne Conference | developers.sun.com/events/communityone | 4

2008 CommunityOne Conference | developers.sun.com/events/communityone | 6

• Syntax INSERT INTO TABLE (a,b,c) values (0, 0,

• Other options for increasing insert speed

2008 CommunityOne Conference | developers.sun.com/events/communityone | 7

2008 CommunityOne Conference | developers.sun.com/events/communityone | 8

• Queue metric data from separate agent reports

2008 CommunityOne Conference | developers.sun.com/events/communityone | 9

• Lower resolution tables track min, avg, and max

2008 CommunityOne Conference | developers.sun.com/events/communityone | 10

Size Limit 31 Days

2008 CommunityOne Conference | developers.sun.com/events/communityone | 11

• MEASUREMENT_DATA split into

2008 CommunityOne Conference | developers.sun.com/events/communityone | 12

• Deletion causes contention on rows in table,

2008 CommunityOne Conference | developers.sun.com/events/communityone | 13

2008 CommunityOne Conference | developers.sun.com/events/communityone | 14

2008 CommunityOne Conference | developers.sun.com/events/communityone | 15

2008 CommunityOne Conference | developers.sun.com/events/communityone | 16

2008 CommunityOne Conference | developers.sun.com/events/communityone | 17

• Query optimizer does not apply where condition

2008 CommunityOne Conference | developers.sun.com/events/communityone | 18

• Explicitly select only on tables based on time

2008 CommunityOne Conference | developers.sun.com/events/communityone | 19

2008 CommunityOne Conference | developers.sun.com/events/communityone | 20

• Rows need to be populated in schema

2008 CommunityOne Conference | developers.sun.com/events/communityone |

CREATE FUNCTION nextseqval

2008 CommunityOne Conference | developers.sun.com/events/communityone |

• Original Solution - InnoDB Sequence Table

2008 CommunityOne Conference | developers.sun.com/events/communityone | 23

• Change HQ_SEQUENCE to MyISAM rather than

2008 CommunityOne Conference | developers.sun.com/events/communityone | 24

• Hibernate Hi-Lo sequence generator

2008 CommunityOne Conference | developers.sun.com/events/communityone | 25

2008 CommunityOne Conference | developers.sun.com/events/communityone | 26

More information at http://support.hyperic.com

2008 CommunityOne Conference | developers.sun.com/events/communityone | 27

You might also like