Professional Documents
Culture Documents
Survival Guide
Marius Moscovici (marius@metricinsights.com)
Steffan Mejia (steffan@lindenlab.com)
Topics
• 43 Servers
o 36 active
o 7 standby spares
• 16 TB of data in MySQL
• 12 TB archived (pre S3 staging)
• 4 TB archived (S3)
• 3.5B rows in main warehouse
• Largest table ~ 500M rows (MySQL)
Warehouse Evolution - First came slaving
Problems:
Problems:
• Easy to lock replication with
temp table creation
• Slaving becomes fragile
Warehouse Evolution - A Warehouse is Born
Problems:
• Warehouse workload limited
by what can be performed by a
single server
Warehouse Evolution - Workload Distributed
Problems:
• No Real-Time
Application
integration support
Warehouse Evolution - Integrate Real Time Data
Lessons Learned - Warehouse Design
Workload exceeds
available memory
Lessons Learned - Warehouse Design
• Keep joins < available memory
SELECT
DATE(event_time),
user_id,
event_type,
COUNT(*)
FROM event_log
WHERE event_time > CONCAT(@last_loaded_date, '23:59:59')
GROUP BY 1,2,3;
ETL Pseudo code - Step 3
3) Set denormalized user columns:
UPDATE user_event_log_summary_staging log_summary,
user
SET log_summary.type = user.type,
log_summary.status = user.status
WHERE user.user_id = log_summary.user_id;
ETL Pseudo code - Step 4
3) Insert into Target Table:
INSERT INTO user_event_log_summary
(...)
SELECT ...
FROM user_event_log_summary_staging;
Functional Partitioning
• Benefits depend on
• Replication latency
o Warehouse slave unable to keep up
o Disk utilization > 95%
o Required frequent re-sync
• Options evaluated
o Higher speed conventional disks
o RAM increase
o Solid-state-disks
Optimization
• SSD Solution
o 12 & 16 disk configurations
o RAID6 vs. RAID10
o 2.0T or 1.6TB formatted capacity
o SATA2 HW BBU RAID6
o ~ 8 TB data on SSD
Results
Two-tiered solution
• Move data into archive tables in separate DB
• Use select to dump data - efficient and fast
• Archive server handles migration
o Dump data
o GPG
o Push to S3
Survival Tips