You are on page 1of 3

apteco.

com

http://www.apteco.com/blog/eat-your-data-faster-aptecodatabuild

Eat Your Data Faster with Apteco.DataBuild

The team at Apteco has long acknowledged that the development of multi-threaded algorithms are the
means of driving performance around data compression in modern high performance software
development and as a result have seen significant improvements in data load times with each new
version of FastStats Designer.
It has been just over 10 years since the publication of Herb Sutters article The Free Lunch Is Over. In his
article Herb described the changing performance characteristics of modern processors and the consequences
this has for modern high performance software development.
In the Apteco High Performance Team we have long recognised that improvements in query and data load
performance will only come from implementing multi-threaded algorithms instead of continual micro optimisations
to sequential code.
With FastStats Designer we have worked over the last few years on developing multi-threaded versions of our
sequential data load algorithms. We have worked a component at a time replacing the sequential processes with
new multi-threaded equivalents and in doing so we achieved a 25-30% speed improvement over the course of
2014.
For the 2015 Q2 version of FastStats Designer we are releasing for beta use our new multi-threaded data load
and compression component Apteco.DataBuild.dll.
This routine still produces our standard FastStats optimised, compressed binary data file formats but does so
with entirely new code that has been written from scratch using the latest parallelisation technology. The
technology we decided to base our component on is a Microsoft library called TPL Dataflow that extends the
multi-threaded features of the .Net Framework to support data processing pipelines. We re-implemented our
compression routines as components of this pipeline to make full use of the multi-threaded capabilities of modern
processors.
Previously with the Nov 14 sequential load process we were able to build a TPC-H reference test system with
1.3Bn records (1Bn Line Item records) in 4 hours. The sequential data load portion of this build took nearly 2
hours. The new Apteco.DataBuild component performs the same processing in under 50 minutes dropping the
total load time (including auto discovery, sort and compression) to 2 hours 50 minutes.

Starting with the 2015 Q2 release users of FastStats Designer will have the option to build selected tables with
this new component and the improvements will benefit any FastStats Designer data builds that execute on multiprocessor machines.

Apteco.DataBuild giving a powerful 32 core AWS EC2 instance something to think about.
The ability to access and analyse big data is key to driving business forward and creating target marketing
campaigns that give you and your organisation the edge.
Sutter, H. 2005. "The free lunch is over: A fundamental turn toward concurrency in software," Dr. Dobb's Journal.
http://www.gotw.ca/publications/concurrency-ddj.htm

Get behind the numbers, find out what FastStats users are really doing and use
the data to drive your marketing forward : Download Trend Report 2014 / 2015
Data driven marketing

Tim Heron
Senior Developer
Tim started at Apteco as a developer in 2002 and has worked on a wide range of FastStats
components. Tim is currently a member of the Apteco High Performance Team where his
primary focus is on improving the speed and power of the FastStats technology.

You might also like