Professional Documents
Culture Documents
1 of 11
http://gigaom.com/cloud/beyond-hadoop-next-generation-big-data-archi...
GigaOM
Home
Apple
Cleantech
Cloud
Europe
Mobile
Video
186
Share
22
Email This
People who have cutting-edge performance and scalability requirements today have already moved on from the Hadoop
model. Some back to SQL, but more to a raft of radically new post-Hadoop architectures. Welcome to the NoHadoop era,
as companies realize big data requires Not Only Hadoop.
8/11/2012 12:04 AM
2 of 11
http://gigaom.com/cloud/beyond-hadoop-next-generation-big-data-archi...
SQL. Having been around for 25 years, its a bit weird to call SQL next-gen, but it is! Theres currently a tremendous
amount of innovation going on around SQL from companies like VoltDB, Clustrix and others. If you need to handle complex
joins, or need ACID requirements, SQL is still the way to go. Applications: Complex business queries, online transaction
processing.
Cloudscale. [McColl is the CEO of Cloudscale. See his bio below.] For realtime analytics on big data, its essential to break
free from the constraints of batch processing. For example, if youre looking to continuously analyze a stream of events at a
rate of one million events per second per server, and deliver results with a maximum latency of five seconds between data in
and analytics out, then you need a real-time data flow architecture. The Cloudscale architecture provides this kind of
realtime big data analytics, with latency that is up to 10,000X faster than batch processing systems such as Hadoop.
Applications: Algorithmic trading, fraud detection, mobile advertising, location services, marketing intelligence.
MPI and BSP. Many supercomputing applications require complex algorithms on big data, in which processors
communicate directly at very high speed in order to deliver performance at scale. Parallel programming tools such as MPI
and BSP are necessary for this kind of high performance supercomputing. Applications: Modelling and simulation, fluid
dynamics.
Pregel. Need to analyse a complex social graph? Need to analyse the web? Its not just big data, its big graphs! Were
rapidly moving to a world where the ability to analyse very-large-scale dynamic graphs (billions of nodes, trillions of edges)
is becoming critical for some important applications. Googles Pregel architecture uses a BSP model to enable highly
efficient graph computing at enormous scale. Applications: Web algorithms, social graph algorithms, location graphs,
learning and discovery, network optimisation, internet of things.
Dremel. Need to interact with web-scale data sets? Googles Dremel architecture is designed to support interactive, ad hoc
queries over trillion-row tables in seconds! It executes queries natively without translating them into MapReduce jobs.
Dremel has been in production since 2006 and has thousands of users within Google. Applications: Data exploration,
customer support, data center monitoring.
Percolator (Caffeine). If you need to incrementally update the analytics on a massive data set continuously, as Google now
has to do on its index of the web, then an architecture like Percolator (Caffeine) beats Hadoop easily; Google Instant just
wouldnt be possible without it. Because the index can be updated incrementally, the median document moves through
Caffeine over 100 times faster than it moved through the companys old MapReduce setup. Applications: Real time search.
The fact that Hadoop is freely available to everyone means it will remain an important entry point to the world of big data
for many people. However, as the performance demands for big data apps continue to increase, we will find these new, more
powerful forms of big data architecture will be required in many cases.
Bill McColl is the founder and CEO of Cloudscale Inc. and a former professor of Computer Science, Head of the Parallel
Computing Research Center, and Chairman of the Computer Science Faculty at Oxford University.
Related GigaOM Pro Research (sub reqd):
Big Data Marketplaces Put a Price on Finding Patterns
How Big Data Tools Are Shaping Sustainability Software
Will Hadoop Vendors Profit from Banks Big Data Woes?
Share This Story
Tweet
Share
186
22
Stacey Higginbotham
8/11/2012 12:04 AM
3 of 11
http://gigaom.com/cloud/beyond-hadoop-next-generation-big-data-archi...
18 Comments
1.
Eddie Saturday, October 23 2010
Good article, although its verbosity could have been map reduced ;-) to simply its a matter of horses for courses or
Hadoop is not the be-all end-all (but then again, who said that Hadoop was the be-all end-all? I dont recall anyone
in the Hadoop community stating so).
Share
Facebook
Tweet
2.
Andrew Purtell Saturday, October 23 2010
Percolator is built on top of BigTable. In the Hadoop ecosystem, we have HBase as an open source implementation of
BigTable, and it seems feasible to build an open equivalent to Percolator on top of the HBase coprocessor framework.
Hadoop is not just MapReduce. This article is written as if Hadoop has stood still since 2006.
Share
Facebook
Tweet
3.
Steve Loughran Sunday, October 24 2010
1. Google are using something nobody else can see -its hard to say theyve moved on from Hadoop, merely evolved
their own MR engine.
2. Nobody in Hadoop-land is going to say you should use Hadoop and friends if you want transactions, ACID, etc.
What we do say is you dont need to index all the stuff you want to search through later, and if you keep some
stuff in a distributed filesystem, you make storing PB affordable
3. What Hadoop does have is testing at double digit petabyte storage capacity, thousands of servers, each with 6+
HDDs.
4. MPI. MPI doesnt handle failure well. Which is why most HPC facilities dont like MPI jobs that take more than
48h to complete -too much risk of an outage. I think MPI is great for some problems, but its not the silver bullet
either.
What Hadoop does bring to the table is community and scale. Nobody in the group thinks its perfect, but we know
what the MapReduce problems are (latency due to the saving of intermediate results to HDD and a wait for all maps
to complete before the reduces), and those of the filesystem (the namenode is an SPOF, better checksumming and
security; the latter is trickling out). Its also designed for a static set of machines; when hosted in on-demand
infrastructure you need to integrate the infrastructure operations into your workflow. We know them, people in
different companies and some universities are working on them. Its going to be hard to compete with the community,
even if you have better solutions.
People used to dismiss Linux compared to real unix, remember?
Share
8/11/2012 12:04 AM
4 of 11
http://gigaom.com/cloud/beyond-hadoop-next-generation-big-data-archi...
Facebook
Tweet
5.
Yyzfan@gmail.com Monday, October 25 2010
Memory based architectures are the future. Spinning disks are the root problem.. There are a few in the field, vmware
just added gemfire oracle and IBM have compete tech.
Its the future in about 5- 7 years.
Share
Facebook
Tweet
8/11/2012 12:04 AM
5 of 11
http://gigaom.com/cloud/beyond-hadoop-next-generation-big-data-archi...
Nice try
If someone uses Hadoop for a realtime or ACID environment, he / she has the wrong job. Hadoop is about offline
analytics and the cost to scale equation.
Share
Facebook
Tweet
7.
Henry Robinson Monday, October 25 2010
Pregel effectively *is* BSP (synchronous checkpointed steps consisting of local processing then message passing).
If you squint hard enough, MapReduce fits into this model as well.
Share
Facebook
Tweet
8/11/2012 12:04 AM
6 of 11
http://gigaom.com/cloud/beyond-hadoop-next-generation-big-data-archi...
10.
Razi Sharir Tuesday, October 26 2010
You get what you pay for; a simple as that. If NoSQL serves you good and theres no need for relational and/or
transaction modeling, than go for it.
Practical exercise implies this is usually not the case and we se indeed many folks go back to RDBMS. For those who
never left or for those who are already there MySQL backend applications were there to support. A SQL
Cloud DB that is elastically scalable and highly available. Dont take my word, checkout this out on our Beta
@xeround.com
Share
Facebook
Tweet
Displaying 16 of 18 comments. View all comments
8/11/2012 12:04 AM
7 of 11
http://gigaom.com/cloud/beyond-hadoop-next-generation-big-data-archi...
Related
Structure:Data live coverage
On March 21 and 22, at Structure:Data, we'll look at how companies like @WalmartLabs, IBM, and PayPal...
What digital fashion brands can learn from the Sears catalog and Facebook fans
Looking to sell stylish, quality clothes at affordable prices? Only-only retailers might have to work a little...
Another reason not to use corporate software: Your bosses are spying on you
Big data and the ability to analyze unstructured data enable all sorts of great applications, but it's...
Sprint will use the biggest vendors to build its smallest cells
Sprint is selected two of its small cell manufacturers, Samsung and Alcatel-Lucent, which happen to be the...
8/11/2012 12:04 AM
8 of 11
http://gigaom.com/cloud/beyond-hadoop-next-generation-big-data-archi...
With the advent of social networking sites like Facebook, Google plus, twitter, etc....
8/11/2012 12:04 AM
9 of 11
http://gigaom.com/cloud/beyond-hadoop-next-generation-big-data-archi...
8/11/2012 12:04 AM
10 of 11
http://gigaom.com/cloud/beyond-hadoop-next-generation-big-data-archi...
8/11/2012 12:04 AM
11 of 11
http://gigaom.com/cloud/beyond-hadoop-next-generation-big-data-archi...
Events
Pro Research
GigaOM TV
Privacy Policy
Terms of Service
About
Editorial Team
Media Kit
Contact
GigaOM
Powered by WordPress.com VIP
News
Events
paidContent
Research
Click to log in with:
LinkedIn
Twitter
Facebook
WordPress.com
GigaOM Pro
Not you?
Remember me
Comment as guest:
By continuing you are agreeing to our Terms of Service and Privacy Policy.
Submitting comment...
Click to log in with:
LinkedIn
Twitter
Facebook
WordPress.com
GigaOM Pro
8/11/2012 12:04 AM