[SATLUG] Hadoop, MapReduce And Public/Private Cloud Computing

Robert Pearson e2eiod at gmail.com
Wed Feb 10 06:56:36 CST 2010


Frank Huddleston has mentioned Hadoop in several posts about Home
Cloud Computing. This post is mainly an FYI for those too busy to have
time to read much...
Anyone interested in Hadoop can search the SATLUG email archives and
get all the previous posts about Hadoop.

The current Hadoop post of interest (to me) is:

"A petascale parallel database" by Robin Harris on Monday, 8 February, 2010
<http://storagemojo.com/2010/02/08/a-petascale-parallel-database/>
[Article excerpt]
"MapReduce and its Open Source version, Hadoop, are parallel data
analysis tools. A few lines of code can drive massive data reductions
across thousands of nodes.
Cool.
Powerful though it is, Hadoop isn’t a database. Classic structured
data analysis of the model/load/process type isn’t what it was
designed for"

One of the Comments is very interesting - included here for your convenience:
[StorageMojo Comment by nate Tuesday, 9 February, 2010 at 11:43 am]
[Begin Comment]
I was talking to a developer working on a project that will be running
on hadoop soon and was interested to hear his comments on hadoop
itself, it’s extremely poorly written, apparently Yahoo built it
mostly by outsourcing the development overseas to some low quality
coders, and the result is some pretty poor code. It can work it’s just
not that good.
I find it pretty interesting how much stuff google does internally
such as their own file system, mapreduce, server builds, their own
switches and routers, their own http server, their own java servlet
server.
Meanwhile others struggle to keep up trying to use as much off the
shelf stuff as possible because they don’t have the engineering
resources internally to even begin to approach doing it themselves,
even a Microsoft insider admitted as much recently in an interview
http://www.theregister.co.uk/2010/02/03/microsoft_bing_number_two_wannabe/
I suppose the message here is hope & pray you aren’t in a market that
google is or might become interested in at some point if your relying
on hadoop. Because whatever you can do, they can do 1000x faster with
their ~billion servers, and their ~million PhDs.
[End Comment]

[rdpcomment]
"file system, mapreduce, server builds, their own switches and
routers, their own http server, their own java servlet server" all key
components of Enterprise Computing and its BIG brother, Cloud
Computing"
So for a Home Cloud you would need:
"your heavily modified file system, Hadoop (mapreduce), your custom
server builds, your hand picked switches and routers, your own http
server, your own java servlet server".
You could use COTS (Commercial Off The Shelf) components for the Home
Cloud since bandwidth and throughput will not make the difference
between your making a profit and surviving. This means that a new
market for Private Cloud components is developing to supply some of
the Google in-house developed components.

"Why private clouds are part of the future" by Robin Harris on Friday,
5 February, 2010
<http://storagemojo.com/2010/02/05/why-private-clouds-are-part-of-the-future/>
[Article excerpt]
"I’ve grappled with the question of private clouds for the last couple
of years. The advantages of web scale systems became more obvious, but
the human desire for reliable data access and control has not receded.
Public and private will not displace each other: they will coexist
just as public and private power sources coexist today. No doubt
public clouds will claim the majority of the market whether measured
in dollars or exabytes, but private clouds will remain significant
contributors to our data infrastructure for decades, if not centuries,
to come."
[End Excerpt]

[rdpcomment - IMHO,YMMV the rise of Private Clouds is a major shift in
the computing paradigm]


More information about the SATLUG mailing list