Data Data Data: Spark stories

I feel as though I have never seen anything quite like Spark before. It seems more than worthy of substantial media coverage, but it is also cause for pause.

I only came to the big data beat in 2013. So I didn’t go through all the run up of hype on Hadoop. I came in when it was in full swing – it seemed natural, and some enthusiasm was warranted. But, as Hadoop 2 was rolling out, and Spark was striding into view, I said, this town is not big enough for the two of them – that Hadoop had taken all the air out of the hyperbolic chamber. Is it or is it not just the new shiny thing in that room?

Now I wonder. Yes, the Hadoop people dutifully over time explained what was wrong with Hadoop (Mostly MapReduce). But as with technology marketing trends generally, it begged the answer. Now Spark seems like the answer. Hadoop greased the skids for it. I guess one reason is that MapReduce was limited. But isn’t Spark limited too, if you look at it from many miles remove?

I boil down Spark's plusses to:

1-It includes more developers. Because it offers support for Python and Scala as well as Java. And runs on the Java Runtime.

2-It runs faster.

Now if you slowly walk away from that car you could say: its patrons face big obstacles in catching up with the Hadoop commercial train. And, it only has a few users at this point. We could go further and say "It is had more general utility than MapReduce, and seems more apt for analytics. Tony Baer has written that Spark use cases seem to be about complex analytics on varied data than about big data per se.

That is a delicate distinction, but probably worthwhile noting. So many technologies thrive on rend asunder based on a few delicate distinctions that tend not to be readily apparent.

I duly note your sense in a tweet that the brunt of use cases described at the sum. It think Spark and Hadoop borth are about developer centric solutions for cheap parallelism WITH focus on data processing.

There was a time when this all was called Data Processing, then there was Information Technology. Now, Data processing is back. - Jack Vaughan

Users view Databrick's Spark
It is in limited beta. But a lot of people have gotten their hands on it.

Hadoop and Spark are coming of age together
The Talking Data podcast features Hadoop and Spark, open source data technologies that gained attention at this year's Strata+Hadoop East event.

Apache Spark : This year's MapReduce killer
Since the release of Apache Spark, big data vendors have touted it as faster, more flexible alternative to MapReduce.

Apache Spark meets the PDP-11
Apache Spark seems ready to upstage Hadoop. But it's best seen in the light of computing history, where it looks like yet another step on the long road of data.

Apache Spark goes 1.0, looks to improve on MapReduce ...
The Apache Software Foundation has released Version 1.0 of ApacheSpark, an open source data processing framework designed to outperform MapReduce ..

Spark framework gets big push from big data vendors
The Spark framework and processing engine is attracting the attention of vendors, who are touting it for use in iterative machine learning and other big data chores.

Data Data Data

Thursday, May 7, 2015

Spark stories

No comments:

Post a Comment