Data Data Data

Tuesday, May 12, 2015

5 minute history of the disintegration of application server

Cloud Computing - HorizonWatch 2015 Trend Report by @horizonwatching http://t.co/wjzbU2x1vD via @SlideShare
— Jack Vaughan at TT (@JackVaughanatTT) May 12, 2015

http://t.co/6yhhn9J1RT Fowler on friend look at Microservices
— Jack Vaughan at TT (@JackVaughanatTT) May 12, 2015

Dockercon State of the Art in Microservices by @adrianco #cloudcomputing #continuousdelivery http://t.co/Fi794TkzFY via @SlideShare
— Jack Vaughan at TT (@JackVaughanatTT) May 12, 2015

Introduction to Node.js by @vikasing #javascript #nodejs http://t.co/nAHVPHuFQo via @SlideShare
— Jack Vaughan at TT (@JackVaughanatTT) May 12, 2015

Sunday, May 10, 2015

Telling winds from the cybernetic past

http://itsthedatatalking.blogspot.com/2015/01/the-best-laid-plans-of-mice-and-man.html http://mitpress.mit.edu/books/cybernetic-revolutionaries http://www.newyorker.com/magazine/2014/10/13/planning-machine https://www.jacobinmag.com/2015/04/allende-chile-beer-medina-cybersyn/

Does a 1970s Utopian technology effort offer useful guides for those trying to assess the progress of new technology today? In one case, at least, yes. It is the story of Salvador Allende's attempt to build a working Socialist government in Chile with computer cybernetics.

The tale is told especially well, under the able hands of author and researcher Eden Medina. Medina rolls up the takeaways in a recent article in Jacobin magazine. It is a summary of some important lessons garnered during work on her 2013 book, The Cybernetic Revolutionaries.

You see, before CIA influencers sponsored Augusto Pinochet and company's junta, Allende's democratically government was trying to bring a new form of socialism that was data driven. In those days, what might pass for the big data enterprise today would be called cybernetics. This school of technology, founded by Norbert Wiener, studied feedback in systems, be they animal or machine. The automatic pilot was perhaps cybernetics crowning achievement. In the Chile case, technologist Stanford Beers was enlisted to bring the magic of realtime feedback to state planning. It was way ahead of its time, and burdened by lethal sniping.

A chief lesson in all that conflag is that the state and its priorities shape how a technology is designed and used. In Allende's work to create a better state planning system based on the infant cybernetic architectures, Beers was given had a lot of rein to try and involve workers, ahead of engineers and government bureaucrats in the planning of production. Uber advocates might say that is going on with its upsurge today, though, we'd say, that is arguable.

"Computer innovation wasn’t born with Silicon Valley startups, and it can thrive by taking on design considerations that fall outside the scope of the market," writes Medina. Yet, the basic lesson is tremendously true: technologies get no more freedom to range than the political system gives them. That lesson may be taught at MIT, but it is largely buried in the footnotes or drowned out by the gush of venture capital, and its dreams.

Read more on this.

Thursday, May 7, 2015

Spark stories

I feel as though I have never seen anything quite like Spark before. It seems more than worthy of substantial media coverage, but it is also cause for pause.

I only came to the big data beat in 2013. So I didn’t go through all the run up of hype on Hadoop. I came in when it was in full swing – it seemed natural, and some enthusiasm was warranted. But, as Hadoop 2 was rolling out, and Spark was striding into view, I said, this town is not big enough for the two of them – that Hadoop had taken all the air out of the hyperbolic chamber. Is it or is it not just the new shiny thing in that room?

Now I wonder. Yes, the Hadoop people dutifully over time explained what was wrong with Hadoop (Mostly MapReduce). But as with technology marketing trends generally, it begged the answer. Now Spark seems like the answer. Hadoop greased the skids for it. I guess one reason is that MapReduce was limited. But isn’t Spark limited too, if you look at it from many miles remove?

I boil down Spark's plusses to:

1-It includes more developers. Because it offers support for Python and Scala as well as Java. And runs on the Java Runtime.

2-It runs faster.

Now if you slowly walk away from that car you could say: its patrons face big obstacles in catching up with the Hadoop commercial train. And, it only has a few users at this point. We could go further and say "It is had more general utility than MapReduce, and seems more apt for analytics. Tony Baer has written that Spark use cases seem to be about complex analytics on varied data than about big data per se.

That is a delicate distinction, but probably worthwhile noting. So many technologies thrive on rend asunder based on a few delicate distinctions that tend not to be readily apparent.

I duly note your sense in a tweet that the brunt of use cases described at the sum. It think Spark and Hadoop borth are about developer centric solutions for cheap parallelism WITH focus on data processing.

There was a time when this all was called Data Processing, then there was Information Technology. Now, Data processing is back. - Jack Vaughan

Users view Databrick's Spark
It is in limited beta. But a lot of people have gotten their hands on it.

Hadoop and Spark are coming of age together
The Talking Data podcast features Hadoop and Spark, open source data technologies that gained attention at this year's Strata+Hadoop East event.

Apache Spark : This year's MapReduce killer
Since the release of Apache Spark, big data vendors have touted it as faster, more flexible alternative to MapReduce.

Apache Spark meets the PDP-11
Apache Spark seems ready to upstage Hadoop. But it's best seen in the light of computing history, where it looks like yet another step on the long road of data.

Apache Spark goes 1.0, looks to improve on MapReduce ...
The Apache Software Foundation has released Version 1.0 of ApacheSpark, an open source data processing framework designed to outperform MapReduce ..

Spark framework gets big push from big data vendors
The Spark framework and processing engine is attracting the attention of vendors, who are touting it for use in iterative machine learning and other big data chores.

Sunday, April 26, 2015

Rah rah, Data, Go, go, go

SwarmDroneDessertBoing

Data as an enthusiasm or even hobby is in the air. As noted in an Economist article (Briefing: Clever Cities: The Multiplexed Metropolis –Sept 7 2013, p.21. ) But does close inspection of the results to date tell us the enthusiasm is warranted? Is this truly like the introduction of electricity to the city?

One young Dutchman developed a mobile app that tapped into open data to predict the best and easiest areas of city to rob. This was done thankfully in this instance to kindle debate. The Smarter City has a Darker Side, and not just in SciFi.

Anyway, who benefited most from the introduction of electricity, and if data is as powerful a game changer, who will benefit most on this go-round? "The importance of political culture will remain" the writer opines. And it is true. The political culture likely remains more important than any transient technology advance - in terms of how the pie gets cut up.

Human behavior is good and bad. If there is a bad side, there is an app for that.

Saturday, April 11, 2015

Behind the music – Spark and the PDP-11

The DataDataData (Itsthedatatalking) blog is meant to focus on data today – not to rehash my history of computing. But sometimes it veers that way, and I will just be holding on to nothing but the wheel. But I digress.

Apple scruff at The Smithsonian

Spark is the latest new shiny object in data processing. That said, I don’t mean to belittle its potential. The folks that fashioned it in the vaunted AMPLabs at UC Berkeley are supersmart, and very aware of what the advent of multicore microprocessors meant to computing: that new means to big clusters of parallelism were available, if only the complexity could be abstracted downwards in clever libraries and runtimes.

People selling Spark come in your door selling Hadoop. Which has had plenty of publicity and is borderline ready for primetime. Now once in there, they may mention you can toss Hadoop, but only if they think you may cotton to that. After writing about Hadoop for about two years I took some care in approaching Spark. Finally some words from way back came back. Please, let me digress some more.

Long ago and far away I sat with my boss discussing the news. The news on that day in 1992 was the ouster of Digital Equipment Corp.'s co-founder Ken Olsen. His departure was an inflection point along a trail that saw DEC go from being a gutsy Maynard, Mass. mill town startup to being a serious threat to IBM's industry leadership to being a forlorn merger candidate.

Like those in other editorial offices, my boss and I wondered what went wrong. What went wrong was the company got confused about what business it was really in. Seems absurd, but it can happen.
DEC's Olsen did not like the PC or Unix, two very innovative industry trends that his subordinates learned to basically eschew. Missing on the move to small personal computers was especially ironic, as DEC itself rose in the 1960s on the back of minicomputers that downsized capabilities of the larger, then-dominant mainframe computer. Anyway, on this particular day I was especially interested to see my editor's take on this. That was because his experience went beyond running a magazine called EDN.

You see, as a graduate student, Jon Titus's had been in the vanguard of what came to be known as microcomputers, or PCs. A July 1974 Radio Electronics issue that featured Titus's 8088-based "Mark-8 Personal Minicomputer" kit predated Popular Electronics' Altair 8088 cover story by six months.

In Cambridge, Mass., Harvard college student Paul Allen picked up a copy of the latter magazine, brought it back to the dorm to share with Bill Gates, and a new era of computing was off and running. Note that Titus and the Radio Electronics editors called the Mark-8 a personal minicomputer. So, Titus had a unique perspective on Ken Olsen's quandary.

"DEC came to think they were selling minicomputers," Titus said. "But what they were selling was computing."

Anyway- I link below to the full story on this which ran on SearchDatamanagement.com. I'd like to add here what a great boss Jon Titus was for me. He stood by me, more than once, which I never will forget. My spousal unit and I got to Washington last week. We went to the Smithsonian museum (actually, just two days after this story went live) and were told that the computer exhibit was closed for repairs (a lot of people can relate to that, ay?!) so we did not see the Mark-8 on display. Instead there was the computer that has, and maybe rightfully so, gained the brunt of the fame.

A cruel old engineer.

That is the Apple II of Steve Wozniak and Steve Jobs. A woman came by and asked the air: "Is that the first computer?" No, said I, trying to be courteous, "the first computers were as big as rooms - that is what many people consider to be the first personal computer." Sorry, that's it for now - I got to go digress. – Jack Vaughan

Read Apache Spark meets the PDP-11 -- in the end, it's all about the processing – SearchDataManagement.com, Mar. 31, 2015 http://bit.ly/1Im9n1l

Wednesday, April 8, 2015

Give me Algorithmic Accountability Or

Give me Algorithmic Accountability or give me… ah, what is the alternative again?

I thought Steve Lohr's article in yesterday's New York Times was worth pointing out, as it boils up a larger issues from the flotsam and jetsam of the big data analytics parade. Oneline ads, the killer app (to date) for big data and machine learning re but a Petri dish, he says. After all, if the wrong ad is served up, the penalty is mild. But, he writes, the stakes are rising. Companies and governments will churn big data to prevent crime, diagnosis illness, and more. Why just the other day JP Morgan said it could spot a rogue trader before he-she went rogue.

The algorithms that do the decisions may need more human oversight, the writer and others tend to suggest. Civil right organizations are among those suggesting. An other is Rajeev Date, formerly of the Consumer Financial Protection Bureau. The story focuses on the notion of Algorithmic Accountability (meeting tonight in the church basement, no smoking please) as an antidote to brewing mayhem

IBM Watson appears in the story. It is hard to get a handle on Watson, but one thing is crystalline; that is, that the mountains of documents is growing beyond managers’ capacity to understand, and that Google is paling under the weight. Watson is meant to do the first cut on finding a gem in, for example the medical literature – reading ‘many thousands of documents per second.’ Along the way, a few researchers may lose their jobs, but the remaining managers will need coffee and servers are wanted.

Havent heard for a while of Danny Hillis – he coined the Thinking Machine back in the day. The original cognitive computer? Or was that the old Ratiocinator (but I digress). Hillis says data storytelling is key. To, like old man Chaucer, find narrative in the confused data stream. If the story teller had a moral compass that would be an additional positive factor, if you take Louis Berry’s word for it. He is cofounder of Earnest, a company that has staff to keep an eye on the predictor engine output.

Opacity would be good, Lohr concludes, as Gary King, director of Harvard’s Institute for Quantitative Social Science joins the narrative. The Learning Machines should learn to err on the side of the individual in the data pool – if that would happen you would get that bank loan, that might be a little iffy. Rather than have a fairly innocuous money request rejected. George Bailey would be the patron saint of the Moralistic Data Story Telling Engineer.

I am trying to think of a case where the owners of the machines programmed them that way .. but parted-lipped Jennifer Lawrence is in a Dior ad contiguous with Lohr’s Maintaining a Human Touch As the Algorithms Get to Work (NYT, Apr 7, 2015, p. A3) and my train of thought has left the station.

Data science should not happen in the dark. We have in fact aborning a classic humanization-computerization dilemma. Academia and associations, mobilize! – Jack Vaughan, Futurist

[Imagine Betty Crocker working a conveyor belt where algorithms are conveyed. I do.]

Friday, March 20, 2015

Spark was nowhere in 2013 - in 2014 it blew by Storm on Google Trends

Thursday, February 5, 2015

Some dirt on dirt data

Sometimes I think back to last Sept when I got a chance to see data in a different role. That is, as a central player in solving civilization-scale challenges. So much has been done, yet there is so much more to do. Just as one example there is soil data. More and more data is being gathered on soil moisture, weather and crop conditions, but new storage techniques, analytical methods and search algorithms are required, as a U of Wis. (Go, Bucky Badger!) researcher said at the conference. - Jack Vaughan

Related
Data-wrangling-a-key-to-meeting-civilization-scale-challenges - SearchDataManagement.com, Sept 2014

Sunday, February 1, 2015

Momentous Tweets in the week of our Lord Jan 25 2015

HBR - We've written a lot of great stuff on digital business models http://s.hbr.org/1tGWJrk

The Onion @TheOnion • 21h21 hours ago Top Story In Sports: Marshawn Lynch Delivers Eloquent 45-Minute Address On Privacy In The Modern Age http://onion.com/1zH2lVj

Mark Madsen @markmadsen • 21h21 hours ago RT @kdnuggets: Can Microsoft make R easy? http://flip.it/gIOeg #rstats http://flip.it/A1eV1

#Hortonworks #Data #Governance Initiative targets #Hadoop data management, better data policies http:// http://bit.ly/1CVCFDF

Francois Petitjean @LeDataMiner ·

.@kdnuggets #Code for learning the #Structure of #GraphicalModels for #BigData released - http://goo.gl/0BhnHh  

Biggest misnaming of place in history? Greenland! NASA | Greenland's Ice Layers Mapped in 3D: http://youtu.be/u0VbPE0TOtQ

Curt Monash shows you where the #bigdata innovation is. http://www.dbms2.com/2015/01/19/where-the-innovation-

The Internet of Things just got a watchdog: FTC issues official report. #Thingnado #IoT http://bit.ly/15PtUOY

"There be data here"... lots of #BigData #OpenData ... http://bit.ly/1sWtO26 #abdsc via @DataScienceCtrl

Fast Clustering of #BigData Sets: http://bit.ly/1BBm8ml #abdsc #DataScience via @DataScienceCtrl

Accenture's Vince Dell'Anno discusses the data supply chain http://bit.ly/1CinPYJ