Sunday, April 26, 2015

Rah rah, Data, Go, go, go

SwarmDroneDessertBoing
Data as an enthusiasm or even hobby is in the air. As noted in an Economist article (Briefing: Clever Cities: The Multiplexed Metropolis –Sept 7 2013, p.21. ) But does close inspection of the results to date tell us the enthusiasm is warranted? Is this truly like the introduction of electricity to the city? 

One young Dutchman developed a mobile app that tapped into open data to predict the best and easiest areas of city to rob. This was done thankfully in this instance to kindle debate. The Smarter City has a Darker Side, and not just in SciFi.

Anyway, who benefited most from the introduction of electricity, and if data is as powerful a game changer, who will benefit most on this go-round? "The importance of political culture will remain" the writer opines. And it is true. The political culture likely remains more important than any transient technology advance - in terms of how the pie gets cut up.

Human behavior is good and bad. If there is a bad side, there is an app for that.

Saturday, April 11, 2015

Behind the music – Spark and the PDP-11

The DataDataData (Itsthedatatalking) blog is meant to focus on data today – not to rehash my history of computing. But sometimes it veers that way, and I will just be holding on to nothing but the wheel. But I digress.

Apple scruff at The Smithsonian
Spark is the latest new shiny object in data processing. That said, I don’t mean to belittle its potential. The folks that fashioned it in the vaunted AMPLabs at UC Berkeley are supersmart, and very aware of what the advent of multicore microprocessors meant to computing: that new means to big clusters of  parallelism were available, if only the complexity could be abstracted downwards in clever libraries and runtimes.

People selling Spark come in your door selling Hadoop. Which has had plenty of publicity and is borderline ready for primetime. Now once in there, they may mention  you can toss Hadoop, but only if they think you may cotton to that.  After writing about Hadoop for about two years I took some care in approaching Spark.  Finally some words from way back came back. Please, let me digress some more.

Long ago and far away I sat with my boss discussing the news. The news on that day in 1992 was the ouster of Digital Equipment Corp.'s co-founder Ken Olsen. His departure was an inflection point along a trail that saw DEC go from being a gutsy Maynard, Mass. mill town startup to being a serious threat to IBM's industry leadership to being a forlorn merger candidate.

Like those in other editorial offices, my boss and I wondered what went wrong. What went wrong was the company got confused about what business it was really in. Seems absurd, but it can happen.
DEC's Olsen did not like the PC or Unix, two very innovative industry trends that his subordinates learned to basically eschew.  Missing on the move to small personal computers was especially ironic, as DEC itself rose in the 1960s on the back of minicomputers that downsized capabilities of the larger, then-dominant mainframe computer. Anyway, on this particular day I was especially interested to see my editor's take on this. That was because his experience went beyond running a magazine called EDN.

You see, as a graduate student, Jon Titus's had been in the vanguard of what came to be known as microcomputers, or PCs.  A July 1974 Radio Electronics issue that featured Titus's 8088-based "Mark-8 Personal Minicomputer" kit predated Popular Electronics' Altair 8088 cover story by six months.

In Cambridge, Mass., Harvard college student Paul Allen picked up a copy of the latter magazine, brought it back to the dorm to share with Bill Gates, and a new era of computing was off and running. Note that Titus and the Radio Electronics editors called the Mark-8 a personal minicomputer. So, Titus had a unique perspective on Ken Olsen's quandary.

"DEC came to think they were selling minicomputers," Titus said. "But what they were selling was computing."

Anyway- I link below to the full story on this which ran on SearchDatamanagement.com. I'd like to add here what a great boss Jon Titus was for me. He stood by me, more than once, which I never will forget. My spousal unit and I got to Washington last week. We went to the Smithsonian museum (actually, just two days after this story went live) and were told that the computer exhibit was closed for repairs (a lot of people can relate to that, ay?!) so we did not see the Mark-8 on display. Instead there was the computer that has, and maybe rightfully so, gained the brunt of the fame.
A cruel old engineer.

That is the Apple II of Steve Wozniak and Steve Jobs.  A woman came by and asked the air: "Is that the first computer?" No, said I, trying to be courteous, "the first computers were as big as rooms - that is what many people consider to be the first personal computer." Sorry, that's it for now - I got to go digress. – Jack Vaughan

Read Apache Spark meets the PDP-11 -- in the end, it's all about the processing – SearchDataManagement.com, Mar. 31, 2015 http://bit.ly/1Im9n1l

Wednesday, April 8, 2015

Give me Algorithmic Accountability Or

Give me Algorithmic Accountability or give me… ah, what is the alternative again?

I thought Steve Lohr's article in yesterday's New York Times was worth pointing out, as it boils up a larger issues from the flotsam and jetsam of the big data analytics parade. Oneline ads, the killer app (to date) for big data and machine learning re but a Petri dish, he says. After all, if the wrong ad is served up, the penalty is mild. But, he writes, the stakes are rising. Companies and governments will churn big data  to prevent crime, diagnosis illness, and more. Why just the other day JP Morgan said it could spot a rogue trader before he-she went rogue.

The algorithms that do the decisions may need more human oversight, the writer and others tend to suggest. Civil right organizations are among those suggesting. An other is Rajeev Date, formerly of the Consumer Financial Protection Bureau. The story focuses on the notion of Algorithmic Accountability (meeting tonight in the church basement, no smoking please) as an antidote to brewing mayhem

IBM Watson appears in the story. It is hard to get a handle on Watson, but one thing is crystalline; that is, that the mountains of documents is growing beyond managers’ capacity to understand, and that Google is paling under the weight. Watson is meant to do the first cut on finding a gem in, for example the medical literature – reading ‘many thousands of documents per second.’ Along the way, a few researchers may lose their jobs, but the remaining managers will need coffee and servers are wanted.

Havent heard for a while of Danny Hillis – he coined the Thinking Machine back in the day. The original cognitive computer? Or was that the old Ratiocinator (but I digress). Hillis says data storytelling is key. To, like old man Chaucer, find narrative in the confused data stream. If the story teller had a moral compass that would be an additional positive factor, if you take Louis Berry’s word for it. He is cofounder of Earnest, a company that has staff to keep an eye on the predictor engine output.

Opacity would be good, Lohr concludes, as Gary King, director of Harvard’s Institute for Quantitative Social Science joins the narrative. The Learning Machines should learn to err on the side of the individual in the data pool – if that would happen you would get that bank loan, that might be a little iffy. Rather than have a fairly innocuous money request rejected. George Bailey would be the patron saint of the Moralistic Data Story Telling Engineer.

I am trying to think of a case where the owners of the machines programmed them that way .. but parted-lipped Jennifer Lawrence is in a Dior ad contiguous with Lohr’s Maintaining a Human Touch As the Algorithms Get to Work (NYT, Apr 7, 2015, p. A3) and my train of thought has left the station.

Data science should not happen in the dark. We have in fact aborning a classic humanization-computerization dilemma. Academia and associations, mobilize! – Jack Vaughan, Futurist


[Imagine Betty Crocker working a conveyor belt where algorithms are conveyed. I do.]

Thursday, February 5, 2015

Some dirt on dirt data

Sometimes I think back to last Sept when I got a chance to see data in a different role. That is, as a central player in solving civilization-scale challenges. So much has been done, yet there is so much more to do. Just as one example there is soil data. More and more data is being gathered on soil moisture, weather and crop conditions, but new storage techniques, analytical methods and search algorithms are required, as a U of Wis. (Go, Bucky Badger!) researcher said at the conference. - Jack Vaughan




Related
Data-wrangling-a-key-to-meeting-civilization-scale-challenges
- SearchDataManagement.com, Sept 2014 

Sunday, February 1, 2015

Momentous Tweets in the week of our Lord Jan 25 2015


HBR - We've written a lot of great stuff on digital business models http://s.hbr.org/1tGWJrk

The Onion @TheOnion • 21h21 hours ago Top Story In Sports: Marshawn Lynch Delivers Eloquent 45-Minute Address On Privacy In The Modern Age http://onion.com/1zH2lVj

Mark Madsen @markmadsen • 21h21 hours ago RT @kdnuggets: Can Microsoft make R easy? http://flip.it/gIOeg #rstats http://flip.it/A1eV1

Initiative targets data management, better data policies http://bit.ly/1CVCFDF


Biggest misnaming of place in history? Greenland! NASA | Greenland's Ice Layers Mapped in 3D: http://youtu.be/u0VbPE0TOtQ

Curt Monash shows you where the innovation is.

The Internet of Things just got a watchdog: FTC issues official report.


"There be data here"... lots of ... via

Fast Clustering of Sets: via

Accenture's Vince Dell'Anno discusses the data supply chain

Wednesday, January 28, 2015

Big Data from Space

Big Data from Space - "The question of how to ensure space-based knowledge is used for the common good has become pressing with the dawning of a new space age, in which satellites have become affordable for private interests," writes NOAA head Kathy Sullivan on Davos Blog. Food for thought. - Jack Vaughan



https://agenda.weforum.org/2015/01/how-big-data-from-space-helps-life-on-earth/
https://www.google.com/webhp?hl=en&tab=ww#hl=en&tbs=qdr:y&q=kathy+sullivan+NOAA+space+data+opendata+%22climate+corporation%22

Monday, January 26, 2015

Between the buttons

Sometime ago I read a review of a book - "The Information: a history, a theory, a flood" by James Gleick. The review was by Sam Anderson and appeared in NYT Sunday Magazine June 26, 2011. That whole year was a blur, lately I am discovering, and I find it hard to believe I totally missed this thing. Cause it sees like a semi-mystical tome about technology, which is one of my suits.

Gleick it seems discussed the fact that every era thinks its is one of information overload. I get it. Folks immemorial feel something essential is being overthrown during the natural course of communications progress. Certainly the telegraph upset the cozy world of the semaphore, and the telephone unwound the telegraph of things. And on.

Glieck as expounded upon by Anderson picks up on some points that bear some noodling. Let's start with what they said that George Boole said that I did not even know about: "The symbols zero and one in the system of logic are nothing and the universe."

Anderson views the Web as an almanac - as a vast interlocking set of databases that seeks to comprise ALL PREVIOUS TEXT. But in the dance of All or Nothing at All you need two to tango. All text runs against communications. Thus Anderson saddles up upon a basic idea of communications to describe the problem incumbent with today's brand of "too much information."

He interestingly describes the leap from harnessing of electricity to telegraph as a leap based on interruptions of circuit flow - breaks in continuity, coded to hold meaning. And as he describes it you can see Morse's experience possibly affecting Boole's thinking.  (We could add Weiner and Shannon.)

- "We need to remember the value of nothing. 

- "We need to organize our internal absences to create meaning." 

- "It’s like breathing: you can’t inhale all day. We need to learn to make peace with the information we don’t know, to embrace the zeroes, to relearn the pleasures of hunger, need, interruption, restraint." 

Sam Anderson finds you have to leave the space, you cant fill up the glass, there are not just ones, there are zeros. You could put Monk, Lacy, Ellington, Basie, Morton, lot of jazz in this discussion.

-"We need to remember the value of nothing. We need to organize our internal absences to create meaning. "

Related

Randomly
-Fortunately for Western Union, the telegram became the money transfer.

-When I look at SI's web site, I see the Web less as a new threat but a resurgence of the original threat to newspapers and magazine that was TV.

-Derived from this story : Here's a good quote from Benedict Anderson, scholar: "reading a newspaper is like reading a novel whose author has abandoned any thought of a coherent plot." The story we are looking at [ http://www.nytimes.com/2011/06/26/magazine/an-accidental-experimental-masterpiece.html ] juxtaposes thoughts on Glieck's book with what an old style almanac was, and how the Internet is now the almanac and the title of the story describes this:  An accidental experimental masterpiece - and I guess that's what I think news is like. 

Sunday, January 25, 2015

Data tweets I have known during the Week of January 18, 2015

Top news this week...Microsoft to acquire Revolution Analytics... the company is the chief independent purveyor of R, which is fitting in nicely with Excel and some other skills in the world of data science. Very interesting 'early acquisition' in the Satya Nadella reign http://t.co/TarHUV15DY ... U.Tex students to use Watson to deliver #IBMWatson @JackVaughanatTT reports: http://bit.ly/1CxHlgr ... HDP 2.1: Apache Falcon for Data Governance in Hadoop http://t.co/WwUpeVXvRF ... Facebook opening up Deep Learning software - looking to vie with Google. http://bit.ly/1um9oLc ... Google's data supremacy: should we be worried? http://wp.me/p2WnJJ-ff ... New Report Evaluates Technological Alternatives to Bulk Data Collection http://t.co/EgliCMTqgE