Sunday, April 27, 2014

Why big data is a big deal at a big school

Cosmic background noise for placement only
The big data activities of three Harvard School of Public Health professors are discussed in the recent issue of Harvard Magazine (Mar.Apr 2014).'Why Big Data is a Big Deal' looks at their work, and it emerges that, basically, there is a pretty obvious connection with what is posited now as 'big data' and a couple of trends long enfolding.  Especially, computer analysis and data gathering in the social sciences - over many years – has grown – grown to the point that its tenets seem evident, natural, and broadly applicable beyond their initial use cases. The Harvard profs exemplify the emerging style. What is new? It is most evident in the case of Gary King, Weatherhead university professor and head of Harvard's Institute for Quantitative Social Science. He has used data with special imagination, yes. He has also found ways to use social media information and cell phone data as part of the analysis, even in places far afield. Like others he attaches a bit of mysticism – or 'capacity to drive good' - to the concept 'data' . Data as a lynchin-pin for a  movement is growing. And the Harvard crew is emblematic. 'Improved statistical & computational methods-not in growth of storage or computational capacity'  – Jack Vaughan

Saturday, April 26, 2014

Data Gumbo for the Week of Noxious Fumes

Predicting legislation, or Follow the money http://blog.fiscalnote.com/2014/04/22/legislating-todays-science-fiction-tomorrow/

Don't touch that dial! Catch Jack Vaughan speaking with Nicole Laskowki about #GartnerBI on Talking Data podcast. bit.ly/1k4ef0y

A majority of financial enterprises (67%) present a "repeatable" level of #bigdata #analytics maturity. -per IDC http://www.idc.com/getdoc.jsp?containerId=prUS24808014

'Improved statistical & computational methods-not in growth of storage or computational capacity' http://harvardmagazine.com/2014/03/why-big-data-is-a-big-deal

Read Wayne E's big data/data warehouse clash http://bit.ly/Rwc5NY  Insightful: Reminds of other techno shifts w incumbent painted all bad!

Big data: big mistake? -  http://on.ft.com/P0PVBF  via @FT 'Big data” has arrived-big insights have not.'

Cringely: Big Data is the new Artificial Intelligence http://betane.ws/s0Am  via @BetaNews

Does #bigdata improve contextualization of science? http://philsci-archive.pitt.edu/9944/1/pietsch-bigdata_complexity.pdf

The Parable of the Google Flus: Traps in Big Data Analysis

Toward a Vision: Official Statistics and Big Data http://magazine.amstat.org/blog/2013/08/01/official-statistics/

Wednesday, April 23, 2014

What is going on with data management technology?

What is going on with data management technology?

One thing is pretty true: Information of all types is engulfing the corporation. There’s more…

Web apps and distributed cloud computing have grown.
And a slew of new technologies has arrived to help companies cope with the data influx and distributed data processing …
… But sorting through those technologies is tough.
And you can’t start fresh unless you are a startup.
If you are established, you worry about startups taking your business …
But the original ‘queriable’ relational database approach is still valid.
Newbie software has to adapt too .. add old style capabilities, and vice versa.

Monday, April 21, 2014

Data privacy stories on SearchBusinessAnaltyics

Snowden speaks to Euro Union Biggies by Skatellite April 2014
Snowden by link up talks with Euros

My SearchBusinessAnalytics.com colleague Ed Burns has been at work on a fine series on emerging data privacy issues. In Data collection practices spark debate on big data ethics, privacy and Laws leave gray area between big data and privacy he paints a picture of the current landscape (and more is on the way).  One thing that emerges, and this is something that Ed speaks with me about on an upcoming Takling Data podcast: there is as sort of entropy going on here – it skews toward the status quo, which is, in broad brush, there really is no such thing as privacy and in too many cases your data is my cash cow. We've heard plenty of talk about the big data gold rush, and data as the new oil – what that translates into is some lip service to the notion of a data self. Burns and I are enthusiasts for the new possibilities of data but both of us I think suspect that data and analytics professionals have to be sure to treat what they do as a profession, and consider the ethics of data mining, as they would any other kind of mining. I think his series on privacy is a good step in laying the ground work for a discussion around this. Does the industry need another Snowden event to wake up to the need for ethical  standards? -Jack Vaughan

Thursday, April 17, 2014

The Unknown Known – On Rumsfeld's ridiculously sublime rumination

 


Nate Silver's well regarded The Signal and the Noise (2012) included a chapter in which the author intrepidly  gumshoes it to Donald Rumsfeld's office, mostly to discuss the former secretary of war's long-running enthusiasm for a little known 1962 book, Pearl Harbor: Warning and Decision by Roberta Wohlstetter.  Actually the greatest enthusiasm may be that which Rumsfeld held for the book's introduction, one penned by economist Thomas Schelling who wrote: "There is a tendency in our planning to confuse the unfamiliar with the improbable."

Before Pearl Harbor, the U.S. expected sabotage from Japan, but not a six-carrier air attack from the north. This formed what Silver might describe as a 'signal and noise' moment when a massive trove of information was not effectively sifted – and Pearl Harbor was not predicted.  There would seem to be a lesson in analytics there somewhere.

Rumsfeld somewhat famously circulated the book in Washington months before the Sept 11 2001 terrorist attacks, and he has a Xeroxed copy of the forward at hand when he meets Silver. After the fact, the Wohlstetter book's  theme seemed applicable, in Rumsfeld's – and, perhaps, Silver's - estimation, to Sept 11. And it may have formed a backdrop for Rumsfeld's ridiculously sublime rumination on known knowns, known unknowns and unknown unknowns, another  variation of which (the unknown known) form the title of Errol Morris' new film, which is what I came here to tell you about.

I call Rumsfeld's 'unknown unknown' wordsmithing ridiculously sublime because, upon viewing Morris' film, I conclude that Rumsfeld's 'understanding' of the Pearl Harbor lesson was more misunderstanding – was more a willful, spiteful and devilish confabulation of analytics. He took a bit of truth and with some technical exactitude mis-applied it to the case of Iraq and its purported troves of weapons of mass destruction, for his larger purpose (political bias) of, well, say, shaking up the Middle East. He took the idea that the Pearl Harbor debacle was caused by failure of imagination, and imagined a fabled debacle all his own. Prediction provides some very special care, evoking a rework of Bob Dylan line: "To live outside of time you must be honest."

"The Unknown Known" is not quite on par with Morris' portrait of Viet Nam era Defense Secretary Robert McNamara as a film and a story, the protagonist elicits less empathy in this viewer, but it is worthwhile in its probing pursuit of logical understanding – in its analysis. Also, like the earlier 'Fog of War', it has some nifty animation. - Jack Vaughan

Sunday, March 30, 2014

Encryption and differential privacy discussed on way out of NSA sinkhole

The U.S. government found itself in a very defensive position vis-à-vis data privacy in the wake of Edward Snowden's NSA disclosures. In January, Pres. Obama promised to appoint a group to look more deeply at U.S. Intelligence programs, which acted as if by fiat from 9-11 on. A recent MIT event took a look at encryption and differential privacy technology as part the review effort.

The latest on that is an Administration proposal to turn over the storage of phone records to phone companies, and to tighten the requirements for subpoenas thereof.  One doesn’t necessarily get a warm feeling on that… but some long time NSA watchers see it as a step forward.

When Obama charged John Podesta, long-time Democratic operative and now White House Counselor, to head the study group, he also said to look at big data commerce and its potential to threaten civil liberties.
The White House enlisted academics, including MIT's Computer Science and Artificial Intelligence Lab Big Data Initiative group, as part of that effort.  In March I covered a related workshop on “Big Data and Privacy: Advancing the State of the Art in Technology and Practice” and, together with colleague Ed Burns, reported this on a SearchDataManagement.com Talking Data podcast.

Both Burns and I felt the MIT conference was a bit high on the technology side (encryption and differential privacy being prominent) and bit low on the privacy side. The notion that data is like the "new gold" or the "new oil" seems overblown, until you see a room full of policy and commerce people discussing how much data is going to change the world as we know it. Whether they are right or wrong is less important than the palpable sense something akin to gold or oil ''fever'' is in the air.

Podesta had planned to attend the event, but was hampered by snow in Washington (although one might guess that, this being the weekend of the Russian Crimean Peninsula incursion, staying close to the White House was wise). He spoke with the assembled by teleconference. Below are some riffs from his published remarks. – Jack Vaughan

"…one purpose of this study is to get a more holistic view of the state of the technology and the benefits and challenges that it brings.  This Administration remains committed to an open, interoperable, secure and reliable internet – the fundamentals that have enabled innovation to flourish, drive markets and improve lives.  

"There is a lot of buzz these days about “Big Data” – a lot of marketing-speak and pitch materials for VC funding. 
"(But) the value that can be generated by the use of big data is not hypothetical.  The availability of large data sets, and the computing power to derive value from them, is creating new business models,
 "With the exponential advance of these capabilities, we must make sure that our modes of protecting privacy – whether technological, regulatory or social – also keep pace.

Related
http://cdn.ttgtmedia.com/rms/editorial/sDM-TalkingDataPodcast-March31-BigDataPrivacyWorkshop.mp3


Saturday, March 22, 2014

Through the scanner darkly, darkly; and the future of information

scanner eye by jvaughan
The digitization of everything is an elixir for some people. It spawns visions. If we could only open up all the data…how about taking the college facebook and putting it on line …. why not street-level and satellite-level photos of every home in the U.S. of A. Ok! Build and sell a picture database of all the license plates on all the cars and trucks on the road? Gee, I don't know. The Department of Fatherland Security recently moved to create a national license-plate recognition database to garner data from commercial and law enforcement tag readers. Then, with NSA skulduggery still a little too current, they canceled it a' sudden. Note that commercial tag reader systems remain out there. DRN or Digital Recognition Network provides "data that puts your company in the driver's seat" helping you repo your assets (e.g., cars) and reduce asset charge-offs. Together with Vigilant Solutions of Livermore, Calif., the company is fighting a Utah law that banned the private, commercial use of the license plate scanning technology. DRN was the only speaker at a hearing on the topic at the Mass State House earlier this month. They see it as their first amendment right to make money taking pictures of stuff. When you think of all the big data uses of license plates beyond immigration, repossession, well its boggling. Probably their more big data apps to come, that we cant even think of, but why not collect the data for that big day in the future? The undercurrent is, if I don’t want the NSA or DFS to do it, why would I want some Starbuck's guzzling nerdster to? Re-jiggering of status quo is what massive levels of data can do. Google has met a few people who don't want pictures of their houses in Google's database and, apparently, will remove them if you ask. I don't think First Amendment rights to take pictures are a foundation for massively scaled reproduction, and would not my license plate in some software company data services offering. In "Who owns the Future," Jaron Lanier lays down some framework for a more credible understanding of where we want to go with data and privacy. By asking questions of the future he takes a sharper picture of the present: "... as technology advances i this century, our present intuition about the nature of information will be remembered as narrow and shortsighted." - Jack Vaughan

Related 
http://www.foxnews.com/politics/2014/02/19/dhs-plan-for-national-license-plate-tracking-system-raises-privacy-concerns/
http://www.googletutor.com/asking-google-to-remove-your-home-from-maps-street-view/
http://betaboston.com/news/2014/03/05/a-vast-hidden-surveillance-network-runs-across-america-powered-by-the-repo-industry/
http://consumerist.com/2014/03/05/the-repo-man-might-be-scanning-your-cars-license-plate-and-location-selling-the-data/
http://www.drndata.com/Content/Docs/DRN%20Vigilant%20Utah%20Press%20Release.pdf
http://www.drndata.com/
http://vigilantsolutions.com/




Saturday, March 15, 2014

I have heard all about Grantland



I'm reading interesting book called Talk Nerdy to Me. This is by the ultra-hip Grantland (as in Grantland Rice) crew whose totally cool cat's sportswriting on the web packaged here bears the blistering subtitle of "Talk Nerdy to Me: Grantland's Guide to the Advanced Analytics Revolution" (sold out says the site today) and it is a more interesting take on big data analytics then many other tomes that you maybe have anted up for. Let's start with "Belichick's Fourth and Reckless" by contributor Bill Simmons. The story centers at times maniacally but on Patriots coach Bill Belichick's famously strange call on fourth-and-2 on November 15, 2009 against Peyton Manning and the Baltimore Colts. It trumps many other coaching failures in Boston sports  fabled history of failures, he writes. It was such a strange call – the Pats were on their own 28, with less than 2 minutes to play and a lead - that people began to look at the statistics trying to see what was in the coaches – the great, mind you - mind. Simmons goes over some of the stats and pretty well proves how at times statistics can lie, or at least outsmart the lazy intellectual (you know the type that works for media!)Bellichick's crazy gambit had backing in stats. "Bellichick did play the percentages if you took those percantages at face value." But Simmons points out for example, that statistics (that going for fourth down had an 80.5 % chance of succeeding) don’t account for the obvious confused funk that had descended on the Pat's in that final quarter.  That there is a big difference between fourth-and-2 on a Sunday in September against a lazed Falcons outfit than there is in November against Petyon Bloody Manning and the Colts. Stop and grok on this:

"I know it's fun to think stats can settle everything, but they can't and they don’t."

If you are playing the statistics card, which one do you choose? Writes Simmons. There are all sorts of statistics to count, but which are the ones to count on? Pulling out all the stop here I am going to recall Mark Twain, or maybe Vin Scully, plenty of argue over who said it:

Statistics are used much like a drunk uses a lamppost: for support, not illumination.

Beware, you would be masters of the big data universe! I said that. - Jack Vaughan

Saturday, March 1, 2014

Duck duck goose

Today's clamor around big data will one day subside. Like the love affair in Cole Porter's Just One of Those Things, it is ''too hot not too cool down''.  It is a sort of process;  vendors and media builds things up and then break things down again. Take as example a recent New York Times story entitled "Big (Bad) Data." The item revolves around the case of A&E's Duck Dynasty star Phil Robertson. His antigay comments in a magazine article went viral on Twitter, and A&E execs, as if in the thrall of big data analytics, suspended him from the show. Then, the Twitter sentiments rebounded, big data was recalibrated, and Robertson was back in. The Times' story suggests the first response was wrong, the second right. But time may prove otherwise. This episode in review is hardly an indictment, although that is how the writer or his editors would have it.  The advent of big data does not obviate the need for exes to have full liberal educations with philosophy, ethology, ethics and economics studies under their belts.  The execs of A&E give vent to the old saw: If you don't know where you're going, any road will take you there.