Wednesday, December 24, 2014

Talking Data: Protecting online data privacy was the big 2014 trend

Talking Data: Protecting online data privacy was the big 2014 trend


The last Talking Data Podcast of 2014 is a bit of a walk down Twitter Lane with data privacy issues in 2014. Ed Burns and I discuss a Pew poll that looks at the attitude of Americans toward online privacy. They aren't comfortable with their data sharing, but they sure do like that social infrastructure services at gratis. Uber and its loping missteps on the way to killing the hackney as he is now known, and what that means for data professionals. There is more, including some discussion of HortonWorks IPO. On Christmas eve my version was truncated. Might wait until the Christmas smoke clears to sort through that. - Jack Vaughan

Tuesday, December 23, 2014

That was the year that was big data ala Hadoop, NoSQL


The year 2014 saw progress in big data architecture development and deployment, as users gained more experience with NoSQL alternatives to relational databases, and Hadoop 2 gained traction for operational analytics uses beyond the distributed processing framework's original batch processing role. Those trends were detailed in a variety of SearchDataManagment pieces. Big data in review: 2014



The roots of machine learning

Neural networks and artificial intelligence have been on my mind over many years, while I spent most days studying middleware, something different. At the heart of neurals were backward propagations and feedback loops – all somewhat related to cybernetics, which flowered in the 1940s into the early 1970s. One of the first implements of cybernetics was the thermostat.

In early 2013 I started working at SearchDataManagement, writing about Big Data. At the end of  this year I have devoted some time to book learning about machine learning. A lot happened while Rip Van Vaughan was catching some z's.  So something told me to go back to one of my old blogs and see where I left off with feedback. If you pick through it you will find Honeywell and the thermostat and automatic pilot, etc. My research told me the first auto pilot arose from a combo of the thermostat (Honeywell) and advanced gyroscopes (Sperry).

I spent hours looking at the thermostat, its mercury, its coil. It had an alchemical effect. I remember wondering if the thermostat could be a surveillance bug. Now we have Nest, which uses the thermostat as a starting point for collecting data for machine learning process. - Jack Vaughan

[It is funny how the old arguments about the autopilot appeared as memes in this year of machine learning. This link, which includes Tom Wolfe's mirthful take on the autopilot in The Right Stuff… is here rather as a place-marker for background: The Secret Museum of Cybernetics - JackVaughan's Radio Weblog, March 2004(also reposted on Moon Traveller with a slew of feedback errata). Probably valuable to cite Nicholas Carr's The Glass Cage, published this year, which takes as its premise, society's growing inabilities, many brought on by automation. Several serious airplane crashes where pilots' skills seemed overly lulled by automation form a showcase in The Glass Cage, ]

From Wolfe’s The Right Stuff:

“Engineers were ... devising systems for guiding rockets into space, through the use of computers built into the engines and connected to accelerometers for monitoring the temperature, pressure, oxygen supply, and other vital conditions of the Mercury capsule and for triggering safety procedures automatically -- meaning they were creating  with computers, systems in which machines could communicate with one another, make decisions, take action, all with tremendous speed and accuracy .. Oh, genius engineers! “

Wednesday, December 17, 2014

AI re-emergence : Study to Examine Effects of Artificial Intelligence


Able New York Times technology writer John Markoff (he has been far away the star of my RJ-11 blog) had two of three (count em, three) AI articles in the Dec 16 Times. One discusses Paul Allen's AI2 institute work; the other discusses a study being launched at Stanford with the goal to look at how technology reshapes roles of humans. Dr. Eric Horvitz of MS Research will lead a committee with Russ Altman, a Stanford professor of bioengineering and computer science. The committee will include Barbara J. Grosz, a Harvard University computer scientist; Yoav Shoham, a professor of computer science at Stanford; Tom Mitchell, the chairman of the machine learning department at Carnegie Mellon University; Alan Mackworth, a professor of computer science at the University of British Columbia; Deirdre K. Mulligan, a lawyer and a professor in the School of Information at the University of California, Berkeley. The last, Mulligan, is the only one who immediately with some cursory Googling appears to be ready to accept that there are some potential downsides to AI re-emergence. It looks like Horvitz has an initial thesis formed ahead of the committee work. That is that, based on a TED presentation ("Making friends with AI") , while he understand some people's issues with AI, that the methods of AI will come to support people's decisions in a nurturing way. The theme would be borne out further if we look at the conclusion of an earlier Horvitz'z organized study on AI's ramifications (that advances were largely positive and progress relatively graceful). Let's hope the filters the grop implement tone down the rose-colored learning machine that enforces academics' best hopes. – Jack Vaughan



Sunday, November 9, 2014

Facebook imbroglio

Recently I wrote a story for SearchDataManagement that largely centered on one of this year's big data imbroglios. That is the Facebook-Cornell Emotional Contagion study. This was the topic at Friday night symposium (Oct 8) capping the first day of the Conference on Digital Experimentation at MIT.   You could sum up the germ of imp of the story as: it is okay to personalize web pages with financial purpose, and to fine tune your methods thereof, but could that overstep an ethical boundary?

On the MIT panelist but not covered in my brief story was Jonathan Zitrain of Harvard University Law. For me, he put the contagion study into context- contrasting and comparing it to the Stanford Prison experiment and the Tuskeegee prisoner studies, which came under scrutiny and helped lead the way to some standards of decorum for psychological and medical experiments.  "There out to be a baseline protection," he said. There is a fiduciary responsibility, a need for a custodial and trusting relationship with subjects, that at least are an objective in science studies of humans.

Now, this responsibility, forwarded by Zitrain and others, remains unstated and vague. Clearly, the Internet powers that be are ready to move on to other topics, and let the Facebook experiment fade into the recesses, as even Snowden's NSA revelations have. I think a professional organization is needed – that sounds old school, I know and I don’t care. As with civil engineering, there is no need for a new generation to figure out what is overstepping – for waiting until the bridge collapses. – Jack Vaughan

Related

http://cyber.law.harvard.edu/people/jzittrain
http://searchdatamanagement.techtarget.com/opinion/Facebook-experiment-points-to-data-ethics-hurdles-in-digital-research
http://codecon.net/

Saturday, October 11, 2014

Calling Dr Data-Dr Null-Dr Data for Evidence-Based Medicine.

anonymizer
"Dr. Data" – likely for SEO reasons it has yet another name in its online version – asks if statistical analysis of databases of historical medical data can be more useful than clinical trial data for diagnosing patients. It recounts the story of Dr. Jennifer Frankovich, a Stanford Children's Hospital rheumatologist, who encountered the young girl symptoms of kidney failure, with possible lupus. Frankovich suspected blood clotting issues, but had to research the matter in order to convince colleagues.  Scientific literature comprising clinical trial data did not offer clues. Instead, Frankovich found evidence of connection of clotting and lupus, given certain circumstances, by searching a database of lupus patients that had been to this hospital over the last five years.

The story by Veronique Greenwood tells us she wrote of her experience in a letter to the New England Journal of Medicine, and was subsequently warned by her bosses not to do that kind of query again. Assumedly HIPPA privacy concerns are involved.

It stands to reason that data on all the medical cases in any given hospital could have some good use. It leads me to wonder. Shouldn’t the 'anonymouzation' or masking of individuals' identities within such databanks be a priority? Is the hegemony of the clinical trial era due to ebb, especially when taking into account the momentum of the World Wide Web?

Frankovich's work could come under the aegis of Evidence-Based Medicine. The expanded Web-borne appoach a'brewing here is sometimes called Digital Experimentation. –Jack Vaughan

Saturday, October 4, 2014

Bayesian Blues

Bayesian methods continue to gain attention as a better means to solve problems and predict outcomes. It was Google that used such algorithms to tune its brilliant search engine, and much more. Nate Silver carried the Bayesian chorus along with his depiction of the method in "The Signal and the Noise." Today, in fact, Bayesian thinking is very broadly adapted - enough so for the New York Times to devote a feature entitled "The Odds, Continually Updated" in this week's Science Times section.

Thomas Bayes writer F.T. Flam [I am not making this up] says set out to calculate the probability of God's existence. This was back in the 19th Century in jolly old England. The math was difficult and really beyond the ken of calculastion - until the recent profusion of clustered computer power. Almost a MacGuffin in the narrative is the overboard Long Island fisherman John Aldrich who the Coast Guard found in the Atlantic Ocean to that services use of Bayesian methods.

"The Odds, Continually Updated'' places more import on the possibility that Bayesian statistics have narrowed down the possible correct answers for the age of the earth (from existing estimations that it was 8 B to 15 B years old, to conjectures that it is 13.8 B years old. - Jack Vaugahn


An extended version* of this piece would consider:
Who was Bayes and what is Bayesian math?
How does it compare to frequentist statistics (the main point of the story)? If frequentist methods were aptly applied would they work in most cases?
How does this all relate to the looming question (smaller)of how much we can  trust data science and (bigger) how much we can trust science?

*for now this site like most blogs comprises free ruminations - and sometimes you get what you pay for.

Saturday, September 27, 2014

Blue takes on bioinformatics' data errors

Data error correction is something of a guessing game. Cyclic redundancy checks and other advance algorithms seemingly tamed the erroneous tiger at the chip and disk level long ago. Does one have to rethink, revisit and refresh these and other approaches than incremental success is to be found in such applications as gene research that pile up the data but can be prone to garbage in?

As the researchers say, gene sequencing costs have gone down drastically, but accuracy of data has only improved slowly. This looks like a job for better error correction.
Some Microsoft-backed researchers have come up with a bit oddly named ("Blue") software library for such a purpose. It has "proved to be generally more accurate than other published algorithms, resulting in more accurately aligned reads and the assembly of longer contigs containing fewer errors." They write:

One significant feature of Blue is that its k-mer consensus table does not have to be derived from the set of reads being corrected. This decoupling makes it possible to correct one dataset, such as small set of 454 mate-pair reads, with the consensus derived from another dataset, such as Illumina reads derived from the same DNA sample. Such cross-correction can greatly improve the quality of small (and expensive) sets of long reads, leading to even better assemblies and higher quality finished genomes.

In a paper published in Bioinformatics, Paul Greenfield, Research Group Leader, CSIRO, Division Computational Informatics, said  test results show that Blue is significantly faster than other available tools—especially on Windows—and is also "more accurate as it recursively evaluates possible alternative corrections in the context of the read being corrected."

The mass of data related to the genome (not just the genome, by any means) continues to call for cutting edge thinking. That thinking may come into play far beyond the old gene pool. - Jack Vaughan

Related
http://blogs.msdn.com/b/msr_er/archive/2014/09/02/a-new-tool-to-correct-dna-sequencing-errors-using-consensus-and-context.aspx

http://www.csiro.au/Outcomes/ICT-and-Services/Software/Blue.aspx

http://bioinformatics.oxfordjournals.org/content/early/2014/06/11/bioinformatics.btu368

Sunday, July 27, 2014

Two takes on Mongo

Take 1 - MongoDB arose out of the general 2000s movement that focused on agility in Web application development (with never-ending-changes-to-schemas replacing etched-in-stone do-not-touch-schemas). The proliferation of data formats calls for something like MongoDB, more than a few adventurousdevelopers decided. A document database, it also rides the success of JSON (over XML), and deals with the bare necessities of state management. It also scales quite easily to humongous scale – hence the fanciful name MongoDB. It is a style of data management that is behind big data tide.


Take 2-  For my part, I like to think the name MongoDB hails from the good old days of Flash Gordon fighting Ming the Merciless, emperor of the Planet Mongo.  My imagination for technology was honed on the Flash serials way back in the Sputnik days, when I'd watch Community Space Theatre Sunday mornings. These spectrally illumined moments are lost to the ages, like all the other signifying TV signals now in the far beyond. But I found a public domain radio days Flash Gordon transmission, and had some fun, mixing it in to a podcast report on my visit in June to MongoDB World in New York for SearchDataManagement. com

Left click to play or right click to download Mongo podcast

Sunday, June 15, 2014

Data stalking, data talking, data tracking

Microsoft Research Visiting Prof Kate Crawford tells the data gatherers that 'everything is personal' What do industry models of privacy assume? she asks. That individuals act like businesses trading info in fictional frictionless market. This is a convenient extension. but what works for the powers that be on one level, works differently for consumers. The convenient extension, thus, is an inconvenient untruth: the fix is in in the big data revolution.As Crawford writes: "Those who wield tools of data tracking and analytics have far more power than those who do not." In a way, the gap between the side arrayed to capitalize on big data and the side that is the data source could not be wider, or more disjointed.

Sunday, June 1, 2014

Will insurance get Googlized?

There is something ineffable about advertising, burdened as it is with psychology. Could that be true for insurance too? The latter has become an industry built on an edifice that is a lattice of perception of risk, but you could say it is built on psychology just as easily (actually, a little more easy to say that). The industry's untapped opportunity is also a potential threat.  It is standing there, waiting to be Googlized.

Tuesday, May 13, 2014

Pew data points on NSA surveillance and public

Just caught up with a Pew Research Center/USA TODAY poll conducted in January that estimated overall approval of NSA surveillance had declined since last summer, when stories first broke based on Edward Snowden’s leaked information.... Democrats remain more supportive of the NSA surveillance program than Republicans, though support is down across party lines....While most of the public wants the government to pursue a criminal case against Snowden, young people offer the least support for his prosecution....

http://www.people-press.org/2014/01/20/obamas-nsa-speech-has-little-impact-on-skeptical-public/

Monday, May 12, 2014

Mystic crystal data pondered

Composer Philip Shepard sees links
between data and music.
He scored a film on chess
master-cum-crazy-man Bobbie Fisher.
The near mystic quality some people attribute to data should be some cause for concern. Something doesn’t usually go from familiar and forgettable to world-changing and magical overnight.  What portion of today's data paeans ("I love data") ("Data is the new punk") will flower dandelion-like then drift off on wind? It is hard to say.

But a feeling can hold that some of this is good, worthy. I've had a chance to see a few conference keynotes that dabbled more in the art – less in the science - of data. And some ring true. At the recent Enterprise Data World event, there was a session by a data strategist from Marvel comics that made the case for applying graphing database architecture to the 'need' to rationalize different incarnations of different super heroes as data elements.  Entertaining, yes -but not an effective poster for data as a new way of being. But another session, one led by composer Philip Sheppard, had more such merit. The Marvel guy seemed to admit that - by noting that he wouldn’t want to follow Sheppard's presentation.

AT EDW14, Philip Shepard discussed information as symphony. Sheppard does film scores - says #Music is #data. @PhilipSheppard #EDW14 He makes a case that data is poetic. It is true, as he point out, you can look a little and see lyrical whiffs in graphical renderings on data on public bicycle use, the slipstream of air pressure readouts for an F1 racer.

But really, the place where his insights into the special nature of data bear the most fruits is the place he is closest to – music. "What is music? He asks. It's loads of things. He answers. It's transformative. It is a form of solace. You can wallow in it and you don’t really know why. Memory is so connected to music. People learn whole caches of text, when there is music attached. Once it probably was the major way of encoding history. Much can be learnt from the way musicians cope with huge amounts of data under duress. The basic music message can change over time, depending, eg., on players' emphasis. Sheppard's words to the data folks assembled: "When you are dealing with things you have to look at them as fluid I think we are starting to look at #data that way."

He has a point. When I was at the symphony once it strikingly dawned on me that this was a message from a human in time. Ludwig. I wrote about it on my art blog (MoonTravellerHerald) under the persona of Shroud Jr.  
Was in the symphony one day – many rainy years ago -- and Beethoven’s message was just crystalline to me. Me, Shroud Jr. Like a telegraph message through the foam of time – Beethoven heard the birds, the guns, he was losing his hearing. He was writing it down. Sending it out. Shroud Jr. was pickin up on it.
Truth be told, I was taken away with Sheppard's music, and can't do his argument justice here! I don’t find much on the web of his that helps directly either.  But some links follow. Communication, music – an interesting path always. Now, revise as communication, music, data. – Jack Vaughan

Related
http://philipsheppard.bandcamp.com/album/bobby-fischer-against-the-world
http://philipsheppard.com/philip-sheppard-biography/
http://edw2014.dataversity.net/sessionPop.cfm?confid=79&proposalid=6305
http://philipsheppard.com/

Big White House Data Report 1

The White House has released a new report on big data and privacy. It has not yet released a report on the suspect and widely reported activities of intelligence agencies. In the NYT's estimation, the report does fine job of laying out some of the benefits and problems associated with extensive data collection and its use in business. Among the benefits is such data's value in medical research. But, the NYT in an editorial page article today asks, cant that same data also be used to discriminate in sales or services? The editorial (A Long Way to Privacy Safeguards) commends the report for its recommendation that law enformecemnt agencids seek court approval to access gidital content like email in the same way they do for physical letters – this as the Supreme Court is poised to consider warrantless searches of cell phones.  The story makes the point that consumers lose control over their information from the moment that it is collected, and the point of collection is the point of infection in data privacy.

Sunday, April 27, 2014

Why big data is a big deal at a big school

Cosmic background noise for placement only
The big data activities of three Harvard School of Public Health professors are discussed in the recent issue of Harvard Magazine (Mar.Apr 2014).'Why Big Data is a Big Deal' looks at their work, and it emerges that, basically, there is a pretty obvious connection with what is posited now as 'big data' and a couple of trends long enfolding.  Especially, computer analysis and data gathering in the social sciences - over many years – has grown – grown to the point that its tenets seem evident, natural, and broadly applicable beyond their initial use cases. The Harvard profs exemplify the emerging style. What is new? It is most evident in the case of Gary King, Weatherhead university professor and head of Harvard's Institute for Quantitative Social Science. He has used data with special imagination, yes. He has also found ways to use social media information and cell phone data as part of the analysis, even in places far afield. Like others he attaches a bit of mysticism – or 'capacity to drive good' - to the concept 'data' . Data as a lynchin-pin for a  movement is growing. And the Harvard crew is emblematic. 'Improved statistical & computational methods-not in growth of storage or computational capacity'  – Jack Vaughan

Saturday, April 26, 2014

Data Gumbo for the Week of Noxious Fumes

Predicting legislation, or Follow the money http://blog.fiscalnote.com/2014/04/22/legislating-todays-science-fiction-tomorrow/

Don't touch that dial! Catch Jack Vaughan speaking with Nicole Laskowki about #GartnerBI on Talking Data podcast. bit.ly/1k4ef0y

A majority of financial enterprises (67%) present a "repeatable" level of #bigdata #analytics maturity. -per IDC http://www.idc.com/getdoc.jsp?containerId=prUS24808014

'Improved statistical & computational methods-not in growth of storage or computational capacity' http://harvardmagazine.com/2014/03/why-big-data-is-a-big-deal

Read Wayne E's big data/data warehouse clash http://bit.ly/Rwc5NY  Insightful: Reminds of other techno shifts w incumbent painted all bad!

Big data: big mistake? -  http://on.ft.com/P0PVBF  via @FT 'Big data” has arrived-big insights have not.'

Cringely: Big Data is the new Artificial Intelligence http://betane.ws/s0Am  via @BetaNews

Does #bigdata improve contextualization of science? http://philsci-archive.pitt.edu/9944/1/pietsch-bigdata_complexity.pdf

The Parable of the Google Flus: Traps in Big Data Analysis

Toward a Vision: Official Statistics and Big Data http://magazine.amstat.org/blog/2013/08/01/official-statistics/

Wednesday, April 23, 2014

What is going on with data management technology?

What is going on with data management technology?

One thing is pretty true: Information of all types is engulfing the corporation. There’s more…

Web apps and distributed cloud computing have grown.
And a slew of new technologies has arrived to help companies cope with the data influx and distributed data processing …
… But sorting through those technologies is tough.
And you can’t start fresh unless you are a startup.
If you are established, you worry about startups taking your business …
But the original ‘queriable’ relational database approach is still valid.
Newbie software has to adapt too .. add old style capabilities, and vice versa.

Monday, April 21, 2014

Data privacy stories on SearchBusinessAnaltyics

Snowden speaks to Euro Union Biggies by Skatellite April 2014
Snowden by link up talks with Euros

My SearchBusinessAnalytics.com colleague Ed Burns has been at work on a fine series on emerging data privacy issues. In Data collection practices spark debate on big data ethics, privacy and Laws leave gray area between big data and privacy he paints a picture of the current landscape (and more is on the way).  One thing that emerges, and this is something that Ed speaks with me about on an upcoming Takling Data podcast: there is as sort of entropy going on here – it skews toward the status quo, which is, in broad brush, there really is no such thing as privacy and in too many cases your data is my cash cow. We've heard plenty of talk about the big data gold rush, and data as the new oil – what that translates into is some lip service to the notion of a data self. Burns and I are enthusiasts for the new possibilities of data but both of us I think suspect that data and analytics professionals have to be sure to treat what they do as a profession, and consider the ethics of data mining, as they would any other kind of mining. I think his series on privacy is a good step in laying the ground work for a discussion around this. Does the industry need another Snowden event to wake up to the need for ethical  standards? -Jack Vaughan

Thursday, April 17, 2014

The Unknown Known – On Rumsfeld's ridiculously sublime rumination

 


Nate Silver's well regarded The Signal and the Noise (2012) included a chapter in which the author intrepidly  gumshoes it to Donald Rumsfeld's office, mostly to discuss the former secretary of war's long-running enthusiasm for a little known 1962 book, Pearl Harbor: Warning and Decision by Roberta Wohlstetter.  Actually the greatest enthusiasm may be that which Rumsfeld held for the book's introduction, one penned by economist Thomas Schelling who wrote: "There is a tendency in our planning to confuse the unfamiliar with the improbable."

Before Pearl Harbor, the U.S. expected sabotage from Japan, but not a six-carrier air attack from the north. This formed what Silver might describe as a 'signal and noise' moment when a massive trove of information was not effectively sifted – and Pearl Harbor was not predicted.  There would seem to be a lesson in analytics there somewhere.

Rumsfeld somewhat famously circulated the book in Washington months before the Sept 11 2001 terrorist attacks, and he has a Xeroxed copy of the forward at hand when he meets Silver. After the fact, the Wohlstetter book's  theme seemed applicable, in Rumsfeld's – and, perhaps, Silver's - estimation, to Sept 11. And it may have formed a backdrop for Rumsfeld's ridiculously sublime rumination on known knowns, known unknowns and unknown unknowns, another  variation of which (the unknown known) form the title of Errol Morris' new film, which is what I came here to tell you about.

I call Rumsfeld's 'unknown unknown' wordsmithing ridiculously sublime because, upon viewing Morris' film, I conclude that Rumsfeld's 'understanding' of the Pearl Harbor lesson was more misunderstanding – was more a willful, spiteful and devilish confabulation of analytics. He took a bit of truth and with some technical exactitude mis-applied it to the case of Iraq and its purported troves of weapons of mass destruction, for his larger purpose (political bias) of, well, say, shaking up the Middle East. He took the idea that the Pearl Harbor debacle was caused by failure of imagination, and imagined a fabled debacle all his own. Prediction provides some very special care, evoking a rework of Bob Dylan line: "To live outside of time you must be honest."

"The Unknown Known" is not quite on par with Morris' portrait of Viet Nam era Defense Secretary Robert McNamara as a film and a story, the protagonist elicits less empathy in this viewer, but it is worthwhile in its probing pursuit of logical understanding – in its analysis. Also, like the earlier 'Fog of War', it has some nifty animation. - Jack Vaughan

Sunday, March 30, 2014

Encryption and differential privacy discussed on way out of NSA sinkhole

The U.S. government found itself in a very defensive position vis-à-vis data privacy in the wake of Edward Snowden's NSA disclosures. In January, Pres. Obama promised to appoint a group to look more deeply at U.S. Intelligence programs, which acted as if by fiat from 9-11 on. A recent MIT event took a look at encryption and differential privacy technology as part the review effort.

The latest on that is an Administration proposal to turn over the storage of phone records to phone companies, and to tighten the requirements for subpoenas thereof.  One doesn’t necessarily get a warm feeling on that… but some long time NSA watchers see it as a step forward.

When Obama charged John Podesta, long-time Democratic operative and now White House Counselor, to head the study group, he also said to look at big data commerce and its potential to threaten civil liberties.
The White House enlisted academics, including MIT's Computer Science and Artificial Intelligence Lab Big Data Initiative group, as part of that effort.  In March I covered a related workshop on “Big Data and Privacy: Advancing the State of the Art in Technology and Practice” and, together with colleague Ed Burns, reported this on a SearchDataManagement.com Talking Data podcast.

Both Burns and I felt the MIT conference was a bit high on the technology side (encryption and differential privacy being prominent) and bit low on the privacy side. The notion that data is like the "new gold" or the "new oil" seems overblown, until you see a room full of policy and commerce people discussing how much data is going to change the world as we know it. Whether they are right or wrong is less important than the palpable sense something akin to gold or oil ''fever'' is in the air.

Podesta had planned to attend the event, but was hampered by snow in Washington (although one might guess that, this being the weekend of the Russian Crimean Peninsula incursion, staying close to the White House was wise). He spoke with the assembled by teleconference. Below are some riffs from his published remarks. – Jack Vaughan

"…one purpose of this study is to get a more holistic view of the state of the technology and the benefits and challenges that it brings.  This Administration remains committed to an open, interoperable, secure and reliable internet – the fundamentals that have enabled innovation to flourish, drive markets and improve lives.  

"There is a lot of buzz these days about “Big Data” – a lot of marketing-speak and pitch materials for VC funding. 
"(But) the value that can be generated by the use of big data is not hypothetical.  The availability of large data sets, and the computing power to derive value from them, is creating new business models,
 "With the exponential advance of these capabilities, we must make sure that our modes of protecting privacy – whether technological, regulatory or social – also keep pace.

Related
http://cdn.ttgtmedia.com/rms/editorial/sDM-TalkingDataPodcast-March31-BigDataPrivacyWorkshop.mp3


Saturday, March 22, 2014

Through the scanner darkly, darkly; and the future of information

scanner eye by jvaughan
The digitization of everything is an elixir for some people. It spawns visions. If we could only open up all the data…how about taking the college facebook and putting it on line …. why not street-level and satellite-level photos of every home in the U.S. of A. Ok! Build and sell a picture database of all the license plates on all the cars and trucks on the road? Gee, I don't know. The Department of Fatherland Security recently moved to create a national license-plate recognition database to garner data from commercial and law enforcement tag readers. Then, with NSA skulduggery still a little too current, they canceled it a' sudden. Note that commercial tag reader systems remain out there. DRN or Digital Recognition Network provides "data that puts your company in the driver's seat" helping you repo your assets (e.g., cars) and reduce asset charge-offs. Together with Vigilant Solutions of Livermore, Calif., the company is fighting a Utah law that banned the private, commercial use of the license plate scanning technology. DRN was the only speaker at a hearing on the topic at the Mass State House earlier this month. They see it as their first amendment right to make money taking pictures of stuff. When you think of all the big data uses of license plates beyond immigration, repossession, well its boggling. Probably their more big data apps to come, that we cant even think of, but why not collect the data for that big day in the future? The undercurrent is, if I don’t want the NSA or DFS to do it, why would I want some Starbuck's guzzling nerdster to? Re-jiggering of status quo is what massive levels of data can do. Google has met a few people who don't want pictures of their houses in Google's database and, apparently, will remove them if you ask. I don't think First Amendment rights to take pictures are a foundation for massively scaled reproduction, and would not my license plate in some software company data services offering. In "Who owns the Future," Jaron Lanier lays down some framework for a more credible understanding of where we want to go with data and privacy. By asking questions of the future he takes a sharper picture of the present: "... as technology advances i this century, our present intuition about the nature of information will be remembered as narrow and shortsighted." - Jack Vaughan

Related 
http://www.foxnews.com/politics/2014/02/19/dhs-plan-for-national-license-plate-tracking-system-raises-privacy-concerns/
http://www.googletutor.com/asking-google-to-remove-your-home-from-maps-street-view/
http://betaboston.com/news/2014/03/05/a-vast-hidden-surveillance-network-runs-across-america-powered-by-the-repo-industry/
http://consumerist.com/2014/03/05/the-repo-man-might-be-scanning-your-cars-license-plate-and-location-selling-the-data/
http://www.drndata.com/Content/Docs/DRN%20Vigilant%20Utah%20Press%20Release.pdf
http://www.drndata.com/
http://vigilantsolutions.com/




Saturday, March 15, 2014

I have heard all about Grantland



I'm reading interesting book called Talk Nerdy to Me. This is by the ultra-hip Grantland (as in Grantland Rice) crew whose totally cool cat's sportswriting on the web packaged here bears the blistering subtitle of "Talk Nerdy to Me: Grantland's Guide to the Advanced Analytics Revolution" (sold out says the site today) and it is a more interesting take on big data analytics then many other tomes that you maybe have anted up for. Let's start with "Belichick's Fourth and Reckless" by contributor Bill Simmons. The story centers at times maniacally but on Patriots coach Bill Belichick's famously strange call on fourth-and-2 on November 15, 2009 against Peyton Manning and the Baltimore Colts. It trumps many other coaching failures in Boston sports  fabled history of failures, he writes. It was such a strange call – the Pats were on their own 28, with less than 2 minutes to play and a lead - that people began to look at the statistics trying to see what was in the coaches – the great, mind you - mind. Simmons goes over some of the stats and pretty well proves how at times statistics can lie, or at least outsmart the lazy intellectual (you know the type that works for media!)Bellichick's crazy gambit had backing in stats. "Bellichick did play the percentages if you took those percantages at face value." But Simmons points out for example, that statistics (that going for fourth down had an 80.5 % chance of succeeding) don’t account for the obvious confused funk that had descended on the Pat's in that final quarter.  That there is a big difference between fourth-and-2 on a Sunday in September against a lazed Falcons outfit than there is in November against Petyon Bloody Manning and the Colts. Stop and grok on this:

"I know it's fun to think stats can settle everything, but they can't and they don’t."

If you are playing the statistics card, which one do you choose? Writes Simmons. There are all sorts of statistics to count, but which are the ones to count on? Pulling out all the stop here I am going to recall Mark Twain, or maybe Vin Scully, plenty of argue over who said it:

Statistics are used much like a drunk uses a lamppost: for support, not illumination.

Beware, you would be masters of the big data universe! I said that. - Jack Vaughan

Saturday, March 1, 2014

Duck duck goose

Today's clamor around big data will one day subside. Like the love affair in Cole Porter's Just One of Those Things, it is ''too hot not too cool down''.  It is a sort of process;  vendors and media builds things up and then break things down again. Take as example a recent New York Times story entitled "Big (Bad) Data." The item revolves around the case of A&E's Duck Dynasty star Phil Robertson. His antigay comments in a magazine article went viral on Twitter, and A&E execs, as if in the thrall of big data analytics, suspended him from the show. Then, the Twitter sentiments rebounded, big data was recalibrated, and Robertson was back in. The Times' story suggests the first response was wrong, the second right. But time may prove otherwise. This episode in review is hardly an indictment, although that is how the writer or his editors would have it.  The advent of big data does not obviate the need for exes to have full liberal educations with philosophy, ethology, ethics and economics studies under their belts.  The execs of A&E give vent to the old saw: If you don't know where you're going, any road will take you there.

Saturday, February 22, 2014

Going down to Stasiland

In the era of the Stasi there was an overarching NSA style apparatus. Projections  were mainly of 'grays and dour greens.' Citizen informants were networked nodes. These were the browser cookies of the time. But they were embedded not in a browser but in a physical world. This was when humans were computers, or computers were humans, have it as you will. It may have been an apex of sorts, though "the jury is still out." The platform as you might say was East Germany, or GDR  (1946-1990). Then and there was created an internet of spies, detecting on their neighbors – on each other – with fault tolerance, high availability and cyclical redundancy checking. Psychological Zesetung, or Brainwave Decomposition, was a common application type. There were 8 Stasi agents for every 5 citizens. Today the file of the concern is available on the world wide zeb. The Stasi Records Agency (BStU) is responsible for making the records of the State Security Service of the former GDR accessible to the public. Every individual has the right to request to view his own personal file. There are files on Michael Jackson.  In 1954, Angela Merkel was born in the West but soon her family moved into Stasiland. Her pater was a Lutheran minister, which made young Angela and outsider in an outside land. Her family's ability to visit the West made her father's connections suspect. Her grade in compulsory Marxist-Lentilist education was 'sufficient – or passing'  (making her a student much like me-oh!) One day Merkel became presient of reunified Germany. When she discovered the U.S. N.S.A. was tapping her cell phone, a slew of 'grays and dour greens' danced in her momentarily fevered and dizzy field of vision.

Sunday, February 16, 2014

Today's Data Drippings

Data as an enthusiasm or even hobby is in the air. As noted in an Economist article (Briefing: Clever Cities: The Multiplexed Metropolis –Sept 7 2013, p.21. ) But does close inspection of the results to date tell us the enthusiasm is warranted? Is this truly like the introduction of electricity to the city? Who benefited from the introduction of electricity, and if data is as powerful a game changer, who will benefit most on this go-round? "The importance of political culture will remain," according to the anonymous Economist writer (Ludwig Siegele).

Saturday, January 25, 2014

Nist data symposium

Nist is looking into Big data and measurement thereof. Upcoming is a symposium in March. Symposium Topics:
Understanding the Data Science Technical Landscape:
Primary challenges in and technical approaches to complex workflow components of Big Data systems, including ETL, lifecycle management, analytics, visualization & human-system interaction.
Major forms of analytics employed in data science.
Improving Analytic System Performance via Measurement Science
Generation of ground truth for large datasets and performance measurement with limited or no ground truth.
Methods to measure the performance of data analytic workflows where there are multiple subcomponents, decision points, and human interactions.
Methods to measure the flow of uncertainty across complex data analytic systems.
Approaches to formally characterizing end-to-end analytic workflows.
Datasets to Enable Rigorous Data Science Research
Useful properties for data science reference datasets.
Leveraging simulated data in data science research.
Efficient approaches to sharing research data.

http://www.nist.gov/itl/iad/data-science-symposium-2014.cfm

Saturday, January 11, 2014

IBM shows its plan to move Watson forward

The IBM Watson supercomputer has garnered a lot of attention in recent years, but it's entering a particularly critical passage now. What happens next could influence the future paths of data analytics generally, and IBM specifically -- for better or for worse. This week, IBM showed its plan to move Watson forward. Virginia Rometty, the company's chairman, president and CEO, said IBM would invest more than $1 billion in a new business group dedicated to commercializing Watson. That figure includes $100 million for venture investments to create an ecosystem of application developers and other business partners. The challenge, though, will be to take highly technical machine learning software from the lab -- and the game show milieu -- to the business mainstream.

 http://searchdatamanagement.techtarget.com/opinion/For-IBM-Watson-no-easy-answers-on-commercial-cognitive-computing