Monday, December 26, 2016

Big Data Pyscho

What's going on behind that Facebook quiz? Cambridge Analytica gets a look at personality scores and, thanks to Facebook, gains access to their profiles and real names. The firm sells analytics, data, profiles. It's what they call big data psychographics.

The big data world that I work in has a sort of a hangover going on right now – some dizzy blur after a long period of heady growth. Something happened in the way of a Godsmack, on the road to Antioch, in the shape of Brexit and Donald Trump’s surprising rise. For many a wonks, it is no surprise.

Anyone who looks objectively at the data analytics of this or any day knows there is plenty of room for mistakes. As with any hot technology, there's also a lot of space for hyperbole. The journalists’ job is to keep an eye on the chance of failure at the same time he reports the assertions of people making waves with that hot technology.

It is therefore a good time for us to consider the recent article penned by Sue Halprin for the New York Review of Books, which starts with a vignette describing the number of data points - 98 - that Facebook collects on each of a gazillion members. There is some hilarity, as the writer uncovers the false persona a Facebook might construct about here – or you, or me.

Halpren learns by digging into Facebook that the uber site mistakenly views here a guy, probably a gay guy because she tend to evince gay guy characteristics. That is one that algorithm hath writ because Halpren reads The New York Times (and the New York Review of Books.

She writes that the big data proponents want us to believe that data analysis will deliver to us a truth that is free of messiness or idiosyncrasy. Truth is full of such, but humans are prepared to gloss over.

Data science today tends toward the reductive – it puts people in compartments. Studies prove this! And underlying the whole big data wave is advertising. Which has always had an aspect of whimsy and subterfuge? In the days of old, we sent our children to school to learn this to protect them. To often now the kids are sent to the better schools to figure out how to exploit the subterfuge. According to Halpren, we need to recognize the fallibility of human beings is written into the algorithms that they write. - Jack Vaughan

They Have Right Now Another You -  NYRB


Thursday, December 1, 2016

Hedger with time on hands bets he can improve boffin computing

Retired billionaire hedge fund manager James H. Simons will fund a research institute to apply advanced computing techniques to scientific problems.

New York Times story by Kenneth Chang, says Simons feels he has identified a weakness in academia, where science students in research so often turn to computer programming only because it is necessary to their research. 

As they move up or out of their profession their software tool creations go too. No V.2.'s 

The software that derives from the “Flatiron Institute’s” efforts will be made available for all scientists, it is said. Up first: Computational biology. Big data analytics seems to be a special focus. 

I am not sure about the premise. So many great programmers started as students in the sciences! So much in high performance computing was driven by academic scientist too. 

Many of the recent advances in big data have happened beyond the ken of science and academia, it’s true. But Spark? Machine learning? Well, much of that work came out of the academy. 

From a press release:

The FI is the first multidisciplinary institute focused entirely on computation. It is also the first center of its kind to be wholly supported by private philanthropy, providing a permanent home for up to 250 scientists and collaborating expert programmers all working together to create, deploy and support new state-of-the-art computational methods. Few existing institutions support the combination of scientists and programmers, instead leaving programming to relatively impermanent graduate students and postdoctoral fellows, and none have done so at the scale of the Flatiron Institute or with such a broad scope, at a single location...The institute will hold conferences and meetings and serve as a focal point for computational science around the world.


Would it be good to have a new effort that served as a new hub for advances in scientific computation? Yes. This will be an interesting development to watch. – Jack Vaughan

Tuesday, October 25, 2016

Crunch time, Capt.

Monday, September 12, 2016

Sunday, August 14, 2016

Moonshot calculations

Catching up with some reading (They promised us jet packs, New York Times Sunday, July 24, 2016). It is discussing Google’s (Alpha’s) shifting strategy regarding Moonshot VC research endeavors. Scattered about in accompanying pictures are erector sets, oscilloscopes, physical things.

Where fail fast once was the mantra it now is fail faster yet.

Head Xman Astro Teller says:  “If you actually want to make the world better – then do what actually makes the world better – and the technology will take care of itself.”

Mr. teller speaks https://m.youtube.com/watch?v=2t13Rq4oc7Aat Ted

The key to technology assessment is to segment according to technology employed, vendor
And end use application. Must important may be end use application. But it is not wholly logical. For Google the end use application of the Killer Kilwauskee variety is still advertising. Which is so very based on psychology, or voodoo economics.

Sunday, July 24, 2016

Notes for a future article on Mahout

Mahout is changing. Its changed over the years from recommender to core engine for the math part.. What users do is put a surface on it, and tweak the algorithms. Contrast that to using a product like Datameer. It may relatively be a black box in terms of how it does what it does. But there is some assurity that it is a path that is tested and you should have less of an adventure in implementing it.  Just as you may enlist vendor field engineers in order to do the implementation, you may get to a place in your Mahout build, and opt to bring in a consultant.
Let's look http://www.slideshare.net/chrishalton/build-vs-buy-strategy at build v buy basics.

Algorithms, audits and CDOs

Monday, July 4, 2016

Simplexity kilt the cat


Simplexity by Jeffrey Kluger (subtitled Why Simple Things Become Complex and how complex things can be made simple). The book by the then (2008) Time reporter describes a ''newly emerging Science"(Maybe si, maybe no) meant to provide a cross disciplinary view on systems ranging from ant colonies to stock market. The "SFI" San Jose Instituite is the sort of anchor source (let by Prof Gell-Mann). Along the way (p31) he speaks w Brandeis economist Blake Lebaron who has been studying simulations of stock market trading. He has seen repeating patterns: 1-Traders wobble about; 2-one finds a useful stratagem; 3-the others mimic it; 4-diminishing returns set in. THe repeating patterns I think may be relevant to the general flow of the tech bubble that many of us live within. Whether it is ASPs or Deep Learning or Hadoop - a scatter occurs in the discover stage, then, a coalescence around a mean ensues, until a new scatter occurs in a new discovery stage. Think Whac-A-Mole.

  https://www.amazon.com/Simplexity-Simple-Things-Become-Complex/dp/B002YNS18EComplex/dp/B002YNS18E

Simplexity kilt the cat

Simplexity by Jeffrey Kluger (subtitled Why Simple Things Become Complex and how complex things can be made simple). The book by the then (2008) Time reporter describes a ''newly emerging Science"(Maybe si, maybe no) meant to provide a cross disciplinary view on systems ranging from ant colonies to stock market. The "SFI" San Jose Instituite is the sort of anchor source (let by Prof Gell-Mann). Along the way (p31) he speaks w Brandeis economist Blake Lebaron who has been studying simulations of stock market trading. He has seen repeating patterns: 1-Traders wobble about; 2-one finds a useful stratagem; 3-the others mimic it; 4-diminishing returns set in. THe repeating patterns I think may be relevant to the general flow of the tech bubble that many of us live within. Whether it is ASPs or Deep Learning or Hadoop - a scatter occurs in the discover stage, then, a coalescence around a mean ensues, until a new scatter occurs in a new discovery stage. Think Whac-A-Mole.


https://www.amazon.com/Simplexity-Simple-Things-Become-https://www.amazon.com/Simplexity-Simple-Things-Become-Complex/dp/B002YNS18EComplex/dp/B002YNS18E

Thursday, June 30, 2016

The nature of field data gathering has changed

The nature of field data gathering has changed, as mobile devices and notepad computers find wider circulation. Surveys that once went through arduous roll-up processes are now gathered and digitized quickly. Now, a new stage of innovation is underway, as back-end systems enable users to employ field data for near-real-time decision making. An example in the geographic information system (GIS) space is ESRI's Survey123 for ArcGIS, which was formally introduced at ESRI's annual user conference, held this week in San Diego. To read the rest of the story.


See also Be there when the GIS plays Hadoop

Tuesday, May 10, 2016

Less Moore's Law in Store

Quantum computers wait in wings as
Moore's Law slows to a crawl.
Source: IBM
Fair to say our sister blog turned into "The Saturday Evening Review of John Markhoff" a long time ago. Well, at Amazing Techno Futures, the news feeds are good - and we could do worse than to track John Markhoff, who has been covering high tech at NYTImes for lo these many years. And I will not turn into a pumpkin if I hijack my own hijack of John.

For your consideration: His May 5 article on Moore's Law. He rightly points out this at inception was more an observation than a law, but Intel's Gordon Moore's 1965 eureka that the number of components that could be etched onto the surface of a silicon wafer was doubling at regular intervals stood the test of what today passes for time.

The news hook is a decision by the Semiconductor Industry Assn's to discontinue its Technology Roadmap for Semiconductors, based I take it on the closing of the Moore's Law era. IEEE will take up where this leaves off, with a forecasting roadmap [system] that tracks a wider swath of technology. Markhoff suggests that Intel hasn't entirely accepted the end of this line.

Possible parts of that swath, according to Markhoff, are quantum computing and  graphene.  The heat of the chips has been the major culprit blocking Moore's Law further run. Cost may be the next bugaboo. So far, parallelism has been the answer.

Suffice it to say, for some people at least, Moore's Law has chugged on like a beautiful slow train of time. With the Law in effect people at Apple, Sun, Oracle, etc. could count on things being better tomorrow than they were today in terms of features and functionality. So the new future, being less predictable, is a bit more foreboding.

I had my uh-ha moment on something like this in about 1983 when I was working on my master's thesis on Local Area Networks. This may not completely be a story about Moore's Law.. But I think it has a point.

Intel was working at the time to place the better part of the Ethernet protocol onto an Ethernet controller (in total maybe it was a 5-chip set). This would replace at least a couple of PC boards worth of circuitry that were the only way at the time to make an Ethernet node.


I was fortunate enough to get a Mostek product engineer on the phone to talk about the effect the chip would have on the market - in those days it was pretty much required that there were alternative sources for important chips, in this case Mostek. The fella described to me the volume that was anticipated over 5 or so years, and the pricing of the chip over that time. I transcribed his data points to a graph paper, and, as the volume went up, the price went down. Very magical moment. - Jack Vaughan

Sunday, April 3, 2016

Are dark pools coming to big data?

In Feb Barclays and Credit Suisse settled with the SEC which uncovered their nefarious high frequency manipulations in their dark trading pools. What’s with that? To go figure a good place is Flash Boys. But it is not an easy read.

Flash Boys delves into the netherworld of Wall St trading in the 2000s – where the devil is in the latency, and professional ethics is shit out of luck. Writer Michael Lewis paints a picture of an obsessively complex world of finance that attracts the underside of human aspiration. That echoes The Big Short, his earlier piece and a quite successful film in 2015.

But here the technological complexity that serves the finance engine rather gets the better of the story - ultimately Flash Boys pales a bit in comparison to The Big Short, as a result. We have a worthy hero at the center of the tale in Brad Katsuyama of Royal Bank of Canada, but the story can be stumbly as it tries to convey his efforts to uncover the culprits in the dark pools of high frequency trading – that would be the people that have the wherewithal to eavesdrop on the market,  spoof your intention, and buy stock in mass quantities at prices slightly lower than what you will pay to buy it from them. Brad could be the Cisco Kid that heads them off at the pass – if the road to get there weren’t so durn rocky.

 I’d suggest that many of the wonders of big data today resemble the wonders of stock market technology that front runs it. Publish and subscribe middleware and fantastically tuned algorithms are common to both phenomena.  Network latency can be the boogie man in both cases. Yes, while nearly no one was looking, online big data made a high frequency trading market out of advertising. The complexity is such that few can truthfully claim to understand it. And that lack of understanding is an opening for a con, as it was in The Big Short and the Flash Boys. When you believe in things that you dont understand then you suffer. - Jack Vaughan

Sunday, March 20, 2016

On the eve of the White House Water Summit

From On the Waterfront
References to The Manhattan Project ( for example, "We need a new Manhattan Project" to address fill in the blank)  are overdone. But we need something on the order of something to deal with water. California knows what it is like to live with this life blood threatened – Israel too. It is good cast attention on it – and that might happen to some extent this week as The White House Water Summit takes place.

One of the issues that must be addressed is data about water. It is not as good as data on oil, or stocks, but it should be. In the New York Times op-ed column Charles Fishman writes about water and data, and how weak efforts are to categorize, detail and report on water use.  

Imagine if NOAA only reported on weather every fifth day. That is analogous to the water reports of the U.S. government, according to Fishman, who says, where water is concerned, we spend five years rolling up a report on a single year. The biggest problem, says Fishman, is water's invisibility, here and globally.

He focused on the fact that water census is done only every five years - that gives us only 20% view of the total water experience. He points to Flint, Toledo, the Colorado basin as recent water crises and notes that adequately monitoring the water doesn't assure results, but that inadequately monitoring the water is criminal what with so much monitoring of Wall Street, Twitter Tweets or auto traffic. Any call for more monitoring of course is up against today's version of the 1800's Know-Nothing movement.

Fishman tells us that good information does three things: 1- it creates demand for more information; 2- it changes people's behavior; and, 3- it ignites innovation.

But what is next? My little look-see into this area uncovered an overabundance of data formats for representing data. It seems a first step for water data improvements might come with the application of  modern big data technology to the problem of multiple formats.


Sunday, March 13, 2016

Four tales of data


Scenario - Four young students on Spring Break go to Honduras in search of the roots of big data. Each comes back with a story. Together they tell of a struggle entwined. Look at data anew through the eyes of four groovy undergrads. Not yet rated.


Oceans of data - Hundreds of meters below the surface of the ocean, Laura Robinson probes the steep slopes of massive undersea mountains. She's on the hunt for thousand-year-old corals that she can test in a nuclear reactor to discover how the ocean changes over time. Big data is her co-pilot

https://www.ted.com/talks/laura_robinson_the_secrets_i_find_on_the_mysterious_ocean_floor

Lord, bless your data - Thomas Bayes writer F.T. Flam [I am not making this up] says set out to calculate the probability of God's existence. This was back in the 19th Century in jolly old England. The math was difficult and really beyond the ken of calculation of the time - until the recent profusion of clustered computer power came around the corner in the early 2000s.

https://en.wikipedia.org/wiki/Thomas_Bayes

Autopilot and the roots of cybernation - Elmer Sperry's son Elmer’s son, Lawrence Burst Sperry, nicknamed Gyro, was best known for inventing the autopilot utilizing the concepts developed by his father for the gyroscope. Alas he was lost off the Channel when only 31.

https://www.pinterest.com/pin/568649890431580951/

I was just trying to help people. The story by Veronique Greenwood tells us she wrote of her experience in a letter to the New England Journal of Medicine, and was subsequently warned by her bosses not to do that kind of query again. Assumedly HIPPA privacy concerns are involved – so get out the anonymizer and gun it, right?!

http://itsthedatatalking.blogspot.com/2014/10/calling-dr-data-dr-null-dr-data-for.html

-----

Data does baseball


Brought to you by MBA@Syracuse: Tools of Baseball Analytics

Data, Humans on the Side




Good things - The recent PBS show The Human Side of Data has much to recommend it. As someone who labors in the big data vineyard as a reporter and commentator, I appreciate its succinct high level view on one of the defining memes of now. I had the chance to speak with my colleague Ed Burns on the topic for the Talking Data Podcast, and thought I’d add to the discussion here.

There were some beautiful animated pictures of data crossing the world – be it air traffic in a normal day or one tweet spreading. A theme was that humans had to create narratives around the data (per Jack Dorsey), and to follow the trail from the data point to the actual real-world event (Jer Thorpe). What makes a culture collectively change its view of data? - one participant asks. What is the context of the data? – several query.




Cause for pause things - And that takes us to an underlying issue with the show.. which is that there is this unspoken progressive notion that we are getting better – that a Ivy League youngster who studied statistics and grew up with the Web for pablum, soda, and bread can do better than the dimwits that went before. It could be true. But correlation is not causation.To phrase a coin. -Jack Vaughan

Wednesday, March 9, 2016

On the machine learning curve

There was an article in the New York Times today I thought I might mention. "Taking Baby Steps toward Software that Reasons like Humana" (below) by John Markoff is a an articulate look at what I call machine learning. .

The story considers what is going on these days as a re-vitalization of artificial intelligence, which bloomed in the 1980s and then faded from headlines, and I agree. I think the story conveys, in a way, that there are some similarities.

The story looks doesn’t use the term 'machine learning' – tho it does mention pattern recognition (somewhat synonymous), deep learning and deep neural nets  .. which are fairly similar. What I think that emphasizes is that today's 'machine learning' is basically a new take on neural networks.

And as such machine learning faces hurdles -because flaws that stymied AI, still remain to be addressed. As Markoff writes "generalized systems that approach human levels of understanding and reasoning have not been developed."

What he doesn’t say is that the people that sell these things today tend to gloss over that, same as their counterparts 'back in the day.' That is not to criticize this particular work, which necessarily has a limited objective.

The story doesn’t use the term 'cognitive computing' either. But it talks about things – Q&A systems, natural speech processing - that combine with 'deep learning' to create cognitive computing.


Taking Baby Steps Toward Software That Reasons Like Humans

By JOHN MARKOFF MARCH 6, 2016 

Richard Socher appeared nervous as he waited for his artificial intelligence program to answer a simple question: “Is the tennis player wearing a cap?” The word “processing” lingered on his laptop’s display for what felt like an eternity. Then the program offered the answer a human might have given instantly: “Yes.” ....

Saturday, February 6, 2016

First Thought - When data goes mystic or, Data Myth

Thinking a bit about the past. And how we got here. Ruminating on the passing of Paul Kantner, and contemplating his dogged clutch to a futuristic transcended science fiction vision. And where a lot of my impressions of technology emanate initially from .. Shannon the telegraph messenger, Wiener the cyberneticist, the Alchemists, neural nets.

Have taken time too to track back and visit Lewis Mumford, who I probably have only read before but once or twice removed. His Myth of the Machine –while rich but scatter plotted – sets a backdrop for the present moment of machine learning and the rebirth of AI – much as Kantner’s Wooden Ships does.

Maybe by riffing on Mumford I can characterize my moody interpretation of technology better. I have been a mystical spin on technology for a very long time. And therefore I go back to another point in time to start over again via Lewis Mumford

Who's book is purply impenetrable – as much about science as Benedictine monks’ fermenting cheese or Micky Mouse’s sorcerer’s apprentice’s broom.

Mumford can see the days of yore that now escape us. He sees  the envy of the birds in the desire to conquer the air in the myth of Icarus, the flying carpet in the Arabian Nights, or the Peruvian  flying figure of Ayar Katsi. [The index to They Myth of Machine is like the debris of a cruise ship in the Sea of Saragossa.]

Mumford notes that literate monks like Bacon and Magnus (the ones on the cusp of alchemy and modern science- when clockwork elements began to show the path of automation) like da Vinci did visualize elements that are still fodder for the Astonishing Tale tokens of our day -  incredible flying machines, instantaneous communication, transmuting of the elements. He notes too how magically influential still the dynamo and the talking machine were as he wrote (late 1960s).

Mumford mentions Thomas More, and Utopia, Bacon and The New Atlantis, in depicting the machine itself as an alternative way of reaching heaven. Language for him is a disease with symptoms we see as dream symbols that become imposing metaphors that like myth rule. You can only filter what you see using the commanding metaphors of your age, he suggests. And the machine is that which bugs Mum.

What is the myth or master metaphor of today? The belly I labor bacteria like in is made of the myth of data. Data bears resemblance to penury as described [p.274] by the Mummer man [who by the way had not too kind comments for contemporary Marshall Mcluhan.] Ask the people who sued Netflix for using their data in an A/B machine learning contest.  What do they think of mystic data? Or, the myth of the machine? –J.V.

Sunday, January 17, 2016

R. Crumb's Sweet Shellac - American Black String Bands Of The 20's & 30's



One day, some strange transmissions from the 1930s wandered into the Data Data Data antennae.

L-R. Robert Johnson, Robert Johnson, Robert Johnson and Robert Johnson.

Dixieland Jug Blowers, Banjoreno. Chicago 1926



One day, some strange transmissions from the 1930s wandered into the Data Data Data antennae. Here, the longtime theme of late Ray Smith's Jazz Decades Radio Show.

Macon Ed & Tampa Joe- Warm Wipe Stomp



One day, some strange transmissions from the 1930s wandered into the Data Data Data antennae.I misheard it as Warm White Stomp, but I know better now.

Saturday, January 9, 2016

Bats in the machine


Scene from the immortal serial "The Batmen of Africa.
A lot of the chatter about machine learning is ‘bright shiny thing’ chatter. Like Tim the Tool Man with a SnapOn Tools catalog, the proponents rush ahead of the reality, painting pictures of distributed server farms iterating blissfully. Aside from the low hanging fruit of the recommendation engine, what applications are there?

An October 2015 issue of the IEEE Spectrum (always a favorite publication) takes the time to look at machine learning from an applications point of view. And a practical application at that. No, its not about reaping in big profit. It is about doing science. And trying to improve on governments' response to Ebola, which in its last outbreak took more than 11,000 lives.

Researcher Barbara Han tells about her use of machine learning algorithms to try and predict which reservoir species- bats or others- could come to harbor a disease like Elbola. Let’s not sit around and wait for the next epidemic, she suggests, let’s instead use computer power to try and foresee the path ahead.

Han said she and her Cary Institute of Ecology colleagues used machine learning to go through vast mounds of unstructured data about wildlife - trying to identify the traits that might alert us to possible ebola-disease-type sources. Along the way she provides a pretty succinct description of the machine learning processs. In machine learning the steps are:

1. Obtain training data set
2. create an initial classification tree
3. split the data set into two groups using randomly selected features - in this case for example, body size, which for the little varmints she is parsing could be ‘under or over’ 1KG
4. Use algorithm to build a second tree that prioritizes misclassified species in the attempt to sort them correctly
5. Repeat. Iteratively. Generate thousands of trees. See the classification accuracy improve.
6. When you get the model performing well on training data, then you
7. Make predictions, using the rest of the data set.


The interesting thing is that you're not waiting for the outbreak to react. Your machine does not get bored with the rote tasks. With ecological systems that have many many variables this is especially useful. I wonder of course if we arent ultimately driven by catastrophe.(batastrophe?)  We have a lot of data on those African bats - the researchers have gathered 30,000 individuals from hundreds of thousands of samples. Of course, The Cary group's  work of prediction, which guesses at likely newcomer carriers, would at least give a leg up. Han says they can suggest areas 'where to look for trouble'' - a model that looked at 58 rodent species could become culprits, they say. And a possible trouble spot could be in the Great Plains of the U.S., spanning from Nebraska south to Kansas. - Jack Vaughan

Related
http://spectrum.ieee.org/biomedical/diagnostics/the-algorithm-thats-hunting-ebola

Wednesday, January 6, 2016

Escape from the glass house - thoughts on Gates and VB

Sometimes, this blog will venture into the deep past... In 2006, the news that Bill Gates was about to retire made me think. His move to give away money to good causes and to gradually remove the heavy yoke of incredibly unbelievable wealth, had given pause to some of us. Gates was championed by many, and criticized by many too. My long-time colleague, Rich Seeley drolly summed it: "This may be like when Ali left boxing; software may never be as fun without Gates to kick around." At the time, I wrote: I'd been in the hardware trade press for 10 years when my boss assigned me to cover a Microsoft product rollout in Atlanta. Call it a simple twist of fate. It was 1991. Of course I'd heard of Bill Gates, but he was in the software business, and of just about no interest to us. If he'd been doing assembler, of course, that would have been of a whole lot of interest, but he was doing Basic, which "real men" didn't do, back in those hardware circles. But the company was on the rise, and the boss sent me. The product rolling out, in fact, was Visual Basic, which has just lately turned 15. There was tension in the air at the launch as we waited for the keynote speaker. Then, Bill Gates came out and just about everybody stood up and cheered clamorously. In those days hardware trade journalists didn't applaud (politely or otherwise) at the end of an industry executive's speech, much less stand up when they just appeared on stage. So I covered the story of the birth of Visual Basic, and had one eye on the rapt audience as I did so. Later on I caught on to the fact that Bill Gates had become the richest man in the world and people were fascinated by that mere fact. Of course, there was real excitement about Visual Basic and Microsoft because the software was enabling for people who grew up during the batch processing era, when gatekeepers in smocks stood between you and the problem you wanted to crunch on.

Read the rest of the story - Halcyon days of the VB scripters