Tuesday, December 29, 2015

Looking back at 2015


  • Watson in 2015


    In 2015, APIs for IBM's Watson system were front and center as a means to bring cognitive computing applications to a broader corporate audience. "People don't necessarily want to buy a million-dollar system to run Watson," IDC's David Schubmehl said. "But PaaS is well-suited for cognitive platforms. People can use Bluemix services and start working with one or two APIs, rather than use the whole system." He added that IBM's March 2015 purchase of AlchemyAPI -- a deep-learning AI technology startup -- was also notable, as it brought to Big Blue a popular set of developer APIs that can help Watson in areas beyond machine learning applications. READ IBM Watson APIs hold key to broader cognitive computing use.


    Author Medina describes cybernetics experiment in 1970s Chile



     Does a 1970s Utopian technology effort offer useful guides for those trying to assess the progress of new technology today? In one case, at least, yes. It is the story of Salvador Allende's attempt to build a working Socialist government in Chile with computer cybernetics.

    The tale is told especially well, under the able hands of author and researcher Eden Medina. Medina rolls up the takeaways in a recent article in Jacobin magazine. It is a summary of some important lessons garnered during work on her 2013 book, The Cybernetic Revolutionaries. (Prose continues here.)


    Saturday, December 26, 2015

    TensorFlow makes news in 2015

    On the face of it, it would appear that TensorFlow received an inordinate amount of attention in 015 for just a machine learning engine. But publicity works that way. Google is a $66 billion-per-year company, and buzz automatically goes with that. But a series of pieces by Cade Metz in Wired were fairly illuminating.

    You push the little valve down, and the music goes around, and it comes out.


    It seems that Google open sourcing, to some extent, TensorFlow gave it some lift. My take would be that they would like others to write the Java and JavaScript Notebooks, and create long lists of libraries, to give it the panache of Apache Spark, which is provably hot, due to its metered Apache klingonage. Is it an attempt to take wind out of Spark's sails? Some people say that Spark is general-purpose, and thus not as good for machine learning as is TensorFlow, which has no other use in life but to do machine learning.

    Yet another Wired piece by Metz discusses the upswing in use of GPUs for machine learning. One banner that TensorFlow seems to forward is the use of GPUs for machine learning. It seems to be a growing meme.

    One wag opines that TensorFlow smells like something that might work autonomous car data. In the back of my mind I can hear the words of someone who told me the difference between IBM and Watson and Google and TensorFlow is that the former is about software to enable enterprises to make consumer products, while the latter is about making consumer products. [Just last week, Google said it would launch such a vehicle with Ford. Which could play to the idea that it doesnt want to make cars, it wants to get data on drivers.]

    As described Metz's article deep learning is the same as machine learning. Metz points out that the new goop in the secret sauce is the increase in both available processing and available data – suggesting that the algorithms are not dramatically different than in the past. [Although the author goes on to say the algorithms are evolving, and that gifted individuals are behind that evolution.]

    Author Metz and source Lukas Biewald of Crowdflower note that Google open sourced the machine learning software, but not the data. Others criticize the fact that, while you can run Tensorflow on your own machine (that could include a GPU board), they are keeping the distributed version to themselves.

    Links
    https://www.youtube.com/watch?v=ENZoY4mLgDE
    http://www.wired.com/2015/11/google-open-sources-its-artificial-intelligence-engine/
    http://www.kdnuggets.com/2015/11/google-tensorflow-deep-learning-disappoints.html
    http://www.wired.com/2015/11/google-open-sourcing-tensorflow-shows-ais-future-is-data-not-code/


    http://www.wired.com/2015/11/googles-open-source-ai-tensorflow-signals-fast-changing-hardware-world/
    https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi-5K0GDYQsrw_oRhC_2hQ0lOp5xC0n1GRxtnpViaFNmriieL3FWkHQRf9QJS_5W7ZMPilxA-MxibJO4A9dygaRSqj0x15klMEstjZ7JDbVtg282JnG8IRd4wTTBEfS828paM2UEQi-Vk4/s1600/cifar10_2.gif
    http://www.tensorflow.org/
    http://googleresearch.blogspot.com/2015/11/tensorflow-googles-latest-machine_9.html

    http://googleresearch.blogspot.com/2015/11/computer-respond-to-this-email.html
    https://www.youtube.com/watch?v=46Jzu-xWIBk
    https://www.youtube.com/watch?v=gY9DewL6Dqk
    http://blogs.nvidia.com/blog/2015/03/18/google-gpu/

    Wednesday, December 2, 2015

    Data Decisions

    Let's not paint big data too darkly. It brings hope -- not just of making money, but of improving social institutions, helping to cure disease and more. However, hope can be accompanied by fear. While privacy is the chief public concern in the new world of voluminous information, there are others, as well. The chance that bad decisions might be made based on misreading big data is one of them. - Jack Vaughan

    read more at http://searchdatamanagement.techtarget.com/news/2240185076/Business-decision-making-must-progress-in-the-age-of-big-data

    Wednesday, November 25, 2015

    Talking Data Podcast ponders analytical algorithms to help save whales

    A right whale and her right calf. Source: NOAA
    In a recent Talking Data podcast guest Kristen Khan of NOAA said she and her colleagues have watched the growth of machine learning projects that identify images. They have wondered: If such analytical algorithms could trim time from what is now a very labor intensive whale identification task, could that free up staff for more proactive efforts to save the whales? A contest sponsored by MathWorks is trying to find the answer. Listen here.

    Tuesday, November 17, 2015

    Architectures from different views


    rdf..its all about the triples.. you nurse it an rehearse it. the idea of a sentence. 





    take a bunch of different types of data from different sources .. put them all together in triples





    on another level you layer it on a physical data infrastructure architecture





    Monday, November 9, 2015

    Tensor

    Let’s look inside a Learning Machine - Google Tensor
    https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi-5K0GDYQsrw_oRhC_2hQ0lOp5xC0n1GRxtnpViaFNmriieL3FWkHQRf9QJS_5W7ZMPilxA-MxibJO4A9dygaRSqj0x15klMEstjZ7JDbVtg282JnG8IRd4wTTBEfS828paM2UEQi-Vk4/s1600/cifar10_2.gif
    http://www.tensorflow.org/
    http://googleresearch.blogspot.com/2015/11/tensorflow-googles-latest-machine_9.html
    http://googleresearch.blogspot.com/2015/11/computer-respond-to-this-email.html
    https://www.youtube.com/watch?v=46Jzu-xWIBk
    https://www.youtube.com/watch?v=gY9DewL6Dqk

    Friday, November 6, 2015

    Machine learning for better medical income on TAP

     Michael Draugelis, is chief data scientist, The University of Pennsylvania, Penn Medicine. He came to this gig in a way roundabout. You see, his wife went into shock while giving birth to their child, Chubsy Ubsy. Mother and child are doing well, but the experience made Draugelis wonder. That's because of his background in US Missile Defense Agency, where they did a lot of work about forecasting clues to impending events. He appeared at Strata + East to discuss all this.

    When he got to Penn Med he set out to focus on Sepsis, an unfortunately leading cause of death for people who go to the hospital to fix something else. His data scientist efforts revolve around something called Penn Signals. "A nerd wonderland," says he. The point is to use evidence-based computing to see who is really really at risk. Right now!

    Now, of course, in reality,  he is doing what all the machine learning people are doing:

    We have some early proof of concepts that have created new signals..that we are feeding back into our algorithms.

    What is next?

    I hope to move on from prediction to auto reasoning.

    That is where his work with Intel comes in.

    The company, which would like to fight disease just as much as it would like to sell chips that go into servers on Web farms, has concluded that big data development, where egg headed data scientists create models in Python etc., that must be thrown over the wall, and rejiggered in Java by 'real' developers, and then sent back to fix, and then back over the wall again etc., was just going to take too much time. Why not more of a cloud development paradigm? Based on open source? Tested by Intel and the like of Penn? It's called TAP, for Trusted Analytics Platform.

    Says our man Draugelis:

    My data scientists need an environment that they can build quickly, select their analytic tools at scale, and a platform that can support it.  We have been excited to work with Intel to explore this new open source project called TAP. 

    As a colleague said, or more precisely asked: " What was the last big software announcement Intel made? And what came of it? And yes the answer is The Intel Hadoop Distribution and what came of it was a ceding of the work to Cloudera.So to coin a phrase: Time will tell. - Jack Vaughan

    For more 

    Go to github to find out what it would be like to be a cloud developer these days.

    Listen to a podcast where we talk with Intel's Vin Sharma about TAP

    You see that embedded video above, this here is the linke to it, like actually. 


    Saturday, October 24, 2015

    AmaMart meets Walzon


    Here in the Digital Age, business models continue to be buffeted, often in surprising ways.  The big, lever you push to sell your product – it seems – is data, analytics, and cloud (you figure which is rod, pivot, force). It plays out most vividly in the dialectic of Walmart and Amazon.
    ==
    Walmart faces challenges. It gets harder to grow an operation when its revenues mount to almost $475 billion a year. Some folks would say Amazon.com, with Web services and large-scale cloud clusters, has created an e-commerce killer app aimed straight at the company atop the Fortune 500. It will be interesting to see if Walmart's coupling of Hadoop and data democracy will help it deflect such challenges. From Big data applications to driveWalmart reboot on SearchDataManagement.com

    ==div

    An article by Jim Stewart in NYT talks about the models shaking as Walmart flattens and Amazon edges upward. Walmart has been working for 15 years to curb the e-commerce incursions of Amazon but effect is narrow.  Prof Greenwald (see below) says as an investor it rather see Walmart  team with an existing company doing analytics and cloud, than to continue to try to build from within. (It actually has made purchases, but not too notable.)


    --div
    When Walmart announced last week that it was significantly increasing its
    investment in e­commerce, it tacitly acknowledged that it had fallen far behind
    Amazon in the race for online customers.
    ====div
    “The shift in retail to the Internet is a huge change, and it’s not just
    affecting Walmart,” said Simeon Gutman, a retailing analyst for Morgan
    Stanley.

    ==div
    Every retail company is trying to manage the transition. It’s not well
    defined or understood and there’s no road map. Walmart is just the biggest.
    It’s a behemoth that was built on superstores with volume and distributionefficiencies. That whole model is being unwound.”===div
    Mr. Gutman said: “Walmart pricing is decent, depending on the basket of
    goods, but they used to be dominant on price. They’ve lost that. Pricing is verycomplex and no one is really executing this all that well. On the Internet, priceis transparent, and your price advantage competes away the minute you gainit. Consumers are demanding and getting the lowest possible prices.”====div
    as Prof. Bruce C. Greenwald of the Columbia Business School
    noted in his book “Competition Demystified.”
    ===div
    Walmart’s superefficient distribution system — a function of its enormousvolume and geographic reach — was long the secret to Walmart’s immenseprofitability,--div
    “Every retail company is trying to manage the transition. It’s not well
    defined or understood and there’s no road map. Walmart is just the biggest.
    It’s a behemoth that was built on superstores with volume and distributionefficiencies. That whole model is being unwound.”--div
    Simeon Gutman,--div
    “In theory, they should be able to use their immense volume anddistribution network to compete with Amazon,” he said. “But they’re not atechnology company, and I don’t know what makes them think they are.” Asan investor, he said, he would rather see Walmart team up with an existingtechnology company that already has the analytics and cloud computingcapacity, rather than try to build its own. “That doesn’t inspire confidence,” hesaid.
    “Distribution is very hard to get right in retail,” he said. “Walmart’s worldis the giant superstore, where you have big pallets of merchandise moving totrucks moving it into the stores. You start talking about delivering individualitems to consumers, and they’re out of their comfort zone.”


    To me thinking the pricing perhaps tells the deeper tale. Walmart sells John Grisham's Rogue Lawyer for $24.61; Amazon sells it for $17.38. UnderArmor at Walmart is $33.78; at Amazon it is $29.99. Wall Street jumped favorably (6.23%) this week when Amazon made 1/3rd of 1 percent profit for the quarter. It has lost money on everything (almost) it has ever sold – and with that scheme played out, it has turned to monetizing its cloud infrastructure. The cloud, said the Times, is raining money.

    The big box was always a shell game where 'the tremendous volume kept things going'. Staples today is an example. Scorched earth policy toward mom and pop stationary stores – as the populace more out of of the town. Now what? After pressing manufacturing to move to China, no middle class to buy a life time supply of you name it.  - Jack Vaughan


    http://www.nytimes.com/2015/10/23/technology/amazon-q3-earnings.html

    http://searchdatamanagement.techtarget.com/opinion/Big-data-applications-to-drive-Walmart-reboot

    A note on the comp: On one level this and the previous piece (on Machine Learning) kick around the notion of the Triple Store Tuple factoid graph. Which I hope to continue to learn about. When I read a story in the NYT, and the factoids* are popping, it looks like this (below) Next stop, however, is Las Vegas, and IBM Insight 2015.



    *What did Ed Sanders call them, glyph clusters?






    Sunday, October 11, 2015

    Machine learning and science

    Machine learning 

    is finding its way into medical and scientific research in a variety of ways.

    Two such paths are covered in a recent 

    Talking Machines  Webcast.


    Quaid Morris of the University of Toronto

    speaks about using machine learning

    to find better ways to treat cancers.


    Also, Patrick Atwater discusses machine learning as a means to address issues of

    California drought.


    Both may have participated in the NIPS conference.

    Wednesday, September 23, 2015

    AirBnB Where are you?

    Fig. 1 - Professionally bound neighborhoods
    Web e-commerce has long been seen as a threat to traditional middlemen, but the threat it now poses businesses like cabs and hotels seem to raise the odds.  Uber takes the phone out of Louie the dispatchers hands, and AirBnB puts Web automation into the door-to-door process once known as 'can I crash at your pad.' Representing a new style of broker, the upstarts pose yet another threat level to those whose hegemony would be disrupted.

    The role of the broker can rub at least two ways, maybe that is why it has been seen as a favorable position. History has seen brokers tend to favor one side over another in a transaction or two. Uber, for now, sets itself as arbiter of the cost of the ride – tho they might point to the black box if you ask them who decides. Uber could change its modus – it's still early.

    How things could change may be seen in an AirBnB algorithm that has received some recent coverage. To hear tell, AirBnB saw what it could benefit if its customers on the providing housing side of the equation could come to a better estimate of the probable value of a night in their abode. While there are precedents in price advice systems from eBay and elsewhere, AirBnB claims somewhat convincingly that there approach is unique.  Still, they like some help with the estimator, so they have made it open source.

    As depicted in an August 2015 IEEE Spectrum story, AirBnB set its secret sauce to percolating when the data science team began to calculating. Author Dan Hills writes that it converted the questions people ask when looking for a place to stay into machine learning algorithms. eBays' problem is different. With eBay, location is not really a factor. The timing is now, not 3 nights in October.  There's not a whole lot of difference between one good copy of Big Brother & the Holding Co.'s first Columbia LP and another.

    AirBnB looked to create a tool that was dynamic,  that considered the unusual characteristics of a listing, and left room for human intuition when necessary.  I will classify that last bit as Surprise 1. Surprise 2 is that they use focus groups (no blind machine learning patriots, they) – and Surprise 3 is that they hired a professional  cartographer to hand draw accurate neighborhood boundaries for important world cities for travellers. [See Fig. 1]

    I don’t know if this is a surprise.. but it is a bit amusing that author Hills previous gig was with a company called "Crash Padder" (itself bought by AirBnB). Amusing to this writer as he experience the crash pad experience first hand, coming late in the cycle when a sufficient number of people had been burned by thieving guests to put a serious lid on peace, love and  why not?

    In Boston, and some other towns, Uber has already come to unwanted prominence for its capacity to bring on the wrong help. They and AirBnB will probably keep a team of data wonks busy for some time creating filters that bar the occasional felon flotsam from gumming up their march to ever higher evaluations. - Jack Vaughan, Boston

    Related
    http://spectrum.ieee.org/computing/software/the-secret-of-airbnbs-pricing-algorithm


     



    Sunday, September 13, 2015

    Doting on Data on Sunday Sept 13

    NoSQL meets SLAs

    As NoSQL technology continues its march into the enterprise, the story has a different tenor. Now we are talking models, and SLAs.

    As Web-scale NoSQL technology looks to find a deeper footing in the enterprise, there may be as many stumbles as steps forward. That was an underlying theme at the NoSQL Now 2015 conference, where issues with service-level agreements (SLAs), data analytics challenges and a lack of skills were often part of the discussion.
    ...
    Looking forward, what we may see is a branching, where raw, original-style NoSQL serves the needs of pedal-to-the-metal developers, and something else evolves to meet the stricter needs of enterprise shops.


    Yes, some folks, like the bandits in The Treasure of the Sierra Madre, 'don’t need no stinkin' badges,' but, to the extent that Big Data is all about mining that Web trove, we will find people trying to bring SQL to NoSQL just as we do in Hadoop. - Jack Vaughan

    Read the story on SearchDataManagement.com. 

    Thursday, September 10, 2015

    Hand-picked data ferment

    The debate of clashing memes - The problem of hand-picked data is not new. Truth battles its way to surface or oblivion everyday. People smarter than their predecessors, or not, make decisions. Is something new a foot? No, you find it in a quote attributed to Mark Twains, and others. “People commonly use statistics like a drunk uses a lamp post; for support rather than illumination.” George Johnson's Raw Data column suggests this phenomenon is exacerbated by the Internet. Perhaps, perhaps. - Jack Vaughan

    Friday, August 7, 2015

    Im going down to Stasiland behind a cloud

    In the era of the Stasi there was an overarching NSA style apparatus. Projections  were mainly of 'grays and dour greens.'Citizen informants were networked nodes. These were the browser cookies of the time. But they were embedded not in a browser but in a physical world. This was when humans were computers, or computers were humans, have it as you will. It may have been an apex of sorts, though "the jury is still out." The platform as you might say was East Germany, or GDR  (1946-1990). To read more.

    Thursday, July 23, 2015

    Build, Ignite, Azure

    In the Spring, Microsoft CEO Satya Nadella talked about Microsoft's enduring mission to make technology available to the masses. It is an assertion that has some grounding, but it is hard to speculate whether Microsoft can find that kind of magic again. This podcast is a recap of Ignite 2015 and Build 2015 SQL Server and Azure announcements that look to move the traditional mission forward. Can Microsoft steal a march on Amazon Web Services? Well the matter is open for discussion. Some background: this former Webmaster used Front Page to bring an organization kicking and screaming into the Web era, and the Microsoft tool ($99) played a big part. Click to download podcast.

    Wednesday, July 22, 2015

    Dremel drill doodling

    Think of SQL tools that let the large numbers of ''SQL-able'' people ask questions of data. If the developers can build the tools, they can enable other people to do the work, and get on with the job of building more tools. And we are back on the usual track of technology history.

    Big Web search company Google created Dremel as a complement to, not a replacement for, MapReduce, to enable at-scale interactive analysis of crawled web documents, tracking of install data for applications on the Android Market site, crash reporting for Google products, spam analysis and much more. It brought some SQLability to MapReduce.

    For a Wisconsin Badger like me it does not go without saying that the Google project takes its name from the Dremel Moto-Tool from the Dremel Co. of Racine, Wisconsin. That company, beginning in the 1930s, was among the region's pioneers in small electric motors – not data engines, but engines of progress nonetheless. - Jack Vaughan

    Sunday, July 19, 2015

    Holes in Mass. Halo: Sitting on Public Records.

    When I was a young reporter it was very challenging to get information out of the State of Massachusetts or City of Boston. It was like a scene from Citizen Kane. The archives were dark and closed. You had to go through conniptions, do leg work. Later, when chance led me to teach Computer Assisted Reporting at N.U., research uncovered some pretty good availability for different types of records on the Web. It seemed  like it was something of a flowering. But apparently it was a false bloom. Massachusetts has gained renown as a liberal bastion – the state spearheaded abolition, voted for McGovern, legalize gay marriage. But for some reason or other it has become a less than liberal fortress when it comes to public records. The statehouse, judiciary and governor's office all claim immunity from records retrieval, as depicted in today's Boston Globe p1 story : "Mass. Public Records Often a Close Book." It has a great lead-in where a lawyer doing researcher on breathalyzers explains that there are states that share such databases for free, states ( Wisconsin) that charge $75, and Massachusetts, where the State Police came up with a $2.7-million tag to share the data. [When pressed they admitted that they had incorrectly estimated the cost – it should have been $1.2 million.] A cast comes through to criticize the situation: Thomas Fieldler of BU's College of Communication; Matthew Segal, of the ACLU, Robert Ambrogi, attorney and exec-director of the Mass newspaper Publishers Assn; Katie Townsend of the Reporters Committee for Freedom of the Press an others. An interesting decrier of bills ("costly new unfunded mandates")  aimed at fixing the situation is the Massachusetts Municipal Assn. Just as interesting is the cameo article appearance of Sec. of State William "What Me Worry?" Galvin, whose office is charge with helping oversee public records, who has no more to say than that "a type of bureaucratic fiefdom" has built up over the years. Maybe the lottery has diverted that dept's attention, and the responsibility for fleecing the poor should be handed over to the Attorney General, another less than bold piece of furniture. With the three previous House Speakers being convicted of felonies, everyone has been pretty busy, keeping a lid on data. - Jack Vaughan

    Saturday, June 27, 2015

    Machine and learning, trial and error

    His Master's Voice (HMV)
    In the winter at Strata West I had a chance to see Oscar Celma of Pandora discuss machine learning from the perspective of the most established streaming music company. Because at root Pandora has a lot of human intelligence about music, its machine learning applications of song suggestion are even more interesting.

     The thing I picked up from Celma's presentation was that you can only get so far with your basic breed of suggestion engine. In the radio days a big voice intoned 'don’t touch that dial'. Now something else is in order.

     You see, if you play the straight and narrow and give them what you know they want, they get bored, and tune out. The element of surprise has been intrinsic to good showmanship immemorial . The machines can get better and better, but at a slower and slower rate. People eventually want to come across a crazed Jack Black pushing the 13th Floor Elevator button in HiFidelity.

    The machines have trouble contemplating the likelihood that a viewer may be prone to enjoy Napoleon Dynamite, as was précised in this story about the 2008 NYTimes Magazine story about the Netflix algorithm contest that fate (my brother cleaning the upstairs) cast upon my stoop.

     TO BE CONTINuED

    Sunday, June 21, 2015

    Momentous tweets for a week in June 2015




    Sunday, June 14, 2015

    Advance and quandry: Big Data and veteran's health

    The era of big data continues to present big quandaries. A Time's story, Database May Help Identify Veterans on the Edge, covers the latest brain teaser.

    The story points to new research, published in the American Journal of Public Health, researchers at the Department of Veterans Affairs and the National Institutes of Health described a database they have created to identify veterans with a high likelihood of suicide, as the Times story (Fri, Jun 12, 2015, p A17) points out, " in much the same way consumer data is used to predict shopping habits."

    The researchers set up a half a database that comprised variables associated somewhat with suicide cases between 2008 and 2011. They ran what I assume to be a machine learning algorithm on that. They then tried to predict what would happen with the remaining half of the database population. They then concluded that predictive modeling can identify high risk patients no identified on clinical grounds.

    But predicting suicide is not like predicting likelihood one might buy a Metallica song, is it? How does the doctor sell the prognosis? "A machine told us you are likely to commit suicide."? Certainly some more delicate alternatives will evolve. A lot of the variables – prior suicide attempts, drug abuse – seem patent. Maybe doctors have just been more likely to guess on the side of life. If the Chinese government hacks the database, and sells the data, will the chance of suicide follow you like an albatross, and fulfill itself ?

    Like so much in the big data game, the advance carries a quandary on its shoulder.

    Related
    http://www.nytimes.com/2015/06/12/us/database-may-help-identify-veterans-likely-to-commit-suicide.html
    http://ajph.aphapublications.org/doi/pdf/10.2105/AJPH.2015.302737
    http://ajph.aphapublications.org/doi/abs/10.2105/AJPH.2015.302737

    Saturday, June 13, 2015

    New and notable Week of Jun 8







    Tuesday, May 26, 2015

    Molecular sugar simulations on Gene/Q

    Researchers working with an IBM supercomputer have been able to model the structure and dynamics of cellulose at the molecular level. It is seen as a step toward better understanding of cellulose biosynthesis and how plant cell walls assemble and function. Cellulose represents one of the most abundant organic compounds on earth with an estimated 180 billion tonnes produced by plants each year, according to an IBM statement.

    Using the IBM Blue Gene/Q supercomputer at VLSCI known as Avoca, scientists were able to perform the quadrillions of calculations required to model the motions of cellulose atoms.

    The research shows that there are between 18 and 24 chains present within the cellulose structure of an elementary microfibril, much less than the 36 chains that had previously been assumed.

    To download the research paper visit: http://www.plantphysiol.org/

    To find out more about the Australian Research Council Centre of Excellence in Plant Cell Walls visit: http://www.plantcellwalls.org.au/


    http://www-03.ibm.com/press/us/en/pressrelease/46965.wss

    Monday, May 25, 2015

    Data Journalism Hackaton

    Took part in NE Sci Writers Assn Data Journalism Hackaton at MIT's Media Lab in April. iRobot Inc HQ! We tried to visualize a data story on California water crisis. 

    Tool was iPython notebook. [I got 1+1= to work!] [How cool is this?!] Came up short but learned a lot about manipulating data along the way. My colleagues were par excellance. Greatest fun I know is to be part of a team that is firing on all cylinders. Gee it looked nice outside, tho. Playing hooky on part 2!   - Jack Vaughan





    code-workshop.neswonline.com || CartoDB || geojson.org
    ipython.org/notebook.html || more to come


    Tuesday, May 12, 2015

    5 minute history of the disintegration of application server








    Sunday, May 10, 2015

    Telling winds from the cybernetic past

    http://itsthedatatalking.blogspot.com/2015/01/the-best-laid-plans-of-mice-and-man.html http://mitpress.mit.edu/books/cybernetic-revolutionaries http://www.newyorker.com/magazine/2014/10/13/planning-machine https://www.jacobinmag.com/2015/04/allende-chile-beer-medina-cybersyn/

     Does a 1970s Utopian technology effort offer useful guides for those trying to assess the progress of new technology today? In one case, at least, yes. It is the story of Salvador Allende's attempt to build a working Socialist government in Chile with computer cybernetics.

    The tale is told especially well, under the able hands of author and researcher Eden Medina. Medina rolls up the takeaways in a recent article in Jacobin magazine. It is a summary of some important lessons garnered during work on her 2013 book, The Cybernetic Revolutionaries.

    You see, before CIA influencers sponsored Augusto Pinochet and company's junta, Allende's democratically government was trying to bring a new form of socialism that was data driven. In those days, what might pass for the big data enterprise today would be called cybernetics. This school of technology, founded by Norbert Wiener, studied feedback in systems, be they animal or machine. The automatic pilot was perhaps cybernetics crowning achievement. In the Chile case, technologist Stanford Beers was enlisted to bring the magic of realtime feedback to state planning. It was way ahead of its time, and burdened by lethal sniping.

     A chief lesson in all that conflag is that the state and its priorities shape how a technology is designed and used. In Allende's work to create a better state planning system based on the infant cybernetic architectures, Beers was given had a lot of rein to try and involve workers, ahead of engineers and government bureaucrats in the planning of production. Uber advocates might say that is going on with its upsurge today, though, we'd say, that is arguable.

    "Computer innovation wasn’t born with Silicon Valley startups, and it can thrive by taking on design considerations that fall outside the scope of the market," writes Medina. Yet, the basic lesson is tremendously true: technologies get no more freedom to range than the political system gives them. That lesson may be taught at MIT, but it is largely buried in the footnotes or drowned out by the gush of venture capital, and its dreams.

    Read more on this.

    Thursday, May 7, 2015

    Spark stories
















    I feel as though I have never seen anything quite like Spark before.  It seems more than worthy of substantial media coverage, but it is also cause for pause.

    I only came to the big data beat in 2013. So I didn’t go through all the run up of hype on Hadoop. I came in when it was in full swing – it seemed natural, and some enthusiasm was warranted. But, as Hadoop 2 was rolling out, and Spark was striding into view, I said, this town is not big enough for the two of them – that Hadoop had taken all the air out of the hyperbolic chamber. Is it or is it not just the new shiny thing in that room?

    Now I wonder. Yes, the Hadoop people dutifully over time explained what was wrong with Hadoop (Mostly MapReduce). But as with technology marketing trends generally, it begged the answer. Now Spark seems like the answer.  Hadoop greased the skids for it. I guess one reason is that MapReduce was limited. But isn’t Spark limited too, if you look at it from many miles remove?

    I boil down Spark's plusses to:

    1-It includes more developers. Because it offers support for Python and Scala as well as Java. And runs on the Java Runtime.

    2-It runs faster.

    Now if you slowly walk away from that car you could say: its patrons face big obstacles in catching up with the Hadoop commercial train. And, it only has a few users at this point. We could go further and say "It is had more general utility than MapReduce, and seems more apt for analytics. Tony Baer has written that Spark use cases seem to be about complex analytics on varied data than about big data per se.

    That is a delicate distinction, but probably worthwhile noting. So many technologies thrive on rend asunder based on a few delicate distinctions that tend not to be readily apparent.

    I duly note your sense in a tweet that the brunt of use cases described at the sum. It think Spark and Hadoop borth are about developer centric solutions for cheap parallelism WITH focus on data processing.

    There was a time when this all was called Data Processing, then there was Information Technology. Now, Data processing is back. - Jack Vaughan



    Users view Databrick's Spark
    It is in limited beta. But a lot of people have gotten their hands on it.

    Hadoop and Spark are coming of age together
    The Talking Data podcast features Hadoop and Spark, open source data technologies that gained attention at this year's Strata+Hadoop East event.

     Apache Spark : This year's MapReduce killer
    Since the release of Apache Spark, big data vendors have touted it as faster, more flexible alternative to MapReduce.

    Apache Spark meets the PDP-11
    Apache Spark seems ready to upstage Hadoop. But it's best seen in the light of computing history, where it looks like yet another step on the long road of data.

    Apache Spark goes 1.0, looks to improve on MapReduce ...
    The Apache Software Foundation has released Version 1.0 of ApacheSpark, an open source data processing framework designed to outperform MapReduce ..

    Spark framework gets big push from big data vendors
    The Spark framework and processing engine is attracting the attention of vendors, who are touting it for use in iterative machine learning and other big data chores.

    Sunday, April 26, 2015

    Rah rah, Data, Go, go, go

    SwarmDroneDessertBoing
    Data as an enthusiasm or even hobby is in the air. As noted in an Economist article (Briefing: Clever Cities: The Multiplexed Metropolis –Sept 7 2013, p.21. ) But does close inspection of the results to date tell us the enthusiasm is warranted? Is this truly like the introduction of electricity to the city? 

    One young Dutchman developed a mobile app that tapped into open data to predict the best and easiest areas of city to rob. This was done thankfully in this instance to kindle debate. The Smarter City has a Darker Side, and not just in SciFi.

    Anyway, who benefited most from the introduction of electricity, and if data is as powerful a game changer, who will benefit most on this go-round? "The importance of political culture will remain" the writer opines. And it is true. The political culture likely remains more important than any transient technology advance - in terms of how the pie gets cut up.

    Human behavior is good and bad. If there is a bad side, there is an app for that.

    Saturday, April 11, 2015

    Behind the music – Spark and the PDP-11

    The DataDataData (Itsthedatatalking) blog is meant to focus on data today – not to rehash my history of computing. But sometimes it veers that way, and I will just be holding on to nothing but the wheel. But I digress.

    Apple scruff at The Smithsonian
    Spark is the latest new shiny object in data processing. That said, I don’t mean to belittle its potential. The folks that fashioned it in the vaunted AMPLabs at UC Berkeley are supersmart, and very aware of what the advent of multicore microprocessors meant to computing: that new means to big clusters of  parallelism were available, if only the complexity could be abstracted downwards in clever libraries and runtimes.

    People selling Spark come in your door selling Hadoop. Which has had plenty of publicity and is borderline ready for primetime. Now once in there, they may mention  you can toss Hadoop, but only if they think you may cotton to that.  After writing about Hadoop for about two years I took some care in approaching Spark.  Finally some words from way back came back. Please, let me digress some more.

    Long ago and far away I sat with my boss discussing the news. The news on that day in 1992 was the ouster of Digital Equipment Corp.'s co-founder Ken Olsen. His departure was an inflection point along a trail that saw DEC go from being a gutsy Maynard, Mass. mill town startup to being a serious threat to IBM's industry leadership to being a forlorn merger candidate.

    Like those in other editorial offices, my boss and I wondered what went wrong. What went wrong was the company got confused about what business it was really in. Seems absurd, but it can happen.
    DEC's Olsen did not like the PC or Unix, two very innovative industry trends that his subordinates learned to basically eschew.  Missing on the move to small personal computers was especially ironic, as DEC itself rose in the 1960s on the back of minicomputers that downsized capabilities of the larger, then-dominant mainframe computer. Anyway, on this particular day I was especially interested to see my editor's take on this. That was because his experience went beyond running a magazine called EDN.

    You see, as a graduate student, Jon Titus's had been in the vanguard of what came to be known as microcomputers, or PCs.  A July 1974 Radio Electronics issue that featured Titus's 8088-based "Mark-8 Personal Minicomputer" kit predated Popular Electronics' Altair 8088 cover story by six months.

    In Cambridge, Mass., Harvard college student Paul Allen picked up a copy of the latter magazine, brought it back to the dorm to share with Bill Gates, and a new era of computing was off and running. Note that Titus and the Radio Electronics editors called the Mark-8 a personal minicomputer. So, Titus had a unique perspective on Ken Olsen's quandary.

    "DEC came to think they were selling minicomputers," Titus said. "But what they were selling was computing."

    Anyway- I link below to the full story on this which ran on SearchDatamanagement.com. I'd like to add here what a great boss Jon Titus was for me. He stood by me, more than once, which I never will forget. My spousal unit and I got to Washington last week. We went to the Smithsonian museum (actually, just two days after this story went live) and were told that the computer exhibit was closed for repairs (a lot of people can relate to that, ay?!) so we did not see the Mark-8 on display. Instead there was the computer that has, and maybe rightfully so, gained the brunt of the fame.
    A cruel old engineer.

    That is the Apple II of Steve Wozniak and Steve Jobs.  A woman came by and asked the air: "Is that the first computer?" No, said I, trying to be courteous, "the first computers were as big as rooms - that is what many people consider to be the first personal computer." Sorry, that's it for now - I got to go digress. – Jack Vaughan

    Read Apache Spark meets the PDP-11 -- in the end, it's all about the processing – SearchDataManagement.com, Mar. 31, 2015 http://bit.ly/1Im9n1l

    Wednesday, April 8, 2015

    Give me Algorithmic Accountability Or

    Give me Algorithmic Accountability or give me… ah, what is the alternative again?

    I thought Steve Lohr's article in yesterday's New York Times was worth pointing out, as it boils up a larger issues from the flotsam and jetsam of the big data analytics parade. Oneline ads, the killer app (to date) for big data and machine learning re but a Petri dish, he says. After all, if the wrong ad is served up, the penalty is mild. But, he writes, the stakes are rising. Companies and governments will churn big data  to prevent crime, diagnosis illness, and more. Why just the other day JP Morgan said it could spot a rogue trader before he-she went rogue.

    The algorithms that do the decisions may need more human oversight, the writer and others tend to suggest. Civil right organizations are among those suggesting. An other is Rajeev Date, formerly of the Consumer Financial Protection Bureau. The story focuses on the notion of Algorithmic Accountability (meeting tonight in the church basement, no smoking please) as an antidote to brewing mayhem

    IBM Watson appears in the story. It is hard to get a handle on Watson, but one thing is crystalline; that is, that the mountains of documents is growing beyond managers’ capacity to understand, and that Google is paling under the weight. Watson is meant to do the first cut on finding a gem in, for example the medical literature – reading ‘many thousands of documents per second.’ Along the way, a few researchers may lose their jobs, but the remaining managers will need coffee and servers are wanted.

    Havent heard for a while of Danny Hillis – he coined the Thinking Machine back in the day. The original cognitive computer? Or was that the old Ratiocinator (but I digress). Hillis says data storytelling is key. To, like old man Chaucer, find narrative in the confused data stream. If the story teller had a moral compass that would be an additional positive factor, if you take Louis Berry’s word for it. He is cofounder of Earnest, a company that has staff to keep an eye on the predictor engine output.

    Opacity would be good, Lohr concludes, as Gary King, director of Harvard’s Institute for Quantitative Social Science joins the narrative. The Learning Machines should learn to err on the side of the individual in the data pool – if that would happen you would get that bank loan, that might be a little iffy. Rather than have a fairly innocuous money request rejected. George Bailey would be the patron saint of the Moralistic Data Story Telling Engineer.

    I am trying to think of a case where the owners of the machines programmed them that way .. but parted-lipped Jennifer Lawrence is in a Dior ad contiguous with Lohr’s Maintaining a Human Touch As the Algorithms Get to Work (NYT, Apr 7, 2015, p. A3) and my train of thought has left the station.

    Data science should not happen in the dark. We have in fact aborning a classic humanization-computerization dilemma. Academia and associations, mobilize! – Jack Vaughan, Futurist


    [Imagine Betty Crocker working a conveyor belt where algorithms are conveyed. I do.]

    Thursday, February 5, 2015

    Some dirt on dirt data

    Sometimes I think back to last Sept when I got a chance to see data in a different role. That is, as a central player in solving civilization-scale challenges. So much has been done, yet there is so much more to do. Just as one example there is soil data. More and more data is being gathered on soil moisture, weather and crop conditions, but new storage techniques, analytical methods and search algorithms are required, as a U of Wis. (Go, Bucky Badger!) researcher said at the conference. - Jack Vaughan




    Related
    Data-wrangling-a-key-to-meeting-civilization-scale-challenges
    - SearchDataManagement.com, Sept 2014