Sunday, August 26, 2018

Demolition Derby World of Data


Weapons of Math Destruction by Cathy O’Neil cuts to the chase when it comes to big data and its very dark side, which she saw first hand working as a quant in the run up to 2008, and thereafter as a data scientist in e-commerce.

What she saw was the housing crisis the collapse of major financial institutions ... all had been aided and abetted by mathematicians wielding magic formulas.

That was 2008. But there was no let up there after.
   
New mathematical techniques were used to churn through petabytes of information much of its created from social media or you e- commerce websites -mathematicians studied desires movements and spending power they were predicting trustworthiness and calculating potential.

But, as O’Neil, author of the Mathbabe blog, documents ably: The models encoded human prejudice.

She enumerates the differences between a study of small classroom and the big data they work on at Google. This is something I see regularly, as I cover big data as it pertains to business enterprises. People see  what Google does and mistakenly extrapolate the company’s proven success to their own potential outcome. They feel good, because they think they  are doing something akin to what the great disruptor of advertising did.

Systems like that can be improved via Feedback but systems like the one she discusses in the Washington school system and which she says is similar to other weapons of mass destruction she considers in her book, generally lag in terms of feedback. They also create fail-safe false premises for their syllogism.

Author writes:
You cannot appeal to a WMD that’s part of their fears sun power they do not listen nor do they bend their DEF not only to charm threats and cajoling but also to logic.
...they define their own reality and use it to justify their results this type of model is Self perpetuating highly destructive and very common. [p.10]

A great example of unfairness is the use of credit scores to decide who gets a job. That has a way of enforcing failures that would cause Horacio Alger a troubled sleep. For me it rather recalls the great moloch of Search Engine Optimization, a dark cottage industry that sells “Google know how” but which is an amazing indisputable black box of Oz.

But the story really begins with the economic crisis of 2008. And the creation of math models that packaged assorted mortgages (as buckets of risk called securities) in ways that proved lethal, complex and resistance to unravelment an underlying assumption was as familiar as any disaster that had come before:

The risk models were assuming that the future would be no different than the past. [p.41]
Subsequently, O'Neal becomes a data scientist for Intent, working on algorithms to predict the better prospects among web sites' visitors. The leap from math models for futures and math models on web sites put her firmly in the realm of big data, which is where Weapons of Math Destruction really begins.
- Vaughan


https://www.amazon.com/Weapons-Math-Destruction-Increases-Inequality/dp/0553418815


Thursday, August 23, 2018

Gaming platforms



With Facebook we see algorithms have replaced editorial boards... a lot of people welcome that... but they may have not have entirely thought through the implications. The Facebook and Twitter platforms have been gamed/amplified by clever/nefarious state-backed programmers. A lot of positive work done to engineer the Internet, has, like the snake eating its tail, begun to devour itself. Her work is not "light reading" but Renee DiResta is someone who I find has really thought through this stuff, and is thinking several steps ahead of the bad guys. - Vaughan

Related

Tuesday, August 21, 2018

Facebook fights broadcasts of confusion

Facebook continues to be used as a vehicle for disinformation. This is done by publishing provocative news (not always fake, but certainly presented with nefarious gusto)  under false pretenses to influence the division in large population. Facebook said on Tuesday that it had identified several new Iranian and Russian influence campaigns on its platform designed to mislead people in different countries and regions. Able Renee DiResta of the New Knowledge Research Group said "malicious narratives are spreading to mislead people around the world". The news comes on the same day as reports that Microsoft has found Russian government affiliated websites that masquerade as websites of prominent American conservative think tank websites. The saw of confusion cuts in all directions. Know your links, or you may be sharing falsehoods that have suspicious origin and negatively disruptive intention. - Vaughan

Monday, August 20, 2018

How well can neurals generalize across hospitals?



Which features in any quantity influence a convolutional neural network’s (CNN’s) decision? To find the answer in radiology, work is needed, writes researcher John Zech on Medium. The matter gains increased importance as researchers look to ‘go big’ with their data, and to create models based on X-rays obtained from different hospitals.

Before tools are used to crunch big data for actual diagnosis "we must verify their ability to generalize across a variety of hospital systems" writes Zech.

Among findings:

that pneumonia screening CNNs trained with data from a single hospital system did generalize to other hospitals, though in 2 / 4 cases their performance was significantly worse than their performance on new data from the hospital where they were trained.

he goes further:

CNNs appear to exploit information beyond specific disease-related imaging findings on x-rays to calibrate their disease predictions. They look at parts of the image that shouldn’t matter (outside the heart for cardiomegaly, outside the lungs for pneumonia). Initial data exploration suggests they appear rely on these more for certain diagnoses (pneumonia) than others (cardiomegaly), likely because the disease-specific imaging findings are harder for them to identify.

These findings come against a backdrop: An early target for IBM’s Watson cognitive software has been radiology diagnostics. Recent reports question the efficacy thereof. Zech and collaborators’ work shows another wrinkle on the issue, and the complexity that may test estimates of early success for deep learning in this domain. - Vaughan

Related
https://arxiv.org/abs/1807.00431
https://medium.com/@jrzech/what-are-radiological-deep-learning-models-actually-learning-f97a546c5b98
https://en.wikipedia.org/wiki/Convolutional_neural_network
https://www.clinical-innovation.com/topics/artificial-intelligence/new-report-questions-watsons-cancer-treatment-recommendations

Sunday, August 19, 2018

DeepMind AI eyes ophthalmological test breakthrough

Eye ball to eye ball with DeepMind.

DeepMind, the brainy bunch of British boffins whom Google pickedup to carry forward the AI torch, has reported in a scientific journal that it succeeded in employing a common ophthalmological tests to screen for many health disorders.

So reports Bloomberg.

DeepMind’s software used two separate neural networks, a kind of machine learning loosely based on how the human brain works. One neural network labels features in OCT images associated with eye diseases, while the other diagnoses eye conditions based on these features.

Splitting the task means that -- unlike an individual network that makes diagnoses directly from medical imagery – DeepMind’s AI isn’t a black box whose decision-making rationale is completely opaque to human doctors, [a principal said].

The group, which encountered controversy over its use of patient data in the past, said it has cleared important hurdles and  hopes to move to clinical tests in 2019.

Related
https://www.bloomberg.com/news/articles/2018-08-13/google-s-deepmind-to-create-product-to-spot-sight-threatening-disease


Monday, June 4, 2018

these deep neural nets just sort of keep getting deeper and bigger


hard to open up those many layered neurals.

to wrap your head around a hundred million weights.

that's harder to udnerstand compared to linear regression.

these deep neural nets just sort of keep getting deeper and bigger.

cc: ummings

Sunday, May 20, 2018

ODSC placekeeper

Sorry I missed the Open Data Science Conference & Expo in Boston earlier this month. I could even have taken that there bus on the left. It was one of those things. This year has a scattered plot. I would have liked to accelerate my data science knowledge, training, and do some networking.  ODSC East 2018 is one of the largest applied data science conferences in the world. But let's think of this as a book mark for a placekeeper for a mnemonic jigger to pick up where we left off.  Find out more: https://odsc.com/boston

Tuesday, April 3, 2018

Recalling good old Obama days



The NYTimes had an editorial about Facebook data privacy yesterday.  In it they recall Obama’s efforts in this regard. Which we saw firsthand at an MIT event back in 2014. I got to cover it as part of my job.

I remember thinking at the time that Obama’s Data Privacy Fact Finding committee was likely to be sidetracked (and co-opted by advertising giants Facebook and Google and telecoms like Verizon and Comcast and their soldiers among the MIT high tech intelligentsia).

That feeling emerged as the conference events ensued, which revolved around encryption and differential privacy and other of the hemming and hawing that characterize the corridors of technology power.

A colleague and I agreed the theme that emerged most prominently was that data was the "new gold" or the "new oil"  -- it seems overblown (why not the "new tulips"?), until you see a room full of policy and commerce people discussing how much data is going to change the world as we know it. Ad nauseum.

Whether they were right or wrong, we more or less settled, was less important than the palpable sense that something akin to gold or oil ''fever'' was in the air. Which brings us back to Facebook, seen in a new light, given the way its data (your data) ended up in the hands of Cambridge Analytica.

The Times's recent editorial avows there is no reason to start from scratch when it comes to data privacy today, that Obama's privacy proposals of 2012 and thereafter, for a basis for data rights. I am not so sure there was much inthe way of real changeat work there. I don't want to sound relativistic like the Trump cracker contingent, but there wasnt much different between the left and right when push came to shove on privacy back in 2004. - Jack IgnatiusVaughan

Related
https://www.nytimes.com/2018/04/01/opinion/facebook-lax-privacy-rules.html
https://itsthedatatalking.blogspot.com/2014/03/encryption-and-differential-privacy.html 



Orwell's Bad Dream Lives

Provided uninterrupted