Saturday, January 9, 2016

Bats in the machine


Scene from the immortal serial "The Batmen of Africa.
A lot of the chatter about machine learning is ‘bright shiny thing’ chatter. Like Tim the Tool Man with a SnapOn Tools catalog, the proponents rush ahead of the reality, painting pictures of distributed server farms iterating blissfully. Aside from the low hanging fruit of the recommendation engine, what applications are there?

An October 2015 issue of the IEEE Spectrum (always a favorite publication) takes the time to look at machine learning from an applications point of view. And a practical application at that. No, its not about reaping in big profit. It is about doing science. And trying to improve on governments' response to Ebola, which in its last outbreak took more than 11,000 lives.

Researcher Barbara Han tells about her use of machine learning algorithms to try and predict which reservoir species- bats or others- could come to harbor a disease like Elbola. Let’s not sit around and wait for the next epidemic, she suggests, let’s instead use computer power to try and foresee the path ahead.

Han said she and her Cary Institute of Ecology colleagues used machine learning to go through vast mounds of unstructured data about wildlife - trying to identify the traits that might alert us to possible ebola-disease-type sources. Along the way she provides a pretty succinct description of the machine learning processs. In machine learning the steps are:

1. Obtain training data set
2. create an initial classification tree
3. split the data set into two groups using randomly selected features - in this case for example, body size, which for the little varmints she is parsing could be ‘under or over’ 1KG
4. Use algorithm to build a second tree that prioritizes misclassified species in the attempt to sort them correctly
5. Repeat. Iteratively. Generate thousands of trees. See the classification accuracy improve.
6. When you get the model performing well on training data, then you
7. Make predictions, using the rest of the data set.


The interesting thing is that you're not waiting for the outbreak to react. Your machine does not get bored with the rote tasks. With ecological systems that have many many variables this is especially useful. I wonder of course if we arent ultimately driven by catastrophe.(batastrophe?)  We have a lot of data on those African bats - the researchers have gathered 30,000 individuals from hundreds of thousands of samples. Of course, The Cary group's  work of prediction, which guesses at likely newcomer carriers, would at least give a leg up. Han says they can suggest areas 'where to look for trouble'' - a model that looked at 58 rodent species could become culprits, they say. And a possible trouble spot could be in the Great Plains of the U.S., spanning from Nebraska south to Kansas. - Jack Vaughan

Related
http://spectrum.ieee.org/biomedical/diagnostics/the-algorithm-thats-hunting-ebola

No comments:

Post a Comment