Tuesday, July 30, 2019
Latent ODEs for Irregularly-Sampled Time Series
"Latent ODEs for Irregularly-Sampled Time Series"
Time series with non-uniform intervals occur in many applications, and are difficult to model using standard recurrent neural networks (RNNs). These guys generalized with a model called ODE-RNNs. Paper by Rubanova et al.: arxiv.org/abs/1907.03907
Monday, July 22, 2019
Data Brevia July 2019
Soon in theatres near you: "Cambridge Analypse" - the story of a mad professor who cruelly crashed two elections, and the plucky data scientist whistle-blower who disclosed the dirt. Starring Ethan Hawke, Sandra Dee and Paul Ruebens and Nick Nolte (as Steve Bannon). pic.twitter.com/D8SwOiLaMR— Jack Vaughan (@JackIVaughan) July 18, 2019
Lyft open-sourced their autonomous driving dataset from its Level 5 self-driving fleet.— Reza Zadeh (@Reza_Zadeh) July 23, 2019
- 55k human-labeled 3D frames
- 7 cameras, 3 lidars
- HD spatial semantic map: lanes, crosswalks, etc
- Drivable surface maphttps://t.co/KDvvKRWX2w pic.twitter.com/uvQ8jmw2UG
FaunaDB 2.7 with improvements for and access data control. https://t.co/84Tq62bst8— Jack Vaughan (@JackIVaughan) July 24, 2019
Each time my comment about big data failures is repeated, the thing failing changes. This time it's data science. Last time it was AI projects. https://t.co/GqQHisRPeY
— Nick Heudecker (@nheudecker) July 24, 2019
Talend Data Fabric release supports pay-as-you-go option with Pipeline Designer, intelligent data integration with MagicFill machine-learning powered suggestions and reversible Format Preserving Encryption. https://t.co/2Uei2NrF9S— Jack Vaughan (@JackIVaughan) July 18, 2019
Redis adds TimeSeries Module - https://t.co/d1tyfZnHWQ via @redislabs— Jack Vaughan (@JackIVaughan) July 22, 2019
Dagster open-source library targets ETL processes and ML pipelines. https://t.co/7tVDNt9P61— Jack Vaughan (@JackIVaughan) July 22, 2019
Some open source DBs are zigging - but Yugabyte is zagging. It goes for Apache license - which makes sense as it is looking to lure Postgres advocaters. https://t.co/TmkxBLjnHM— Jack Vaughan (@JackIVaughan) July 20, 2019
Friday, July 5, 2019
Saturday, May 25, 2019
Trend No. 9 Renaissance of Silicon
Trend No. 9 Renaissance of Silicon – Navin Chaddha, Mayfield Fund, at Churchill Club Annusal Top 10 Tech Trends Dinner May 19 - We hear software is eating the world. It actually did. That’s finished. Now you need to innovate. You are reaching limitations of what CPUs can do. Every big hyperscaler is burning chips. My advice to people is if they understand anything about physics, if they understand anything about technology go back to the basics. We have taken the easy route of taking shortcuts but it’s time to go back to the basics, solve innovative problems.
Tuesday, April 30, 2019
Google improves its cloud database lineup
In the early days of cloud, data was second only to security amid reasons not to migrate. Today, data as a migration barrier may be in ascendance – but cloud vendors have determinedly worked to fix that.
Having a single database for a business is an idea whose time came and went. Of course, you can argue that there never was a time when a single database type would suffice. But, today, fielding a selection of databases seems to be key to most plans for cloud computing.
While Amazon and to a slightly lesser extent Microsoft furnished their clouds with a potpourri of databases, Google stood outside the fray.
It’s not hard to imagine that a tech house like Google, busy inventing away, might fall into a classic syndrome where it dismisses databases that it hadn’t itself invented. It’s engineers are rightly proud of homebrewed DBs such as Big Query, Spanner and Big Table. But having watched Microsoft and Amazon gain in the cloud, the company seems more resolute now to embrace diverse databases.
The trend was manifest earlier this month at Google Cloud Next 2019. This was the first Google Cloud confab under the leadership of Thomas Kurian, formerly of Oracle.
Kurian appears to be leading Google to a more open view on a new generation of databases that are fit for special purpose. This is seen in deals with DataStax, InfluxDB, MongoDB, Neo4j, Redis Labs and others. It also is seen in deeper support for familiar general purpose engines like PostgreSQL and Microsoft SQL Server, taking the form of Cloud SQL for PostgreSQL and Cloud SQL for Microsoft SQL Server, respectively
In a call from the Google Cloud Next showfloor, Kartick Sekar told us openness to a variety of databases is a key factor in cloud decisions that enterprises are now making. Sekar, who is Google Cloud solutions architect with consultancy and managed services provider Pythian, said built-in security and management features distinguish cloud vendors latest offerings.
When databases like PostgreSQL, MySQL and SQL Server become managed services on the cloud, he said, users don’t have to change their basic existing database technology.
This is not to say migrations occur without some changes. “There will always be a need for some level of due diligence to see if everything can be moved to the cloud,” Sekar said.
The view here is that plentiful options are becoming par for cloud. Google seems determined that no database will be left behind. Its update to its SQL Server support, particularly, bears watching, as its ubiquity is beyond dispute. – Jack Vaughan.
Read Google takes a run at enterprise cloud data management - SearchDataManagement.com
Having a single database for a business is an idea whose time came and went. Of course, you can argue that there never was a time when a single database type would suffice. But, today, fielding a selection of databases seems to be key to most plans for cloud computing.
While Amazon and to a slightly lesser extent Microsoft furnished their clouds with a potpourri of databases, Google stood outside the fray.
It’s not hard to imagine that a tech house like Google, busy inventing away, might fall into a classic syndrome where it dismisses databases that it hadn’t itself invented. It’s engineers are rightly proud of homebrewed DBs such as Big Query, Spanner and Big Table. But having watched Microsoft and Amazon gain in the cloud, the company seems more resolute now to embrace diverse databases.
The trend was manifest earlier this month at Google Cloud Next 2019. This was the first Google Cloud confab under the leadership of Thomas Kurian, formerly of Oracle.
Kurian appears to be leading Google to a more open view on a new generation of databases that are fit for special purpose. This is seen in deals with DataStax, InfluxDB, MongoDB, Neo4j, Redis Labs and others. It also is seen in deeper support for familiar general purpose engines like PostgreSQL and Microsoft SQL Server, taking the form of Cloud SQL for PostgreSQL and Cloud SQL for Microsoft SQL Server, respectively
In a call from the Google Cloud Next showfloor, Kartick Sekar told us openness to a variety of databases is a key factor in cloud decisions that enterprises are now making. Sekar, who is Google Cloud solutions architect with consultancy and managed services provider Pythian, said built-in security and management features distinguish cloud vendors latest offerings.
When databases like PostgreSQL, MySQL and SQL Server become managed services on the cloud, he said, users don’t have to change their basic existing database technology.
This is not to say migrations occur without some changes. “There will always be a need for some level of due diligence to see if everything can be moved to the cloud,” Sekar said.
The view here is that plentiful options are becoming par for cloud. Google seems determined that no database will be left behind. Its update to its SQL Server support, particularly, bears watching, as its ubiquity is beyond dispute. – Jack Vaughan.
Read Google takes a run at enterprise cloud data management - SearchDataManagement.com
Saturday, April 20, 2019
DataOps, where is thy sting?
I had reason to look at the topic of DataOps the other day. It is like DevOps, with jimmies on top. When we talk techy, several things are going on, it occurred to me. That is: DataOps is DevOps as DevOps is last year's Agile Programming. Terms have a limited lifespan (witness the replacement of BPM with RPA). And you may be saying "DataOps" today because "Dataflow automation" did not elicit an iota of resonance last year, that I may write a story about 'dataflow automation' and not realized I am writing about 'dataOps' or vice versa. At left are technologies or use cases related to DataOps. At right are stories I or colleagues wrote on the related topics.
Dataflow automation, Workflow management
|
Jan 15, 2019 - Another
planned integration will link CDH to Hortonworks DataFlow, a real-time data streaming and analytics platform that can pull in
data from a variety of ...
Sep 7, 2018 - At
the same time as it advanced the Kafka-related capabilities in SMM,
Hortonworks released Hortonworks DataFlow 3.2, with
improved performance for ...
You've
visited this page 3 times. Last visit: 12/20/18
Aug 2, 2018 - ...
or on the Hadoop platform. Data is supplied to the ODS using data integration
and data ingestion tools, such as Attunity Replicate or Hortonworks DataFlow.
5 days ago - Focusing
on data flow, event processing is a big change in both computer and
data architecture. It is often enabled by Kafka, the messaging system created
and ...
>
|
Containers and
orchestration
|
Mar 29, 2019 - Docker containers in Kubernetes clusters give IT teams a new
framework for deploying big data systems, making it easier to spin up
additional infrastructure for ...
Sep 13, 2018 - Hortonworks
is joining with Red Hat and IBM to work together on a hybrid big data
architecture format that will run using containers both
in the cloud and on ...
Jan 15, 2019 - Containers and
the Kubernetes open source container orchestration
system will also play a significant role in Cloudera's development strategy,
Reilly said.
|
Performance and
application monitoring
|
Apr 4, 2019 - As application performance management vendors introduce new capabilities for users
moving big data cloud applications to the cloud, their focus often is on ...
You
visited this page on 4/15/19.
5 days ago - Data
integration performance has increased significantly by utilizing memory,
... These tools eliminate the need for a separate application server dedicated to ..
Big data
applications to drive Walmart reboot
We may have
outlived the era of killer apps in some part defined by Walmart, but Hadoop
big data applications may help the giant's quest for more growth.
|
Ingest new data
sources more rapidly
|
May 11, 2018 - The
GPUs can ingest a lot of data -- they can swallow it and process
it whole. People can leverage these GPUs with certain queries. For example,
they do ...
You've
visited this page 4 times. Last visit: 3/10/19
5 days ago - Streaming
and near-real-time data ingestion should also be a standard feature of integration
software, along with time-based and event-based data acquisition; ...
Feb 11, 2019 - Landers
said StoryFit has built machine learning models that understand story
elements. StoryFit ingests and maps whole texts of books and scripts, and
finds ...
|
Saturday, March 30, 2019
What Capt. Kirk's Internet is saying about Big Data
A data scientist is not a cog in the machine. And there is more to the profession than pushing buttons. Science is part art, and asking the right questions is not a talent that comes easily.Kirk Borne's Twitter feed is a continual font of data science and related know how. No wonder he consistently gets accolades as top blogger/tweeter etc. Here are some recent excerpts.
A #DataScientist is a multi-discipline integrator who uses the scientific method to extract knowledge from data; interprets it by asking the right questions to the right people (SMEs); then explains the new knowledge to the decision-makers in understandable terms.#DataScience pic.twitter.com/VGpcLQRHjG— Kirk Borne (@KirkDBorne) March 29, 2019
My friend George Lawton has been thinking about road traffic and AI and human cognition, even human empathy. Having watched or heard about at least a half dozen instances of road rage this week, I think he is on to something. What would TensorFlow do? WWTFD?
Checking out "How TensorFlow is helping in maintaining Road Safety" on Data Science Central: https://t.co/qSCcX79fL6— Jack Vaughan (@JackIVaughan) March 30, 2019
The cocktail approach has gained maturity in various fields. It's coming to data science.
Checking out "Ensemble Methods in One Picture" on Data Science Central: https://t.co/l2UJ4dZPtg— Jack Vaughan (@JackIVaughan) March 30, 2019
Thursday, March 28, 2019
Julia language
Haven’t been to an MIT open lecture for a while. Recently took in one that concerned Julia, an open source programming language with interesting characteristics.
The session was led by MIT math prof Alan Edelman. He said the key to the language was its support of composable abstractions.
An MIT News report has it that:“Julia allows researchers to write high-level code in an intuitive syntax and produce code with the speed of production programming languages,” according to a statement from the selection committee. “Julia has been widely adopted by the scientific computing community for application areas that include astronomy, economics, deep learning, energy optimization, and medicine. In particular, the Federal Aviation Administration has chosen Julia as the language for the next-generation airborne collision avoidance system.”
The language is built to work easily with other programming language, so you can sew things together. I take it that Julia owes debts to Jupyter, Python and R, and like them find use in science. Prof Edelman contrasted Julia's speed with that of Python.
In Deep Neurals as people work through gradients its like linear algebra as a scalar neural net problem these days, Edelman said. Julia can do this quickly, (it's good as a 'backprop')he indicated. He also saw it as useful in addressing the niggling problem of reporducibility in scientific experiments using computing.
Here are some bullet points on the language from Wikipedia:
*Multiple dispatch: providing ability to define function behavior across many combinations of argument types
*Dynamic type system: types for documentation, optimization, and dispatch
*Good performance, approaching that of statically-typed languages like C
*A built-in package manager
*Lisp-like macros and other metaprogramming facilities
*Call Python functions: use the PyCall package[a]
*Call C functions directly: no wrappers or special APIs
Also from Wikipedia: Julia has attracted some high-profile clients, from investment manager BlackRock, which uses it for time-series analytics, to the British insurer Aviva, which uses it for risk calculations. In 2015, the Federal Reserve Bank of New York used Julia to make models of the US economy, noting that the language made model estimation "about 10 times faster" than its previous MATLAB implementation.
Edelman more or less touts superior values for Julia versus NumPy. Google has worked with it and TPUs and machine learning [see Automatic Full Compilation of Julia Programs and ML Models to Cloud TPUs".
It's magic he says is multiple dispatch. Python does single dispatch on the first argument. That's one of the biggies. (Someone in the audience sees a predecessor in Forth. There is nothing new in computer science, Edelman conjects. Early people didnt see its applications to use cases like we see here, he infers. )Also important is pipe stability. What are composable abstractions? I don’t know. J. Vaughan
Related
http://calendar.mit.edu/event/julia_programming_-_humans_compose_when_software_does#.XJ1julVKiM9
http://news.mit.edu/2018/julia-language-co-creators-win-james-wilkinson-prize-numerical-software-1226
https://en.wikipedia.org/wiki/Julia_(programming_language)
https://www.nature.com/articles/s41562-016-0021
The session was led by MIT math prof Alan Edelman. He said the key to the language was its support of composable abstractions.
An MIT News report has it that:“Julia allows researchers to write high-level code in an intuitive syntax and produce code with the speed of production programming languages,” according to a statement from the selection committee. “Julia has been widely adopted by the scientific computing community for application areas that include astronomy, economics, deep learning, energy optimization, and medicine. In particular, the Federal Aviation Administration has chosen Julia as the language for the next-generation airborne collision avoidance system.”
The language is built to work easily with other programming language, so you can sew things together. I take it that Julia owes debts to Jupyter, Python and R, and like them find use in science. Prof Edelman contrasted Julia's speed with that of Python.
In Deep Neurals as people work through gradients its like linear algebra as a scalar neural net problem these days, Edelman said. Julia can do this quickly, (it's good as a 'backprop')he indicated. He also saw it as useful in addressing the niggling problem of reporducibility in scientific experiments using computing.
Here are some bullet points on the language from Wikipedia:
*Multiple dispatch: providing ability to define function behavior across many combinations of argument types
*Dynamic type system: types for documentation, optimization, and dispatch
*Good performance, approaching that of statically-typed languages like C
*A built-in package manager
*Lisp-like macros and other metaprogramming facilities
*Call Python functions: use the PyCall package[a]
*Call C functions directly: no wrappers or special APIs
Also from Wikipedia: Julia has attracted some high-profile clients, from investment manager BlackRock, which uses it for time-series analytics, to the British insurer Aviva, which uses it for risk calculations. In 2015, the Federal Reserve Bank of New York used Julia to make models of the US economy, noting that the language made model estimation "about 10 times faster" than its previous MATLAB implementation.
Edelman more or less touts superior values for Julia versus NumPy. Google has worked with it and TPUs and machine learning [see Automatic Full Compilation of Julia Programs and ML Models to Cloud TPUs".
It's magic he says is multiple dispatch. Python does single dispatch on the first argument. That's one of the biggies. (Someone in the audience sees a predecessor in Forth. There is nothing new in computer science, Edelman conjects. Early people didnt see its applications to use cases like we see here, he infers. )Also important is pipe stability. What are composable abstractions? I don’t know. J. Vaughan
Related
http://calendar.mit.edu/event/julia_programming_-_humans_compose_when_software_does#.XJ1julVKiM9
http://news.mit.edu/2018/julia-language-co-creators-win-james-wilkinson-prize-numerical-software-1226
https://en.wikipedia.org/wiki/Julia_(programming_language)
https://www.nature.com/articles/s41562-016-0021
Subscribe to:
Posts (Atom)