Magical Thinking: Data Agnostic Algorithms

Today I had a good chuckle with a colleague about why the Data Science team couldn’t use a Dev database with a limited subset of sample data in the schema of Prod to develop their machine learning models — the amount and complexity of the data you have completely drives the predictive ability of your modeling.

It was a funny conversation because two weeks prior, a project manager at a different organization had mentioned trying to do some proto-types of fuzzy matching with a “Data Agnostic Algorithm” which amounted to nearly the same faulty suggestion.

For BI reporting often if you have a small sample of the data in the structure your production data will be delivered you can begin your viz dev work, build out the UI and make a lot of progress before your data gets finalized.

Lots of folks get wowed or scared by machine learning because they don’t know that beneath the surface its a lot of understandable statistical concepts you probably learned in high school. We think that computers have cognition and freewill instead of pattern recognition capabilities and the ability to do arithmetic very quickly and accurately.

My thought construct for machine learning is that its like the Ironman suit that Tony Stark uses to magnify his human abilities more so than Terminator which is not human at all and famously can go rouge.

If you think of machine learn like the Maven Analytics graphic below as the natural progression of increasingly complex analysis and data its not so mystifying. If you believe that machine learning (ironman suit) is driven by an analytical person (Tony Stark) you can acknowledge the power an fallibility of the toolset.

You can remember that its not auto-magical, that the data you use to train your models directly drives their accuracy and predictive capabilities and that YOU are responsible for understanding and predicting deficits of your data, methods and applications.

Leave a comment