Prep work IS the work

Why do we act like prep work is unique to data science?

Across the street from daycare this morning was a guy sanding the side of a garage, presumably prior to repainting it and there were construction crews replacing water mains tearing up the street.

My friend Mark likes to joke that they shouldn’t call it “painting” they should call it prepping because rolling on some paint is so much less work than spackling, sanding, taping and and doing all the prep work prior to painting.

With all the Auto ML packages its not surprising that data scientists spend relatively little time modeling compared to staging the data for their analysis, talking to stakeholders and doing all the prep work to make sure they are solving the right problems with the right data.

Auto ML is like a paint sprayer — yes, its minimized one bottleneck in the process, but it didn’t minimize all the time and effort in data science as an end to end process.

Imagine a painter who wouldn’t tape or sand? Seems to me like a data scientist who won’t manage data prep or pipelines.

Leave a comment