Dev / Test / Prod

I’m starting to understand the ETL – ELT revolution.

Test is never an exact copy of Prod, even when its structurally the same. Sometimes it has less memory, almost always it has less data and its probably not visible to the same users.

Like a parent who buys extra back up sets of their child’s lovie, you know you can’t fully replicate Prod in lower environments no matter how hard you try.

If all you are building is a basic relational database with some new tables or fields then it usually works pretty well to develop on your own machine with very limited sample data, push your changes to Git and have someone Test in the shared instance.

If you are developing a mathematical model – e.g. trying to do time series forecasting or recommend target prices — the volume and complexity of your data is essential to the dev and unit testing process. You may be tempted to develop right in Production.

Instead, now a whole host of (ELT) tools enable you to use the full Prod dataset and schema in a siloed safe place to train and test your models without literally copying it over into Test.

I’m not sure if all the tools and systems required to make this process “easy” and seamless exists yet and we just haven’t adopted them, or if collaborating with other developers and sufficiently testing your code is inherently complicated and possibly intractable.

I do think though that the traditional process flow of clear delineation between Dev / Test / Prod is becoming rapidly outdated and the new development process practices and tools show a lot of promise for meeting ML use cases.

Also, don’t worry, we found “Soft” Duckie in a flower box after a night of disappointment and searching.

Leave a comment