Normcore Tech

Subscribe
Archives
May 24, 2019

The commoditization of data science

The Factory at Asnieres, Van Gogh, 1887

One of my favorite subreddits is (sorry in advance) r/programmingcirclejerk, because it offers a place to call out some of the more ridiculous, lofty statements that people make about various programming languages. Sometimes it can be mean-spirited, but often it’s right on the mark. One of the recent posts was a quote from a blog post that said,

I often think Python is too easy. Can you really call it "programming" if you can generate classification predictions with only 6 lines of code? Especially if 3 of those lines are a dependency and your training data, I would argue that someone else did the real programming.

The discussion of the quote centered around the fact that it’s ridiculous to rebuild programming APIs from scratch, when there is already a full set of them pre-built by a community that specializes in that particular problem.

This cut to the heart of a trend I’ve been thinking about recently: how the process of data science itself is becoming a commodity.

To be clear, not analysis. Data analysis will never be able to be automated because it involves too much business logic, trial and error, and human involvement. But the data science models and the underlying algorithms, the pieces of code that go something like this:

import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score

Split the data into training/testing sets

diabetes_X_train = diabetes_X[:-20] diabetes_X_test = diabetes_X[-20:]

Split the targets into training/testing sets

Want to read the full issue?
website
Powered by Buttondown, the easiest way to start and grow your newsletter.