Parvez Ahammad's Website
  • About
  • News
  • Research
  • Blog
  • Code

Parvez Ahammad / Blog

"Rube Goldberg Machine Learning" comes to web performance analytics

6/30/2016

0 Comments

 
Picture
Note: The title of this article is a play on the term "Rube Goldberg machine". Acccording to Wikipedia, a Rube Goldberg machine is a contraption, invention, device, or apparatus that is deliberately over-engineered to perform a simple task in a complicated fashion, generally including a chain reaction. Keep that in your mind.

Everyone (that cares about ML) knows about supervised / unsupervised / semi-supervised learning pipelines. I have now come across an entirely new class of ML pipelines that I shall call "Rube Goldberg Machine Learning" pipelines. Before I go on and explain what I mean, let me provide some context.

Last week, I attended Velocity conference in Santa Clara, CA. For those of you who are unfamiliar, Velocity conference is a popular enterprise-oriented (non-academic) conference focused on web performance and DevOps topics. I was very excited to see a machine learning talk in the program: "Using machine learning to determine drivers of bounce and conversion". Apparently it was the first ML talk ever at this venue (for a field that produces insane amount of data, I don't know why ML doesn't show up more often in Velocity). So, yay! Given what I know about the prior work of Pat Meenan and Tammy Everts, I had high hopes for the talk and the potential findings. Neither of them are ML practitioners, but both of them have done stellar work in web performance community before. Unfortunately my excitement quickly dissipated within the first few slides (click the link to the slides if you are curious). Several web performance folks told me informally that the conclusions of the talk didn't seem right, because the conclusions appeared to go against the conventional wisdom in the web performance field. I have no problem with going against conventional wisdom; sometimes it is nice to correct long-held misconceptions if there's good evidence. My disappointment mainly stems from the misuse of ML models in this talk. Instead of simply venting, let me break down the good/bad/ugly aspects of this talk:

The Good:
  • It's the first ML talk at Velocity. Yay!
  • Tammy Everts and her team had put together a nice dataset of session by session web perf metrics for commercial websites, along with business-critical measures like conversion and bounce rates.
  • Pat Meenan correctly emphasized to the audience that barrier to entry for playing with ML algorithms is low these days, given the immense amount of work that has gone into various open-source ML libraries.
  • The models used in the talk are posted on GitHub - helpful gesture.

The Bad:
  • No data was shared. Attempts to ask for anonymized data sharing weren't welcomed. As a machine learning person (who likes open data sharing), I found myself confused at this behavior.
  • The shared code on GitHub is a few lines of Python code that basically calls some off-the-shelf ML algorithms, so really it is just code that calls some other code. Most ML people can write this code, so what's the value in open-sourcing such code? There's no algorithmic addition here. There's also no real software that's being open-sourced.
  • The talk is rated very highly by the non-ML audience, and people were only debating the web performance aspects of the conclusions. Barely anyone spoke up about the glaring modeling problems in the talk. May be it is not common to have ML folks in the audience.
  • This was a clear "vendor pitch" from Soasta that was very cleverly masked in the colorful clothes of ML word cloud. How did this get into Velocity that supposedly hates vendor pitches?

The Ugly:
  • What happened to the good-old multi-variate regression? Looking at the description of the dataset, I bet $0.02 that straight-up linear regression with multiple variables would have given them 80% accuracy with something to interpret.
  • Both the attempted models were complex models that aren't easy to interpret. They are also too complicated for what they are trying to do (hence the title of my blog post). Interpreting DNN variable importance is still an open question anyway (this talk didn't solve it). If you see the title of the talk, and you look at the two models they presented, you honestly wonder why they chose black-box-like models for their experiments.
  • Models didn't take into account correlation within input variables, and potential group structures that have strong influence on variable selection and variable importance calculations. This means that the conclusions they drew, and the variables they think are important - none of it is believable. Forget about what is conventional wisdom in web performance; the model selection methodology itself renders the conclusions of the talk to be moot.
  • There were no ROC curves or accuracy numbers. Really, there was ZERO information in the talk that a machine-learning enthusiast can use to convince him or herself that these results can be trusted.
  • The talk title and some of the presentation tried to make causal associations but the methods used in this talk absolutely cannot provide such insights.
  • It would have been at least interesting if they tried Random Forest model on the top-K features found through the gini-index (that I am guessing they used for variable selection). If they wanted to say here are top-6 features, they should have shown how the restricted 6-variable model behaves. It's a very basic and fixable mistake. I have communicated these thoughts with the speakers already - so hopefully there will be progress on this front.

I could go on further about other nitty-gritty, but let me stop here and summarize. I saw a first-ever ML talk at Velocity that didn't really teach me anything about web performance despite analyzing a million sessions worth of data.  The talk generated a lot of buzz on the basis of disrupting conventional wisdom while the models are highly questionable. I think what allowed the talk to fly through is the use of over-engineered and overly-complicated ML pipelines that aren't interpretable. This is the class of ML that I am going to call "Rube Goldberg Machine Learning" from now on.

As a recent ICML talk title says: "Friends Don’t Let Friends Deploy Models They Don’t Understand".
0 Comments

    Opinions / Thoughts / Ramblings - all personal.

    Archives

    October 2016
    July 2016
    June 2016
    May 2016

    Categories

    All
    AI
    Computer Vision
    Machine Learning
    Science
    Statistics
    Technology
    User Experience
    Web Performance

    RSS Feed

Proudly powered by Weebly
  • About
  • News
  • Research
  • Blog
  • Code