Underspecified pipelines: Why good models underperform in production
Machine Learning models often exhibit poor performance when applied to real-world problems. Surprisingly, this effect is observed even when rigorous validation procedures have been followed, and the model generalization measured under laboratory conditions seems acceptable. Recent research identifies underspecified pipelines as the main culprit for models that score well on hold-out sets but perform poorly once operationalized.
In this advanced talk, we will go over the research results, explain the theoretical foundations of the underspecification effect, discuss several case studies using ensemble and deep learning models, and provide suggestions for training models with credible inductive biases. As part of the talk, we will also do a Python-based demo using some toy examples to get a better idea of how underspecification manifests itself.