Evaluating machine learning solutions for subjective problems
Machine Learning research uses massive curated datasets and mathematical metrics to evaluate their methods, which allows models to be compared up to decimal points of percentages. While these metrics are not always perfect, they mostly provide a good indication of how well the model performs. When deploying machine learning systems in the real world, we often encounter problems where it’s harder to evaluate the quality of the results.
This is especially the case when dealing with solutions that can only be evaluated subjectively, where the model decision influences the outcome, or where the possible result set is so large that it’s impossible to create a golden evaluation set. Prime examples of this are search engines and recommender systems, where all three of these characteristics come into play together, but there’s a wide range of machine learning solutions that are hard to evaluate due to at least one of these characteristics.
The only way to really evaluate these systems is by running them in production and analyzing the results, which can be risky, costly, and too late. In this talk, we’ll examine some alternative and intermediate evaluation methods, which can help you build and evaluate these systems with more confidence and lower risk.