Brains vs. Brawn – Is endless scaling really the way forward?
The current race towards ever larger models by the Googles and OpenAIs of this world can be quite discouraging to everyday ML practitioners. Are we doomed to simply rent the giant models we will never have the compute and data to build ourselves? Is careful research and model architecture design useless in an age where the common wisdom is to crush every problem under billions of parameters?Regrettably model size and SOTA scores seem to be the main evaluation criteria for papers, research and models. In the meantime many other smart, useful or simply intriguing ideas fall by the wayside while everybody talks about how GPT-3 will replace programmers next year (it wont’t). So let’s try and look at other approaches to improve the quality of a model other than drowning it in data & compute.