Nova Community

Towards an AI just for the elite?

Nova Member | Machine Learning Researcher at Morgan Stanley

At the beginning of the century, we witnessed a rapid development of AI in general, and in particular in Machine Learning. Such growth was possible thanks to three factors:

1. Availability of enough data required to train the Machine Learning models.

2. Breakthroughs within Machine Learning subfields such as Computer Vision or NLP.

3. Graphic Compute Units (GPUs) brought cheap compute power and massive parallelization.

These elements enabled the development of the field and the increasing interest from both, industry and academia. As a consequence, Machine Learning and Data Science are now two of the hot topics within technology attracting talent and investment like never before.

However, the race to beat the latest metrics (in contests such as ImageNet) and the eagerness from tech companies to position themselves as leading innovators is making reverting the path walked during the last decades. In certain fields, algorithms are becoming harder to run and ridiculously expensive to train to the point where just an elite set of companies can afford it. If the research community worked during the last few decades towards making AI more accessible, recent advances seem to make the field less democratic. In my opinion, there are multiple reasons behind this trend but I will only mention one and elaborate on a second idea.

Recent advances seem to make AI less accessible to the broader public, why is so?

There is an increasing number of papers whose results cannot be reproduced since the datasets are not public. Having the algorithm is no longer good enough to use it and companies know it. Joelle Pineau is one of the scientists trying to change this issue, along with many others [1].

The second and more worrisome factor is the exponential growth in the number of parameters which makes it impossible, for most companies, to reproduce the results or use them in production. There is a trend to increase the size of the models in order to obtain algorithms with better generalization capabilities, for instance, language models. The latest version of GPT, developed by OpenAI, has a total number of 175 billion parameters [2]. Probably that does not tell you much but there is a universal language all of us understand. Training GPT-3 costs more than $12 million dollars [3] and most likely, they needed many attempts to come up with the right configuration which can bring the cost up to hundreds of millions… and just for one model! Not many companies can spend that much money on research, let alone training one single algorithm. We can find the following quote on GPT-3 paper [4]:

Unfortunately, a bug in the filtering caused us to ignore some overlaps, and due to the cost of training, it was not feasible to retrain the model.

The following graph, adapted from DistilBERT [5], shows the growth in the number of parameters in some of the latest Language Models. There is a clear exponential trend that makes everyone wonder if the interesting results of the model are due to its size or its architecture.

References
[1] https://www.wired.com/story/artificial-intelligenc…
[2] https://www.technologyreview.com/2020/07/20/100545…
[3] https://lambdalabs.com/blog/demystifying-gpt-3/
[4] https://arxiv.org/pdf/2005.14165.pdf
[5] https://arxiv.org/pdf/1910.01108.pdf

‍

If you would like to be featured as a guest writer in the Blog, you can submit your article idea here and we will get in touch with you soon.

More from the community

Go back to Blog

Podcast Pills

The JME partner who builds in public: Iván Landabaso on VC, content and non-linear careers

Nova

Podcast Pills

Nova acquires Revenue Squared: inside the deal and the future of B2B sales

Nova

Podcast Pills

Karpathy joins Anthropic, Lovable hits $400M ARR and why PLG wins

Nova