ML training compute has been doubling every 6 months since 2010

gilbetron · on Feb 15, 2022

Machine learning related, the transcription capability of Zoom (which uses some form of machine learning) was amazing me this morning. I was watching a recording of a zoom session that was a mock interview to help learning a new interview question. In this situation, there were two people sitting by each other in an echo-y conference room, being recorded by a zoom tablet thing microphone. They were talking to each other, both were french but speaking english and so had moderately heavy accents, both had masks on, and were talking about a coding problem. The accuracy of the transcription was blowing my mind. Really the only time it stumbled was for very technical terms.

dopidopHN · on Feb 15, 2022

I have a tick French accent, in regards to auto subtitles, a switch happened 3 or 4 years ago in my experience.

axg11 · on Feb 15, 2022

For those in the field, the progress of model performance feels just as rapid. Language models today are incredibly powerful compared to language models in the early 2010s.

I think the advent of retrieval models (retrieval transformers) will continue this compute trend in a more efficient manner. They allow focusing of the compute onto indexing of the knowledge.

Datenstrom · on Feb 15, 2022

While this is true there is also a lot of interesting research going on that makes model training and adaptation to new tasks much more efficient. For example in meta-learning methods like Model Agnostic Meta-Learning (MAML)[1] you learn a model that learns to initialize weights for another model such that it converges very quickly for a new task.

For datasets benchmarking new task adaptation there are Meta-Dataset (MD)[2] for Meta-Learning, and the Visual Task Adaptation Benchmark (VTAB)[3] for Representation Learning. Recently to compare the two approaches VTAB+MD[4] was created. Of course, there is also model quantization[5], mixed precision, sparse networks[6], brain floats[7].

Well I've dropped a lot there, but there is a lot more outside the mainstream. I've been building something on a tight budget for a few years so this has been important for success there. We are are very focused on being data and compute efficient.

[1]: https://proceedings.mlr.press/v70/finn17a/finn17a.pdf

[2]: https://arxiv.org/abs/1903.03096

[3]: https://arxiv.org/abs/1910.04867

[4]: https://openreview.net/pdf?id=Q0hm0_G1mpH

[5]: https://arxiv.org/abs/2105.08819

[6]: https://arxiv.org/abs/2112.13896

[7]: https://en.wikipedia.org/wiki/Bfloat16_floating-point_format

joe_the_user · on Feb 15, 2022

It looks like a useful article showing the trend OpenAI described a few years ago has continued.

One thing I'd object to is coining (or maybe just using?) the term "Large-Scale Era". This seems like polluting the discussion with a meaningless phrase.

The basic message is essentially, deep learning continues to grow in scale exponentially, many people consider the dividends still worth the costs. No need to "sexy things up" with what no more than a marketing term.

Tamaybes · on Feb 15, 2022

The result about recent compute trends is different from the recent trends described by OpenAI. In particular, they find a 3.5-month doubling time over the Deep Learning Era, whereas the paper finds a 6-month doubling time.

I think the Large-Scale Era does point to a new phenomenon that emerged pretty discontinuously, which is that there are now 'two lanes' in ML scaling. Prior to 2015, academic and industry would train roughly similarly compute intensive models. Since then, a small number of industry players frequently train models with 10-100x more compute than what the typical researcher uses.

joe_the_user · on Feb 15, 2022

The thing is that the advent of deep learning was a very big change in the sense you had a general purpose method appear that you could use to throw computing power at many/most problems (and tune a bit but still) and get results that previously you couldn't get (and when did get results, you required domain experts). No doubt we have changes within the trajectory of this escalating brute force solutions. But relative changes in this paradigm seem fundamentally different than the initial advent of the paradigm.

aaronsimpson · on Feb 16, 2022

A few years ago at an AI conference (might've been NeurIPS), there was a paper on considering environmental impact as an important facet in AI safety/responsibility. I've been thinking about it for a long time, and this only makes me think more. At what point do we consider carbon emissions relative to compute as a valid concern?

Retric · on Feb 16, 2022

Compute going carbon neutral is largely the same as the electric grid going carbon neutral. So IMO, it’s not worth focusing on specifically in the long term.

Something about environmental problems causes people to focus on vary narrow types of harm rather than the major ones like habitat destruction. Domestic cats kill ~5 orders of magnitude more birds than windmills, yet somehow bird strikes of windmills is in the public continuous. I think it comes down to the inability to really grasp the difference between large numbers combined with specific harms used as distractions.

lebed2045 · on Feb 16, 2022

+1 to this. I only recently learned that concrete/cement production is major factor of co2 emission that significantly bigger than a meat production for example. Sounds like modifying this production method is more effective way to fight global warming than me start to be vegetarian.

PeterisP · on Feb 16, 2022

The impact of meat on global warming is mostly through methane, not CO2. So comparing its CO2 emissions to concrete industry CO2 emissions is not an appropriate comparison, you should compare "CO2-equivalent of all meat-related greenhouse gases" to the CO2 emitted by concrete industry CO2.

seltzered_ · on Feb 16, 2022

See https://computingwithinlimits.org/2021/ , https://arxiv.org/abs/1907.10597

mistrial9 · on Feb 15, 2022

hehe - DeepLearning that is, my supervised models work fine without a compute infrastructure the size of a major naval vessel thanks!

jwozn · on Feb 16, 2022

Deep learning can be supervised - it all depends on the application.

chickenpotpie · on Feb 15, 2022

https://en.m.wikipedia.org/wiki/Wirth%27s_law

AlexanderTheGr8 · on Feb 16, 2022

I am in software (primarily in optimization) and I very much want that law to be true, but I am pretty sure it's not.

We use multiple GPUs at 100% capacity (all cores) for ML training.

When need be, we use parallelization/concurrency to use every core in computer for our application.

Almost no computer application today buffers while processing (thanks to parallelization / concurrency)

Software can improve a lot faster than hardware can (due to low capital required)

wirthjason · on Feb 15, 2022

This is one of my favorite laws!

itronitron · on Feb 15, 2022

At what point should this start to have a noticeable result outside of the profession?

sbierwagen · on Feb 17, 2022

GPT-3 has 175 billion parameters, and is not quite human-level. Cerebras is talking about building out clusters that can handle 100 trillion parameters, about a thousand times bigger than GPT-3.

This hypothetical GPT-4 would be big enough that it shouldn't have the context window or BPE token problems of GPT-3. It would be superhumanly good at predicting token sequences-- generating text. What that would look like, exactly, is not entirely clear. (What does it mean to be twice as good as a human at writing an essay?) If it had the same architecture as previous GPTs then it wouldn't be "conscious" or be goal directed, but it would be able to correlate more information and draw conclusions that we couldn't. (Would we understand these conclusions is another open question.)

It would also be quite expensive to run in a capital-amortized sense, at first, maybe hundreds of dollars per word. The commercial use of such a thing would be limited. Large tech companies seem to be building AI because of how self-evidently useful it will be... eventually.

The economic case for high-cost NNs is the opposite of most automation, which started from the bottom up. If a net is expensive to train and run then you have to pursue the Tesla strategy-- start from the top down. So you need to target high-price knowledge work in terms of dollars per hour, but is tolerant of small variance, since the results are still probabilistic. In the near future you won't see NNs designing jet engines, producing complete software systems from requirements documents, or even standard grunt-level software engineering work of wiring two systems together, since you can never be quite sure GPT-programmer is producing the right work.

Similarly, GPT-lawyer or GPT-CEO would be tough to do. GPT-hollywood would be interesting, if you could get it to crank out a complete Marvel movie with superhumanly good CGI. More terrifyingly, GPT-advertiser, if it can produce super-appealing ads, would be able to pay for its own runtime at the cost of a machine hijacking human minds to plug money into gacha games or the equivalent in the 2030s.

coolspot · on Feb 15, 2022

Please watch these documentaries:

Terminator, Terminator 2

ellis0n · on Feb 15, 2022

Growth by 1e+7 per ten years from 1e+14 2011 to 1e+21 2021 and should be 1e+28 2030. Probably Doom, host OS and player bot will be generated and played by AI from 1e+28 2030.

2011 1e+14 = 100 000 000 000 000

2021 1e+21 = 1 000 000 000 000 000 000 000 <- you are here

2031 1e+28 = 10 000 000 000 000 000 000 000 000 000

runnerup · on Feb 18, 2022

HN's Monospace feature might be helpful for this. Double-space before beginning each line.

uses · on Feb 15, 2022

What's a good next step after learning "classical" neural nets, i.e. backpropagation ANNs?

I've been working with MNIST with C and CUDA with dynamic parallelism for a couple months and it's been extremely enlightening but I'm kind of ready to move on.

CNNs maybe?

nefitty · on Feb 15, 2022

What would you recommend for a complete beginner? I found these lectures that have helped me start sketching out the field in my mind, but it's still difficult: https://www.davidsilver.uk/teaching/

jstx1 · on Feb 15, 2022

David Silver's lectures are on reinforcement learning which is different from deep learning; I definitely wouldn't start there. One good starter book is Hands-On ML by Aurelien Geron - https://learning.oreilly.com/library/view/hands-on-machine-l... but there's tonnes of others (and the recommendations will kind of vary based on your background).

nefitty · on Feb 16, 2022

Sweet, that helps. Thank you!

ausbah · on Feb 15, 2022

linear regression is what most classes start with

nefitty · on Feb 16, 2022

Thanks! I'll look into this.

mcbuilder · on Feb 15, 2022

CNNs are a classic class of ANNs, still widely used in computer vision applications. You will want to look into the transformer architecture though after that, as they are all the rage these days, especially in NLP tasks.

jstx1 · on Feb 15, 2022

CNN, RNN, generative models, autoencoders, transformers

Victerius · on Feb 15, 2022

Why spend billions of dollars on armies of PhDs when you can create a better organic intelligence in 9 months at a fraction of the cost?

Why try to beat nature?

I'm being only semi sarcastic here.

ska · on Feb 15, 2022

> Why spend billions of dollars on armies ...

The vast majority of AI research (over all time) has not been seriously intended or expected to arrive at a general intelligence.

Sure there are some true believers there in some sort of handwavy construction argument ("and then a miracle occurs") and there have been serious research attempts to understand intelligence, of limited scope an resources. But to be clear, whether this wave or previous ones, the money washes in and the army of PhD's happens when there is a sniff of practical, usable algorithmic results from machine learning. It's not the same thing.

Parallel application of even relatively stupid algorithms targeted at the right tasks will rapidly outproduce any reasonable number of organic intelligences of the type you describe :)

cossatot · on Feb 15, 2022

Plus 20 years of intensive training after that 9 months...

Or maybe your newborn is more capable than mine!

spyder · on Feb 15, 2022

Because it's software and once you created it you can duplicate it, customize it with the fraction of the cost and time.

"Why try to beat nature?"

If we wouldn't try "beat nature" we would be still in caves or would be extinct. Nature is "beating" itself with evolution and we are part of that.

_oghd · on Feb 15, 2022

because the kind of work we want the machines to do is degrading and monotonous

oscar wilde's the soul of man under socialism probably better describes the idea.

AnEro · on Feb 15, 2022

I wish that's why we all are doing this

qualudeheart · on Feb 16, 2022

Those “organic intelligences” need to grow up for 18 years and might not like you.

jacquesm · on Feb 15, 2022

Because a newborn needs another 18 years or so of nurturing and education to become useful, this costs so much that quite a few countries see no qualms in saddling these new humans with the debt of their education.

abecedarius · on Feb 16, 2022

The organic intelligence needed help on protein folding, for one.

AnEro · on Feb 15, 2022

Nature's $7 an hour general compute nodes are still too expensive to not burn money to try to make an equivalent at the cost of electricity and initial purchase.

anonymouse008 · on Feb 15, 2022

I honestly thought you meant waiting for the next iteration doubling of ML model performance/training…

which I thought was brilliant, ha!

mistrial9 · on Feb 15, 2022

deep, real-time scans of digital media streams, subscription behavior, market responses.. things like that, where the inputs are digital already

faangiq · on Feb 15, 2022

The other way is more fun too.

throwawaynay · on Feb 15, 2022

AI is already more (cost)efficient at some tasks than humans

nootropicat · on Feb 15, 2022

1. Because technology to clone an existing human (not just the body) doesn't exist, especially into a more durable substrate

2. Because ai doesn't have biological scaling limits. There's nothing particularly special about human level of intelligence.

rowanG077 · on Feb 15, 2022

> There's nothing particularly special about human level of intelligence.

Big claim since nothing in the known universe has human level intelligence besides humans. It's like holding the original declaration of independence and saying that there is nothing particularly special about this piece of paper.

nefitty · on Feb 15, 2022

This analogy breaks down easily.

What do you propose is special about human-level intelligence? If anything, it is in constant struggle with biological and emotional needs. Yes, of course, human brains are amazing, the declaration of independence is great, but the entire human project is the push for progress.

rowanG077 · on Feb 15, 2022

I can't say what is special about human-level intelligence. Most importantly because the scientific community(and by extension me) can't define what human-level intelligence means. It's just too illusive at this point. That something is undefined by humans doesn't mean you can simply say it's not special.

PeterisP · on Feb 16, 2022

There certainly has been a lot of work on defining and measuring intelligence. An example summarizing some of that work is Legg&Hutter https://openresearch-repository.anu.edu.au/bitstream/1885/15...

IMHO in the community there aren't any significant objections to the Legg & Hutter definition proposed in that document "Intelligence measures an agent’s ability to achieve goals in a wide range of environments." - perhaps you can tweak the wording some more, but that's the direction implied by people talking about building general intelligence and (at some future point) human-comparable general intelligence; in essence it's about 'wide-coverage' intelligence of being efficient at different purposes and figuring out what is required to be successful for these purposes, as opposed to narrow single-task specific effectiveness; but it's essentially a metric on which one quite reasonably can imagine something being equivalent to or better than the average (or x-th percentile) human.

rowanG077 · on Feb 16, 2022

With that definition you can argue humans aren't intelligent at all. The environments we operate in are very limited compared to for example extremophiles. I doubt anyone would argue extremophiles are of genius level intellect compared to humans. That definition can be falsified in dozens of ways. That doesn't mean it can't be useful of course.

PeterisP · on Feb 16, 2022

IDK, humans can achieve all kinds of goals not only in temperate climates but also underwater, in the Antarctic and even in outer space; and in the environments where extremophiles operate we can do all kinds of interesting things that they can't.

It's not about the capabilities of unassisted body - I can influence stuff in a volcano without sticking my bare hands in it.

RavlaAlvar · on Feb 15, 2022

Buy the Nvidia dips

egberts1 · on Feb 15, 2022

10B?

- Is that 2 (as in binary 10)?

- 108 as in a typo, or

- an order of magnitude of 6 thereof?

dang · on Feb 15, 2022

HN's software replaces "10 billion" with "10B". Most of the time it's clear.

However, I've replaced the submitted title ('ML training compute has been growing by a factor of 10B since 2010') with what the tweet says now.

microtonal · on Feb 15, 2022

10 billion. The number of FLOPs has gone from ~1e+14 (6 layer MLP MNIST) to ~1e24 (Megatron-turing NLG).

edit: s/parameters/FLOPs

Tamaybes · on Feb 15, 2022

Not parameters, the amount of FLOPS required to train the model.

microtonal · on Feb 15, 2022

Whoops, thanks! That's what I get for typing too quickly between household duties.