Machine learning related, the transcription capability of Zoom (which uses some form of machine learning) was amazing me this morning. I was watching a recording of a zoom session that was a mock interview to help learning a new interview question. In this situation, there were two people sitting by each other in an echo-y conference room, being recorded by a zoom tablet thing microphone. They were talking to each other, both were french but speaking english and so had moderately heavy accents, both had masks on, and were talking about a coding problem. The accuracy of the transcription was blowing my mind. Really the only time it stumbled was for very technical terms.
For those in the field, the progress of model performance feels just as rapid. Language models today are incredibly powerful compared to language models in the early 2010s.
I think the advent of retrieval models (retrieval transformers) will continue this compute trend in a more efficient manner. They allow focusing of the compute onto indexing of the knowledge.
While this is true there is also a lot of interesting research going on that makes model training and adaptation to new tasks much more efficient. For example in meta-learning methods like Model Agnostic Meta-Learning (MAML)[1] you learn a model that learns to initialize weights for another model such that it converges very quickly for a new task.
For datasets benchmarking new task adaptation there are Meta-Dataset (MD)[2] for Meta-Learning, and the Visual Task Adaptation Benchmark (VTAB)[3] for Representation Learning. Recently to compare the two approaches VTAB+MD[4] was created. Of course, there is also model quantization[5], mixed precision, sparse networks[6], brain floats[7].
Well I've dropped a lot there, but there is a lot more outside the mainstream. I've been building something on a tight budget for a few years so this has been important for success there. We are are very focused on being data and compute efficient.
It looks like a useful article showing the trend OpenAI described a few years ago has continued.
One thing I'd object to is coining (or maybe just using?) the term "Large-Scale Era". This seems like polluting the discussion with a meaningless phrase.
The basic message is essentially, deep learning continues to grow in scale exponentially, many people consider the dividends still worth the costs. No need to "sexy things up" with what no more than a marketing term.
The result about recent compute trends is different from the recent trends described by OpenAI. In particular, they find a 3.5-month doubling time over the Deep Learning Era, whereas the paper finds a 6-month doubling time.
I think the Large-Scale Era does point to a new phenomenon that emerged pretty discontinuously, which is that there are now 'two lanes' in ML scaling. Prior to 2015, academic and industry would train roughly similarly compute intensive models. Since then, a small number of industry players frequently train models with 10-100x more compute than what the typical researcher uses.
The thing is that the advent of deep learning was a very big change in the sense you had a general purpose method appear that you could use to throw computing power at many/most problems (and tune a bit but still) and get results that previously you couldn't get (and when did get results, you required domain experts). No doubt we have changes within the trajectory of this escalating brute force solutions. But relative changes in this paradigm seem fundamentally different than the initial advent of the paradigm.
A few years ago at an AI conference (might've been NeurIPS), there was a paper on considering environmental impact as an important facet in AI safety/responsibility. I've been thinking about it for a long time, and this only makes me think more. At what point do we consider carbon emissions relative to compute as a valid concern?
Compute going carbon neutral is largely the same as the electric grid going carbon neutral. So IMO, it’s not worth focusing on specifically in the long term.
Something about environmental problems causes people to focus on vary narrow types of harm rather than the major ones like habitat destruction. Domestic cats kill ~5 orders of magnitude more birds than windmills, yet somehow bird strikes of windmills is in the public continuous. I think it comes down to the inability to really grasp the difference between large numbers combined with specific harms used as distractions.
+1 to this. I only recently learned that concrete/cement production is major factor of co2 emission that significantly bigger than a meat production for example. Sounds like modifying this production method is more effective way to fight global warming than me start to be vegetarian.
The impact of meat on global warming is mostly through methane, not CO2. So comparing its CO2 emissions to concrete industry CO2 emissions is not an appropriate comparison, you should compare "CO2-equivalent of all meat-related greenhouse gases" to the CO2 emitted by concrete industry CO2.
GPT-3 has 175 billion parameters, and is not quite human-level. Cerebras is talking about building out clusters that can handle 100 trillion parameters, about a thousand times bigger than GPT-3.
This hypothetical GPT-4 would be big enough that it shouldn't have the context window or BPE token problems of GPT-3. It would be superhumanly good at predicting token sequences-- generating text. What that would look like, exactly, is not entirely clear. (What does it mean to be twice as good as a human at writing an essay?) If it had the same architecture as previous GPTs then it wouldn't be "conscious" or be goal directed, but it would be able to correlate more information and draw conclusions that we couldn't. (Would we understand these conclusions is another open question.)
It would also be quite expensive to run in a capital-amortized sense, at first, maybe hundreds of dollars per word. The commercial use of such a thing would be limited. Large tech companies seem to be building AI because of how self-evidently useful it will be... eventually.
The economic case for high-cost NNs is the opposite of most automation, which started from the bottom up. If a net is expensive to train and run then you have to pursue the Tesla strategy-- start from the top down. So you need to target high-price knowledge work in terms of dollars per hour, but is tolerant of small variance, since the results are still probabilistic. In the near future you won't see NNs designing jet engines, producing complete software systems from requirements documents, or even standard grunt-level software engineering work of wiring two systems together, since you can never be quite sure GPT-programmer is producing the right work.
Similarly, GPT-lawyer or GPT-CEO would be tough to do. GPT-hollywood would be interesting, if you could get it to crank out a complete Marvel movie with superhumanly good CGI. More terrifyingly, GPT-advertiser, if it can produce super-appealing ads, would be able to pay for its own runtime at the cost of a machine hijacking human minds to plug money into gacha games or the equivalent in the 2030s.
Growth by 1e+7 per ten years from 1e+14 2011 to 1e+21 2021 and should be 1e+28 2030. Probably Doom, host OS and player bot will be generated and played by AI from 1e+28 2030.
2011 1e+14 = 100 000 000 000 000
2021 1e+21 = 1 000 000 000 000 000 000 000 <- you are here
What's a good next step after learning "classical" neural nets, i.e. backpropagation ANNs?
I've been working with MNIST with C and CUDA with dynamic parallelism for a couple months and it's been extremely enlightening but I'm kind of ready to move on.
What would you recommend for a complete beginner? I found these lectures that have helped me start sketching out the field in my mind, but it's still difficult: https://www.davidsilver.uk/teaching/
David Silver's lectures are on reinforcement learning which is different from deep learning; I definitely wouldn't start there. One good starter book is Hands-On ML by Aurelien Geron - https://learning.oreilly.com/library/view/hands-on-machine-l... but there's tonnes of others (and the recommendations will kind of vary based on your background).
CNNs are a classic class of ANNs, still widely used in computer vision applications. You will want to look into the transformer architecture though after that, as they are all the rage these days, especially in NLP tasks.
The vast majority of AI research (over all time) has not been seriously intended or expected to arrive at a general intelligence.
Sure there are some true believers there in some sort of handwavy construction argument ("and then a miracle occurs") and there have been serious research attempts to understand intelligence, of limited scope an resources. But to be clear, whether this wave or previous ones, the money washes in and the army of PhD's happens when there is a sniff of practical, usable algorithmic results from machine learning. It's not the same thing.
Parallel application of even relatively stupid algorithms targeted at the right tasks will rapidly outproduce any reasonable number of organic intelligences of the type you describe :)
Because a newborn needs another 18 years or so of nurturing and education to become useful, this costs so much that quite a few countries see no qualms in saddling these new humans with the debt of their education.
Nature's $7 an hour general compute nodes are still too expensive to not burn money to try to make an equivalent at the cost of electricity and initial purchase.
> There's nothing particularly special about human level of intelligence.
Big claim since nothing in the known universe has human level intelligence besides humans. It's like holding the original declaration of independence and saying that there is nothing particularly special about this piece of paper.
What do you propose is special about human-level intelligence? If anything, it is in constant struggle with biological and emotional needs. Yes, of course, human brains are amazing, the declaration of independence is great, but the entire human project is the push for progress.
I can't say what is special about human-level intelligence. Most importantly because the scientific community(and by extension me) can't define what human-level intelligence means. It's just too illusive at this point. That something is undefined by humans doesn't mean you can simply say it's not special.
IMHO in the community there aren't any significant objections to the Legg & Hutter definition proposed in that document "Intelligence measures an agent’s ability to achieve goals in a wide range of environments." - perhaps you can tweak the wording some more, but that's the direction implied by people talking about building general intelligence and (at some future point) human-comparable general intelligence; in essence it's about 'wide-coverage' intelligence of being efficient at different purposes and figuring out what is required to be successful for these purposes, as opposed to narrow single-task specific effectiveness; but it's essentially a metric on which one quite reasonably can imagine something being equivalent to or better than the average (or x-th percentile) human.
With that definition you can argue humans aren't intelligent at all. The environments we operate in are very limited compared to for example extremophiles. I doubt anyone would argue extremophiles are of genius level intellect compared to humans. That definition can be falsified in dozens of ways. That doesn't mean it can't be useful of course.
IDK, humans can achieve all kinds of goals not only in temperate climates but also underwater, in the Antarctic and even in outer space; and in the environments where extremophiles operate we can do all kinds of interesting things that they can't.
It's not about the capabilities of unassisted body - I can influence stuff in a volcano without sticking my bare hands in it.