It probably is from the perspective of an information theorist. Did you read any...

tysam_and · on Aug 16, 2023

Aha! Yes, yes, yes! Good question. A senior scientist handed me a book form of this paper (plus the introduction by Weaver, which made things much easier for little, younger me!) From there, it's just been the process of getting to know the field better and making the connections over time. It's been about 7-8 years for me, so I've been deep into neural networks for a little while.

Not everyone uses information theory to inform research into neural networks, which is a darn shame, but many of the recent advances I feel can be trivially explained with info theory, or at least derived from it.

I can give a few basic examples about where it interacts with deep learning. One obvious one is the cross-entropy function. This of course is trying perform ERM use the MLE over the dataset where the neural network is the mapping function f(x) -> y. For a loss function, the cross-entropy is ideal in this case, as we want to choose a coding scheme that minimizes our regret on new data based on the statistics of some data we have (thus the 'E' in Empirical Risk Minimization, and the 'R', and the, oh gosh darn it, you get what I mean here). This is analogous to a communications process where the neural network f is emitting a target token with a likelihood determined by f(x). In this case, the softmax serves as that emission probability, and instead of optimizing a discrete system, we are optimizing a continuous relaxation of the probabilities of that discrete system in the limit.

You can go a lot deeper into the rabbit hole and rephrase basically everything in ML in that light. Doing so has rather significantly helped me push the bounds of performance in my area of research, at least. There is a lot to learn, however.

So I'm sure there are decent articles out there, but second-hand stuff will likely only be useful for getting a general gist of a problem, at the cost of also inheriting the common public model for a way of thinking about things. If you go off of the beaten path and try to translate the whole training process (don't forget the temporal aspect!) into this framework, I think you'll find it very challenging and interesting. Very fun! <3 :) Hope that answers the question, if not, then feel free to let me know! I think there are some learning materials that talk about the basics of info theory w.r.t. ML, but the rabbit hole goes....very deep indeedy.