My guess is that filtering is downstream of prediction, because filtering is making a prediction about which input will be useful, then acting on that prediction.
I think the Predictive Processing model would describe it as each layer only signaling “surprisal” to the layer above when the input diverges from the prediction. You could call this filtering if you like, or just an efficient delta representation.