> it appears to have solved the inverse folding problem for many proteins
While this is true, DeepFold’s algorithm is only applicable to extant proteins having enough evolutionary information, which is arguably a small fraction of the theoretical sequence space. Fortunately, machine-learning approaches for de novo protein design are being actively developed as we speak.
> While this is true, DeepFold’s algorithm is only applicable to extant proteins having enough evolutionary information,
Didn't the challenge include predicting previously unseen/unsolved proteins? Based on that, I would wager that what DeepFold learned is an evolutionary "language" that maps out a large useful subset of the entire possibility space. Natural evolution tends build upon previous successes so it seems probable evolution has mapped out a fairly useful "language" of patterns for useful protein shapes. Especially given only a portion of the shape is critical to function for many proteins.
But agreed, de novo protein design based on DeepFold's successes probably will outperform a naive approach by orders of magnitude. But why wait if even a naive approach is already orders of magnitude better than current methods?
Either way, I'm excited to see who comes up with the first custom protein(s) to catalyze industrial processes! Get some yeast/bacteria to mass produce a protein based alternative to platinum catalyzer's for fuel cells using an active site with organically available (and cheap) metals. Design half of it to stick to a polymer so it coats nicely. Bam, no more trying to get some weird polymer/perovskite with the right properties. Not sure mRNA is needed at that point versus CRISPR, but maybe it's more effective.
> Didn't the challenge include predicting previously unseen/unsolved proteins?
All protein sequences in the competition lacked a published solved structure, but they had enough effective (remotely) homologous sequences to predict coevolution-derived interresidue distances and contacts. All those sequences were already present in databases and therefore within the known protein universe.
> Based on that, I would wager that what DeepFold learned is an evolutionary "language" that maps out a large useful subset of the entire possibility space.
This might be true, depending on how “foldable” the unexplored space is. A reverse DeepFold would give some clues on that.
> Natural evolution tends build upon previous successes so it seems probable evolution has mapped out a fairly useful "language" of patterns for useful protein shapes
This is already known. The structural and functional diversity we see in existing proteins originated from a relatively limited repertoire of conserved protein domain folds, or even subdomain-sized fragments in some cases.
> Either way, I'm excited to see who comes up with the first custom protein(s) to catalyze industrial processes!
Same here! Especially since the de novo design of enzymes has been progressing slowly but steadily.
> but they had enough effective (remotely) homologous sequences to predict coevolution-derived interresidue distances and contacts.
Ah, I see what you're saying a bit better. Makes more sense to how the the possible "design" space for novel proteins could be limited. Designing novel active sites could be especially tricky, more than it'd seem at first glance. A folded structure that'd effectively transfer electrons from one target species to another in a catalyzer protein could likely be outside the explored "vocabulary" of extant proteins as it'd require specialized pathways and precise positioning. Chlorophyll is pretty unchanged in evolution as I understand it.Thanks, interesting background! I'll keep an eye out for the de novo design algorithms.
While this is true, DeepFold’s algorithm is only applicable to extant proteins having enough evolutionary information, which is arguably a small fraction of the theoretical sequence space. Fortunately, machine-learning approaches for de novo protein design are being actively developed as we speak.