
Structural biology in the age of AI
- How accurate is the prediction of protein structure by AlphaFold? Terwilliger et al. address this question with a rigorous assessment of the accuracy of AlphaFold-predicted structures by comparing them with experimentally determined X-ray crystallographic data.
The advent of publicly available databases housing millions of protein structural models predicted by AlphaFold1,2 and related advance-ments3–5 is an important milestone in structural biology. However, questions about AlphaFold’s prediction accuracy and the claim that structural biology is ‘solved’6 remain the subject of ongoing debate. In an article published in this issue of Nature Methods, Terwilliger et al. caution that AlphaFold predictions should be seen as ‘valuable hypotheses’ and not as replacements for experimental structure determination 7. A calibrated answer to the question of prediction accuracy is of interest not just to structural biologists but to all biologists who use artificial intelligence (AI)-based predictions of protein structure in their work in applications ranging from mechanistic cell biology to rational drug design.
Terwilliger and colleagues assessed AlphaFold’s prediction accuracy on 102 chosen X-ray crystallographic datasets. They compared AlphaFold predictions to structures computed from unbiased experimental X-ray diffraction data, noticing a twofold increase in relative prediction error in Cα distances between globular domains. Additionally, 7–20% of AlphaFold-predicted side chains were found to be incompatible with available crystallographic data. Their main finding is that a substantial percentage of high-confidence AlphaFold predictions deviate from experimental maps, both globally in domain orientations and locally in backbone and side-chain conformation (Fig. 1).
The results presented by Terwilliger et al. provide a cautionary note and a valuable reference for anyone that uses AlphaFold predicted models. But there can be limitations to viewing accuracy solely through the lens of the quality of fit of the model to the experimental electron density of a crystallized protein. It is possible that neither the prediction nor the crystal structure reflects the correct conformation of the protein under near-physiological conditions determined using methods such as cryo-electron microscopy (cryo-EM). How-ever, some predicted models may be good enough to help to interpret medium-resolution cryo-EM maps8 and generate testable hypotheses, even if they turn out subsequently not to hold up to a careful comparison with an experimental crystallographic map that captures one of many functionally relevant protein conformations. On the other hand, there are applications for which average root-mean-square deviations as small as ~1 Å between the predicted and actual structures can negatively affect success, as discussed below.
Structure-guided drug design demands precise optimization of chemical matter, necessitating atomic-level accuracy in protein structure. Karelina et al.9 have recently reported that small-molecule docking using AlphaFold2-derived models is less accurate than docking using experimentally derived protein structures for G-protein-coupled receptors. Wong et al.10 have reported that the naive combination of computational docking with AlphaFold2 models is ineffective for the discrimination of active and inactive antibacterial compounds across a subset of the Escherichia coli proteome. High-throughput screening results using predicted structures often perform worse than those using experimentally determined structures in various docking or scoring methodologies11. It would appear that for drug discovery applications, experimentally determined structures using either cryo-EM or X-ray crystallographic methods will remain the gold standard for some time to come. It is worth noting, though, that rapid progress is being made in this field. A recent preprint describing the development of RoseTTAFold All-Atom12 hints at a future in which prediction of the structure of a protein in the context of bound small molecules, co-factors and nucleic acids could be done in a single forward pass of a neural network.
Multi-protein complexes represent another area in which direct comparison with experimental crystallographic maps will not always be possible. Motivated by progress in the prediction of monomeric protein structures, AlphaFold and related methodologies have been adapted to predict the structures of protein complexes4, a more challenging problem. Although the accuracy of these models is generally worse than that of their monomeric counterparts, progress is being made with ‘enhanced sampling’ methods such as AFsample13 and improved curation of multiple sequence alignments via Deep-MSA214. Only a quarter of structures in the Protein Data Bank include protein–protein interfaces, resulting in a smaller dataset for supervised learning methods. Experimental structure determination thus remains vital for accurate protein–protein interface information, including the vast universe of transient protein complexes, but this is an important growth opportunity for prediction methods to close the gap.
The AI-driven revolution in structural biology goes beyond protein structure prediction. Generative AI tools for designing proteins, peptides, and biologics are already starting to transform the field. For example, RFdiffusion, a generative diffusion model based on the RosettaFold structure prediction network coupled with ProteinMPNN, allows de novo backbone and sequence generation for both constrained and unconstrained protein design tasks15. These and many other ongoing developments, such as the sequence first model EvoDiff16 from Microsoft, suggest a bright future for generative AI-driven protein design.
The rapid progress in computing power, machine learning models and data generation suggest that the accuracy of machine learning methods in structural biology will only improve over time. The performance gains of large language models are unprecedented, and as the size of the training data and the number of model parameters increase, so does performance. One needs to look no further than tools such as ChatGPT to get a sense of how rapidly such tools can be integrated into daily use. Will these advances make AI-driven structure prediction the new normal and eliminate the need for experimental structure determination? Not yet. Now that the ability of machine learning methods to leverage curated databases (such as the PDB) and output reasonable predictions has been established, high-throughput experimental approaches are needed to generate the next generation of training sets that will need to be many orders of magnitude larger to keep up with advances in machine learning model architectures. The work presented by Terwilliger and colleagues is a valuable example of how experimental data can be used to measure the accuracy of structural predictions. AI won’t replace experimental structural biology, but integrating AI with high-throughput experimental studies will shape the future of structural biology.
Sriram Subramaniam
Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, British Columbia, Canada. Gandeeva Therapeutics Inc., Burnaby, British Columbia, Canada. e-mail: [email protected]
Published online: xx xx xxxx
References
1. Jumper, J. et al. Nature 596, 583–589 (2021).2. Varadi, M. et al. Nucleic Acids Res. 50, D439–D444 (2022). D1.3. Baek, M. et al. Science 373, 871–876 (2021).4. Evans, R. et al. Preprint at bioRxiv https://doi.org/10.1101/2021.10.04.463034 (2021).5. Lin, Z. et al. Science 379, 1123–1130 (2023).6. Ourmazd, A., Moffat, K. & Lattman, E. E. Nat. Methods 19, 24–26 (2022).7. Terwilliger, T. C. et al. Nat. Methods https://doi.org/10.1038/s41592-023-02087-4 (2023).8. Fontana, P. et al. Science 376, eabm9326 (2022).9. Karelina, M., Noh, J. J. & Dror, R. O. eLife 12, RP89386 (2023).10. Wong, F. et al. Mol. Syst. Biol. 18, e11081 (2022).11. Scardino, V., Di Filippo, J. I. & Cavasotto, C. N. iScience 26, 105920 (2022).12. Krishna, R. et al. Preprint at bioRxiv https://doi.org/10.1101/2023.10.09.561603 (2023).13. Wallner, B. Proteins 91, 1734–1746 (2023).14. Zheng, W., Wuyun, Q., Freddolino, P. L. & Zhang, Y. Proteins 91, 1684–1703 (2023).15. Watson, J. L. et al. Nature 620, 1089–1100 (2023).16. Alamdari, S. et al. Protein generation with evolutionary diffusion: sequence is all you need. bioRxiv Preprint at https://doi.org/10.1101/2023.09.11.556673 (2023).
Acknowledgements
I thank A. Fraser and A. Mukhopadhyay for insightful discussions. S.S. is the Gobind Khorana Chair for Cancer Drug Design at the University of British Columbia, supported by a Canada Excellence Research Chair award and by grants from the VGH Foundation, the Tai Hung Fai Charitable Foundation, and the Alzheimer Society of Canada.
Competing interests
S.S. is the founder and CEO of Gandeeva Therapeutics, a drug discovery company based in Vancouver, Canada.
Original source here.
The post Structural biology in the age of AI appeared first on Life Sciences British Columbia.
