Biologists got their first glimpses of a different set of images as people around the world marveled at the James Webb Space Telescope's most detailed images in July.
AlphaFold, an artificial intelligence program, has predicted more than 200 million proteins in 3-D, according to Demis Hassabis, the founder and CEO of DeepMind, the London-based company that developed the system, which was recently identified by decades of research using electron microscopes and other methods.
DeepMind collaborated with the European Bioinformatics Institute of the European Molecular Biology Laboratory to make the AI's first appearance in 2021.
Hassabis said the sweeping new release in July expanded the library to "every organism on the planet who has had its genome sequenced." "You can look up a three-dimensional structure of a protein almost as easily as doing a key word Google search."
Researchers have used some of the predictions from 2021 to develop new malaria vaccines, improve our understanding of Parkinson's disease, research how to protect honeybee health, and more. Chagas disease and leishmaniasis, which can be devastating or fatal if left untreated
Over a decade of slow-going experiments have revealed the structure of over 194,000 proteins, all housed in the Protein Data Bank. In 2021, the AlphaFold project released predicted structures for about 1 million proteins, including almost all known human proteins. This year, the AlphaFold database exploded with predicted structures for more than 200 million proteins.
Many scientists were ecstatic at the news of the vast dataset. Others are concerned that scientists may interpret the predicted structures as proteins' true shapes. There are still things that AlphaFold cannot and was never designed to do. That need to be addressed before the protein cosmos fully becomes a reality.
Julie Forman-Kay, a protein biophysicist at the Hospital for Sick Children and the University of Toronto, describes the new catalog as a "significant benefit." In many cases, AlphaFold and RoseTTAFold, another AI researcher, predict shapes that match with protein profiles from experiments. However, "it's not that way across the board."
Some proteins are more accurate than others, and therefore, some scientists may be left thinking they know how a protein functions, when in fact, they don't. Forman-Kay says that now, people aren't required to perform experimental structure determination. This is false.
Thale cress (Arabidopsis thaliana)
This plant protein is a kinase, which transfers phosphates onto other proteins, potentially affecting their functions.
Proteins start out as long amino acids and develop into a variety of 3-D shapes. Some of the curlicues resemble the tight corkscrew rings of a 1980s perm or the pleats of an accordion. Others are confused for a child's spiraling scribbles.
The structure of a protein is more than just aesthetics; it can determine how the protein functions. For example, enzymes require a pocket where they can store small molecules and perform chemical reactions. And proteins that work in a protein complex, two or more proteins working like components of a machine, need the right shapes to form with their partners.
Scientists may be able to recognize how a mutation alters a protein's shape to cause disease, or it might help scientists develop superior vaccines and medicines.
Scientists have bombarded protein crystals with X-rays, flash frozen cells, and examined them under high-powered electron microscopes, all of which have long gone without a hitch. So far, it has been a slow process, according to Tamir Gonen, a membrane biophysicist and Howard Hughes Medical Institute investigator at UCLA's David Geffen School of Medicine.
Pseudomonas bacteria (Pseudomonas syringae)
Causes frost damage on plants by triggering ice crystals at relatively high temperatures. It might be used for seeding clouds and food preservation.
Such meticulous and expensive experimental work has revealed the 3-dimensional structures of more than 194,000 proteins, supported by a consortium of research organizations. But structural biologists' capacity to keep up is far outweighed by the speed at which geneticists are deciphering the DNA instructions for making proteins, according to Harvard Medical School systems biologist Nazim Bouatta.
Many scientists dreamed of acquiring computer programs that could examine a gene's DNA and simulate its folding into a 3-dimensional structure.
Scientists have made considerable advances in this area over the years. However, “until two years ago, we were really a long way from anything like a good solution,” says John Moult, a computational biologist at the University of Maryland’s Rockville campus.
Moult is one of the organizers of a competition called the CASP: Critical Assessment of Protein Structure Prediction. Organizers give competitors a set of proteins for their algorithms to fold and compare their predictions against experimentally determined structures. Most AIs failed to grasp the actual shapes of the proteins.
AlphaFold emerged in a major way in 2020, accurately predicting the structures of 90 percent of test proteins, including two-thirds of the time, competing with experimental methods.
Since its inception in 1994, the CASP competition had largely consisted of deciphering the structure of single proteins. "Suddenly, that was basically done," Moult says.
Hassabis said in the news briefing that more than half a million scientists have accessed the database since AlphaFold's 2021 release. Nuclear pores are critical pathways that allow molecules to enter and exit cell nuclei. Without them, cells would not function properly. Each pore is enormous, relatively speaking, composed of about 1,000 proteins. Researchers previously managed to place roughly 30 percent of the pieces in the puzzle.
AlphaFold assisted scientists in reconstructing experimental data, completing about 30 percent of the nuclear pore complex.
Researchers tested algorithms to understand how the components fit together and completed the puzzle in June 10 Science.
Now that AlphaFold has almost solved how to fold single proteins, CASP organizers are inviting teams to work on the next challenges: Predict the structures of RNA molecules and model how proteins interacted with other molecules.
Deep-learning AI techniques "appear promising but have not yet delivered the goods," according to Moult.
AlphaFold's capability to accurately model protein interactions is huge, since most proteins do not operate in isolation. They work with other proteins or other cells. But its ability to accurately predict how two proteins might behave when the proteins interact are "nowhere near," according to Forman-Kay, the University of Toronto protein biophysicist.
The artificial intelligence has been trained to fold proteins by studying the contours of known structures. And many fewer multiprotein complexes have been solved experimentally.
The Malaria parasite (Plasmodium falciparum)
It is being investigated as a potential vaccination for the parasite's male and female gametes.
Forman-Kay examines proteins that refuse to be limited to a single shape. They are typically as floppy as wet noodles (SN: 2/9/13, p. 26). Some will form defined shapes when they interact with other proteins or molecules to perform various functions.
AlphaFold's predicted shapes reach a high confidence level for about 60 percent of wiggly proteins that Forman-Kay and colleagues examined, according to a preliminary analysis posted in February at bioRxiv.org. The shapeshifters are often depicted as long corkscrews called alpha helices.
The AlphaFold algorithm for three disordered proteins was compared to experimental data. The protein called alpha-synuclein's structure resembled that of the protein when it interacted with lipids, according to the team.
AlphaFold predicted a mishmash of the protein's two forms when working with two different partners for another protein, called eukaryotic translation initiation factor 4E-binding protein 2. Researchers believe this Frankenstein structure, which does not exist in actual organisms, might mislead researchers about how the protein works.
Human beings are diverse.
AlphaFold's high confidence (blue) in its predictions of the lower coiled area and the ribbon structure just below it, the two would never appear at the same time.
AlphaFold may be a bit too rigid in its predictions. Jane Dyson, a structural biologist at the Scripps Research Institute in La Jolla, Calif., believes that a static structure doesn't tell you everything about how a protein works. Even single proteins with generally well-defined structures aren't frozen in space. Enzymes, for example, undergo small shape changes when shepherding chemical reactions.
According to Dyson, if you ask AlphaFold to predict the structure of an enzyme, it will display a fixed image that may closely resemble what scientists have determined by X-ray crystallography. “But [it] will not reveal you any of the subtleties that are changing as the different partners” interact with the enzyme.
"The dynamics are what Mr. AlphaFold cannot give you," Dyson adds.
Biologists have a head start on figuring out how a drug might interact with a protein thanks to computer renderings. However, scientists should keep in mind one thing: “These are experimental structures,” says Gonen, at UCLA.
He uses AlphaFold's protein predictions to aid in making sense of experimental data, but he fears that researchers will accept the AI's predictions as gospel. If that happens, "the risk is that it will become harder and harder to justify why you need to solve an experimental structure," according to him.
Honeybee (Apis mellifera)
Helps protect against bacterial illness.
Bouatta of Harvard Medical School is more optimistic. He believes that researchers probably wouldn't need to invest experimental money in the kinds of proteins that AlphaFold predicts, which should help structural biologists to prioritize their efforts.
Bouatta believes that researchers should invest their money in these difficult proteins. “We may use them to train another AI system,” and thus make even better predictions.
He and his colleagues have already reverseengineered AlphaFold to create an open-source version called OpenFold, which researchers may use to solve other problems, such as those nasty but essential protein complexes.
The huge amount of DNA generated by the Human Genome Project has allowed for a wide range of biological investigations and opened up new areas of investigation (SN: 2/12/22, p. 22). Bouatta believes that having structural information on 200 million proteins might be equally revolutionary.
According to him, “we don’t even know what kinds of questions we might be asking in the future,” thanks to AlphaFold and its AI kin.