Biological structures predicted by AI with high accuracy

Researchers at Stanford University have developed machine-learning methods which they say accurately predict the 3D shapes of drug targets and other important biological molecules, even when only limited data is available.

Determining the 3D shapes of biological molecules is one of the hardest challenges in modern biology and medical discovery. Companies and research institutions often spend millions of dollars to determine a molecular structure and even such massive efforts are frequently unsuccessful.

Through novel machine-learning techniques, Stanford PhD students Stephan Eismann and Raphael Townshend, under the guidance of associate professor of computer scientist Ron Dror, have developed an approach that overcomes this problem by predicting accurate structures computationally.

The team said, most notably, their approach succeeds even when learning from only a few known structures, making it applicable to the molecules whose structures are most difficult to determine experimentally. “Structural biology, which is the study of the shapes of molecules, has this mantra that structure determines function,” said Townshend.

The algorithm designed by the researchers predicts accurate molecular structures and, in doing so, can allow scientists to explain how different molecules work, with applications ranging from fundamental biological research to informing drug design practices.

“Proteins are molecular machines that perform many functions. To execute their functions, proteins often bind to other proteins,” said Eismann. “If you know that a pair of proteins is implicated in a disease and you know how they interact in 3D, you can try to target this interaction specifically with a drug.”

A new AI algorithm can pick out an RNA molecule’s 3D shape from incorrect shapes. Computational prediction of the structures into which RNAs fold is important – and particularly difficult – because so few structures are known.

Instead of specifying what makes a structural prediction more or less accurate, the researchers let the algorithm discover these molecular features for itself. They did this because they found that the conventional technique of providing such knowledge can sway an algorithm in favour of certain features, preventing it from finding other informative features.

“The problem with these hand-crafted features in an algorithm is that the algorithm becomes biased towards what the person who picks these features thinks is important and you might miss some information that you would need to do better,” said Eismann.

“The network learned to find fundamental concepts that are key to molecular structure formation, but without explicitly being told to,” Townshend added. “The exciting aspect is that the algorithm has clearly recovered things we knew were important, but it has also recovered characteristics we didn’t know about before.”

Having shown success with proteins, the researchers next applied their algorithm to another class of important biological molecules, RNAs. They tested their algorithm in a series of ‘RNA Puzzles’ from a long-standing competition in their field and, in every case, the tool outperformed all the other puzzle participants and did so without being designed specifically for RNA structures.

The researchers said they are “excited” to see where else their approach can be applied, having already had success with protein complexes and RNA molecules.

“Most of the dramatic recent advances in machine learning have required a tremendous amount of data for training,” Dror explained. “The fact that this method succeeds given very little training data suggests related methods could address unsolved problems in many fields where data is scarce.”

Specifically for structural biology, the team stated that they’re “only just scratching the surface” in terms of scientific progress to be made.

“Once you have this fundamental technology, then you’re increasing your level of understanding another step and can start asking the next set of questions,” said Townshend. “For example, you can start designing new molecules and medicines with this kind of information, which is an area that people are very excited about.”