This article is based on research findings that are yet to be peer-reviewed. Results are therefore regarded as preliminary and should be interpreted as such. For further information, please contact the cited source.
These systems are faced with a major challenge: These models often suggest new molecular structures that are difficult or impossible to develop in a laboratory. If a chemist cannot actually make the molecules, its disease-fighting properties can''t be tested.
A new approach from MIT researchers has placed constraints on a machine learning approach, which allows for molecular structures to be synthesized. The method ensures that molecules are composed of materials that can be purchased, and that chemical reactions between those materials follow the rules of chemistry.
Compared to other methods, their model proposed molecular structures that were as high and sometimes better using popular evaluations. While their system also takes less than one second to propose a synthetic pathway, while other approaches that separately propose molecules and then evaluate their synthesizability may take several minutes. In a search space that can include billions of potential molecules, time savings add up.
This process reorganizes how we demand these models to generate new molecular structures. Despite the fact that many of these models think about building new molecular structures by atom or bond by bond instead, we are building new molecules block by building block and reaction by reaction, according to Connor Coley, the author of the paper.
Wenhao Gao, the first author of the paper, and Rocio Mercado, a postdoc, are among the authors present at the International Conference on Learning Representations.
The model simulates the process of synthesizing a molecule to ensure it can be produced in order to create a molecular structure.
The model is given a set of viable building blocks, which are chemicals that can be purchased, and a list of valid chemical reactions to work with. These chemical reaction templates are hand-made by experts. Controlling these inputs by only permitting certain chemicals or specific reactions enables the researchers to limit how large the search space can be for a new molecule.
To make a tree, the model uses these inputs by selecting building blocks and linking them through chemical reactions, one at a time. At each step, the molecule becomes more complex as additional chemicals and reactions are added.
The chemical and reaction structure in the molecule is shown in the first volume.
Instead of directly designing the product molecule itself, we create an action sequence to obtain the molecule. This allows us to maintain the structure''s superiority, according to Gao.
The researchers extract a complete molecular structure, as well as a set of building blocks and chemical reactions, and the model learns to create a tree that synthesizes the substance. After seeing hundreds of thousands of examples, the model learns to identify these synthetic pathways on its own.
The trained model may be used for optimization. A prospective research identifies certain characteristics they want to achieve in a final molecule, based on certain building blocks and chemical reaction templates, and the model suggests a synthesizable molecular structure.
What was surprising about how large a fraction of molecules you can actually reproduce with a very modest template set. According to Mercado, you do not need that large amount of space to construct the model.
They tested the model by assessing how well it could produce synthesizable molecules. It was capable to reproduce 51 percent of these molecules and took less than a second to recreate each one.
Because the model isn''t looking through all of the options for each step in the tree, it has a specific set of chemicals and reactions to follow, according to Gao.
When they used their model to propose molecules with specific properties, their method suggested higher quality molecular structures that had larger binding affinities than those used in other methods. This means molecules would be better able to attach to a protein and prevent an activity, like stopping a virus from replicating.
The authors suggest that while designing a molecule that may dock with SARS-Cov-2, their model suggested several molecular structures that may be better able to bind with viral proteins than existing inhibitors. These are, however, only computational predictions.
According to Gao, there are so many illnesses to deal with. I hope that our approach will speed this process, ensuring that we won''t have to screen billions of molecules every time for a disease target. Instead, we can simply specify the properties we want, and it will speed the process of finding that drug candidate.
Mercado believes that their approach might help existing drug discovery pipelines. If a pharmaceutical company has identified a particular substance that has desired properties, but cannot be produced, then it may employ this model to propose synthetic molecules that look similar to it.
The team intends to continue using chemical reaction templates to improve models'' performance. With additions, they can conduct additional tests on certain disease targets and, eventually, apply the model to the drug discovery process.
The Office of Naval Research and the Machine Learning for Pharmaceutical Discovery and Synthesis Consortium have both supported this research in part.