Machine learning techniques are providing new insights into the structure of small molecules. In addition to medicine, it can be applied in the future, for example, to drug control and doping. Open access is available to all researchers in the world.
In the human body, there are thousands of different molecules called metabolites that carry energy and information through the cells. Metabolites are studied, for example, in blood samples, but it is difficult to rely on their differences with each other due to their small size. For example, the diameter of glucose is about one nanometer, while the diameter of a human hair is about 100,000 nanometers.
Juho Rousu, professor of information technology at Aalto University said, “At best, current methods can detect about 40% of the metabolites in a sample. Rousu’s research group has been developing world-class computational detection methods and machine learning techniques for small molecules for many years.
The team has developed a new machine learning method (new insights into the structure of small molecules) that is highly accurate to identify metabolites. The research has been published in the prestigious journal Nature Machine Intelligence.
The specific identification of metabolites can help researchers and doctors to understand, for example, the effect of diet, exercise and alcohol consumption on health, as well as metabolic diseases. Rousu’s team model helps to understand the intracellular processes that affect the onset of disease and to identify prohibited substances in, for example, drugs or doping samples.
“Our research provides researchers in the field with one of the best methods in the world for detecting small molecules. The open method can help identify metabolic disorders that cause many diseases, such as heart disease and diabetes begin in adulthood,” says Rousu.
Pass the block
Metabolites are characterized by sensitive detection tools. The most common method of identification is based on the analysis of the mass and elimination time of metabolites.
Small and large differences in laboratory measurement methods have hindered the use of large measurement data in machine learning models. Eric Bach, a doctoral researcher in Rousu’s research group, has found a way around that.
“The elimination time of small particles varies from one laboratory to another, but the process of their elimination is constant in every laboratory. We have shown that this characteristic can be used in teaching of machine learning models,” Bach explains.
For the first time, the research team was thus able to combine measurement data from many laboratories, which made it possible to use unprecedented data.
“We trained our machine learning model with all available metabolite data. The end result is a machine learning system based on open source code and one of the most accurate in the world for identifying metabolites,” said Bach.
The model was so accurate that Rousu’s team was able to separate the stereochemical, 3D, and structure of the metabolites, which was impossible before.
“Discovery in stereochemical separation is a game-changer for scientists who have been focusing on 2D data for years. It makes the whole industry move forward,” says Emma Schymanski , assistant professor at the University of Luxembourg says.
“The application area of the system is not limited to medicine. It can also be used to detect small harmful substances in nature, or it can be used to find new cells and plant cells for the production of drugs,” said Schymanski says.