It is well known that machine learning is widely used in bio-metrics and health care systems, however the prediction of molecular properties and drug targets domain is not so developed. Now overall success rate of drug discovery and preclinical studies is around 0.05% - 0.1%. There are emergencies when the medicine should be discovered as fast as possible, for example pandemic situation. Or there are disease for which the drug couldn't be found for years, for example Parkinson’s disease and Alzheimer’s disease. Therefore, there is a need for rapidly and accurately discovering drugs.
There are three main reasons why machine learning can be used for prediction of molecular properties and drug targets:
- There exists a powerful information, databases which can be used for learning. UniProt is supported by many institutions, and is the most informative and comprehensive protein database (Consortium, 2015).
- There are powerful toolkits and web servers which can help to solve problems in drug–target interaction prediction. One of these tools is OpenChem which is a pytorch-based deep learning toolkit for computational chemistry and drug design, which contains Feature2Label, Smiles2Label, Graph2Label, SiameseModel, GenerativeRNN, and MolecularRNN. Users can train predictive models for classification, regression, and multi-task problems, and develop generative models for generating novel molecules with optimised properties.
- Current status and requirements. There are still a lot of things to be discovered, and many of them cannot be done without the help of computers. For example the human genome contains more than 20.000 genes, and approximately 80% of them can encode one or more proteins. Only a small number of proteins have been identified as pharmacologically active and are targets for currently approved drugs.
A recent research propose a pre-trained model ImageMol which is used to predict molecular targets of candidate compounds. The ImageMol framework demonstrates a high performance in evaluation of molecular properties (that is, the drug’s metabolism, brain penetration and toxicity) and molecular target profiles (that is, beta-secretase enzyme and kinases) across 51 benchmark datasets.
ImageMol shows high accuracy in identifying anti-SARS-CoV-2 molecules across 13 high-throughput experimental datasets from the National Center for Advancing Translational Sciences. Via ImageMol, it was identified candidate clinical 3C-like protease inhibitors for potential treatment of COVID-19.
ImageMol model combines an image processing framework with comprehensive molecular chemistry knowledge for extracting fine pixel-level molecular features in a visual computing way. It has several big improvements compared to other applications:
- It utilises molecular images as the feature representation of compounds with high accuracy and low computing cost;
- It used a wide dataset of images for training. A molecular encoder is designed to extract latent features from ~10 million molecular images.
- Five pretraining strategies are utilised to optimise the latent representation of the molecular encoder by considering the chemical knowledge and structural information from molecular images.
- A pretrained molecular encoder is fine-tuned on downstream tasks to further improve model performance.
Niciun comentariu:
Trimiteți un comentariu