Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Bioinformatics ; 38(Suppl 1): i84-i91, 2022 06 24.
Artigo em Inglês | MEDLINE | ID: mdl-35758812

RESUMO

MOTIVATION: Molecular carcinogenicity is a preventable cause of cancer, but systematically identifying carcinogenic compounds, which involves performing experiments on animal models, is expensive, time consuming and low throughput. As a result, carcinogenicity information is limited and building data-driven models with good prediction accuracy remains a major challenge. RESULTS: In this work, we propose CONCERTO, a deep learning model that uses a graph transformer in conjunction with a molecular fingerprint representation for carcinogenicity prediction from molecular structure. Special efforts have been made to overcome the data size constraint, such as multi-round pre-training on related but lower quality mutagenicity data, and transfer learning from a large self-supervised model. Extensive experiments demonstrate that our model performs well and can generalize to external validation sets. CONCERTO could be useful for guiding future carcinogenicity experiments and provide insight into the molecular basis of carcinogenicity. AVAILABILITY AND IMPLEMENTATION: The code and data underlying this article are available on github at https://github.com/bowang-lab/CONCERTO.


Assuntos
Carcinógenos , Redes Neurais de Computação , Animais , Carcinógenos/toxicidade , Previsões , Mutagênicos
2.
Anal Chem ; 93(33): 11415-11423, 2021 08 24.
Artigo em Inglês | MEDLINE | ID: mdl-34375078

RESUMO

Targeted, untargeted, and data-independent acquisition (DIA) metabolomics workflows are often hampered by ambiguous identification based on either MS1 information alone or relatively few MS2 fragment ions. While DIA methods have been popularized in proteomics, it is less clear whether they are suitable for metabolomics workflows due to their large precursor isolation windows and complex coisolation patterns. Here, we quantitatively investigate the conditions necessary for unique metabolite detection in complex backgrounds using precursor and fragment ion mass-to-charge (m/z) separation, comparing three benchmarked mass spectrometry (MS) methods [MS1, MRM (multiple reaction monitoring), and DIA]. Our simulations show that DIA outperformed MS1-only and MRM-based methods with regards to specificity by factors of ∼2.8-fold and ∼1.8-fold, respectively. Additionally, we show that our results are not dependent on the number of transitions used or the complexity of the background matrix. Finally, we show that collision energy is an important factor in unambiguous detection and that a single collision energy setting per compound cannot achieve optimal pairwise differentiation of compounds. Our analysis demonstrates the power of using both high-resolution precursor and high-resolution fragment ion m/z for unambiguous compound detection. This work also establishes DIA as an emerging MS acquisition method with high selectivity for metabolomics, outperforming both data-dependent acquisition (DDA) and MRM with regards to unique compound identification potential.


Assuntos
Metabolômica , Proteômica , Íons , Espectrometria de Massas , Fluxo de Trabalho
3.
Proteomics ; 20(21-22): e1900352, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-32061181

RESUMO

Liquid Chromatography coupled to Tandem Mass Spectrometry (LC-MS/MS) based methods are currently the top choice for high-throughput, quantitative measurements of the proteome. While traditional proteomics LC-MS/MS methods can suffer from issues such as low reproducibility and quantitative accuracy due to its stochastic nature, recent improvements in acquisition protocols have resulted in methods that can overcome these challenges. Data-independent acquisition (DIA) is a novel mass spectrometric method that does so by using a deterministic acquisition strategy. These new approaches will allow researchers to apply MS on more complex samples, however, existing heuristic and expert-knowledge based methods will struggle with keeping pace of the increasing complexity of the resulting data. Deep learning (DL) based methods have been shown to be more adept at handling large amounts of complex data than traditional methods in many other fields, such as computer vision and natural language processing. Proteomics is also entering a phase where the size and complexity of the data will require us to look towards scalable and data-driven DL pipelines.


Assuntos
Proteômica , Espectrometria de Massas em Tandem , Cromatografia Líquida , Aprendizado de Máquina , Proteoma , Reprodutibilidade dos Testes
4.
Patterns (N Y) ; 3(10): 100588, 2022 Oct 14.
Artigo em Inglês | MEDLINE | ID: mdl-36277819

RESUMO

Artificial intelligence (AI) and machine learning (ML) are expanding in popularity for broad applications to challenging tasks in chemistry and materials science. Examples include the prediction of properties, the discovery of new reaction pathways, or the design of new molecules. The machine needs to read and write fluently in a chemical language for each of these tasks. Strings are a common tool to represent molecular graphs, and the most popular molecular string representation, Smiles, has powered cheminformatics since the late 1980s. However, in the context of AI and ML in chemistry, Smiles has several shortcomings-most pertinently, most combinations of symbols lead to invalid results with no valid chemical interpretation. To overcome this issue, a new language for molecules was introduced in 2020 that guarantees 100% robustness: SELF-referencing embedded string (Selfies). Selfies has since simplified and enabled numerous new applications in chemistry. In this perspective, we look to the future and discuss molecular string representations, along with their respective opportunities and challenges. We propose 16 concrete future projects for robust molecular representations. These involve the extension toward new chemical domains, exciting questions at the interface of AI and robust languages, and interpretability for both humans and machines. We hope that these proposals will inspire several follow-up works exploiting the full potential of molecular string representations for the future of AI in chemistry and materials science.

5.
Pac Symp Biocomput ; 25: 274-285, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-31797603

RESUMO

The activity of mutational processes differs across the genome, and is influenced by chromatin state and spatial genome organization. At the scale of one megabase-pair (Mb), regional mutation density correlate strongly with chromatin features and mutation density at this scale can be used to accurately identify cancer type. Here, we explore the relationship between genomic region and mutation rate by developing an information theory driven, dynamic programming algorithm for dividing the genome into regions with differing relative mutation rates between cancer types. Our algorithm improves mutual information when compared to the naive approach, effectively reducing the average number of mutations required to identify cancer type. Our approach provides an efficient method for associating regional mutation density with mutation labels, and has future applications in exploring the role of somatic mutations in a number of diseases.


Assuntos
Taxa de Mutação , Neoplasias , Biologia Computacional , Genômica , Humanos , Mutação , Neoplasias/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA