Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Mais filtros












Base de dados
Intervalo de ano de publicação
1.
J Chem Inf Model ; 64(3): 697-711, 2024 Feb 12.
Artigo em Inglês | MEDLINE | ID: mdl-38300258

RESUMO

This study presents a rigorous framework for investigating molecular out-of-distribution (MOOD) generalization in drug discovery. The concept of MOOD is first clarified through a problem specification that demonstrates how the covariate shifts encountered during real-world deployment can be characterized by the distribution of sample distances to the training set. We find that these shifts can cause performance to drop by up to 60% and uncertainty calibration by up to 40%. This leads us to propose a splitting protocol that aims to close the gap between the deployment and testing. Then, using this protocol, a thorough investigation is conducted to assess the impact of model design, model selection, and data set characteristics on MOOD performance and uncertainty calibration. We find that appropriate representations and algorithms with built-in uncertainty estimation are crucial to improving performance and uncertainty calibration. This study sets itself apart by its exhaustiveness and opens an exciting avenue to benchmark meaningful algorithmic progress in molecular scoring.

2.
Algorithms Mol Biol ; 15: 12, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32508979

RESUMO

The classical gene and species tree reconciliation, used to infer the history of gene gain and loss explaining the evolution of gene families, assumes an independent evolution for each family. While this assumption is reasonable for genes that are far apart in the genome, it is not appropriate for genes grouped into syntenic blocks, which are more plausibly the result of a concerted evolution. Here, we introduce the Super-Reconciliation problem which consists in inferring a history of segmental duplication and loss events (involving a set of neighboring genes) leading to a set of present-day syntenies from a single ancestral one. In other words, we extend the traditional Duplication-Loss reconciliation problem of a single gene tree, to a set of trees, accounting for segmental duplications and losses. Existency of a Super-Reconciliation depends on individual gene tree consistency. In addition, ignoring rearrangements implies that existency also depends on gene order consistency. We first show that the problem of reconstructing a most parsimonious Super-Reconciliation, if any, is NP-hard and give an exact exponential-time algorithm to solve it. Alternatively, we show that accounting for rearrangements in the evolutionary model, but still only minimizing segmental duplication and loss events, leads to an exact polynomial-time algorithm. We finally assess time efficiency of the former exponential time algorithm for the Duplication-Loss model on simulated datasets, and give a proof of concept on the opioid receptor genes.

3.
ACS Omega ; 5(51): 32984-32994, 2020 Dec 29.
Artigo em Inglês | MEDLINE | ID: mdl-33403260

RESUMO

The fundamental goal of generative drug design is to propose optimized molecules that meet predefined activity, selectivity, and pharmacokinetic criteria. Despite recent progress, we argue that existing generative methods are limited in their ability to favorably shift the distributions of molecular properties during optimization. We instead propose a novel Reinforcement Learning framework for molecular design in which an agent learns to directly optimize through a space of synthetically accessible drug-like molecules. This becomes possible by defining transitions in our Markov decision process as chemical reactions and allows us to leverage synthetic routes as an inductive bias. We validate our method by demonstrating that it outperforms existing state-of-the-art approaches in the optimization of pharmacologically relevant objectives, while results on multi-objective optimization tasks suggest increased scalability to realistic pharmaceutical design problems.

4.
Mol Biol Evol ; 36(4): 766-783, 2019 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-30698742

RESUMO

Genetic code deviations involving stop codons have been previously reported in mitochondrial genomes of several green plants (Viridiplantae), most notably chlorophyte algae (Chlorophyta). However, as changes in codon recognition from one amino acid to another are more difficult to infer, such changes might have gone unnoticed in particular lineages with high evolutionary rates that are otherwise prone to codon reassignments. To gain further insight into the evolution of the mitochondrial genetic code in green plants, we have conducted an in-depth study across mtDNAs from 51 green plants (32 chlorophytes and 19 streptophytes). Besides confirming known stop-to-sense reassignments, our study documents the first cases of sense-to-sense codon reassignments in Chlorophyta mtDNAs. In several Sphaeropleales, we report the decoding of AGG codons (normally arginine) as alanine, by tRNA(CCU) of various origins that carry the recognition signature for alanine tRNA synthetase. In Chromochloris, we identify tRNA variants decoding AGG as methionine and the synonymous codon CGG as leucine. Finally, we find strong evidence supporting the decoding of AUA codons (normally isoleucine) as methionine in Pycnococcus. Our results rely on a recently developed conceptual framework (CoreTracker) that predicts codon reassignments based on the disparity between DNA sequence (codons) and the derived protein sequence. These predictions are then validated by an evaluation of tRNA phylogeny, to identify the evolution of new tRNAs via gene duplication and loss, and structural modifications that lead to the assignment of new tRNA identities and a change in the genetic code.


Assuntos
Clorófitas/genética , Evolução Molecular , Código Genético , Genoma Mitocondrial , Filogenia , RNA de Transferência/genética
5.
BMC Genomics ; 19(Suppl 2): 102, 2018 May 09.
Artigo em Inglês | MEDLINE | ID: mdl-29764363

RESUMO

BACKGROUND: Several methods have been developed for the accurate reconstruction of gene trees. Some of them use reconciliation with a species tree to correct, a posteriori, errors in gene trees inferred from multiple sequence alignments. Unfortunately the best fit to sequence information can be lost during this process. RESULTS: We describe GATC, a new algorithm for reconstructing a binary gene tree with branch length. GATC returns optimal solutions according to a measure combining both tree likelihood (according to sequence evolution) and a reconciliation score under the Duplication-Transfer-Loss (DTL) model. It can either be used to construct a gene tree from scratch or to correct trees infered by existing reconstruction method, making it highly flexible to various input data types. The method is based on a genetic algorithm acting on a population of trees at each step. It substantially increases the efficiency of the phylogeny space exploration, reducing the risk of falling into local minima, at a reasonable computational time. We have applied GATC to a dataset of simulated cyanobacterial phylogenies, as well as to an empirical dataset of three reference gene families, and showed that it is able to improve gene tree reconstructions compared with current state-of-the-art algorithms. CONCLUSION: The proposed algorithm is able to accurately reconstruct gene trees and is highly suitable for the construction of reference trees. Our results also highlight the efficiency of multi-objective optimization algorithms for the gene tree reconstruction problem. GATC is available on Github at: https://github.com/UdeM-LBIT/GATC .


Assuntos
Cianobactérias/genética , Genes Bacterianos , Genômica/métodos , Algoritmos , Evolução Molecular , Duplicação Gênica , Internet , Modelos Genéticos , Família Multigênica , Filogenia
6.
Bioinformatics ; 33(21): 3331-3339, 2017 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-28655158

RESUMO

MOTIVATION: Codon reassignments have been reported across all domains of life. With the increasing number of sequenced genomes, the development of systematic approaches for genetic code detection is essential for accurate downstream analyses. Three automated prediction tools exist so far: FACIL, GenDecoder and Bagheera; the last two respectively restricted to metazoan mitochondrial genomes and CUG reassignments in yeast nuclear genomes. These tools can only analyze a single genome at a time and are often not followed by a validation procedure, resulting in a high rate of false positives. RESULTS: We present CoreTracker, a new algorithm for the inference of sense-to-sense codon reassignments. CoreTracker identifies potential codon reassignments in a set of related genomes, then uses statistical evaluations and a random forest classifier to predict those that are the most likely to be correct. Predicted reassignments are then validated through a phylogeny-aware step that evaluates the impact of the new genetic code on the protein alignment. Handling simultaneously a set of genomes in a phylogenetic framework, allows tracing back the evolution of each reassignment, which provides information on its underlying mechanism. Applied to metazoan and yeast genomes, CoreTracker significantly outperforms existing methods on both precision and sensitivity. AVAILABILITY AND IMPLEMENTATION: CoreTracker is written in Python and available at https://github.com/UdeM-LBIT/CoreTracker. CONTACT: mabrouk@iro.umontreal.ca. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Núcleo Celular/genética , Códon , Genoma Mitocondrial , Genômica/métodos , Análise de Sequência de DNA/métodos , Software , Animais , Código Genético , Genoma Fúngico , Filogenia , Leveduras/genética
7.
Nucleic Acids Res ; 45(6): 3017-3030, 2017 04 07.
Artigo em Inglês | MEDLINE | ID: mdl-27932455

RESUMO

Enhancers are intergenic DNA elements that regulate the transcription of target genes in response to signaling pathways by interacting with promoters over large genomic distances. Recent studies have revealed that enhancers are bi-directionally transcribed into enhancer RNAs (eRNAs). Using single-molecule fluorescence in situ hybridization (smFISH), we investigated the eRNA-mediated regulation of transcription during estrogen induction in MCF-7 cells. We demonstrate that eRNAs are localized exclusively in the nucleus and are induced with similar kinetics as target mRNAs. However, eRNAs are mostly nascent at enhancers and their steady-state levels remain lower than those of their cognate mRNAs. Surprisingly, at the single-allele level, eRNAs are rarely co-expressed with their target loci, demonstrating that active gene transcription does not require the continuous transcription of eRNAs or their accumulation at enhancers. When co-expressed, sub-diffraction distance measurements between nascent mRNA and eRNA signals reveal that co-transcription of eRNAs and mRNAs rarely occurs within closed enhancer-promoter loops. Lastly, basal eRNA transcription at enhancers, but not E2-induced transcription, is maintained upon depletion of MLL1 and ERα, suggesting some degree of chromatin accessibility prior to signal-dependent activation of transcription. Together, our findings suggest that eRNA accumulation at enhancer-promoter loops is not required to sustain target gene transcription.


Assuntos
Elementos Facilitadores Genéticos , Regiões Promotoras Genéticas , RNA não Traduzido/biossíntese , Transcrição Gênica , Estradiol/farmacologia , Receptor alfa de Estrogênio/fisiologia , Fatores de Transcrição Forkhead/biossíntese , Fatores de Transcrição Forkhead/genética , Histona-Lisina N-Metiltransferase/fisiologia , Humanos , Células MCF-7 , Modelos Moleculares , Proteína de Leucina Linfoide-Mieloide/fisiologia , RNA Mensageiro/biossíntese , RNA não Traduzido/fisiologia , Receptores Purinérgicos P2Y2/biossíntese , Receptores Purinérgicos P2Y2/genética , Análise de Célula Única
8.
Sci Rep ; 6: 33782, 2016 Sep 22.
Artigo em Inglês | MEDLINE | ID: mdl-27653669

RESUMO

Alkaloid accumulation in plants is activated in response to stress, is limited in distribution and specific alkaloid repertoires are variable across taxa. Rauvolfioideae (Apocynaceae, Gentianales) represents a major center of structural expansion in the monoterpenoid indole alkaloids (MIAs) yielding thousands of unique molecules including highly valuable chemotherapeutics. The paucity of genome-level data for Apocynaceae precludes a deeper understanding of MIA pathway evolution hindering the elucidation of remaining pathway enzymes and the improvement of MIA availability in planta or in vitro. We sequenced the nuclear genome of Rhazya stricta (Apocynaceae, Rauvolfioideae) and present this high quality assembly in comparison with that of coffee (Rubiaceae, Coffea canephora, Gentianales) and others to investigate the evolution of genome-scale features. The annotated Rhazya genome was used to develop the community resource, RhaCyc, a metabolic pathway database. Gene family trees were constructed to identify homologs of MIA pathway genes and to examine their evolutionary history. We found that, unlike Coffea, the Rhazya lineage has experienced many structural rearrangements. Gene tree analyses suggest recent, lineage-specific expansion and diversification among homologs encoding MIA pathway genes in Gentianales and provide candidate sequences with the potential to close gaps in characterized pathways and support prospecting for new MIA production avenues.

9.
PLoS One ; 11(8): e0159559, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27513924

RESUMO

MOTIVATIONS: Gene trees inferred solely from multiple alignments of homologous sequences often contain weakly supported and uncertain branches. Information for their full resolution may lie in the dependency between gene families and their genomic context. Integrative methods, using species tree information in addition to sequence information, often rely on a computationally intensive tree space search which forecloses an application to large genomic databases. RESULTS: We propose a new method, called ProfileNJ, that takes a gene tree with statistical supports on its branches, and corrects its weakly supported parts by using a combination of information from a species tree and a distance matrix. Its low running time enabled us to use it on the whole Ensembl Compara database, for which we propose an alternative, arguably more plausible set of gene trees. This allowed us to perform a genome-wide analysis of duplication and loss patterns on the history of 63 eukaryote species, and predict ancestral gene content and order for all ancestors along the phylogeny. AVAILABILITY: A web interface called RefineTree, including ProfileNJ as well as a other gene tree correction methods, which we also test on the Ensembl gene families, is available at: http://www-ens.iro.umontreal.ca/~adbit/polytomysolver.html. The code of ProfileNJ as well as the set of gene trees corrected by ProfileNJ from Ensembl Compara version 73 families are also made available.


Assuntos
Algoritmos , Biologia Computacional/métodos , Evolução Molecular , Genes/genética , Genoma/genética , Filogenia , Animais , Humanos , Análise de Sequência de DNA
10.
BMC Struct Biol ; 15: 20, 2015 Oct 09.
Artigo em Inglês | MEDLINE | ID: mdl-26449279

RESUMO

BACKGROUND: RNA ligases 2 are scarce and scattered across the tree of life. Two members of this family are well studied: the mitochondrial RNA editing ligase from the parasitic trypanosomes (Kinetoplastea), a promising drug target, and bacteriophage T4 RNA ligase 2, a workhorse in molecular biology. Here we report the identification of a divergent RNA ligase 2 (DpRNL) from Diplonema papillatum (Diplonemea), a member of the kinetoplastids' sister group. METHODS: We identified DpRNL with methods based on sensitive hidden Markov Model. Then, using homology modeling and molecular dynamics simulations, we established a three dimensional structure model of DpRNL complexed with ATP and Mg2+. RESULTS: The 3D model of Diplonema was compared with available crystal structures from Trypanosoma brucei, bacteriophage T4, and two archaeans. Interaction of DpRNL with ATP is predicted to involve double π-stacking, which has not been reported before in RNA ligases. This particular contact would shift the orientation of ATP and have considerable consequences on the interaction network of amino acids in the catalytic pocket. We postulate that certain canonical amino acids assume different functional roles in DpRNL compared to structurally homologous residues in other RNA ligases 2, a reassignment indicative of constructive neutral evolution. Finally, both structure comparison and phylogenetic analysis show that DpRNL is not specifically related to RNA ligases from trypanosomes, suggesting a unique adaptation of the latter for RNA editing, after the split of diplonemids and kinetoplastids. CONCLUSION: Homology modeling and molecular dynamics simulations strongly suggest that DpRNL is an RNA ligase 2. The predicted innovative reshaping of DpRNL's catalytic pocket is worthwhile to be tested experimentally.


Assuntos
Euglenozoários/genética , Proteínas de Protozoários/química , Proteínas de Protozoários/metabolismo , RNA Ligase (ATP)/química , RNA Ligase (ATP)/metabolismo , Trifosfato de Adenosina/metabolismo , Domínio Catalítico , Euglenozoários/química , Euglenozoários/enzimologia , Magnésio/metabolismo , Cadeias de Markov , Modelos Moleculares , Simulação de Acoplamento Molecular , Simulação de Dinâmica Molecular , Filogenia , Proteínas de Protozoários/genética , RNA Ligase (ATP)/genética , Homologia Estrutural de Proteína
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...