Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
bioRxiv ; 2023 Sep 17.
Artículo en Inglés | MEDLINE | ID: mdl-37745387

RESUMEN

Recent advancements in Protein Language Models (pLMs) have enabled high-throughput analysis of proteins through primary sequence alone. At the same time, newfound evidence illustrates that codon usage bias is remarkably predictive and can even change the final structure of a protein. Here, we explore these findings by extending the traditional vocabulary of pLMs from amino acids to codons to encapsulate more information inside CoDing Sequences (CDS). We build upon traditional transfer learning techniques with a novel pipeline of token embedding matrix seeding, masked language modeling, and student-teacher knowledge distillation, called MELD. This transformed the pretrained ProtBERT into cdsBERT; a pLM with a codon vocabulary trained on a massive corpus of CDS. Interestingly, cdsBERT variants produced a highly biochemically relevant latent space, outperforming their amino acid-based counterparts on enzyme commission number prediction. Further analysis revealed that synonymous codon token embeddings moved distinctly in the embedding space, showcasing unique additions of information across broad phylogeny inside these traditionally "silent" mutations. This embedding movement correlated significantly with average usage bias across phylogeny. Future fine-tuned organism-specific codon pLMs may potentially have a more significant increase in codon usage fidelity. This work enables an exciting potential in using the codon vocabulary to improve current state-of-the-art structure and function prediction that necessitates the creation of a codon pLM foundation model alongside the addition of high-quality CDS to large-scale protein databases.

2.
Sci Rep ; 13(1): 2088, 2023 02 06.
Artículo en Inglés | MEDLINE | ID: mdl-36747072

RESUMEN

In this study, we investigate how an organism's codon usage bias can serve as a predictor and classifier of various genomic and evolutionary traits across the domains of life. We perform secondary analysis of existing genetic datasets to build several AI/machine learning models. When trained on codon usage patterns of nearly 13,000 organisms, our models accurately predict the organelle of origin and taxonomic identity of nucleotide samples. We extend our analysis to identify the most influential codons for phylogenetic prediction with a custom feature ranking ensemble. Our results suggest that the genetic code can be utilized to train accurate classifiers of taxonomic and phylogenetic features. We then apply this classification framework to open reading frame (ORF) detection. Our statistical model assesses all possible ORFs in a nucleotide sample and rejects or deems them plausible based on the codon usage distribution. Our dataset and analyses are made publicly available on GitHub and the UCI ML Repository to facilitate open-source reproducibility and community engagement.


Asunto(s)
Genómica , Aprendizaje Automático , Filogenia , Reproducibilidad de los Resultados , Codón/genética , Nucleótidos
3.
J Pers Med ; 11(12)2021 Dec 04.
Artículo en Inglés | MEDLINE | ID: mdl-34945766

RESUMEN

Heart diseases are some of the most common and pressing threats to human health worldwide. The American Heart Association and the National Institute of Health jointly work to annually update data on cardiac diseases. In 2018, 126.9 million Americans were reported as having some form of cardiac disorder, with an estimated direct and indirect total cost of USD 363.4 billion. This necessitates developing therapeutic interventions for heart diseases to improve human life expectancy and economic relief. In this review, we look into gamma-secretase as a potential therapeutic target for cardiac diseases. Gamma-secretase, an aspartyl protease enzyme, is responsible for the cleavage and activation of a number of substrates that are relevant to normal cardiac development and function as found in mutation studies. Some of these substrates are involved in downstream signaling processes and crosstalk with pathways relevant to heart diseases. Most of the substrates and signaling events we explored were found to be potentially beneficial to maintain cardiac function in diseased conditions. This review presents an updated overview of the current knowledge on gamma-secretase processing of cardiac-relevant substrates and seeks to understand if the modulation of gamma-secretase activity would be beneficial to combat cardiac diseases.

4.
J Pers Med ; 11(12)2021 Dec 16.
Artículo en Inglés | MEDLINE | ID: mdl-34945845

RESUMEN

Heat shock protein 90 (Hsp90) is a molecular chaperone that interacts with up to 10% of the proteome. The extensive involvement in protein folding and regulation of protein stability within cells makes Hsp90 an attractive therapeutic target to correct multiple dysfunctions. Many of the clients of Hsp90 are found in pathways known to be pathogenic in the heart, ranging from transforming growth factor ß (TGF-ß) and mitogen activated kinase (MAPK) signaling to tumor necrosis factor α (TNFα), Gs and Gq g-protein coupled receptor (GPCR) and calcium (Ca2+) signaling. These pathways can therefore be targeted through modulation of Hsp90 activity. The activity of Hsp90 can be targeted through small-molecule inhibition. Small-molecule inhibitors of Hsp90 have been found to be cardiotoxic in some cases however. In this regard, specific targeting of Hsp90 by modulation of post-translational modifications (PTMs) emerges as an attractive strategy. In this review, we aim to address how Hsp90 functions, where Hsp90 interacts within pathological pathways, and current knowledge of small molecules and PTMs known to modulate Hsp90 activity and their potential as therapeutics in cardiac diseases.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...