Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 5.067
Filtrar
Mais filtros

Intervalo de ano de publicação
1.
Mol Cell ; 82(20): 3840-3855.e8, 2022 10 20.
Artigo em Inglês | MEDLINE | ID: mdl-36270248

RESUMO

The use of alternative promoters, splicing, and cleavage and polyadenylation (APA) generates mRNA isoforms that expand the diversity and complexity of the transcriptome. Here, we uncovered thousands of previously undescribed 5' uncapped and polyadenylated transcripts (5' UPTs). We show that these transcripts resist exonucleases due to a highly structured RNA and N6-methyladenosine modification at their 5' termini. 5' UPTs appear downstream of APA sites within their host genes and are induced upon APA activation. Strong enrichment in polysomal RNA fractions indicates 5' UPT translational potential. Indeed, APA promotes downstream translation initiation, non-canonical protein output, and consistent changes to peptide presentation at the cell surface. Lastly, we demonstrate the biological importance of 5' UPTs using Bcl2, a prominent anti-apoptotic gene whose entire coding sequence is a 5' UPT generated from 5' UTR-embedded APA sites. Thus, APA is not only accountable for terminating transcripts, but also for generating downstream uncapped RNAs with translation potential and biological impact.


Assuntos
Poliadenilação , Isoformas de RNA , Isoformas de RNA/genética , Regiões 5' não Traduzidas , Regiões 3' não Traduzidas/genética , Proteínas Proto-Oncogênicas c-bcl-2/genética , Exonucleases/genética
2.
Am J Hum Genet ; 111(5): 966-978, 2024 05 02.
Artigo em Inglês | MEDLINE | ID: mdl-38701746

RESUMO

Replicability is the cornerstone of modern scientific research. Reliable identifications of genotype-phenotype associations that are significant in multiple genome-wide association studies (GWASs) provide stronger evidence for the findings. Current replicability analysis relies on the independence assumption among single-nucleotide polymorphisms (SNPs) and ignores the linkage disequilibrium (LD) structure. We show that such a strategy may produce either overly liberal or overly conservative results in practice. We develop an efficient method, ReAD, to detect replicable SNPs associated with the phenotype from two GWASs accounting for the LD structure. The local dependence structure of SNPs across two heterogeneous studies is captured by a four-state hidden Markov model (HMM) built on two sequences of p values. By incorporating information from adjacent locations via the HMM, our approach provides more accurate SNP significance rankings. ReAD is scalable, platform independent, and more powerful than existing replicability analysis methods with effective false discovery rate control. Through analysis of datasets from two asthma GWASs and two ulcerative colitis GWASs, we show that ReAD can identify replicable genetic loci that existing methods might otherwise miss.


Assuntos
Asma , Estudo de Associação Genômica Ampla , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único , Estudo de Associação Genômica Ampla/métodos , Humanos , Asma/genética , Cadeias de Markov , Colite Ulcerativa/genética , Reprodutibilidade dos Testes , Fenótipo , Genótipo
3.
Proc Natl Acad Sci U S A ; 121(17): e2318333121, 2024 Apr 23.
Artigo em Inglês | MEDLINE | ID: mdl-38625949

RESUMO

Many nonequilibrium, active processes are observed at a coarse-grained level, where different microscopic configurations are projected onto the same observable state. Such "lumped" observables display memory, and in many cases, the irreversible character of the underlying microscopic dynamics becomes blurred, e.g., when the projection hides dissipative cycles. As a result, the observations appear less irreversible, and it is very challenging to infer the degree of broken time-reversal symmetry. Here we show, contrary to intuition, that by ignoring parts of the already coarse-grained state space we may-via a process called milestoning-improve entropy-production estimates. We present diverse examples where milestoning systematically renders observations "closer to underlying microscopic dynamics" and thereby improves thermodynamic inference from lumped data assuming a given range of memory, and we hypothesize that this effect is quite general. Moreover, whereas the correct general physical definition of time reversal in the presence of memory remains unknown, we here show by means of physically relevant examples that at least for semi-Markov processes of first and second order, waiting-time contributions arising from adopting a naive Markovian definition of time reversal generally must be discarded.

4.
Proc Natl Acad Sci U S A ; 121(7): e2318731121, 2024 Feb 13.
Artigo em Inglês | MEDLINE | ID: mdl-38315841

RESUMO

Capturing rare yet pivotal events poses a significant challenge for molecular simulations. Path sampling provides a unique approach to tackle this issue without altering the potential energy landscape or dynamics, enabling recovery of both thermodynamic and kinetic information. However, despite its exponential acceleration compared to standard molecular dynamics, generating numerous trajectories can still require a long time. By harnessing our recent algorithmic innovations-particularly subtrajectory moves with high acceptance, coupled with asynchronous replica exchange featuring infinite swaps-we establish a highly parallelizable and rapidly converging path sampling protocol, compatible with diverse high-performance computing architectures. We demonstrate our approach on the liquid-vapor phase transition in superheated water, the unfolding of the chignolin protein, and water dissociation. The latter, performed at the ab initio level, achieves comparable statistical accuracy within days, in contrast to a previous study requiring over a year.

5.
Proc Natl Acad Sci U S A ; 121(31): e2401162121, 2024 Jul 30.
Artigo em Inglês | MEDLINE | ID: mdl-39042671

RESUMO

Nonequilibrium states in soft condensed matter require a systematic approach to characterize and model materials, enhancing predictability and applications. Among the tools, X-ray photon correlation spectroscopy (XPCS) provides exceptional temporal and spatial resolution to extract dynamic insight into the properties of the material. However, existing models might overlook intricate details. We introduce an approach for extracting the transport coefficient, denoted as [Formula: see text], from the XPCS studies. This coefficient is a fundamental parameter in nonequilibrium statistical mechanics and is crucial for characterizing transport processes within a system. Our method unifies the Green-Kubo formulas associated with various transport coefficients, including gradient flows, particle-particle interactions, friction matrices, and continuous noise. We achieve this by integrating the collective influence of random and systematic forces acting on the particles within the framework of a Markov chain. We initially validated this method using molecular dynamics simulations of a system subjected to changes in temperatures over time. Subsequently, we conducted further verification using experimental systems reported in the literature and known for their complex nonequilibrium characteristics. The results, including the derived [Formula: see text] and other relevant physical parameters, align with the previous observations and reveal detailed dynamical information in nonequilibrium states. This approach represents an advancement in XPCS analysis, addressing the growing demand to extract intricate nonequilibrium dynamics. Further, the methods presented are agnostic to the nature of the material system and can be potentially expanded to hard condensed matter systems.

6.
Proc Natl Acad Sci U S A ; 121(6): e2313360121, 2024 02 06.
Artigo em Inglês | MEDLINE | ID: mdl-38294935

RESUMO

A central challenge in the study of intrinsically disordered proteins is the characterization of the mechanisms by which they bind their physiological interaction partners. Here, we utilize a deep learning-based Markov state modeling approach to characterize the folding-upon-binding pathways observed in a long timescale molecular dynamics simulation of a disordered region of the measles virus nucleoprotein NTAIL reversibly binding the X domain of the measles virus phosphoprotein complex. We find that folding-upon-binding predominantly occurs via two distinct encounter complexes that are differentiated by the binding orientation, helical content, and conformational heterogeneity of NTAIL. We observe that folding-upon-binding predominantly proceeds through a multi-step induced fit mechanism with several intermediates and do not find evidence for the existence of canonical conformational selection pathways. We observe four kinetically separated native-like bound states that interconvert on timescales of eighty to five hundred nanoseconds. These bound states share a core set of native intermolecular contacts and stable NTAIL helices and are differentiated by a sequential formation of native and non-native contacts and additional helical turns. Our analyses provide an atomic resolution structural description of intermediate states in a folding-upon-binding pathway and elucidate the nature of the kinetic barriers between metastable states in a dynamic and heterogenous, or "fuzzy", protein complex.


Assuntos
Proteínas Intrinsicamente Desordenadas , Proteínas Intrinsicamente Desordenadas/química , Dobramento de Proteína , Ligação Proteica , Simulação de Dinâmica Molecular
7.
Proc Natl Acad Sci U S A ; 121(3): e2318989121, 2024 Jan 16.
Artigo em Inglês | MEDLINE | ID: mdl-38215186

RESUMO

The continuous-time Markov chain (CTMC) is the mathematical workhorse of evolutionary biology. Learning CTMC model parameters using modern, gradient-based methods requires the derivative of the matrix exponential evaluated at the CTMC's infinitesimal generator (rate) matrix. Motivated by the derivative's extreme computational complexity as a function of state space cardinality, recent work demonstrates the surprising effectiveness of a naive, first-order approximation for a host of problems in computational biology. In response to this empirical success, we obtain rigorous deterministic and probabilistic bounds for the error accrued by the naive approximation and establish a "blessing of dimensionality" result that is universal for a large class of rate matrices with random entries. Finally, we apply the first-order approximation within surrogate-trajectory Hamiltonian Monte Carlo for the analysis of the early spread of Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) across 44 geographic regions that comprise a state space of unprecedented dimensionality for unstructured (flexible) CTMC models within evolutionary biology.


Assuntos
COVID-19 , SARS-CoV-2 , Humanos , Algoritmos , COVID-19/epidemiologia , Cadeias de Markov
8.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-39003531

RESUMO

Profile hidden Markov models (pHMMs) are able to achieve high sensitivity in remote homology search, making them popular choices for detecting novel or highly diverged viruses in metagenomic data. However, many existing pHMM databases have different design focuses, making it difficult for users to decide the proper one to use. In this review, we provide a thorough evaluation and comparison for multiple commonly used profile HMM databases for viral sequence discovery in metagenomic data. We characterized the databases by comparing their sizes, their taxonomic coverage, and the properties of their models using quantitative metrics. Subsequently, we assessed their performance in virus identification across multiple application scenarios, utilizing both simulated and real metagenomic data. We aim to offer researchers a thorough and critical assessment of the strengths and limitations of different databases. Furthermore, based on the experimental results obtained from the simulated and real metagenomic data, we provided practical suggestions for users to optimize their use of pHMM databases, thus enhancing the quality and reliability of their findings in the field of viral metagenomics.


Assuntos
Cadeias de Markov , Metagenômica , Vírus , Metagenômica/métodos , Vírus/genética , Vírus/classificação , Bases de Dados Genéticas , Humanos , Biologia Computacional/métodos , Algoritmos
9.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38340093

RESUMO

Shotgun sequencing is a high-throughput method used to detect copy number variants (CNVs). Although there are numerous CNV detection tools based on shotgun sequencing, their quality varies significantly, leading to performance discrepancies. Therefore, we conducted a comprehensive analysis of next-generation sequencing-based CNV detection tools over the past decade. Our findings revealed that the majority of mainstream tools employ similar detection rationale: calculates the so-called read depth signal from aligned sequencing reads and then segments the signal by utilizing either circular binary segmentation (CBS) or hidden Markov model (HMM). Hence, we compared the performance of those two core segmentation algorithms in CNV detection, considering varying sequencing depths, segment lengths and complex types of CNVs. To ensure a fair comparison, we designed a parametrical model using mainstream statistical distributions, which allows for pre-excluding bias correction such as guanine-cytosine (GC) content during the preprocessing step. The results indicate the following key points: (1) Under ideal conditions, CBS demonstrates high precision, while HMM exhibits a high recall rate. (2) For practical conditions, HMM is advantageous at lower sequencing depths, while CBS is more competitive in detecting small variant segments compared to HMM. (3) In case involving complex CNVs resembling real sequencing, HMM demonstrates more robustness compared with CBS. (4) When facing large-scale sequencing data, HMM costs less time compared with the CBS, while their memory usage is approximately equal. This can provide an important guidance and reference for researchers to develop new tools for CNV detection.


Assuntos
Algoritmos , Variações do Número de Cópias de DNA , Sequenciamento de Nucleotídeos em Larga Escala/métodos
10.
Brief Bioinform ; 25(5)2024 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-39288230

RESUMO

Compared with analyzing omics data from a single platform, an integrative analysis of multi-omics data provides a more comprehensive understanding of the regulatory relationships among biological features associated with complex diseases. However, most existing frameworks for integrative analysis overlook two crucial aspects of multi-omics data. Firstly, they neglect the known dependencies among biological features that exist in highly credible biological databases. Secondly, most existing integrative frameworks just simply remove the subjects without full omics data to handle block missingness, resulting in decreasing statistical power. To overcome these issues, we propose a network-based integrative Bayesian framework for biomarker selection and disease outcome prediction based on multi-omics data. Our framework utilizes Dirac spike-and-slab variable selection prior to identifying a small subset of biomarkers. The incorporation of gene pathway information improves the interpretability of feature selection. Furthermore, with the strategy in the FBM (stand for "full Bayesian model with missingness") model where missing omics data are augmented via a mechanistic model, our framework handles block missingness in multi-omics data via a data augmentation approach. The real application illustrates that our approach, which incorporates existing gene pathway information and includes subjects without DNA methylation data, results in more interpretable feature selection results and more accurate predictions.


Assuntos
Teorema de Bayes , Biomarcadores , Humanos , Biomarcadores/metabolismo , Biologia Computacional/métodos , Genômica/métodos , Redes Reguladoras de Genes , Algoritmos , Multiômica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA