RESUMO
BACKGROUND: Studying the co-occurrence network structure of microbial samples is one of the critical approaches to understanding the perplexing and delicate relationship between the microbe, host, and diseases. It is also critical to develop a tool for investigating co-occurrence networks and differential abundance analyses to reveal the disease-related taxa-taxa relationship. In addition, it is also necessary to tighten the co-occurrence network into smaller modules to increase the ability for functional annotation and interpretability of these taxa-taxa relationships. Also, it is critical to retain the phylogenetic relationship among the taxa to identify differential abundance patterns, which can be used to resolve contradicting functions reported by different studies. RESULTS: In this article, we present Correlation and Consensus-based Cross-taxonomy Network Analysis (C3NA), a user-friendly R package for investigating compositional microbial sequencing data to identify and compare co-occurrence patterns across different taxonomic levels. C3NA contains two interactive graphic user interfaces (Shiny applications), one of them dedicated to the comparison between two diagnoses, e.g., disease versus control. We used C3NA to analyze two well-studied diseases, colorectal cancer, and Crohn's disease. We discovered clusters of study and disease-dependent taxa that overlap with known functional taxa studied by other discovery studies and differential abundance analyses. CONCLUSION: C3NA offers a new microbial data analyses pipeline for refined and enriched taxa-taxa co-occurrence network analyses, and the usability was further expanded via the built-in Shiny applications for interactive investigation.
Assuntos
Filogenia , ConsensoRESUMO
The COVID-19 pandemic brought forth an urgent need for widespread genomic surveillance for rapid detection and monitoring of emerging SARS-CoV-2 variants. It necessitated design, development, and deployment of a nationwide infrastructure designed for sequestration, consolidation, and characterization of patient samples that disseminates de-identified information to public authorities in tight turnaround times. Here, we describe our development of such an infrastructure, which sequenced 594,832 high coverage SARS-CoV-2 genomes from isolates we collected in the United States (U.S.) from March 13th 2020 to July 3rd 2023. Our sequencing protocol ('Virseq') utilizes wet and dry lab procedures to generate mutation-resistant sequencing of the entire SARS-CoV-2 genome, capturing all major lineages. We also characterize 379 clinically relevant SARS-CoV-2 multi-strain co-infections and ensure robust detection of emerging lineages via simulation. The modular infrastructure, sequencing, and analysis capabilities we describe support the U.S. Centers for Disease Control and Prevention national surveillance program and serve as a model for rapid response to emerging pandemics at a national scale.
Assuntos
COVID-19 , Genoma Viral , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , SARS-CoV-2/isolamento & purificação , COVID-19/epidemiologia , COVID-19/virologia , Estados Unidos/epidemiologia , MutaçãoRESUMO
The microbiota has proved to be one of the critical factors for many diseases, and researchers have been using microbiome data for disease prediction. However, models trained on one independent microbiome study may not be easily applicable to other independent studies due to the high level of variability in microbiome data. In this study, we developed a method for improving the generalizability and interpretability of machine learning models for predicting three different diseases (colorectal cancer, Crohn's disease, and immunotherapy response) using nine independent microbiome datasets. Our method involves combining a smaller dataset with a larger dataset, and we found that using at least 25% of the target samples in the source data resulted in improved model performance. We determined random forest as our top model and employed feature selection to identify common and important taxa for disease prediction across the different studies. Our results suggest that this leveraging scheme is a promising approach for improving the accuracy and interpretability of machine learning models for predicting diseases based on microbiome data.
RESUMO
Proteins are rapidly and dynamically post-transcriptionally modified as cells respond to changes in their environment. For example, protein phosphorylation is mediated by kinases while dephosphorylation is mediated by phosphatases. Quantifying and predicting interactions between kinases, phosphatases, and target proteins over time will aid the study of signaling cascades under a variety of environmental conditions. Here, we describe methods to statistically analyze label-free phosphoproteomic data and infer posttranscriptional regulatory networks over time. We provide an R-based method that can be used to normalize and analyze label-free phosphoproteomic data using variance stabilizing normalization and a linear mixed model across multiple time points and conditions. We also provide a method to infer regulator-target interactions over time using a discretization scheme followed by dynamic Bayesian modeling computations to validate our conclusions. Overall, this pipeline is designed to perform functional analyses and predictions of phosphoproteomic signaling cascades.
Assuntos
Fosfoproteínas , Proteômica , Teorema de Bayes , Fosfoproteínas/metabolismo , Proteômica/métodos , Transdução de Sinais , Fosforilação , Fosfotransferases/metabolismo , Monoéster Fosfórico Hidrolases/metabolismoRESUMO
The 2022 global Mpox outbreak swiftly introduced unforeseen diversity in the monkeypox virus (MPXV) population, resulting in numerous Clade IIb sublineages. This propagation of new MPXV mutations warrants the thorough re-investigation of previously recommended or validated primers designed to target MPXV genomes. In this study, we explored 18 PCR primer sets and examined their binding specificity against 5210 MPXV genomes, representing all the established MPXV lineages. Our results indicated that only five primer sets resulted in almost all perfect matches against the targeted MPXV lineages, and the remaining primer sets all contained 1-2 mismatches against almost all the MPXV lineages. We further investigated the mismatched primer-genome pairs and discovered that some of the primers overlapped with poorly sequenced and assembled regions of the MPXV genomes, which are consistent across multiple lineages. However, we identified 173 99% genome-wide conserved regions across all 5210 MPXV genomes, representing 30 lineages/clades with at least 80% lineage-specific consensus for future primer development and primer binding evaluation. This exercise is crucial to ensure that the current detection schemes are robust and serve as a framework for primer evaluation in clinical testing development for other infectious diseases.
Assuntos
Bioensaio , Monkeypox virus , Humanos , Consenso , Surtos de Doenças , Monkeypox virus/genética , Reação em Cadeia da PolimeraseRESUMO
Molecular biology aims to understand cellular responses and regulatory dynamics in complex biological systems. However, these studies remain challenging in non-model species due to poor functional annotation of regulatory proteins. To overcome this limitation, we develop a multi-layer neural network that determines protein functionality directly from the protein sequence. We annotate kinases and phosphatases in Glycine max. We use the functional annotations from our neural network, Bayesian inference principles, and high resolution phosphoproteomics to infer phosphorylation signaling cascades in soybean exposed to cold, and identify Glyma.10G173000 (TOI5) and Glyma.19G007300 (TOT3) as key temperature regulators. Importantly, the signaling cascade inference does not rely upon known kinase motifs or interaction data, enabling de novo identification of kinase-substrate interactions. Conclusively, our neural network shows generalization and scalability, as such we extend our predictions to Oryza sativa, Zea mays, Sorghum bicolor, and Triticum aestivum. Taken together, we develop a signaling inference approach for non-model species leveraging our predicted kinases and phosphatases.
Assuntos
Transdução de Sinais , Fatores de Transcrição , Teorema de Bayes , Fatores de Transcrição/metabolismo , FosforilaçãoRESUMO
Microbiome composition profiles generated from 16S rRNA sequencing have been extensively studied for their usefulness in phenotype trait prediction, including for complex diseases such as diabetes and obesity. These microbiome compositions have typically been quantified in the form of Operational Taxonomic Unit (OTU) count matrices. However, alternate approaches such as Amplicon Sequence Variants (ASV) have been used, as well as the direct use of k-mer sequence counts. The overall effect of these different types of predictors when used in concert with various machine learning methods has been difficult to assess, due to varied combinations described in the literature. Here we provide an in-depth investigation of more than 1,000 combinations of these three clustering/counting methods, in combination with varied choices for normalization and filtering, grouping at various taxonomic levels, and the use of more than ten commonly used machine learning methods for phenotype prediction. The use of short k-mers, which have computational advantages and conceptual simplicity, is shown to be effective as a source for microbiome-based prediction. Among machine-learning approaches, tree-based methods show consistent, though modest, advantages in prediction accuracy. We describe the various advantages and disadvantages of combinations in analysis approaches, and provide general observations to serve as a useful guide for future trait-prediction explorations using microbiome data.
RESUMO
BACKGROUND: While the COVID-19 pandemic presents a global challenge, the U.S. response places substantial responsibility for both decision-making and communication on local health authorities, necessitating tools to support decision-making at the community level. OBJECTIVES: We created a Pandemic Vulnerability Index (PVI) to support counties and municipalities by integrating baseline data on relevant community vulnerabilities with dynamic data on local infection rates and interventions. The PVI visually synthesizes county-level vulnerability indicators, enabling their comparison in regional, state, and nationwide contexts. METHODS: We describe the data streams used and how these are combined to calculate the PVI, detail the supporting epidemiological modeling and machine-learning forecasts, and outline the deployment of an interactive web Dashboard. Finally, we describe the practical application of the PVI for real-world decision-making. RESULTS: Considering an outlook horizon from 1 to 28 days, the overall PVI scores are significantly associated with key vulnerability-related outcome metrics of cumulative deaths, population adjusted cumulative deaths, and the proportion of deaths from cases. The modeling results indicate the most significant predictors of case counts are population size, proportion of black residents, and mean PM2.5. The machine learning forecast results were strongly predictive of observed cases and deaths up to 14 days ahead. The modeling reinforces an integrated concept of vulnerability that accounts for both dynamic and static data streams and highlights the drivers of inequities in COVID-19 cases and deaths. These results also indicate that local areas with a highly ranked PVI should take near-term action to mitigate vulnerability. DISCUSSION: The COVID-19 PVI Dashboard monitors multiple data streams to communicate county-level trends and vulnerabilities and facilitates decision-making and communication among government officials, scientists, community leaders, and the public to enable effective and coordinated action to combat the pandemic.
RESUMO
Autophagy, a form of lysosomal degradation capable of eliminating dysfunctional proteins and organelles, is a cellular process associated with homeostasis. Autophagy functions in cell survival by breaking down proteins and organelles and recycling them to meet metabolic demands. However, aberrant up regulation of autophagy can function as an alternative to apoptosis. The duality of autophagy, and its regulation over cell survival/death, intimately links it with human disease. Non-coding RNAs regulate mRNA levels and elicit diverse effects on mammalian protein expression. The most studied non-coding RNAs to-date are microRNAs (miRNA). MicroRNAs function in post-transcriptional regulation, causing profound changes in protein levels, and affect many biological processes and diseases. The role and regulation of autophagy, whether it is beneficial or harmful, is a controversial topic in cardiovascular disease. A number of recent studies have identified miRNAs that target autophagy-related proteins and influence the development, progression, or treatment of cardiovascular disease. Understanding the mechanisms by which these miRNAs work can provide promising insight and potential progress towards the development of therapeutic treatments in cardiovascular disease.
Assuntos
Autofagia/genética , Doenças Cardiovasculares/genética , Doenças Cardiovasculares/patologia , MicroRNAs/genética , Animais , Autofagia/fisiologia , Doenças Cardiovasculares/fisiopatologia , Cardiomiopatias Diabéticas/genética , Cardiomiopatias Diabéticas/patologia , Cardiomiopatias Diabéticas/fisiopatologia , Humanos , MicroRNAs/metabolismo , Modelos Cardiovasculares , Isquemia Miocárdica/genética , Isquemia Miocárdica/patologia , Isquemia Miocárdica/fisiopatologia , Remodelação Ventricular/genética , Remodelação Ventricular/fisiologiaRESUMO
As a naturally occurring inhibitor of mTOR, accumulated evidence has suggested that DEPTOR plays a pivotal role in suppressing the progression of human malignances. However, the function of DEPTOR in the development of esophageal squamous cell carcinoma (ESCC) is still unclear. Here we report that the expression of DEPTOR is significantly reduced in tumor tissues derived from human patients with ESCC, and the downregulation of DEPTOR predicts a poor prognosis of ESCC patients. In addition, we found that the expression of DEPTOR negatively regulates the tumorigenic activities of ESCC cell lines (KYSE150, KYSE510 and KYSE190). Furthermore, ectopic DEPTOR expression caused a significant suppression of the cellular proliferation, migration and invasion of KYSE150 cells, which has the lowest expression level of DEPTOR in the three cell lines. Meanwhile, CRISPR/Cas9 mediated knockout of DEPTOR in KYSE-510 cells significantly promoted cellular proliferation, migration and invasion. In addition, in vivo assays further revealed that tumor growth was significantly inhibited in xenografts with ectopic DEPTOR expression as compared to untreated KYSE150 cells, and was markedly enhanced in DEPTOR knockout KYSE-510 cells. Biochemical studies revealed that overexpression of DEPTOR led to the suppression of AKT/mTOR pathway as evidenced by reduced phosphorylation of AKT, mTOR and downstream SGK1, indicating DEPTOR might control the progression of ESCC through AKT/mTOR signaling pathway. Thus, these findings, for the first time, demonstrated that DEPTOR inhibits the tumorigenesis of ESCC cells and might serve as a potential therapeutic target or prognostic marker for human patients with ESCC.