RESUMO
The COVID-19 pandemic brought forth an urgent need for widespread genomic surveillance for rapid detection and monitoring of emerging SARS-CoV-2 variants. It necessitated design, development, and deployment of a nationwide infrastructure designed for sequestration, consolidation, and characterization of patient samples that disseminates de-identified information to public authorities in tight turnaround times. Here, we describe our development of such an infrastructure, which sequenced 594,832 high coverage SARS-CoV-2 genomes from isolates we collected in the United States (U.S.) from March 13th 2020 to July 3rd 2023. Our sequencing protocol ('Virseq') utilizes wet and dry lab procedures to generate mutation-resistant sequencing of the entire SARS-CoV-2 genome, capturing all major lineages. We also characterize 379 clinically relevant SARS-CoV-2 multi-strain co-infections and ensure robust detection of emerging lineages via simulation. The modular infrastructure, sequencing, and analysis capabilities we describe support the U.S. Centers for Disease Control and Prevention national surveillance program and serve as a model for rapid response to emerging pandemics at a national scale.
Assuntos
COVID-19 , Genoma Viral , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , SARS-CoV-2/isolamento & purificação , COVID-19/epidemiologia , COVID-19/virologia , Estados Unidos/epidemiologia , MutaçãoRESUMO
BACKGROUND: University spring break carries a two-pronged SARS-CoV-2 variant transmission risk. Circulating variants from universities can spread to spring break destinations, and variants from spring break destinations can spread to universities and surrounding communities. Therefore, it is critical to implement SARS-CoV-2 variant surveillance and testing strategies to limit community spread before and after spring break to mitigate virus transmission and facilitate universities safely returning to in-person teaching. METHODS: We examined the SARS-CoV-2 positivity rate and changes in variant lineages before and after the university spring break for two consecutive years. 155 samples were sequenced across four time periods: pre- and post-spring break 2021 and pre- and post-spring break 2022; following whole genome sequencing, samples were assigned clades. The clades were then paired with positivity and testing data from over 50,000 samples. RESULTS: In 2021, the number of variants in the observed population increased from four to nine over spring break, with variants of concern being responsible for most of the cases; Alpha percent composition increased from 22.2% to 56.4%. In 2022, the number of clades in the population increased only from two to three, all of which were Omicron or a sub-lineage of Omicron. However, phylogenetic analysis showed the emergence of distantly related sub-lineages. 2022 saw a greater increase in positivity than 2021, which coincided with a milder mitigation strategy. Analysis of social media data provided insight into student travel destinations and how those travel events may have impacted spread. CONCLUSIONS: We show the role that repetitive testing can play in transmission mitigation, reducing community spread, and maintaining in-person education. We identified that distantly related lineages were brought to the area after spring break travel regardless of the presence of a dominant variant of concern.
Assuntos
COVID-19 , SARS-CoV-2 , Viagem , Humanos , COVID-19/transmissão , COVID-19/prevenção & controle , COVID-19/epidemiologia , COVID-19/virologia , SARS-CoV-2/genética , SARS-CoV-2/isolamento & purificação , Universidades , Sequenciamento Completo do Genoma , Filogenia , Estações do AnoRESUMO
BACKGROUND: Multiple tools including Accreditation Council for Graduate Medical Education (ACGME) standardized milestones can be utilized to assess trainee and residency program performance. However, little is known regarding the objective validation of these tools in predicting written board passage. METHODS: In this retrospective study, data was gathered on n = 45 Wayne State University Obstetrics and Gynecology program graduates over the five-year period ending July 2018. United States Medical Licensing Examination (USMLE) scores, Council on Resident Education in Obstetrics and Gynecology (CREOG) in-training scores and ACGME milestones were used to predict American Board of Obstetrics and Gynecology (ABOG) board passage success on first attempt. Significance was set at p < 0.05. RESULTS: Written board passage was associated with average CREOGs (p = 0.01) and milestones (p = 0.008) while USMLE1 was not significantly associated (p = 0.055). USMLE1 <217 (Positive predictive value (PPV) = 96%). CREOGs <197 (PPV = 100%) and milestones <3.25 (PPV = 100%), particularly practice-based learning and systems-based practice milestones were most strongly correlated with board failure. Using a combination of these two milestones, it is possible to correctly predict board passage using our model (PPV = 86%). DISCUSSION: This study is the first validating the utility of milestones in a surgical specialty by demonstrating their ability to predict board passage. Residents with CREOGs or milestones below thresholds are at risk for board failure and may warrant early intervention.
RESUMO
AIM: The study aims to determine resident applicant metrics most predictive of academic and clinical performance as measured by the Council of Resident Education in Obstetrics and Gynecology (CREOG) examination scores and Accreditation Council for Graduate Medical Education (ACGME) clinical performance (Milestones) in the aftermath of United States Medical Licensing Examination Scores (USMLE) Step 1 becoming a pass/fail examination. METHODS: In this retrospective study, electronic and paper documents for Wayne State University Obstetrics and Gynecology residents matriculated over a 5-year period ending July 2018 were collected. USMLE scores, clerkship grade, and wording on the letters of recommendation as well as Medical Student Performance Evaluation (MSPE) were extracted from the Electronic Residency Application Service (ERAS) and scored numerically. Semiannual Milestone evaluations and yearly CREOG scores were used as a marker of resident performance. Statistical analysis on residents (n = 75) was performed using R and SPSS and significance was set at P < .05. RESULTS: Mean USMLE score correlated with CREOG performance and, of all 3 Steps, Step 1 had the tightest association. MSPE and class percentile also correlated with CREOGs. Clerkship grade and recommendation letters had no correlation with resident performance. Of all metrics provided by ERAS, none taken alone, were as useful as Step 1 scores at predicting performance in residency. Regression modeling demonstrated that the combination of Step 2 scores with MSPE wording restored the predictive ability lost by Step 1. CONCLUSIONS: The change of USMLE Step 1 to pass/fail may alter resident selection strategies. Other objective markers are needed in order to evaluate an applicant's future performance in residency.
RESUMO
Long-Acting Reversible Contraception (LARCs) has the potential to decrease unintended pregnancies but only if women can easily access a requested method. Retrospective electronic chart review identified women desiring LARC placement over a one-year period ending 31 December 2016. Most of the 311 insertions were for family planning, with 220 new insertions and 60 replacements. Delays occurred in 38% (n = 118) of patients, averaged 5 ± 5 weeks, and 47% received interval contraception. Reasons included absence of qualified provider (n = 44, 37%), pending cultures (n = 31, 26%), and Mirena availability. Teenage LARC use favored Nexplanon whereas older women preferred Mirena (p < 0.01). Of the 11% choosing early LARC removal, a significant number were African Americans (p = 0.040) or teenagers (p = 0.048). Retention time varied by device type; most patients switched to other contraceptives. No patients experienced IUD expulsion. Understanding barriers, attempting to remedy them, and addressing the side effects associated with LARC use is of importance in this inner-city patient population in the United States.
RESUMO
Single-cell RNA-seq (scRNASeq) has become a powerful technique for measuring the transcriptome of individual cells. Unlike the bulk measurements that average the gene expressions over the individual cells, gene measurements at individual cells can be used to study several different tissues and organs at different stages. Identifying the cell types present in the sample from the single cell transcriptome data is a common goal in many single-cell experiments. Several methods have been developed to do this. However, correctly identifying the true cell types remains a challenge. We present a framework that addresses this problem. Our hypothesis is that the meaningful characteristics of the data will remain despite small perturbations of data. We validate the performance of the proposed method on eight publicly available scRNA-seq datasets with known cell types as well as five simulation datasets with different degrees of the cluster separability. We compare the proposed method with five other existing methods: RaceID, SNN-Cliq, SINCERA, SEURAT, and SC3. The results show that the proposed method performs better than the existing methods.
Assuntos
Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de RNA , Análise de Célula Única , Transcriptoma , Análise por Conglomerados , Simulação por ComputadorRESUMO
In spite of the efforts in developing and maintaining accurate variant databases, a large number of disease-associated variants are still hidden in the biomedical literature. Curation of the biomedical literature in an effort to extract this information is a challenging task due to: (i) the complexity of natural language processing, (ii) inconsistent use of standard recommendations for variant description, and (iii) the lack of clarity and consistency in describing the variant-genotype-phenotype associations in the biomedical literature. In this article, we employ text mining and word cloud analysis techniques to address these challenges. The proposed framework extracts the variant-gene-disease associations from the full-length biomedical literature and designs an evidence-based variant-driven gene panel for a given condition. We validate the identified genes by showing their diagnostic abilities to predict the patients' clinical outcome on several independent validation cohorts. As representative examples, we present our results for acute myeloid leukemia (AML), breast cancer and prostate cancer. We compare these panels with other variant-driven gene panels obtained from Clinvar, Mastermind and others from literature, as well as with a panel identified with a classical differentially expressed genes (DEGs) approach. The results show that the panels obtained by the proposed framework yield better results than the other gene panels currently available in the literature.
Assuntos
Neoplasias da Mama/genética , Mineração de Dados , Bases de Dados Genéticas , Leucemia Mieloide Aguda/genética , Processamento de Linguagem Natural , Neoplasias da Próstata/genética , Feminino , Estudos de Associação Genética , Humanos , MasculinoRESUMO
With the explosion of high-throughput data, effective integrative analyses are needed to decipher the knowledge accumulated in biological databases. Existing meta-analysis approaches in systems biology often focus on hypothesis testing and neglect real expression changes, i.e. effect sizes, across independent studies. In addition, most integrative tools completely ignore the topological order of gene regulatory networks that hold key characteristics in understanding biological processes. Here we introduce a novel meta-analysis framework, Network-Based Integrative Analysis (NBIA), that transforms the challenging meta-analysis problem into a set of standard pathway analysis problems that have been solved efficiently. NBIA utilizes techniques from classical and modern meta-analysis, as well as a network-based analysis, in order to identify patterns of genes and networks that are consistently impacted across multiple studies. We assess the performance of NBIA by comparing it with nine meta-analysis approaches: Impact Analysis, GSA, and GSEA combined with classical meta-analysis methods (Fisher's and the additive method), plus the three MetaPath approaches that employ multiple datasets. The 10 approaches have been tested on 1,737 samples from 27 expression datasets related to Alzheimer's disease, acute myeloid leukemia (AML), and influenza. For all of the three diseases, NBIA consistently identifies biological pathways relevant to the underlying diseases while the other 9 methods fail to capture the key phenomena. The identified AML signature is also validated on a completely independent cohort of 167 AML patients. In this independent cohort, the proposed signature identifies two groups of patients that have significantly different survival profiles (Cox p-value 2 × 10-6). The NBIA framework will be included in the next release of BLMA Bioconductor package (http://bioconductor.org/packages/release/bioc/html/BLMA.html).
RESUMO
MOTIVATION: Recent advances in biomedical research have made massive amount of transcriptomic data available in public repositories from different sources. Due to the heterogeneity present in the individual experiments, identifying reproducible biomarkers for a given disease from multiple independent studies has become a major challenge. The widely used meta-analysis approaches, such as Fisher's method, Stouffer's method, minP and maxP, have at least two major limitations: (i) they are sensitive to outliers, and (ii) they perform only one statistical test for each individual study, and hence do not fully utilize the potential sample size to gain statistical power. RESULTS: Here, we propose a gene-level meta-analysis framework that overcomes these limitations and identifies a gene signature that is reliable and reproducible across multiple independent studies of a given disease. The approach provides a comprehensive global signature that can be used to understand the underlying biological phenomena, and a smaller test signature that can be used to classify future samples of a given disease. We demonstrate the utility of the framework by constructing disease signatures for influenza and Alzheimer's disease using nine datasets including 1108 individuals. These signatures are then validated on 12 independent datasets including 912 individuals. The results indicate that the proposed approach performs better than the majority of the existing meta-analysis approaches in terms of both sensitivity as well as specificity. The proposed signatures could be further used in diagnosis, prognosis and identification of therapeutic targets. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Perfilação da Expressão Gênica , Transcriptoma , Biomarcadores , Humanos , Tamanho da Amostra , Sensibilidade e EspecificidadeRESUMO
Following publication of the original paper [1], the authors reported the following update to the competing interests declaration.
RESUMO
BACKGROUND: Many high-throughput experiments compare two phenotypes such as disease vs. healthy, with the goal of understanding the underlying biological phenomena characterizing the given phenotype. Because of the importance of this type of analysis, more than 70 pathway analysis methods have been proposed so far. These can be categorized into two main categories: non-topology-based (non-TB) and topology-based (TB). Although some review papers discuss this topic from different aspects, there is no systematic, large-scale assessment of such methods. Furthermore, the majority of the pathway analysis approaches rely on the assumption of uniformity of p values under the null hypothesis, which is often not true. RESULTS: This article presents the most comprehensive comparative study on pathway analysis methods available to date. We compare the actual performance of 13 widely used pathway analysis methods in over 1085 analyses. These comparisons were performed using 2601 samples from 75 human disease data sets and 121 samples from 11 knockout mouse data sets. In addition, we investigate the extent to which each method is biased under the null hypothesis. Together, these data and results constitute a reliable benchmark against which future pathway analysis methods could and should be tested. CONCLUSION: Overall, the result shows that no method is perfect. In general, TB methods appear to perform better than non-TB methods. This is somewhat expected since the TB methods take into consideration the structure of the pathway which is meant to describe the underlying phenomena. We also discover that most, if not all, listed approaches are biased and can produce skewed results under the null.
Assuntos
Genômica/métodos , Animais , Humanos , Estatística como AssuntoRESUMO
Although massive amounts of condition-specific molecular profiles are being accumulated in public repositories every day, meaningful interpretation of these data remains a major challenge. In an effort to identify the biomarkers that describe the key biological phenomena for a given condition, several approaches have been developed over the past few years. However, the majority of these approaches either (i) do not consider the known intermolecular interactions, or (ii) do not integrate molecular data of multiple types (e.g., genomics, transcriptomics, proteomics, epigenomics, etc.), and thus potentially fail to capture the true biological changes responsible for complex diseases (e.g., cancer). In addition, these approaches often ignore the heterogeneity and study bias present in independent molecular cohorts. In this manuscript, we propose a novel multi-cohort and multi-omics meta-analysis framework that overcomes all three limitations mentioned above in order to identify robust molecular subnetworks that capture the key dynamic nature of a given biological condition. Our framework integrates multiple independent gene expression studies, unmatched DNA methylation studies, and protein-protein interactions to identify methylation-driven subnetworks. We demonstrate the proposed framework by constructing subnetworks related to two complex diseases: glioblastoma and low-grade gliomas. We validate the identified subnetworks by showing their ability to predict patients' clinical outcome on multiple independent validation cohorts.
RESUMO
A recent focus of computational biology has been to integrate the complementary information available in molecular profiles as well as in multiple network databases in order to identify connected regions that show significant changes under different conditions. This allows for capturing dynamic and condition-specific mechanisms of the underlying phenomena and disease stages. Here we review 22 such integrative approaches for active module identification published over the last decade. This article only focuses on tools that are currently available for use and are well-maintained. We compare these methods focusing on their primary features, integrative abilities, network structures, mathematical models, and implementations. We also provide real-world scenarios in which these methods have been successfully applied, as well as highlight outstanding challenges in the field that remain to be addressed. The main objective of this review is to help potential users and researchers to choose the best method that is suitable for their data and analysis purpose.
RESUMO
Motivation: Identification of novel therapeutic effects for existing US Food and Drug Administration (FDA)-approved drugs, drug repurposing, is an approach aimed to dramatically shorten the drug discovery process, which is costly, slow and risky. Several computational approaches use transcriptional data to find potential repurposing candidates. The main hypothesis of such approaches is that if gene expression signature of a particular drug is opposite to the gene expression signature of a disease, that drug may have a potential therapeutic effect on the disease. However, this may not be optimal since it fails to consider the different roles of genes and their dependencies at the system level. Results: We propose a systems biology approach to discover novel therapeutic roles for established drugs that addresses some of the issues in the current approaches. To do so, we use publicly available drug and disease data to build a drug-disease network by considering all interactions between drug targets and disease-related genes in the context of all known signaling pathways. This network is integrated with gene-expression measurements to identify drugs with new desired therapeutic effects based on a system-level analysis method. We compare the proposed approach with the drug repurposing approach proposed by Sirota et al. on four human diseases: idiopathic pulmonary fibrosis, non-small cell lung cancer, prostate cancer and breast cancer. We evaluate the proposed approach based on its ability to re-discover drugs that are already FDA-approved for a given disease. Availability and implementation: The R package DrugDiseaseNet is under review for publication in Bioconductor and is available at https://github.com/azampvd/DrugDiseaseNet. Supplementary information: Supplementary data are available at Bioinformatics online.
Assuntos
Reposicionamento de Medicamentos , Neoplasias/tratamento farmacológico , Biologia de Sistemas , Descoberta de Drogas/métodos , Reposicionamento de Medicamentos/métodos , Humanos , TranscriptomaRESUMO
DNA methylation is an important epigenetic mechanism that plays a crucial role in cellular regulatory systems. Recent advancements in sequencing technologies now enable us to generate high-throughput methylation data and to measure methylation up to single-base resolution. This wealth of data does not come without challenges, and one of the key challenges in DNA methylation studies is to identify the significant differences in the methylation levels of the base pairs across distinct biological conditions. Several computational methods have been developed to identify differential methylation using bisulfite sequencing data; however, there is no clear consensus among existing approaches. A comprehensive survey of these approaches would be of great benefit to potential users and researchers to get a complete picture of the available resources. In this article, we present a detailed survey of 22 such approaches focusing on their underlying statistical models, primary features, key advantages and major limitations. Importantly, the intrinsic drawbacks of the approaches pointed out in this survey could potentially be addressed by future research.