Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
1.
BMC Bioinformatics ; 25(1): 66, 2024 Feb 12.
Artigo em Inglês | MEDLINE | ID: mdl-38347515

RESUMO

BACKGROUND: DNA methylation is one of the most stable and well-characterized epigenetic alterations in humans. Accordingly, it has already found clinical utility as a molecular biomarker in a variety of disease contexts. Existing methods for clinical diagnosis of methylation-related disorders focus on outlier detection in a small number of CpG sites using standardized cutoffs which differentiate healthy from abnormal methylation levels. The standardized cutoff values used in these methods do not take into account methylation patterns which are known to differ between the sexes and with age. RESULTS: Here we profile genome-wide DNA methylation from blood samples drawn from within a cohort composed of healthy controls of different age and sex alongside patients with Prader-Willi syndrome (PWS), Beckwith-Wiedemann syndrome, Fragile-X syndrome, Angelman syndrome, and Silver-Russell syndrome. We propose a Generalized Additive Model to perform age and sex adjusted outlier analysis of around 700,000 CpG sites throughout the human genome. Utilizing z-scores among the cohort for each site, we deployed an ensemble based machine learning pipeline and achieved a combined prediction accuracy of 0.96 (Binomial 95% Confidence Interval 0.868[Formula: see text]0.995). CONCLUSION: We demonstrate a method for age and sex adjusted outlier detection of differentially methylated loci based on a large cohort of healthy individuals. We present a custom machine learning pipeline utilizing this outlier analysis to classify samples for potential methylation associated congenital disorders. These methods are able to achieve high accuracy when used with machine learning methods to classify abnormal methylation patterns.


Assuntos
Síndrome de Beckwith-Wiedemann , Síndrome de Silver-Russell , Humanos , Impressão Genômica , Metilação de DNA , Síndrome de Beckwith-Wiedemann/diagnóstico , Síndrome de Beckwith-Wiedemann/genética , Síndrome de Silver-Russell/diagnóstico , Síndrome de Silver-Russell/genética , Aprendizado de Máquina Supervisionado
2.
BMC Genomics ; 25(1): 371, 2024 Apr 16.
Artigo em Inglês | MEDLINE | ID: mdl-38627676

RESUMO

BACKGROUND: X-chromosome inactivation (XCI) is an epigenetic process that occurs during early development in mammalian females by randomly silencing one of two copies of the X chromosome in each cell. The preferential inactivation of either the maternal or paternal copy of the X chromosome in a majority of cells results in a skewed or non-random pattern of X inactivation and is observed in over 25% of adult females. Identifying skewed X inactivation is of clinical significance in patients with suspected rare genetic diseases due to the possibility of biased expression of disease-causing genes present on the active X chromosome. The current clinical test for the detection of skewed XCI relies on the methylation status of the methylation-sensitive restriction enzyme (Hpall) binding site present in proximity of short tandem polymorphic repeats on the androgen receptor (AR) gene. This approach using one locus results in uninformative or inconclusive data for 10-20% of tests. Further, recent studies have shown inconsistency between methylation of the AR locus and the state of inactivation of the X chromosome. Herein, we develop a method for estimating X inactivation status, using exome and transcriptome sequencing data derived from blood in 227 female samples. We built a reference model for evaluation of XCI in 135 females from the GTEx consortium. We tested and validated the model on 11 female individuals with different types of undiagnosed rare genetic disorders who were clinically tested for X-skew using the AR gene assay and compared results to our outlier-based analysis technique. RESULTS: In comparison to the AR clinical test for identification of X inactivation, our method was concordant with the AR method in 9 samples, discordant in 1, and provided a measure of X inactivation in 1 sample with uninformative clinical results. We applied this method on an additional 81 females presenting to the clinic with phenotypes consistent with different hereditary disorders without a known genetic diagnosis. CONCLUSIONS: This study presents the use of transcriptome and exome sequencing data to provide an accurate and complete estimation of X-inactivation and skew status in a cohort of female patients with different types of suspected rare genetic disease.


Assuntos
Exoma , Inativação do Cromossomo X , Adulto , Humanos , Feminino , Transcriptoma , Sequenciamento do Exoma , Cromossomos Humanos X/genética
3.
Clin Chem ; 69(7): 711-717, 2023 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-37086467

RESUMO

BACKGROUND: Large ß-globin gene cluster deletions (hereditary persistence of fetal hemoglobin [Hb] or ß-, δß-, γδß-, and ϵγδß-thalassemia), are associated with widely disparate phenotypes, including variable degrees of microcytic anemia and Hb F levels. When present, increased Hb A2 is used as a surrogate marker for ß-thalassemia. Notably, ϵγδß-thalassemias lack the essential regulatory locus control region (LCR) and cause severe transient perinatal anemia but normal newborn screen (NBS) results and Hb A2 levels. Herein, we report a novel deletion of the ϵ, Aγ, Gγ, and ψß loci with intact LCR, δ-, and ß-regions in 2 women and newborn twins. METHODS: Capillary electrophoresis (CE), high-performance liquid chromatography (HPLC), DNA sequencing, multiplex ligation-dependent probe amplification (MLPA), gap-polymerase chain reaction (gap-PCR), and long-read sequencing (LRS) were performed. RESULTS: NBS showed an Hb A > Hb F pattern for both twins. At 20 months, Hb A2 was increased similarly to that in the mother and an unrelated woman. Unexplained microcytosis was absent and the twins lacked severe neonatal anemia. MLPA, LRS, and gap-PCR confirmed a 32 599 base pair deletion of ϵ (HBE1) through ψß (HBBP1) loci. CONCLUSIONS: This deletion represents a hemoglobinopathy category with a distinct phenotype that has not been previously described, an ϵγ-thalassemia. Both the NBS Hb A > F pattern and the subsequent increased Hb A2 without microcytosis are unusual. A similar deletion should be considered when this pattern is encountered and appropriate test methods selected for detection. Knowledge of the clinical impact of this new category will improve genetic counselling, with distinction from the severe transient anemia associated with ϵγδß-thalassemia.


Assuntos
Hemoglobinopatias , Talassemia , Talassemia beta , Humanos , Feminino , Talassemia/genética , Talassemia beta/diagnóstico , Talassemia beta/genética , Hemoglobina Fetal/genética , Reação em Cadeia da Polimerase Multiplex
4.
Bioinformatics ; 36(17): 4609-4615, 2020 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-32315392

RESUMO

MOTIVATION: Next-generation sequencing is rapidly improving diagnostic rates in rare Mendelian diseases, but even with whole genome or whole exome sequencing, the majority of cases remain unsolved. Increasingly, RNA sequencing is being used to solve many cases that evade diagnosis through sequencing alone. Specifically, the detection of aberrant splicing in many rare disease patients suggests that identifying RNA splicing outliers is particularly useful for determining causal Mendelian disease genes. However, there is as yet a paucity of statistical methodologies to detect splicing outliers. RESULTS: We developed LeafCutterMD, a new statistical framework that significantly improves the previously published LeafCutter in the context of detecting outlier splicing events. Through simulations and analysis of real patient data, we demonstrate that LeafCutterMD has better power than the state-of-the-art methodology while controlling false-positive rates. When applied to a cohort of disease-affected probands from the Mayo Clinic Center for Individualized Medicine, LeafCutterMD recovered all aberrantly spliced genes that had previously been identified by manual curation efforts. AVAILABILITY AND IMPLEMENTATION: The source code for this method is available under the opensource Apache 2.0 license in the latest release of the LeafCutter software package available online at http://davidaknowles.github.io/leafcutter. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genoma , Doenças Raras , Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Splicing de RNA , Doenças Raras/diagnóstico , Doenças Raras/genética , Análise de Sequência de RNA , Software
5.
J Med Internet Res ; 23(9): e30157, 2021 09 28.
Artigo em Inglês | MEDLINE | ID: mdl-34449401

RESUMO

BACKGROUND: COVID-19 is caused by the SARS-CoV-2 virus and has strikingly heterogeneous clinical manifestations, with most individuals contracting mild disease but a substantial minority experiencing fulminant cardiopulmonary symptoms or death. The clinical covariates and the laboratory tests performed on a patient provide robust statistics to guide clinical treatment. Deep learning approaches on a data set of this nature enable patient stratification and provide methods to guide clinical treatment. OBJECTIVE: Here, we report on the development and prospective validation of a state-of-the-art machine learning model to provide mortality prediction shortly after confirmation of SARS-CoV-2 infection in the Mayo Clinic patient population. METHODS: We retrospectively constructed one of the largest reported and most geographically diverse laboratory information system and electronic health record of COVID-19 data sets in the published literature, which included 11,807 patients residing in 41 states of the United States of America and treated at medical sites across 5 states in 3 time zones. Traditional machine learning models were evaluated independently as well as in a stacked learner approach by using AutoGluon, and various recurrent neural network architectures were considered. The traditional machine learning models were implemented using the AutoGluon-Tabular framework, whereas the recurrent neural networks utilized the TensorFlow Keras framework. We trained these models to operate solely using routine laboratory measurements and clinical covariates available within 72 hours of a patient's first positive COVID-19 nucleic acid test result. RESULTS: The GRU-D recurrent neural network achieved peak cross-validation performance with 0.938 (SE 0.004) as the area under the receiver operating characteristic (AUROC) curve. This model retained strong performance by reducing the follow-up time to 12 hours (0.916 [SE 0.005] AUROC), and the leave-one-out feature importance analysis indicated that the most independently valuable features were age, Charlson comorbidity index, minimum oxygen saturation, fibrinogen level, and serum iron level. In the prospective testing cohort, this model provided an AUROC of 0.901 and a statistically significant difference in survival (P<.001, hazard ratio for those predicted to survive, 95% CI 0.043-0.106). CONCLUSIONS: Our deep learning approach using GRU-D provides an alert system to flag mortality for COVID-19-positive patients by using clinical covariates and laboratory values within a 72-hour window after the first positive nucleic acid test result.


Assuntos
COVID-19 , Sistemas de Informação em Laboratório Clínico , Aprendizado Profundo , Algoritmos , Registros Eletrônicos de Saúde , Humanos , Estudos Retrospectivos , SARS-CoV-2
6.
BMC Bioinformatics ; 20(1): 175, 2019 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-30961526

RESUMO

BACKGROUND: Establishment and maintenance of DNA methylation throughout the genome is an important epigenetic mechanism that regulates gene expression whose disruption has been implicated in human diseases like cancer. It is therefore crucial to know which genes, or other genomic features of interest, exhibit significant discordance in DNA methylation between two phenotypes. We have previously proposed an approach for ranking genes based on methylation discordance within their promoter regions, determined by centering a window of fixed size at their transcription start sites. However, we cannot use this method to identify statistically significant genomic features and handle features of variable length and with missing data. RESULTS: We present a new approach for computing the statistical significance of methylation discordance within genomic features of interest in single and multiple test/reference studies. We base the proposed method on a well-articulated hypothesis testing problem that produces p- and q-values for each genomic feature, which we then use to identify and rank features based on the statistical significance of their epigenetic dysregulation. We employ the information-theoretic concept of mutual information to derive a novel test statistic, which we can evaluate by computing Jensen-Shannon distances between the probability distributions of methylation in a test and a reference sample. We design the proposed methodology to simultaneously handle biological, statistical, and technical variability in the data, as well as variable feature lengths and missing data, thus enabling its wide-spread use on any list of genomic features. This is accomplished by estimating, from reference data, the null distribution of the test statistic as a function of feature length using generalized additive regression models. Differential assessment, using normal/cancer data from healthy fetal tissue and pediatric high-grade glioma patients, illustrates the potential of our approach to greatly facilitate the exploratory phases of clinically and biologically relevant methylation studies. CONCLUSIONS: The proposed approach provides the first computational tool for statistically testing and ranking genomic features of interest based on observed DNA methylation discordance in comparative studies that accounts, in a rigorous manner, for biological, statistical, and technical variability in methylation data, as well as for variability in feature length and for missing data.


Assuntos
Epigênese Genética , Epigenômica , Genômica , Metilação de DNA , Genoma Humano , Humanos , Neoplasias/diagnóstico , Neoplasias/genética , Probabilidade
7.
BMC Bioinformatics ; 19(1): 87, 2018 03 07.
Artigo em Inglês | MEDLINE | ID: mdl-29514626

RESUMO

BACKGROUND: DNA methylation is a stable form of epigenetic memory used by cells to control gene expression. Whole genome bisulfite sequencing (WGBS) has emerged as a gold-standard experimental technique for studying DNA methylation by producing high resolution genome-wide methylation profiles. Statistical modeling and analysis is employed to computationally extract and quantify information from these profiles in an effort to identify regions of the genome that demonstrate crucial or aberrant epigenetic behavior. However, the performance of most currently available methods for methylation analysis is hampered by their inability to directly account for statistical dependencies between neighboring methylation sites, thus ignoring significant information available in WGBS reads. RESULTS: We present a powerful information-theoretic approach for genome-wide modeling and analysis of WGBS data based on the 1D Ising model of statistical physics. This approach takes into account correlations in methylation by utilizing a joint probability model that encapsulates all information available in WGBS methylation reads and produces accurate results even when applied on single WGBS samples with low coverage. Using the Shannon entropy, our approach provides a rigorous quantification of methylation stochasticity in individual WGBS samples genome-wide. Furthermore, it utilizes the Jensen-Shannon distance to evaluate differences in methylation distributions between a test and a reference sample. Differential performance assessment using simulated and real human lung normal/cancer data demonstrate a clear superiority of our approach over DSS, a recently proposed method for WGBS data analysis. Critically, these results demonstrate that marginal methods become statistically invalid when correlations are present in the data. CONCLUSIONS: This contribution demonstrates clear benefits and the necessity of modeling joint probability distributions of methylation using the 1D Ising model of statistical physics and of quantifying methylation stochasticity using concepts from information theory. By employing this methodology, substantial improvement of DNA methylation analysis can be achieved by effectively taking into account the massive amount of statistical information available in WGBS data, which is largely ignored by existing methods.


Assuntos
Teoria da Informação , Modelos Teóricos , Estatística como Assunto , Sulfitos/química , Sequenciamento Completo do Genoma/métodos , Sequência de Bases , Simulação por Computador , Ilhas de CpG/genética , Metilação de DNA/genética , Entropia , Epigênese Genética , Ontologia Genética , Genoma Humano , Humanos , Neoplasias Pulmonares/genética , Probabilidade , Navegador
8.
PLoS Comput Biol ; 10(1): e1003411, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24415927

RESUMO

The role intrinsic statistical fluctuations play in creating avalanches--patterns of complex bursting activity with scale-free properties--is examined in leaky Markovian networks. Using this broad class of models, we develop a probabilistic approach that employs a potential energy landscape perspective coupled with a macroscopic description based on statistical thermodynamics. We identify six important thermodynamic quantities essential for characterizing system behavior as a function of network size: the internal potential energy, entropy, free potential energy, internal pressure, pressure, and bulk modulus. In agreement with classical phase transitions, these quantities evolve smoothly as a function of the network size until a critical value is reached. At that value, a discontinuity in pressure is observed that leads to a spike in the bulk modulus demarcating loss of thermodynamic robustness. We attribute this novel result to a reallocation of the ground states (global minima) of the system's stationary potential energy landscape caused by a noise-induced deformation of its topographic surface. Further analysis demonstrates that appreciable levels of intrinsic noise can cause avalanching, a complex mode of operation that dominates system dynamics at near-critical or subcritical network sizes. Illustrative examples are provided using an epidemiological model of bacterial infection, where avalanching has not been characterized before, and a previously studied model of computational neuroscience, where avalanching was erroneously attributed to specific neural architectures. The general methods developed here can be used to study the emergence of avalanching (and other complex phenomena) in many biological, physical and man-made interaction networks.


Assuntos
Infecções Bacterianas/epidemiologia , Biologia Computacional/métodos , Cadeias de Markov , Algoritmos , Simulação por Computador , Entropia , Humanos , Infectologia/métodos , Modelos Teóricos , Neurociências , Distribuição Normal , Estresse Mecânico , Termodinâmica
9.
J Chem Phys ; 138(20): 204108, 2013 May 28.
Artigo em Inglês | MEDLINE | ID: mdl-23742455

RESUMO

The master equation is used extensively to model chemical reaction systems with stochastic dynamics. However, and despite its phenomenological simplicity, it is not in general possible to compute the solution of this equation. Drawing exact samples from the master equation is possible, but can be computationally demanding, especially when estimating high-order statistical summaries or joint probability distributions. As a consequence, one often relies on analytical approximations to the solution of the master equation or on computational techniques that draw approximative samples from this equation. Unfortunately, it is not in general possible to check whether a particular approximation scheme is valid. The main objective of this paper is to develop an effective methodology to address this problem based on statistical hypothesis testing. By drawing a moderate number of samples from the master equation, the proposed techniques use the well-known Kolmogorov-Smirnov statistic to reject the validity of a given approximation method or accept it with a certain level of confidence. Our approach is general enough to deal with any master equation and can be used to test the validity of any analytical approximation method or any approximative sampling technique of interest. A number of examples, based on the Schlögl model of chemistry and the SIR model of epidemiology, clearly illustrate the effectiveness and potential of the proposed statistical framework.


Assuntos
Teoria Quântica , Termodinâmica
10.
Diagn Microbiol Infect Dis ; 105(3): 115880, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-36669396

RESUMO

On February 29th, 2020, the U.S. Food and Drug Administration issued the first Emergency Use Authorization (EUA) for a SARS-CoV-2 assay outside of the U.S. Centers for Disease Control and Prevention. As of May 3rd, 2021, 289 total EUAs have been granted. Like influenza, there is no standard for defining limit of detection (LoD), but rather guidance that analytical sensitivity/LoD be established as the level that gives a 95% detection rate in at least 20 replicates. Here we compare the performance characteristics of SARS-CoV-2 tests receiving EUA by standardizing sensitivity to a common unit of measure and assess the variability in LoD between tests. Additionally, we looked at factors that may impact sensitivities due to lack of standardization of the test development process and compare results for a standardized reference panel for comparative analysis within a subset of EUA tests offered by the U.S. Food and Drug Administration.


Assuntos
COVID-19 , SARS-CoV-2 , Humanos , COVID-19/diagnóstico , Teste para COVID-19 , Limite de Detecção , Técnicas de Laboratório Clínico/métodos , Sensibilidade e Especificidade
11.
J Mol Diagn ; 25(8): 602-610, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37236547

RESUMO

Innovation in sequencing instrumentation is increasing the per-batch data volumes and decreasing the per-base costs. Multiplexed chemistry protocols after the addition of index tags have further contributed to efficient and cost-effective sequencer utilization. With these pooled processing strategies, however, comes an increased risk of sample contamination. Sample contamination poses a risk of missing critical variants in a patient sample or wrongly reporting variants derived from the contaminant, which are particularly relevant issues in oncology specimen testing in which low variant allele frequencies have clinical relevance. Small custom-targeted next-generation sequencing (NGS) panels yield limited variants and pose challenges in delineating true somatic variants versus contamination calls. A number of popular contamination identification tools have the ability to perform well in whole-genome/exome sequencing data; however, in smaller gene panels, there are fewer variant candidates for the tools to perform accurately. To prevent clinical reporting of potentially contaminated samples in small next-generation sequencing panels, we have developed MICon (Microhaplotype Contamination detection), a novel contamination detection model that uses microhaplotype site variant allele frequencies. In a heterogeneous hold-out test cohort of 210 samples, the model displayed state-of-the-art performance with an area under the receiver-operating characteristic curve of 0.995.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Laboratórios , Humanos , Fluxo de Trabalho , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Aprendizado de Máquina Supervisionado
12.
PLoS One ; 18(2): e0279956, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36735683

RESUMO

BACKGROUND: Real-world performance of COVID-19 diagnostic tests under Emergency Use Authorization (EUA) must be assessed. We describe overall trends in the performance of serology tests in the context of real-world implementation. METHODS: Six health systems estimated the odds of seropositivity and positive percent agreement (PPA) of serology test among people with confirmed SARS-CoV-2 infection by molecular test. In each dataset, we present the odds ratio and PPA, overall and by key clinical, demographic, and practice parameters. RESULTS: A total of 15,615 people were observed to have at least one serology test 14-90 days after a positive molecular test for SARS-CoV-2. We observed higher PPA in Hispanic (PPA range: 79-96%) compared to non-Hispanic (60-89%) patients; in those presenting with at least one COVID-19 related symptom (69-93%) as compared to no such symptoms (63-91%); and in inpatient (70-97%) and emergency department (93-99%) compared to outpatient (63-92%) settings across datasets. PPA was highest in those with diabetes (75-94%) and kidney disease (83-95%); and lowest in those with auto-immune conditions or who are immunocompromised (56-93%). The odds ratios (OR) for seropositivity were higher in Hispanics compared to non-Hispanics (OR range: 2.59-3.86), patients with diabetes (1.49-1.56), and obesity (1.63-2.23); and lower in those with immunocompromised or autoimmune conditions (0.25-0.70), as compared to those without those comorbidities. In a subset of three datasets with robust information on serology test name, seven tests were used, two of which were used in multiple settings and met the EUA requirement of PPA ≥87%. Tests performed similarly across datasets. CONCLUSION: Although the EUA requirement was not consistently met, more investigation is needed to understand how serology and molecular tests are used, including indication and protocol fidelity. Improved data interoperability of test and clinical/demographic data are needed to enable rapid assessment of the real-world performance of in vitro diagnostic tests.


Assuntos
COVID-19 , SARS-CoV-2 , Humanos , Estados Unidos/epidemiologia , COVID-19/diagnóstico , COVID-19/epidemiologia , Teste para COVID-19 , Técnicas de Laboratório Clínico/métodos , Testes Sorológicos
13.
Lancet Digit Health ; 4(9): e632-e645, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-35835712

RESUMO

BACKGROUND: COVID-19 is a multi-system disorder with high variability in clinical outcomes among patients who are admitted to hospital. Although some cytokines such as interleukin (IL)-6 are believed to be associated with severity, there are no early biomarkers that can reliably predict patients who are more likely to have adverse outcomes. Thus, it is crucial to discover predictive markers of serious complications. METHODS: In this retrospective cohort study, we analysed samples from 455 participants with COVID-19 who had had a positive SARS-CoV-2 RT-PCR result between April 14, 2020, and Dec 1, 2020 and who had visited one of three Mayo Clinic sites in the USA (Minnesota, Arizona, or Florida) in the same period. These participants were assigned to three subgroups depending on disease severity as defined by the WHO ordinal scale of clinical improvement (outpatient, severe, or critical). Our control cohort comprised of 182 anonymised age-matched and sex-matched plasma samples that were available from the Mayo Clinic Biorepository and banked before the COVID-19 pandemic. We did a deep profiling of circulatory cytokines and other proteins, lipids, and metabolites from both cohorts. Most patient samples were collected before, or around the time of, hospital admission, representing ideal samples for predictive biomarker discovery. We used proximity extension assays to quantify cytokines and circulatory proteins and tandem mass spectrometry to measure lipids and metabolites. Biomarker discovery was done by applying an AutoGluon-tabular classifier to a multiomics dataset, producing a stacked ensemble of cutting-edge machine learning algorithms. Global proteomics and glycoproteomics on a subset of patient samples with matched pre-COVID-19 plasma samples was also done. FINDINGS: We quantified 1463 cytokines and circulatory proteins, along with 902 lipids and 1018 metabolites. By developing a machine-learning-based prediction model, a set of 102 biomarkers, which predicted severe and clinical COVID-19 outcomes better than the traditional set of cytokines, were discovered. These predictive biomarkers included several novel cytokines and other proteins, lipids, and metabolites. For example, altered amounts of C-type lectin domain family 6 member A (CLEC6A), ether phosphatidylethanolamine (P-18:1/18:1), and 2-hydroxydecanoate, as reported here, have not previously been associated with severity in COVID-19. Patient samples with matched pre-COVID-19 plasma samples showed similar trends in muti-omics signatures along with differences in glycoproteomics profile. INTERPRETATION: A multiomic molecular signature in the plasma of patients with COVID-19 before being admitted to hospital can be exploited to predict a more severe course of disease. Machine learning approaches can be applied to highly complex and multidimensional profiling data to reveal novel signatures of clinical use. The absence of validation in an independent cohort remains a major limitation of the study. FUNDING: Eric and Wendy Schmidt.


Assuntos
COVID-19 , Biomarcadores , COVID-19/diagnóstico , Estudos de Coortes , Citocinas , Humanos , Lipidômica/métodos , Lipídeos , Metabolômica/métodos , Pandemias , Prognóstico , Proteômica/métodos , Estudos Retrospectivos , SARS-CoV-2
14.
Nat Biomed Eng ; 5(4): 360-376, 2021 04.
Artigo em Inglês | MEDLINE | ID: mdl-33859388

RESUMO

In cancer, linking epigenetic alterations to drivers of transformation has been difficult, in part because DNA methylation analyses must capture epigenetic variability, which is central to tumour heterogeneity and tumour plasticity. Here, by conducting a comprehensive analysis, based on information theory, of differences in methylation stochasticity in samples from patients with paediatric acute lymphoblastic leukaemia (ALL), we show that ALL epigenomes are stochastic and marked by increased methylation entropy at specific regulatory regions and genes. By integrating DNA methylation and single-cell gene-expression data, we arrived at a relationship between methylation entropy and gene-expression variability, and found that epigenetic changes in ALL converge on a shared set of genes that overlap with genetic drivers involved in chromosomal translocations across the disease spectrum. Our findings suggest that an epigenetically driven gene-regulation network, with UHRF1 (ubiquitin-like with PHD and RING finger domains 1) as a central node, links genetic drivers and epigenetic mediators in ALL.


Assuntos
Epigênese Genética , Modelos Teóricos , Leucemia-Linfoma Linfoblástico de Células Precursoras/genética , Proteínas Estimuladoras de Ligação a CCAAT/genética , Criança , Subunidade alfa 2 de Fator de Ligação ao Core/genética , Análise Citogenética , Metilação de DNA , Entropia , Edição de Genes , Regulação Neoplásica da Expressão Gênica , Humanos , Proteínas de Fusão Oncogênica/genética , Leucemia-Linfoma Linfoblástico de Células Precursoras/patologia , RNA-Seq , Análise de Célula Única , Processos Estocásticos , Ubiquitina-Proteína Ligases/genética
15.
Arch Pathol Lab Med ; 145(7): 785-796, 2021 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-33720333

RESUMO

CONTEXT.­: Small case series have evaluated severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) detection in formalin-fixed, paraffin-embedded tissue using reverse transcription-polymerase chain reaction, immunohistochemistry (IHC), and/or RNA in situ hybridization (RNAish). OBJECTIVE.­: To compare droplet digital polymerase chain reaction, IHC, and RNAish to detect SARS-CoV-2 in formalin-fixed, paraffin-embedded tissue in a large series of lung specimens from coronavirus disease 2019 (COVID-19) patients. DESIGN.­: Droplet digital polymerase chain reaction and RNAish used commercially available probes; IHC used clone 1A9. Twenty-six autopsies of COVID-19 patients with formalin-fixed, paraffin-embedded tissue blocks of 62 lung specimens, 22 heart specimens, 2 brain specimens, and 1 liver, and 1 umbilical cord were included. Control cases included 9 autopsy lungs from patients with other infections/inflammation and virus-infected tissue or cell lines. RESULTS.­: Droplet digital polymerase chain reaction had the highest sensitivity for SARS-CoV-2 (96%) when compared with IHC (31%) and RNAish (36%). All 3 tests had a specificity of 100%. Agreement between droplet digital polymerase chain reaction and IHC or RNAish was fair (κ = 0.23 and κ = 0.35, respectively). Agreement between IHC and in situ hybridization was substantial (κ = 0.75). Interobserver reliability was almost perfect for IHC (κ = 0.91) and fair to moderate for RNAish (κ = 0.38-0.59). Lung tissues from patients who died earlier after onset of symptoms revealed higher copy numbers by droplet digital polymerase chain reaction (P = .03, Pearson correlation = -0.65) and were more likely to be positive by RNAish (P = .02) than lungs from patients who died later. We identified SARS-CoV-2 in hyaline membranes, in pneumocytes, and rarely in respiratory epithelium. Droplet digital polymerase chain reaction showed low copy numbers in 7 autopsy hearts from ProteoGenex Inc. All other extrapulmonary tissues were negative. CONCLUSIONS.­: Droplet digital polymerase chain reaction was the most sensitive and highly specific test to identify SARS-CoV-2 in lung specimens from COVID-19 patients.


Assuntos
Teste para COVID-19/métodos , COVID-19/diagnóstico , Imuno-Histoquímica , Hibridização In Situ/métodos , Pulmão/virologia , Reação em Cadeia da Polimerase Via Transcriptase Reversa/métodos , SARS-CoV-2/isolamento & purificação , Adulto , Idoso , Idoso de 80 Anos ou mais , COVID-19/virologia , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Variações Dependentes do Observador , Estudos Prospectivos , RNA Viral/isolamento & purificação , Reprodutibilidade dos Testes , SARS-CoV-2/genética , Sensibilidade e Especificidade
16.
Front Genet ; 12: 739054, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34745213

RESUMO

Detecting gene fusions involving driver oncogenes is pivotal in clinical diagnosis and treatment of cancer patients. Recent developments in next-generation sequencing (NGS) technologies have enabled improved assays for bioinformatics-based gene fusions detection. In clinical applications, where a small number of fusions are clinically actionable, targeted polymerase chain reaction (PCR)-based NGS chemistries, such as the QIAseq RNAscan assay, aim to improve accuracy compared to standard RNA sequencing. Existing informatics methods for gene fusion detection in NGS-based RNA sequencing assays traditionally use a transcriptome-based spliced alignment approach or a de-novo assembly approach. Transcriptome-based spliced alignment methods face challenges with short read mapping yielding low quality alignments. De-novo assembly-based methods yield longer contigs from short reads that can be more sensitive for genomic rearrangements, but face performance and scalability challenges. Consequently, there exists a need for a method to efficiently and accurately detect fusions in targeted PCR-based NGS chemistries. We describe SeekFusion, a highly accurate and computationally efficient pipeline enabling identification of gene fusions from PCR-based NGS chemistries. Utilizing biological samples processed with the QIAseq RNAscan assay and in-silico simulated data we demonstrate that SeekFusion gene fusion detection accuracy outperforms popular existing methods such as STAR-Fusion, TOPHAT-Fusion and JAFFA-hybrid. We also present results from 4,484 patient samples tested for neurological tumors and sarcoma, encompassing details on some novel fusions identified.

17.
BMC Bioinformatics ; 11: 547, 2010 Nov 05.
Artigo em Inglês | MEDLINE | ID: mdl-21054868

RESUMO

BACKGROUND: Estimating the rate constants of a biochemical reaction system with known stoichiometry from noisy time series measurements of molecular concentrations is an important step for building predictive models of cellular function. Inference techniques currently available in the literature may produce rate constant values that defy necessary constraints imposed by the fundamental laws of thermodynamics. As a result, these techniques may lead to biochemical reaction systems whose concentration dynamics could not possibly occur in nature. Therefore, development of a thermodynamically consistent approach for estimating the rate constants of a biochemical reaction system is highly desirable. RESULTS: We introduce a Bayesian analysis approach for computing thermodynamically consistent estimates of the rate constants of a closed biochemical reaction system with known stoichiometry given experimental data. Our method employs an appropriately designed prior probability density function that effectively integrates fundamental biophysical and thermodynamic knowledge into the inference problem. Moreover, it takes into account experimental strategies for collecting informative observations of molecular concentrations through perturbations. The proposed method employs a maximization-expectation-maximization algorithm that provides thermodynamically feasible estimates of the rate constant values and computes appropriate measures of estimation accuracy. We demonstrate various aspects of the proposed method on synthetic data obtained by simulating a subset of a well-known model of the EGF/ERK signaling pathway, and examine its robustness under conditions that violate key assumptions. Software, coded in MATLAB®, which implements all Bayesian analysis techniques discussed in this paper, is available free of charge at http://www.cis.jhu.edu/~goutsias/CSS%20lab/software.html. CONCLUSIONS: Our approach provides an attractive statistical methodology for estimating thermodynamically feasible values for the rate constants of a biochemical reaction system from noisy time series observations of molecular concentrations obtained through perturbations. The proposed technique is theoretically sound and computationally feasible, but restricted to quantitative data obtained from closed biochemical reaction systems. This necessitates development of similar techniques for estimating the rate constants of open biochemical reaction systems, which are more realistic models of cellular function.


Assuntos
Teorema de Bayes , Fenômenos Bioquímicos , Termodinâmica , Algoritmos , Biologia Computacional/métodos , Cinética , Transdução de Sinais
18.
Front Genet ; 11: 173, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32180803

RESUMO

Several recent studies have demonstrated the utility of RNA-Seq in the diagnosis of rare inherited disease. Diagnostic rates 35% higher than those previously achievable with DNA-Seq alone have been attained. These studies have primarily profiled gene expression and splicing defects, however, some have also shown that fusion transcripts are diagnostic or phenotypically relevant in patients with constitutional disorders. Fusion transcripts have traditionally been studied as oncogenic phenomena, with relevance only to cancer testing. Consequently, fusion detection algorithms were biased toward the detection of well-known oncogenic fusions, hindering their application to rare Mendelian genetic disease studies. A recent methodology published by the authors successfully tailored a traditional algorithm to the detection of pathogenic fusion events in inherited disease. A key mechanism of decreasing false positive or biologically benign events was comparison to a database of events detected in normal tissues. This approach is akin to population frequency-based filtering of genetic variants. It is predicated on the idea that pathogenic fusion transcripts are absent from normal tissue. We report on an analysis of RNA-Seq data from the genotype-tissue expression (GTEx) project in which known pathogenic fusions are computationally detected at low levels in normal tissues unassociated with the disease phenotype. Examples include archetypal cancer fusion transcripts, as well as fusions responsible for rare inherited disease. We consider potential explanations for the detectability of such transcripts and discuss the bearing such results have on the future profiling of genetic disease patients for pathogenic gene fusions.

19.
Epigenetics ; 15(8): 841-858, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32114880

RESUMO

Translocations of the KMT2A (MLL) gene define a biologically distinct and clinically aggressive subtype of acute myeloid leukaemia (AML), marked by a characteristic gene expression profile and few cooperating mutations. Although dysregulation of the epigenetic landscape in this leukaemia is particularly interesting given the low mutation frequency, its comprehensive analysis using whole genome bisulphite sequencing (WGBS) has not been previously performed. Here we investigated epigenetic dysregulation in nine MLL-rearranged (MLL-r) AML samples by comparing them to six normal myeloid controls, using a computational method that encapsulates mean DNA methylation measurements along with analyses of methylation stochasticity. We discovered a dramatically altered epigenetic profile in MLL-r AML, associated with genome-wide hypomethylation and a markedly increased DNA methylation entropy reflecting an increasingly disordered epigenome. Methylation discordance mapped to key genes and regulatory elements that included bivalent promoters and active enhancers. Genes associated with significant changes in methylation stochasticity recapitulated known MLL-r AML expression signatures, suggesting a role for the altered epigenetic landscape in the transcriptional programme initiated by MLL translocations. Accordingly, we established statistically significant associations between discordances in methylation stochasticity and gene expression in MLL-r AML, thus providing a link between the altered epigenetic landscape and the phenotype.


Assuntos
Metilação de DNA , Regulação Neoplásica da Expressão Gênica , Leucemia Aguda Bifenotípica/genética , Leucemia Mieloide Aguda/genética , Epigênese Genética , Histona-Lisina N-Metiltransferase/genética , Humanos , Leucemia Aguda Bifenotípica/metabolismo , Leucemia Mieloide Aguda/metabolismo , Proteína de Leucina Linfoide-Mieloide/genética , Transcriptoma , Translocação Genética
20.
Artigo em Inglês | MEDLINE | ID: mdl-31662300

RESUMO

Trichorhinophalangeal syndrome type I (TRPSI) is a rare disorder that causes distinctive ectodermal, facial, and skeletal features affecting the hair (tricho-), nose (rhino-), and fingers and toes (phalangeal) and is inherited in an autosomal dominant pattern. TRPSI is caused by loss of function variants in TRPS1, involved in the regulation of chondrocyte and perichondrium development. Pathogenic variants in TRPS1 include missense mutations and deletions with variable breakpoints, with only a single instance of an intragenic duplication reported to date. Here we report an affected individual presenting with a classic TRPSI phenotype who is heterozygous for a de novo intragenic ∼36.3-kbp duplication affecting exons 2-4 of TRPS1 Molecular analysis revealed the duplication to be in direct tandem orientation affecting the splicing of TRPS1 The aberrant transcripts are predicted to produce a truncated TRPS1 missing the nuclear localization signal and the GATA and IKAROS-like zinc-finger domains resulting in functional TRPS1 haploinsufficiency. Our study identifies a novel intragenic tandem duplication of TRPS1 and highlights the importance of molecular characterization of intragenic duplications.


Assuntos
Dedos/anormalidades , Doenças do Cabelo/genética , Síndrome de Langer-Giedion/genética , Nariz/anormalidades , Proteínas Repressoras/genética , Idoso , Criança , Proteínas de Ligação a DNA/genética , Éxons/genética , Família , Feminino , Duplicação Gênica/genética , Doenças do Cabelo/etiologia , Humanos , Síndrome de Langer-Giedion/etiologia , Masculino , Pessoa de Meia-Idade , Mutação , Mutação de Sentido Incorreto/genética , Linhagem , Fenótipo , Splicing de RNA/genética , Proteínas Repressoras/metabolismo , Deleção de Sequência/genética , Fatores de Transcrição/genética , Dedos de Zinco/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA