RESUMEN
The analysis of cell-free DNA (cfDNA) from plasma offers great promise for the earlier detection of cancer. At present, changes in DNA sequence, methylation, or copy number are the most sensitive ways to detect the presence of cancer. To further increase the sensitivity of such assays with limited amounts of sample, it would be useful to be able to evaluate the same template molecules for all these changes. Here, we report an approach, called MethylSaferSeqS, that achieves this goal, and can be applied to any standard library preparation method suitable for massively parallel sequencing. The innovative step was to copy both strands of each DNA-barcoded molecule with a primer that allows the subsequent separation of the original strands (retaining their 5-methylcytosine residues) from the copied strands (in which the 5-methylcytosine residues are replaced with unmodified cytosine residues). The epigenetic and genetic alterations present in the DNA molecules can then be obtained from the original and copied strands, respectively. We applied this approach to plasma from 265 individuals, including 198 with cancers of the pancreas, ovary, lung, and colon, and found the expected patterns of mutations, copy number alterations, and methylation. Furthermore, we could determine which original template DNA molecules were methylated and/or mutated. MethylSaferSeqS should be useful for addressing a variety of questions relating genetics and epigenetics.
Asunto(s)
Variaciones en el Número de Copia de ADN , Neoplasias , Femenino , Humanos , Metilación , 5-Metilcitosina , ADN/genética , Mutación , Neoplasias/genética , Metilación de ADNRESUMEN
BACKGROUND: Postoperative nausea and vomiting (PONV) is a key driver of unplanned admission and patient satisfaction following surgery. Because traditional risk factors do not completely explain variability in risk, we hypothesize that genetics may contribute to the overall risk for this complication. The objective of this research is to perform a genome-wide association study of PONV, derive a polygenic risk score for PONV, assess associations between the risk score and PONV in a validation cohort, and compare any genetic contributions to known clinical risks for PONV. METHODS: Surgeries with integrated genetic and perioperative data performed under general anesthesia at Michigan Medicine and Vanderbilt University Medical Center were studied. PONV was defined as nausea or emesis occurring and documented in the PACU. In the Discovery Phase, genome-wide association studies were performed on each genetic cohort and the results were meta-analyzed. Next, in the Polygenic Phase, we assessed whether a polygenic score, derived from genome-wide association study in a derivation cohort from Vanderbilt University Medical Center, improved prediction within a validation cohort from Michigan Medicine, as quantified by discrimination (C-statistic) and net reclassification index. RESULTS: Of 64,523 total patients, 5,703 developed PONV (8.8%). We identified 46 genetic variants exceeding P<1x10-5 threshold, occurring with minor allele frequency > 1%, and demonstrating concordant effects in both cohorts. Standardized polygenic score was associated with PONV in a basic model, controlling for age and sex, (aOR 1.027 per standard deviation increase in overall genetic risk, 95% CI 1.001-1.053, P=0.044), a model based on known clinical risks (aOR 1.029, 95% CI 1.003-1.055, P=0.030), and a full clinical regression, controlling for 21 demographic, surgical, and anesthetic factors, (aOR 1.029, 95% CI 1.002-1.056, P=0.033). The addition of polygenic score improved overall discrimination in models based on known clinical risk factors (c-statistic: 0.616 compared to 0.613, P=0.028) and improved net reclassification of 4.6% of cases. CONCLUSION: Standardized polygenic risk was associated with PONV in all three of our models, but the genetic influence was smaller than exerted by clinical risk factors. Specifically, a patient with a polygenic risk score > 1 standard deviation above the mean, has 2-3% greater odds of developing PONV when compared to the baseline population, which is at least an order of magnitude smaller than the increase associated with having prior PONV/motion sickness (55%), having a history of migraines (17%), or being female (83%), and is not clinically significant. Furthermore, the use of a polygenic risk score does not meaningfully improve discrimination compared to clinical risk factors and is not clinically useful.
RESUMEN
We report a sensitive PCR-based assay called Repetitive Element AneupLoidy Sequencing System (RealSeqS) that can detect aneuploidy in samples containing as little as 3 pg of DNA. Using a single primer pair, we amplified â¼350,000 amplicons distributed throughout the genome. Aneuploidy was detected in 49% of liquid biopsies from a total of 883 nonmetastatic, clinically detected cancers of the colorectum, esophagus, liver, lung, ovary, pancreas, breast, or stomach. Combining aneuploidy with somatic mutation detection and eight standard protein biomarkers yielded a median sensitivity of 80% in these eight cancer types, while only 1% of 812 healthy controls scored positive.
Asunto(s)
Aneuploidia , Neoplasias , Secuencias Repetitivas de Ácidos Nucleicos , Biomarcadores de Tumor , ADN Tumoral Circulante , ADN/genética , Esófago , Humanos , Biopsia Líquida , Mutación , Neoplasias/diagnóstico , Neoplasias/genética , Secuencias Repetitivas de Ácidos Nucleicos/genética , Secuenciación Completa del GenomaRESUMEN
BACKGROUND & AIMS: Aneuploidy has been proposed as a tool to assess progression in patients with Barrett's esophagus (BE), but has heretofore required multiple biopsies. We assessed whether a single esophageal brushing that widely sampled the esophagus could be combined with massively parallel sequencing to characterize aneuploidy and identify patients with disease progression to dysplasia or cancer. METHODS: Esophageal brushings were obtained from patients without BE, with non-dysplastic BE (NDBE), low-grade dysplasia (LGD), high-grade dysplasia (HGD), or adenocarcinoma (EAC). To assess aneuploidy, we used RealSeqS, a technique that uses a single primer pair to interrogate â¼350,000 genome-spanning regions and identify specific chromosome arm alterations. A classifier to distinguish NDBE from EAC was trained on results from 79 patients. An independent validation cohort of 268 subjects was used to test the classifier at distinguishing patients at successive phases of BE progression. RESULTS: Aneuploidy progression was associated with gains of 1q, 12p, and 20q and losses on 9p and 17p. The entire chromosome 8q was often gained in NDBE, whereas focal gain of 8q24 was identified only when there was dysplasia. Among validation subjects, a classifier incorporating these features with a global measure of aneuploidy scored positive in 96% of EAC, 68% of HGD, but only 7% of NDBE. CONCLUSIONS: RealSeqS analysis of esophageal brushings provides a practical and sensitive method to determine aneuploidy in BE patients. It identifies specific chromosome changes that occur early in NDBE and others that occur late and mark progression to dysplasia. The clinical implications of this approach can now be tested in prospective trials.
Asunto(s)
Adenocarcinoma/patología , Aneuploidia , Esófago de Barrett/genética , Esófago de Barrett/patología , Neoplasias Esofágicas/patología , Adenocarcinoma/genética , Esófago de Barrett/clasificación , Estudios Transversales , Técnicas Citológicas , Progresión de la Enfermedad , Neoplasias Esofágicas/genética , Esófago/patología , Secuenciación de Nucleótidos de Alto Rendimiento , HumanosRESUMEN
Replicability, the ability to replicate scientific findings, is a prerequisite for scientific discovery and clinical utility. Troublingly, we are in the midst of a replicability crisis. A key to replicability is that multiple measurements of the same item (e.g., experimental sample or clinical participant) under fixed experimental constraints are relatively similar to one another. Thus, statistics that quantify the relative contributions of accidental deviations-such as measurement error-as compared to systematic deviations-such as individual differences-are critical. We demonstrate that existing replicability statistics, such as intra-class correlation coefficient and fingerprinting, fail to adequately differentiate between accidental and systematic deviations in very simple settings. We therefore propose a novel statistic, discriminability, which quantifies the degree to which an individual's samples are relatively similar to one another, without restricting the data to be univariate, Gaussian, or even Euclidean. Using this statistic, we introduce the possibility of optimizing experimental design via increasing discriminability and prove that optimizing discriminability improves performance bounds in subsequent inference tasks. In extensive simulated and real datasets (focusing on brain imaging and demonstrating on genomics), only optimizing data discriminability improves performance on all subsequent inference tasks for each dataset. We therefore suggest that designing experiments and analyses to optimize discriminability may be a crucial step in solving the replicability crisis, and more generally, mitigating accidental measurement error.
Asunto(s)
Conectoma , Genoma , Artefactos , Mapeo Encefálico/métodos , Conjuntos de Datos como Asunto , Humanos , Reproducibilidad de los ResultadosRESUMEN
BACKGROUND: Patients with coronavirus disease 2019 (COVID-19) requiring mechanical ventilation have high mortality and resource utilisation. The ability to predict which patients may require mechanical ventilation allows increased acuity of care and targeted interventions to potentially mitigate deterioration. METHODS: We included hospitalised patients with COVID-19 in this single-centre retrospective observational study. Our primary outcome was mechanical ventilation or death within 24 h. As clinical decompensation is more recognisable, but less modifiable, as the prediction window shrinks, we also assessed 4, 8, and 48 h prediction windows. Model features included demographic information, laboratory results, comorbidities, medication administration, and vital signs. We created a Random Forest model, and assessed performance using 10-fold cross-validation. The model was compared with models derived from generalised estimating equations using discrimination. RESULTS: Ninety-three (23%) of 398 patients required mechanical ventilation or died within 14 days of admission. The Random Forest model predicted pending mechanical ventilation with good discrimination (C-statistic=0.858; 95% confidence interval, 0.841-0.874), which is comparable with the discrimination of the generalised estimating equation regression. Vitals sign data including SpO2/FiO2 ratio (Random Forest Feature Importance Z-score=8.56), ventilatory frequency (5.97), and heart rate (5.87) had the highest predictive utility. In our highest-risk cohort, the number of patients needed to identify a single new case was 3.2, and for our second quintile it was 5.0. CONCLUSION: Machine learning techniques can be leveraged to improve the ability to predict which patients with COVID-19 are likely to require mechanical ventilation, identifying unrecognised bellwethers and providing insight into the constellation of accompanying signs of respiratory failure in COVID-19.
Asunto(s)
COVID-19/diagnóstico , COVID-19/terapia , Toma de Decisiones Clínicas/métodos , Aprendizaje Automático/tendencias , Respiración Artificial/tendencias , Anciano , COVID-19/epidemiología , Femenino , Humanos , Masculino , Persona de Mediana Edad , Valor Predictivo de las Pruebas , Estudios RetrospectivosRESUMEN
Aneuploidy is a feature of most cancer cells, and a myriad of approaches have been developed to detect it in clinical samples. We previously described primers that could be used to amplify â¼38,000 unique long interspersed nucleotide elements (LINEs) from throughout the genome. Here we have developed an approach to evaluate the sequencing data obtained from these amplicons. This approach, called Within-Sample AneupLoidy DetectiOn (WALDO), employs supervised machine learning to detect the small changes in multiple chromosome arms that are often present in cancers. We used WALDO to search for chromosome arm gains and losses in 1,677 tumors and in 1,522 liquid biopsies of blood from cancer patients or normal individuals. Aneuploidy was detected in 95% of cancer biopsies and in 22% of liquid biopsies. Using single-nucleotide polymorphisms within the amplified LINEs, WALDO concomitantly assesses allelic imbalances, microsatellite instability, and sample identification. WALDO can be used on samples containing only a few nanograms of DNA and as little as 1% neoplastic content and has a variety of applications in cancer diagnostics and forensic science.
Asunto(s)
Aneuploidia , Elementos de Nucleótido Esparcido Largo/genética , Neoplasias/genética , Aberraciones Cromosómicas , Predisposición Genética a la Enfermedad , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Técnicas de Amplificación de Ácido Nucleico/métodosRESUMEN
BACKGROUND: Existing genetic information can be leveraged to identify patients with susceptibilities to conditions that might impact their perioperative care, but clinicians generally have limited exposure and are not trained to contextualise this information. We identified patients with genetic susceptibilities to anaesthetic complications using a perioperative biorepository and characterised the concordance with existing diagnoses. METHODS: Adult patients undergoing surgery within Michigan Medicine from 2012 to 2017 were consented for genotyping. Genotypes were integrated with the electronic health record (EHR). We retrospectively characterised frequencies of variants associated with butyrylcholinesterase deficiency, factor V Leiden, and malignant hyperthermia, three pharmacogenetic factors with perioperative implications. We calculated the percentage homozygous and heterozygous for each that had been diagnosed previously and searched for EHR findings consistent with a predisposition. RESULTS: Analysis of genetic data revealed that 25 out of 40 769 (0.1%) patients were homozygous and 1918 (4.7%) were heterozygous for mutations associated with butyrylcholinesterase deficiency. Of the homozygous individuals, 14 (56%) carried a pre-existing diagnosis. For factor V Leiden, 29 (0.1%) were homozygous and 2153 (5.3%) heterozygous. Of the homozygous individuals, three (10%) were diagnosed by EHR-derived phenotype and six (21%) by clinician review. Malignant hyperthermia was assessed in a subset of patients. We detected two patients with associated mutations. Neither carried clinical diagnoses. CONCLUSIONS: We identified patients with genetic susceptibility to perioperative complications using an open source script designed for clinician use. We validated this application in a retrospective analysis for three conditions with well-characterised inheritance, and showed that not all genetic susceptibilities were documented in the EHR.
Asunto(s)
Hipertermia Maligna , Adulto , Registros Electrónicos de Salud , Genómica , Genotipo , Humanos , Mutación , Fenotipo , Estudios RetrospectivosRESUMEN
The advent of next-generation sequencing has dramatically decreased the cost for whole-genome sequencing and increased the viability for its application in research and clinical care. The Personal Genome Project (PGP) provides unrestricted access to genomes of individuals and their associated phenotypes. This resource enabled the Critical Assessment of Genome Interpretation (CAGI) to create a community challenge to assess the bioinformatics community's ability to predict traits from whole genomes. In the CAGI PGP challenge, researchers were asked to predict whether an individual had a particular trait or profile based on their whole genome. Several approaches were used to assess submissions, including ROC AUC (area under receiver operating characteristic curve), probability rankings, the number of correct predictions, and statistical significance simulations. Overall, we found that prediction of individual traits is difficult, relying on a strong knowledge of trait frequency within the general population, whereas matching genomes to trait profiles relies heavily upon a small number of common traits including ancestry, blood type, and eye color. When a rare genetic disorder is present, profiles can be matched when one or more pathogenic variants are identified. Prediction accuracy has improved substantially over the last 6 years due to improved methodology and a better understanding of features.
Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación Completa del Genoma/métodos , Área Bajo la Curva , Predisposición Genética a la Enfermedad , Proyecto Genoma Humano , Humanos , Fenotipo , Sitios de Carácter CuantitativoRESUMEN
The role of rare missense variants in disease causation remains difficult to interpret. We explore whether the clustering pattern of rare missense variants (MAF < 0.01) in a protein is associated with mode of inheritance. Mutations in genes associated with autosomal dominant (AD) conditions are known to result in either loss or gain of function, whereas mutations in genes associated with autosomal recessive (AR) conditions invariably result in loss-of-function. Loss-of-function mutations tend to be distributed uniformly along protein sequence, whereas gain-of-function mutations tend to localize to key regions. It has not previously been ascertained whether these patterns hold in general for rare missense mutations. We consider the extent to which rare missense variants are located within annotated protein domains and whether they form clusters, using a new unbiased method called CLUstering by Mutation Position. These approaches quantified a significant difference in clustering between AD and AR diseases. Proteins linked to AD diseases exhibited more clustering of rare missense mutations than those linked to AR diseases (Wilcoxon P = 5.7 × 10(-4), permutation P = 8.4 × 10(-4)). Rare missense mutation in proteins linked to either AD or AR diseases was more clustered than controls (1000G) (Wilcoxon P = 2.8 × 10(-15) for AD and P = 4.5 × 10(-4) for AR, permutation P = 3.1 × 10(-12) for AD and P = 0.03 for AR). The differences in clustering patterns persisted even after removal of the most prominent genes. Testing for such non-random patterns may reveal novel aspects of disease etiology in large sample studies.
Asunto(s)
Genes Dominantes , Genes Recesivos , Enfermedades Genéticas Congénitas/genética , Mutación Missense , Proteínas/genética , Biología Computacional , Bases de Datos Genéticas , Genoma Humano , Humanos , Anotación de Secuencia Molecular , Familia de MultigenesRESUMEN
Insertion/deletion variants (indels) alter protein sequence and length, yet are highly prevalent in healthy populations, presenting a challenge to bioinformatics classifiers. Commonly used features--DNA and protein sequence conservation, indel length, and occurrence in repeat regions--are useful for inference of protein damage. However, these features can cause false positives when predicting the impact of indels on disease. Existing methods for indel classification suffer from low specificities, severely limiting clinical utility. Here, we further develop our variant effect scoring tool (VEST) to include the classification of in-frame and frameshift indels (VEST-indel) as pathogenic or benign. We apply 24 features, including a new "PubMed" feature, to estimate a gene's importance in human disease. When compared with four existing indel classifiers, our method achieves a drastically reduced false-positive rate, improving specificity by as much as 90%. This approach of estimating gene importance might be generally applicable to missense and other bioinformatics pathogenicity predictors, which often fail to achieve high specificity. Finally, we tested all possible meta-predictors that can be obtained from combining the four different indel classifiers using Boolean conjunctions and disjunctions, and derived a meta-predictor with improved performance over any individual method.
Asunto(s)
Biología Computacional/métodos , Mutación INDEL , Programas Informáticos , Algoritmos , Conjuntos de Datos como Asunto , Humanos , Modelos Genéticos , Mutación Missense , Reproducibilidad de los Resultados , Navegador WebRESUMEN
BACKGROUND & AIMS: The management of pancreatic cysts poses challenges to both patients and their physicians. We investigated whether a combination of molecular markers and clinical information could improve the classification of pancreatic cysts and management of patients. METHODS: We performed a multi-center, retrospective study of 130 patients with resected pancreatic cystic neoplasms (12 serous cystadenomas, 10 solid pseudopapillary neoplasms, 12 mucinous cystic neoplasms, and 96 intraductal papillary mucinous neoplasms). Cyst fluid was analyzed to identify subtle mutations in genes known to be mutated in pancreatic cysts (BRAF, CDKN2A, CTNNB1, GNAS, KRAS, NRAS, PIK3CA, RNF43, SMAD4, TP53, and VHL); to identify loss of heterozygozity at CDKN2A, RNF43, SMAD4, TP53, and VHL tumor suppressor loci; and to identify aneuploidy. The analyses were performed using specialized technologies for implementing and interpreting massively parallel sequencing data acquisition. An algorithm was used to select markers that could classify cyst type and grade. The accuracy of the molecular markers was compared with that of clinical markers and a combination of molecular and clinical markers. RESULTS: We identified molecular markers and clinical features that classified cyst type with 90%-100% sensitivity and 92%-98% specificity. The molecular marker panel correctly identified 67 of the 74 patients who did not require surgery and could, therefore, reduce the number of unnecessary operations by 91%. CONCLUSIONS: We identified a panel of molecular markers and clinical features that show promise for the accurate classification of cystic neoplasms of the pancreas and identification of cysts that require surgery.
Asunto(s)
Algoritmos , Biomarcadores de Tumor/genética , Páncreas/patología , Quiste Pancreático/clasificación , Quiste Pancreático/patología , Adulto , Femenino , Predisposición Genética a la Enfermedad , Pruebas Genéticas/métodos , Humanos , Masculino , Persona de Mediana Edad , Mutación , Quiste Pancreático/genética , Quiste Pancreático/cirugía , Fenotipo , Valor Predictivo de las Pruebas , Pronóstico , Estudios RetrospectivosRESUMEN
For TP53-mutated head and neck squamous cell carcinomas (HNSCCs), the codon and specific amino acid sequence change resulting from a patient's mutation can be prognostic. Thus, developing a framework to predict patient survival for specific mutations in TP53 would be valuable. There are many bioinformatics and functional methods for predicting the phenotypic impact of genetic variation, but their overall clinical value remains unclear. Here, we assess the ability of 15 different methods to predict HNSCC patient survival from TP53 mutation, using TP53 mutation and clinical data from patients enrolled in E4393 by the Eastern Cooperative Oncology Group (ECOG), which investigated whether TP53 mutations in surgical margins were predictive of disease recurrence. These methods include: server-based computational tools SIFT, PolyPhen-2, and Align-GVGD; our in-house POSE and VEST algorithms; the rules devised in Poeta et al. with and without considerations for splice-site mutations; location of mutation in the DNA-bound TP53 protein structure; and a functional assay measuring WAF1 transactivation in TP53-mutated yeast. We assessed method performance using overall survival (OS) and progression-free survival (PFS) from 420 HNSCC patients, of whom 224 had TP53 mutations. Each mutation was categorized as "disruptive" or "non-disruptive". For each method, we compared the outcome between the disruptive group vs. the non-disruptive group. The rules devised by Poeta et al. with or without our splice-site modification were observed to be superior to others. While the differences in OS (disruptive vs. non-disruptive) appear to be marginally significant (Poeta rules + splice rules, P = 0.089; Poeta rules, P = 0.053), both algorithms identified the disruptive group as having significantly worse PFS outcome (Poeta rules + splice rules, P = 0.011; Poeta rules, P = 0.027). In general, prognostic performance was low among assessed methods. Further studies are required to develop and validate methods that can predict functional and clinical significance of TP53 mutations in HNSCC patients.
Asunto(s)
Algoritmos , Carcinoma de Células Escamosas/genética , Biología Computacional/métodos , Genética Médica/métodos , Neoplasias de Cabeza y Cuello/genética , Mutación/genética , Proteína p53 Supresora de Tumor/genética , Carcinoma de Células Escamosas/fisiopatología , Progresión de la Enfermedad , Neoplasias de Cabeza y Cuello/fisiopatología , Humanos , Pronóstico , Análisis de SupervivenciaRESUMEN
Genetic screening is becoming possible on an unprecedented scale. However, its utility remains controversial. Although most variant genotypes cannot be easily interpreted, many individuals nevertheless attempt to interpret their genetic information. Initiatives such as the Personal Genome Project (PGP) and Illumina's Understand Your Genome are sequencing thousands of adults, collecting phenotypic information and developing computational pipelines to identify the most important variant genotypes harbored by each individual. These pipelines consider database and allele frequency annotations and bioinformatics classifications. We propose that the next step will be to integrate these different sources of information to estimate the probability that a given individual has specific phenotypes of clinical interest. To this end, we have designed a Bayesian probabilistic model to predict the probability of dichotomous phenotypes. When applied to a cohort from PGP, predictions of Gilbert syndrome, Graves' disease, non-Hodgkin lymphoma, and various blood groups were accurate, as individuals manifesting the phenotype in question exhibited the highest, or among the highest, predicted probabilities. Thirty-eight PGP phenotypes (26%) were predicted with area-under-the-ROC curve (AUC)>0.7, and 23 (15.8%) of these were statistically significant, based on permutation tests. Moreover, in a Critical Assessment of Genome Interpretation (CAGI) blinded prediction experiment, the models were used to match 77 PGP genomes to phenotypic profiles, generating the most accurate prediction of 16 submissions, according to an independent assessor. Although the models are currently insufficiently accurate for diagnostic utility, we expect their performance to improve with growth of publicly available genomics data and model refinement by domain experts.
Asunto(s)
Predisposición Genética a la Enfermedad/genética , Genoma/genética , Genómica/métodos , Modelos Estadísticos , Análisis de Secuencia de ADN/métodos , Teorema de Bayes , Estudio de Asociación del Genoma Completo , Proyecto Genoma Humano , Humanos , FenotipoRESUMEN
Assessment of the functional consequences of variants near splice sites is a major challenge in the diagnostic laboratory. To address this issue, we created expression minigenes (EMGs) to determine the RNA and protein products generated by splice site variants (n = 10) implicated in cystic fibrosis (CF). Experimental results were compared with the splicing predictions of eight in silico tools. EMGs containing the full-length Cystic Fibrosis Transmembrane Conductance Regulator (CFTR) coding sequence and flanking intron sequences generated wild-type transcript and fully processed protein in Human Embryonic Kidney (HEK293) and CF bronchial epithelial (CFBE41o-) cells. Quantification of variant induced aberrant mRNA isoforms was concordant using fragment analysis and pyrosequencing. The splicing patterns of c.1585-1G>A and c.2657+5G>A were comparable to those reported in primary cells from individuals bearing these variants. Bioinformatics predictions were consistent with experimental results for 9/10 variants (MES), 8/10 variants (NNSplice), and 7/10 variants (SSAT and Sroogle). Programs that estimate the consequences of mis-splicing predicted 11/16 (HSF and ASSEDA) and 10/16 (Fsplice and SplicePort) experimentally observed mRNA isoforms. EMGs provide a robust experimental approach for clinical interpretation of splice site variants and refinement of in silico tools.
Asunto(s)
Simulación por Computador , Técnicas Genéticas , Isoformas de ARN/genética , Empalme del ARN , Línea Celular , Fibrosis Quística/genética , Fibrosis Quística/metabolismo , Regulador de Conductancia de Transmembrana de Fibrosis Quística/genética , Humanos , Mutación , Isoformas de ARN/análisis , Sitios de Empalme de ARN/genéticaRESUMEN
SUMMARY: Advances in sequencing technology have greatly reduced the costs incurred in collecting raw sequencing data. Academic laboratories and researchers therefore now have access to very large datasets of genomic alterations but limited time and computational resources to analyse their potential biological importance. Here, we provide a web-based application, Cancer-Related Analysis of Variants Toolkit, designed with an easy-to-use interface to facilitate the high-throughput assessment and prioritization of genes and missense alterations important for cancer tumorigenesis. Cancer-Related Analysis of Variants Toolkit provides predictive scores for germline variants, somatic mutations and relative gene importance, as well as annotations from published literature and databases. Results are emailed to users as MS Excel spreadsheets and/or tab-separated text files. AVAILABILITY: http://www.cravat.us/
Asunto(s)
Mutación , Neoplasias/genética , Programas Informáticos , Genómica/métodos , Humanos , InternetRESUMEN
PURPOSE: Serous tubal intraepithelial carcinoma (STIC) is now recognized as the main precursor of ovarian high-grade serous carcinoma (HGSC). Other potential tubal lesions include p53 signatures and tubal intraepithelial lesions. We aimed to investigate the extent and pattern of aneuploidy in these epithelial lesions and HGSC to define the features that characterize stages of tumor initiation and progression. EXPERIMENTAL DESIGN: We applied RealSeqS to compare genome-wide aneuploidy patterns among the precursors, HGSC (cases, n = 85), and histologically unremarkable fallopian tube epithelium (HU-FTE; control, n = 65). On the basis of a discovery set (n = 67), we developed an aneuploidy-based algorithm, REAL-FAST (Repetitive Element AneupLoidy Sequencing Fallopian Tube Aneuploidy in STIC), to correlate the molecular data with pathology diagnoses. We validated the result in an independent validation set (n = 83) to determine its performance. We correlated the molecularly defined precursor subgroups with proliferative activity and histology. RESULTS: We found that nearly all p53 signatures lost the entire Chr17, offering a "two-hit" mechanism involving both TP53 and BRCA1 in BRCA1 germline mutation carriers. Proliferatively active STICs harbor gains of 19q12 (CCNE1), 19q13.2, 8q24 (MYC), or 8q arm, whereas proliferatively dormant STICs show 22q loss. REAL-FAST classified HU-FTE and STICs into 5 clusters and identified a STIC subgroup harboring unique aneuploidy that is associated with increased proliferation and discohesive growth. On the basis of a validation set, REAL-FAST showed 95.8% sensitivity and 97.1% specificity in detecting STIC/HGSC. CONCLUSIONS: Morphologically similar STICs are molecularly distinct. The REAL-FAST assay identifies a potentially "aggressive" STIC subgroup harboring unique DNA aneuploidy that is associated with increased cellular proliferation and discohesive growth. REAL-FAST offers a highly reproducible adjunct technique to assist the diagnosis of STIC lesions.
Asunto(s)
Carcinoma in Situ , Cistadenocarcinoma Seroso , Neoplasias de las Trompas Uterinas , Neoplasias Ováricas , Humanos , Femenino , Proteína p53 Supresora de Tumor/genética , Neoplasias Ováricas/patología , Cistadenocarcinoma Seroso/genética , Cistadenocarcinoma Seroso/patología , Trompas Uterinas/patología , Neoplasias de las Trompas Uterinas/genética , Carcinoma in Situ/patologíaRESUMEN
Serous tubal intraepithelial carcinoma (STIC) is the fallopian tube precursor lesion for most cases of pelvic high-grade serous carcinoma (HGSC). To date, the morphologic, molecular, and clinical heterogeneity of STIC and a less atypical putative precursor lesion, termed serous tubal intraepithelial lesion, has not been well characterized. Better understanding of precursor heterogeneity could impact the clinical management of women with incidental STICs (without concurrent carcinoma) identified in cases of prophylactic or opportunistic salpingectomy. This study analyzed morphologic and molecular features of 171 STICs and 21 serous tubal intraepithelial lesions. We assessed their histologic features, Ki-67 and p53 staining patterns, and genome-wide DNA copy number alterations. We classified all precursor lesions into 2 morphologic subtypes, one with a flat surface (Flat) and the other characterized by budding, loosely adherent, or detached (BLAD) morphology. On the basis of pathology review by a panel of 8 gynecologic pathologists, we found 87 BLAD, 96 Flat, and 9 indeterminate lesions. As compared with Flat lesions, BLAD lesions were more frequently diagnostic of STIC ( P <0.0001) and were found concurrently with HGSC ( P <0.0001). BLAD morphology was also characterized by higher Ki-67 proliferation index ( P <0.0001), presence of epithelial stratification ( P <0.0001), and increased lymphocyte density ( P <0.0001). BLAD lesions also exhibited more frequent DNA copy number gain/amplification at the CCNE1 or CMYC loci canonical to HGSCs ( P <0.0001). Both BLAD morphology and STIC diagnoses are independent risk factors for an elevated Ki-67 proliferation index. No correlation was observed between BLAD and Flat lesions with respect to patient age, presence of germline BRCA1/2 mutation, or p53 staining pattern. These findings suggest that tubal precursor lesions are morphologically and molecularly heterogeneous, laying the foundation for further studies on the pathogenesis of HGSC initiation and identifying histologic features predictive of poor patient outcomes.
Asunto(s)
Adenocarcinoma in Situ , Carcinoma in Situ , Carcinoma , Cistadenocarcinoma Seroso , Neoplasias de las Trompas Uterinas , Neoplasias Ováricas , Femenino , Humanos , Proteína BRCA1 , Carcinoma in Situ/genética , Carcinoma in Situ/patología , Neoplasias Ováricas/patología , Antígeno Ki-67 , Proteína p53 Supresora de Tumor/genética , Proteína BRCA2 , Neoplasias de las Trompas Uterinas/genética , Neoplasias de las Trompas Uterinas/patología , Cistadenocarcinoma Seroso/genética , Cistadenocarcinoma Seroso/patología , ADNRESUMEN
We previously described an approach called RealSeqS to evaluate aneuploidy in plasma cell-free DNA through the amplification of ~350,000 repeated elements with a single primer. We hypothesized that an unbiased evaluation of the large amount of sequencing data obtained with RealSeqS might reveal other differences between plasma samples from patients with and without cancer. This hypothesis was tested through the development of a machine learning approach called Alu Profile Learning Using Sequencing (A-PLUS) and its application to 7615 samples from 5178 individuals, 2073 with solid cancer and the remainder without cancer. Samples from patients with cancer and controls were prespecified into four cohorts used for model training, analyte integration, and threshold determination, validation, and reproducibility. A-PLUS alone provided a sensitivity of 40.5% across 11 different cancer types in the validation cohort, at a specificity of 98.5%. Combining A-PLUS with aneuploidy and eight common protein biomarkers detected 51% of the cancers at 98.9% specificity. We found that part of the power of A-PLUS could be ascribed to a single feature-the global reduction of AluS subfamily elements in the circulating DNA of patients with solid cancer. We confirmed this reduction through the analysis of another independent dataset obtained with a different approach (whole-genome sequencing). The evaluation of Alu elements may therefore have the potential to enhance the performance of several methods designed for the earlier detection of cancer.
Asunto(s)
Neoplasias , Humanos , Reproducibilidad de los Resultados , Neoplasias/diagnóstico , Neoplasias/genética , Elementos de Nucleótido Esparcido Corto , Aprendizaje Automático , AneuploidiaRESUMEN
BACKGROUND: Whole exome sequencing studies identify hundreds to thousands of rare protein coding variants of ambiguous significance for human health. Computational tools are needed to accelerate the identification of specific variants and genes that contribute to human disease. RESULTS: We have developed the Variant Effect Scoring Tool (VEST), a supervised machine learning-based classifier, to prioritize rare missense variants with likely involvement in human disease. The VEST classifier training set comprised ~ 45,000 disease mutations from the latest Human Gene Mutation Database release and another ~45,000 high frequency (allele frequency >1%) putatively neutral missense variants from the Exome Sequencing Project. VEST outperforms some of the most popular methods for prioritizing missense variants in carefully designed holdout benchmarking experiments (VEST ROC AUC = 0.91, PolyPhen2 ROC AUC = 0.86, SIFT4.0 ROC AUC = 0.84). VEST estimates variant score p-values against a null distribution of VEST scores for neutral variants not included in the VEST training set. These p-values can be aggregated at the gene level across multiple disease exomes to rank genes for probable disease involvement. We tested the ability of an aggregate VEST gene score to identify candidate Mendelian disease genes, based on whole-exome sequencing of a small number of disease cases. We used whole-exome data for two Mendelian disorders for which the causal gene is known. Considering only genes that contained variants in all cases, the VEST gene score ranked dihydroorotate dehydrogenase (DHODH) number 2 of 2253 genes in four cases of Miller syndrome, and myosin-3 (MYH3) number 2 of 2313 genes in three cases of Freeman Sheldon syndrome. CONCLUSIONS: Our results demonstrate the potential power gain of aggregating bioinformatics variant scores into gene-level scores and the general utility of bioinformatics in assisting the search for disease genes in large-scale exome sequencing studies. VEST is available as a stand-alone software package at http://wiki.chasmsoftware.org and is hosted by the CRAVAT web server at http://www.cravat.us.