RESUMO
Emerging evidence from the large numbers of cancer genomes analyzed in recent years indicates that chromosomal instability (CI), a well-established hallmark of cancer cells, is detectable in precancerous lesions. In this opinion, we discuss the association of this instability with tumor progression and cancer risk. We highlight the opportunity that early genomic instability presents for the diagnosis of esophageal adenocarcinoma (EAC) and its precancerous lesion, Barrett's esophagus (BE). With a growing body of evidence suggesting that only a small pool of cancer-related genes are involved in early tumor development, we argue that general genomic instability may hold greater diagnostic potential for early cancer detection as opposed to the identification of individual mutational biomarkers.
Assuntos
Adenocarcinoma , Esôfago de Barrett , Neoplasias Esofágicas , Adenocarcinoma/diagnóstico , Adenocarcinoma/genética , Adenocarcinoma/patologia , Esôfago de Barrett/diagnóstico , Esôfago de Barrett/genética , Esôfago de Barrett/patologia , Progressão da Doença , Neoplasias Esofágicas/diagnóstico , Neoplasias Esofágicas/genética , Neoplasias Esofágicas/patologia , Instabilidade Genômica/genética , HumanosRESUMO
BACKGROUND: Endoscopic surveillance is recommended for patients with Barrett's oesophagus because, although the progression risk is low, endoscopic intervention is highly effective for high-grade dysplasia and cancer. However, repeated endoscopy has associated harms and access has been limited during the COVID-19 pandemic. We aimed to evaluate the role of a non-endoscopic device (Cytosponge) coupled with laboratory biomarkers and clinical factors to prioritise endoscopy for Barrett's oesophagus. METHODS: We first conducted a retrospective, multicentre, cross-sectional study in patients older than 18 years who were having endoscopic surveillance for Barrett's oesophagus (with intestinal metaplasia confirmed by TFF3 and a minimum Barrett's segment length of 1 cm [circumferential or tongues by the Prague C and M criteria]). All patients had received the Cytosponge and confirmatory endoscopy during the BEST2 (ISRCTN12730505) and BEST3 (ISRCTN68382401) clinical trials, from July 7, 2011, to April 1, 2019 (UK Clinical Research Network Study Portfolio 9461). Participants were divided into training (n=557) and validation (n=334) cohorts to identify optimal risk groups. The biomarkers evaluated were overexpression of p53, cellular atypia, and 17 clinical demographic variables. Endoscopic biopsy diagnosis of high-grade dysplasia or cancer was the primary endpoint. Clinical feasibility of a decision tree for Cytosponge triage was evaluated in a real-world prospective cohort from Aug 27, 2020 (DELTA; ISRCTN91655550; n=223), in response to COVID-19 and the need to provide an alternative to endoscopic surveillance. FINDINGS: The prevalence of high-grade dysplasia or cancer determined by the current gold standard of endoscopic biopsy was 17% (92 of 557 patients) in the training cohort and 10% (35 of 344) in the validation cohort. From the new biomarker analysis, three risk groups were identified: high risk, defined as atypia or p53 overexpression or both on Cytosponge; moderate risk, defined by the presence of a clinical risk factor (age, sex, and segment length); and low risk, defined as Cytosponge-negative and no clinical risk factors. The risk of high-grade dysplasia or intramucosal cancer in the high-risk group was 52% (68 of 132 patients) in the training cohort and 41% (31 of 75) in the validation cohort, compared with 2% (five of 210) and 1% (two of 185) in the low-risk group, respectively. In the real-world setting, Cytosponge results prospectively identified 39 (17%) of 223 patients as high risk (atypia or p53 overexpression, or both) requiring endoscopy, among whom the positive predictive value was 31% (12 of 39 patients) for high-grade dysplasia or intramucosal cancer and 44% (17 of 39) for any grade of dysplasia. INTERPRETATION: Cytosponge atypia, p53 overexpression, and clinical risk factors (age, sex, and segment length) could be used to prioritise patients for endoscopy. Further investigation could validate their use in clinical practice and lead to a substantial reduction in endoscopy procedures compared with current surveillance pathways. FUNDING: Medical Research Council, Cancer Research UK, Innovate UK.
Assuntos
Adenocarcinoma/patologia , Esôfago de Barrett/patologia , COVID-19 , Neoplasias Esofágicas/patologia , Seleção de Pacientes , Conduta Expectante/métodos , Adenocarcinoma/diagnóstico por imagem , Adenocarcinoma/metabolismo , Idoso , Esôfago de Barrett/diagnóstico por imagem , Esôfago de Barrett/metabolismo , Esôfago de Barrett/terapia , Biomarcadores/metabolismo , COVID-19/prevenção & controle , Tomada de Decisão Clínica , Ensaios Clínicos como Assunto , Estudos Transversais , Árvores de Decisões , Progressão da Doença , Neoplasias Esofágicas/diagnóstico por imagem , Neoplasias Esofágicas/metabolismo , Esofagoscopia , Estudos de Viabilidade , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Projetos Piloto , Estudos Prospectivos , Estudos Retrospectivos , Medição de Risco , Fatores de Risco , SARS-CoV-2 , Fator Trefoil-3/metabolismo , Proteína Supressora de Tumor p53/metabolismoRESUMO
BACKGROUND & AIMS: Despite extensive Barrett's esophagus (BE) screening efforts, most patients with esophageal adenocarcinoma (EAC) present de novo. It is unclear how much of this problem is the result of insensitivity or poor applications of current screening guidelines. We aimed to evaluate the sensitivity of guidelines by determining the proportion of prevalent EAC cases that meet the American College of Gastroenterology (ACG) or the British Society of Gastroenterology (BSG) guidelines for BE screening and determine whether changes to criteria would enhance detection. METHODS: A retrospective single-center cohort from the United States (n = 663) and a prospective multicenter cohort from the United Kingdom (n = 645) were collected and analyzed independently. Screening eligibility was determined as patients with chronic reflux and at least 2 or more risk factors as defined by the guidelines. We calculated the proportion of screening-eligible patients and then compared BE/EAC risk factors between screening-eligible and screening-ineligible patients using the chi-squared or Student t test as appropriate. RESULTS: In the Mayo clinic cohort there were 54.9% EAC cases and in the UK cohort there were 38.9% EAC cases that were not identified by ACG or BSG screening criteria, respectively. Among patients who did not meet the screening criteria, lack of heartburn was observed in 86.5% in the Mayo clinic cohort and in 61.4% in the UK cohort. Other risk factors that were lacking included obesity (defined as a body mass index of ≥30 kg/m2) and family history of EAC. Eliminating chronic reflux from the ACG/BSG criteria improved eligibility for screening from 45.1% to 81.3% (P < .001) in the Mayo Clinic cohort and from 61.1% (n = 394) to 81.5% (n = 526; P < .001) in the UK cohort. However, reflux may be difficult to ascertain from the history, and by including proton pump inhibitor use status in addition to the BSG criteria, screening eligibility improved by 10.0% in the UK cohort (n = 459; P < .001). CONCLUSIONS: ACG/BSG BE screening guidelines have limited our ability to detect prevalent EAC. An optimized approach to identifying the individuals most suitable for EAC screening needs to be implemented, particularly one that does not rely on chronic reflux symptoms.
Assuntos
Adenocarcinoma , Esôfago de Barrett , Neoplasias Esofágicas , Refluxo Gastroesofágico , Adenocarcinoma/diagnóstico , Adenocarcinoma/epidemiologia , Adenocarcinoma/etiologia , Esôfago de Barrett/complicações , Neoplasias Esofágicas/diagnóstico , Neoplasias Esofágicas/epidemiologia , Neoplasias Esofágicas/etiologia , Refluxo Gastroesofágico/complicações , Refluxo Gastroesofágico/diagnóstico , Azia/complicações , Azia/diagnóstico , Humanos , Estudos Prospectivos , Estudos Retrospectivos , Fatores de Risco , Estados UnidosRESUMO
Barrett's oesophagus has been known for many years to display early changes to the genome consistent with the risk for oesophageal adenocarcinoma. Recently we have shown that this information can be used without knowledge of individual gene mutations to accurately predict a patient's future risk of malignant progression.
Assuntos
Esôfago de Barrett/genética , Detecção Precoce de Câncer/métodos , Neoplasias Esofágicas/genética , Instabilidade Genômica , Lesões Pré-Cancerosas/genética , Adenocarcinoma/diagnóstico , Adenocarcinoma/genética , Neoplasias Esofágicas/diagnóstico , HumanosRESUMO
BACKGROUND & AIMS: Esophageal adenocarcinomas (EACs) are heterogeneous and often preceded by Barrett's esophagus (BE). Many genomic changes have been associated with development of BE and EAC, but little is known about epigenetic alterations. We performed epigenetic analyses of BE and EAC tissues and combined these data with transcriptome and genomic data to identify mechanisms that control gene expression and genome integrity. METHODS: In a retrospective cohort study, we collected tissue samples and clinical data from 150 BE and 285 EAC cases from the Oesophageal Cancer Classification and Molecular Stratification consortium in the United Kingdom. We analyzed methylation profiles of all BE and EAC tissues and assigned them to subgroups using non-negative matrix factorization with k-means clustering. Data from whole-genome sequencing and transcriptome studies were then incorporated; we performed integrative methylation and RNA-sequencing analyses to identify genes that were suppressed with increased methylation in promoter regions. Levels of different immune cell types were computed using single-sample gene set enrichment methods. We derived 8 organoids from 8 EAC tissues and tested their sensitivity to different drugs. RESULTS: BE and EAC samples shared genome-wide methylation features, compared with normal tissues (esophageal, gastric, and duodenum; controls) from the same patients and grouped into 4 subtypes. Subtype 1 was characterized by DNA hypermethylation with a high mutation burden and multiple mutations in genes in cell cycle and receptor tyrosine signaling pathways. Subtype 2 was characterized by a gene expression pattern associated with metabolic processes (ATP synthesis and fatty acid oxidation) and lack methylation at specific binding sites for transcription factors; 83% of samples of this subtype were BE and 17% were EAC. The third subtype did not have changes in methylation pattern, compared with control tissue, but had a gene expression pattern that indicated immune cell infiltration; this tumor type was associated with the shortest time of patient survival. The fourth subtype was characterized by DNA hypomethylation associated with structure rearrangements, copy number alterations, with preferential amplification of CCNE1 (cells with this gene amplification have been reported to be sensitive to CDK2 inhibitors). Organoids with reduced levels of MGMT and CHFR expression were sensitive to temozolomide and taxane drugs. CONCLUSIONS: In a comprehensive integrated analysis of methylation, transcriptome, and genome profiles of more than 400 BE and EAC tissues, along with clinical data, we identified 4 subtypes that were associated with patient outcomes and potential responses to therapy.
Assuntos
Adenocarcinoma/genética , Esôfago de Barrett/genética , Metilação de DNA/genética , Epigênese Genética/genética , Mucosa Esofágica/patologia , Neoplasias Esofágicas/genética , Adenocarcinoma/patologia , Idoso , Antineoplásicos/farmacologia , Antineoplásicos/uso terapêutico , Esôfago de Barrett/tratamento farmacológico , Esôfago de Barrett/patologia , Ciclina E/genética , Metilação de DNA/efeitos dos fármacos , Progressão da Doença , Epigênese Genética/efeitos dos fármacos , Neoplasias Esofágicas/patologia , Feminino , Amplificação de Genes , Regulação Neoplásica da Expressão Gênica/efeitos dos fármacos , Humanos , Masculino , Pessoa de Meia-Idade , Proteínas Oncogênicas/genética , Regiões Promotoras Genéticas/genética , RNA-Seq , Estudos Retrospectivos , Temozolomida/farmacologia , Temozolomida/uso terapêutico , Sequenciamento Completo do GenomaRESUMO
BACKGROUND & AIMS: The incidence of esophageal adenocarcinoma (EAC) has increased over the past decades. It is unclear if this increase is the result of a new cancer phenotype or an increase in risk factors for EAC. We aimed to compare risk factors, the proportions of intestinal and nonintestinal phenotypes of EAC, and survival times of patients during the 2009 to 2012 time period vs the 1996 to 1997 time period. METHODS: We performed a retrospective single-center cohort study of 829 patients with EAC from the time periods of 1996 to 1997 and 2009 to 2012. Baseline characteristics were compared using χ2 analysis for categoric variables and the Student t test for continuous variables. The Cox proportional hazards model was used to compare 5-year survival. RESULTS: We included 149 patients from the 1996 to 1997 time period and 680 patients from the 2009 to 2012 time period. There was no significant difference between the cohorts in terms of age at cancer presentation, sex, or history of smoking (P > .05). Gastroesophageal reflux symptoms were absent in almost half of the patients from each time period (P = .46). Intestinal metaplasia was identified in esophageal tumor tissues from 48.3% of patients with EAC in the 1996 to 1997 time period and in 49.9% of patients in the 2009 to 2012 time period (P = .45). Patients from each time period presented with similar-stage cancer (P = .25), most at stage III (43% in the 1996-1997 period and 37.8% in the 2009-2012 period). Having EAC during the period of 1996 to1997 was associated with an increased risk of death (hazard ratio, 1.6; 95% CI, 1.3-2.0; P = .001), compared with the 2009 to 2012 time period, in a univariate model (adjusted hazard ratio, 1.7; 95% CI, 1.4-2.1; P < .001) after we controlled for sex, age at diagnosis, tumor stage, and presence of intestinal metaplasia. CONCLUSIONS: In a comparison of patients with EAC from the time periods of 1996 to 1997 vs 2009 to 2012, we found similar and persistent proportions of tumor phenotypes, characterized by a lack of intestinal metaplasia or heartburn symptoms. The lack of symptoms could contribute to our continued inability to identify incident cancers and/or improve patient survival.
Assuntos
Adenocarcinoma , Esôfago de Barrett , Neoplasias Esofágicas , Adenocarcinoma/epidemiologia , Esôfago de Barrett/epidemiologia , Estudos de Coortes , Neoplasias Esofágicas/epidemiologia , Humanos , Fenótipo , Estudos Retrospectivos , Fatores de RiscoRESUMO
BACKGROUND & AIMS: Most patients with esophageal adenocarcinoma (EAC) present with de novo tumors. Although this could be due to inadequate screening strategies, the precise reason for this observation is not clear. We compared survival of patients with prevalent EAC with and without synchronous Barrett esophagus (BE) with intestinal metaplasia (IM) at the time of EAC diagnosis. METHODS: Clinical data were studied using Cox proportional hazards regression to evaluate the effect of synchronous BE-IM on EAC survival independent of age, sex, TNM stage, and tumor location. We analyzed data from a cohort of patients with EAC from the Mayo Clinic (n=411; 203 with BE and IM) and a multicenter cohort from the United Kingdom (n=1417; 638 with BE and IM). RESULTS: In the Mayo cohort, BE with IM had a reduced risk of death compared to patients without BE and IM (hazard ratio [HR] 0.44; 95% CI, 0.34-0.57; P<.001). In a multivariable analysis, BE with IM was associated with longer survival independent of patient age or sex, tumor stage or location, and BE length (adjusted HR, 0.66; 95% CI, 0.5-0.88; P=.005). In the United Kingdom cohort, patients BE and IM had a reduced risk of death compared with those without (HR, 0.59; 95% CI, 0.5-0.69; P<.001), with continued significance in multivariable analysis that included patient age and sex and tumor stage and tumor location (adjusted HR, 0.77; 95% CI, 0.64-0.93; P=.006). CONCLUSION: Two types of EAC can be characterized based on the presence or absence of BE. These findings could increase our understanding the etiology of EAC, and be used in management and prognosis of patients.
Assuntos
Adenocarcinoma/genética , Esôfago de Barrett/genética , Neoplasias Esofágicas/genética , Intestinos/patologia , Fenótipo , Adenocarcinoma/etiologia , Adenocarcinoma/patologia , Idoso , Esôfago de Barrett/complicações , Neoplasias Esofágicas/etiologia , Neoplasias Esofágicas/patologia , Esôfago/patologia , Humanos , Masculino , Metaplasia/complicações , Metaplasia/genética , Pessoa de Meia-Idade , Estadiamento de Neoplasias , Prognóstico , Modelos de Riscos Proporcionais , Análise de Regressão , Reino Unido , Estados UnidosRESUMO
Identifying large-scale structural variation in cancer genomes continues to be a challenge to researchers. Current methods rely on genome alignments based on a reference that can be a poor fit to highly variant and complex tumor genomes. To address this challenge we developed a method that uses available breakpoint information to generate models of structural variations. We use these models as references to align previously unmapped and discordant reads from a genome. By using these models to align unmapped reads, we show that our method can help to identify large-scale variations that have been previously missed.
Assuntos
Variação Genética , Genoma Humano , Genômica/métodos , Modelos Biológicos , Neoplasias/genética , Algoritmos , Linhagem Celular Tumoral , Aberrações Cromossômicas , Mapeamento Cromossômico , Simulação por Computador , HumanosRESUMO
BACKGROUND: High-throughput sequencing has become one of the primary tools for investigation of the molecular basis of disease. The increasing use of sequencing in investigations that aim to understand both individuals and populations is challenging our ability to develop analysis tools that scale with the data. This issue is of particular concern in studies that exhibit a wide degree of heterogeneity or deviation from the standard reference genome. The advent of population scale sequencing studies requires analysis tools that are developed and tested against matching quantities of heterogeneous data. RESULTS: We developed a large-scale whole genome simulation tool, FIGG, which generates large numbers of whole genomes with known sequence characteristics based on direct sampling of experimentally known or theorized variations. For normal variations we used publicly available data to determine the frequency of different mutation classes across the genome. FIGG then uses this information as a background to generate new sequences from a parent sequence with matching frequencies, but different actual mutations. The background can be normal variations, known disease variations, or a theoretical frequency distribution of variations. CONCLUSION: In order to enable the creation of large numbers of genomes, FIGG generates simulated sequences from known genomic variation and iteratively mutates each genome separately. The result is multiple whole genome sequences with unique variations that can primarily be used to provide different reference genomes, model heterogeneous populations, and can offer a standard test environment for new analysis algorithms or bioinformatics tools.
Assuntos
Genômica/métodos , Software , Simulação por Computador , Variação Genética , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Mutação , Alinhamento de SequênciaRESUMO
Access to public data sets is important to the scientific community as a resource to develop new experiments or validate new data. Projects such as the PeptideAtlas, Ensembl and The Cancer Genome Atlas (TCGA) offer both access to public data and a repository to share their own data. Access to these data sets is often provided through a web page form and a web service API. Access technologies based on web protocols (e.g. http) have been in use for over a decade and are widely adopted across the industry for a variety of functions (e.g. search, commercial transactions, and social media). Each architecture adapts these technologies to provide users with tools to access and share data. Both commonly used web service technologies (e.g. REST and SOAP), and custom-built solutions over HTTP are utilized in providing access to research data. Providing multiple access points ensures that the community can access the data in the simplest and most effective manner for their particular needs. This article examines three common access mechanisms for web accessible data: BioMart, caBIG, and Google Data Sources. These are illustrated by implementing each over the PeptideAtlas repository and reviewed for their suitability based on specific usages common to research. BioMart, Google Data Sources, and caBIG are each suitable for certain uses. The tradeoffs made in the development of the technology are dependent on the uses each was designed for (e.g. security versus speed). This means that an understanding of specific requirements and tradeoffs is necessary before selecting the access technology.
Assuntos
Genoma , Peptídeos/química , Software , Bases de Dados Genéticas , Genômica , Armazenamento e Recuperação da Informação/métodos , InternetRESUMO
Timely detection of Barrett's esophagus, the pre-malignant condition of esophageal adenocarcinoma, can improve patient survival rates. The Cytosponge-TFF3 test, a non-endoscopic minimally invasive procedure, has been used for diagnosing intestinal metaplasia in Barrett's. However, it depends on pathologist's assessment of two slides stained with H&E and the immunohistochemical biomarker TFF3. This resource-intensive clinical workflow limits large-scale screening in the at-risk population. To improve screening capacity, we propose a deep learning approach for detecting Barrett's from routinely stained H&E slides. The approach solely relies on diagnostic labels, eliminating the need for expensive localized expert annotations. We train and independently validate our approach on two clinical trial datasets, totaling 1866 patients. We achieve 91.4% and 87.3% AUROCs on discovery and external test datasets for the H&E model, comparable to the TFF3 model. Our proposed semi-automated clinical workflow can reduce pathologists' workload to 48% without sacrificing diagnostic performance, enabling pathologists to prioritize high risk cases.
Assuntos
Adenocarcinoma , Esôfago de Barrett , Aprendizado Profundo , Neoplasias Esofágicas , Humanos , Esôfago de Barrett/diagnóstico , Esôfago de Barrett/patologia , Neoplasias Esofágicas/diagnóstico , Neoplasias Esofágicas/patologia , Adenocarcinoma/diagnóstico , Adenocarcinoma/patologia , MetaplasiaRESUMO
A variety of mutational processes drive cancer development, but their dynamics across the entire disease spectrum from pre-cancerous to advanced neoplasia are poorly understood. We explore the mutagenic processes shaping oesophageal adenocarcinoma tumorigenesis in 997 instances comprising distinct stages of this malignancy, from Barrett Oesophagus to primary tumours and advanced metastatic disease. The mutational landscape is dominated by the C[T > C/G]T substitution enriched signatures SBS17a/b, which are linked with TP53 mutations, increased proliferation, genomic instability and disease progression. The APOBEC mutagenesis signature is a weak but persistent signal amplified in primary tumours. We also identify prevalent alterations in DNA damage repair pathways, with homologous recombination, base and nucleotide excision repair and translesion synthesis mutated in up to 50% of the cohort, and surprisingly uncoupled from transcriptional activity. Among these, the presence of base excision repair deficiencies show remarkably poor prognosis in the cohort. In this work, we provide insights on the mutational aetiology and changes enabling the transition from pre-neoplastic to advanced oesophageal adenocarcinoma.
Assuntos
Adenocarcinoma , Neoplasias Esofágicas , Humanos , Mutação , Mutagênese , Neoplasias Esofágicas/genética , Adenocarcinoma/genéticaRESUMO
BACKGROUND: As the volume, complexity and diversity of the information that scientists work with on a daily basis continues to rise, so too does the requirement for new analytic software. The analytic software must solve the dichotomy that exists between the need to allow for a high level of scientific reasoning, and the requirement to have an intuitive and easy to use tool which does not require specialist, and often arduous, training to use. Information visualization provides a solution to this problem, as it allows for direct manipulation and interaction with diverse and complex data. The challenge addressing bioinformatics researches is how to apply this knowledge to data sets that are continually growing in a field that is rapidly changing. RESULTS: This paper discusses an approach to the development of visual mining tools capable of supporting the mining of massive data collections used in systems biology research, and also discusses lessons that have been learned providing tools for both local researchers and the wider community. Example tools were developed which are designed to enable the exploration and analyses of both proteomics and genomics based atlases. These atlases represent large repositories of raw and processed experiment data generated to support the identification of biomarkers through mass spectrometry (the PeptideAtlas) and the genomic characterization of cancer (The Cancer Genome Atlas). Specifically the tools are designed to allow for: the visual mining of thousands of mass spectrometry experiments, to assist in designing informed targeted protein assays; and the interactive analysis of hundreds of genomes, to explore the variations across different cancer genomes and cancer types. CONCLUSIONS: The mining of massive repositories of biological data requires the development of new tools and techniques. Visual exploration of the large-scale atlas data sets allows researchers to mine data to find new meaning and make sense at scales from single samples to entire populations. Providing linked task specific views that allow a user to start from points of interest (from diseases to single genes) enables targeted exploration of thousands of spectra and genomes. As the composition of the atlases changes, and our understanding of the biology increase, new tasks will continually arise. It is therefore important to provide the means to make the data available in a suitable manner in as short a time as possible. We have done this through the use of common visualization workflows, into which we rapidly deploy visual tools. These visualizations follow common metaphors where possible to assist users in understanding the displayed data. Rapid development of tools and task specific views allows researchers to mine large-scale data almost as quickly as it is produced. Ultimately these visual tools enable new inferences, new analyses and further refinement of the large scale data being provided in atlases such as PeptideAtlas and The Cancer Genome Atlas.
Assuntos
Mineração de Dados , Genômica/métodos , Neoplasias/genética , Proteômica/métodos , Software , Neoplasias do Colo/genética , Feminino , Glioblastoma/genética , Humanos , Espectrometria de Massas , Neoplasias Ovarianas/genéticaRESUMO
BACKGROUND: For shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post-translational modifications with known masses. Each mass spectrometer can generate data at an astonishingly high rate, and the scope of what is searched for is continually increasing. Therefore solutions for improving our ability to perform these searches are needed. RESULTS: We present a sequence database search engine that is specifically designed to run efficiently on the Hadoop MapReduce distributed computing framework. The search engine implements the K-score algorithm, generating comparable output for the same input files as the original implementation. The scalability of the system is shown, and the architecture required for the development of such distributed processing is discussed. CONCLUSION: The software is scalable in its ability to handle a large peptide database, numerous modifications and large numbers of spectra. Performance scales with the number of processors in the cluster, allowing throughput to expand with the available resources.
Assuntos
Algoritmos , Proteômica/métodos , Ferramenta de Busca , Análise de Sequência de Proteína/métodos , Software , Bases de Dados Factuais , Espectrometria de Massas/métodos , Peptídeos/química , Processamento de Proteína Pós-TraducionalRESUMO
Oesophageal adenocarcinoma (OAC) provides an ideal case study to characterize large-scale rearrangements. Using whole genome short-read sequencing of 383 cases, for which 214 had matched whole transcriptomes, we observed structural variations (SV) with a predominance of deletions, tandem duplications and inter-chromosome junctions that could be identified as LINE-1 mobile element (ME) insertions. Complex clusters of rearrangements resembling breakage-fusion-bridge cycles or extrachromosomal circular DNA accounted for 22% of complex SVs affecting known oncogenes. Counting SV events affecting known driver genes substantially increased the recurrence rates of these drivers. After excluding fragile sites, we identified 51 candidate new drivers in genomic regions disrupted by SVs, including ETV5, KAT6B and CLTC. RUNX1 was the most recurrently altered gene (24%), with many deletions inactivating the RUNT domain but preserved the reading frame, suggesting an altered protein product. These findings underscore the importance of identification of SV events in OAC with implications for targeted therapies.
Assuntos
Adenocarcinoma , Neoplasias Esofágicas , Adenocarcinoma/genética , Neoplasias Esofágicas/genética , Genoma Humano , Histona Acetiltransferases/genética , Humanos , Sequenciamento Completo do GenomaRESUMO
BACKGROUND: The advances in high-throughput sequencing technologies and growth in data sizes has highlighted the need for scalable tools to perform quality assurance testing. These tests are necessary to ensure that data is of a minimum necessary standard for use in downstream analysis. In this paper we present the SAMQA tool to rapidly and robustly identify errors in population-scale sequence data. RESULTS: SAMQA has been used on samples from three separate sets of cancer genome data from The Cancer Genome Atlas (TCGA) project. Using technical standards provided by the SAM specification and biological standards defined by researchers, we have classified errors in these sequence data sets relative to individual reads within a sample. Due to an observed linearithmic speedup through the use of a high-performance computing (HPC) framework for the majority of tasks, poor quality data was identified prior to secondary analysis in significantly less time on the HPC framework than the same data run using alternative parallelization strategies on a single server. CONCLUSIONS: The SAMQA toolset validates a minimum set of data quality standards across whole-genome and exome sequences. It is tuned to run on a high-performance computational framework, enabling QA across hundreds gigabytes of samples regardless of coverage or sample type.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Projetos de Pesquisa , Genômica , Sequenciamento de Nucleotídeos em Larga Escala/normas , Humanos , Controle de QualidadeRESUMO
Cancer cells are shaped through an evolutionary process of DNA mutation, cell selection and population expansion. Early steps in this process are driven by a set of mutated driver genes and structural alterations to the genome through copy number gains or losses. Oesophageal adenocarcinoma (EAC) and the pre-invasive tissue, Barrett's oesophagus (BE), provide an ideal example in which to observe and study this evolution. BE displays early genomic instability, specifically in copy number changes that may later be observed in EAC. Furthermore, these early changes result in patterns of progression (that is, 'born bad', gradual or catastrophic) that may help to describe the evolution of EAC. As only a small proportion of patients with BE will go on to develop cancer, a better understanding of these patterns and the resulting genomic changes should improve early detection in EAC and may provide clues for the evolution of cancer more broadly.
Assuntos
Esôfago de Barrett/patologia , Neoplasias Esofágicas/patologia , Adenocarcinoma/genética , Adenocarcinoma/patologia , Esôfago de Barrett/genética , Neoplasias Esofágicas/genética , Humanos , MutaçãoRESUMO
BACKGROUND: High throughput sequencing has become an increasingly important tool for biological research. However, the existing software systems for managing and processing these data have not provided the flexible infrastructure that research requires. RESULTS: Existing software solutions provide static and well-established algorithms in a restrictive package. However as high throughput sequencing is a rapidly evolving field, such static approaches lack the ability to readily adopt the latest advances and techniques which are often required by researchers. We have used a loosely coupled, service-oriented infrastructure to develop SeqAdapt. This system streamlines data management and allows for rapid integration of novel algorithms. Our approach also allows computational biologists to focus on developing and applying new methods instead of writing boilerplate infrastructure code. CONCLUSION: The system is based around the Addama service architecture and is available at our website as a demonstration web application, an installable single download and as a collection of individual customizable services.
Assuntos
Biologia Computacional/métodos , Análise de Sequência de DNA/métodos , Software , Algoritmos , Sequência de Bases , Sistemas de Gerenciamento de Base de Dados , Internet , Análise de Sequência de DNA/instrumentaçãoRESUMO
BACKGROUND: The cancer risk in Barrett's oesophagus (BO) is difficult to estimate. Histologic dysplasia has strong predictive power, but can be missed by random biopsies. Other clinical parameters have limited utility for risk stratification. We aimed to assess whether a molecular biomarker panel on targeted biopsies can predict neoplastic progression of BO. METHODS: 203 patients with BO were tested at index endoscopy for 9 biomarkers (p53 and cyclin A expression; aneuploidy and tetraploidy; CDKN2A (p16), RUNX3 and HPP1 hypermethylation; 9p and 17p loss of heterozygosity) on autofluorescence-targeted biopsies and followed-up prospectively. Data comparing progressors to non-progressors were evaluated by univariate and multivariate analyses using survival curves, Cox-proportional hazards and logistic regression models. FINDINGS: 127 patients without high-grade dysplasia (HGD) or oesophageal adenocarcinoma (OAC) at index endoscopy were included, of which 42 had evidence of any histologic progression over time. Aneuploidy was the only predictor of progression from non-dysplastic BO (NDBO) to any grade of neoplasia (p = 0.013) and HGD/OAC (p = 0.002). Aberrant p53 expression correlated with risk of short-term progression within 12 months, with an odds ratio of 6.0 (95% CI: 3.1-11.2). A panel comprising aneuploidy and p53 had an area under the receiving operator characteristics curve of 0.68 (95% CI: 0.59-0.77) for prediction of any progression. INTERPRETATION: Aneuploidy is the only biomarker that predicts neoplastic progression of NDBO. Aberrant p53 expression suggests prevalent dysplasia, which might have been missed by random biopsies, and warrants early follow up.
Assuntos
Adenocarcinoma/genética , Aneuploidia , Esôfago de Barrett/patologia , Neoplasias Esofágicas/genética , Marcadores Genéticos , Idoso , Idoso de 80 Anos ou mais , Esôfago de Barrett/genética , Subunidade alfa 3 de Fator de Ligação ao Core/genética , Ciclina A/genética , Inibidor p16 de Quinase Dependente de Ciclina/genética , Progressão da Doença , Endoscopia , Feminino , Humanos , Modelos Logísticos , Masculino , Proteínas de Membrana/genética , Pessoa de Meia-Idade , Proteínas de Neoplasias/genética , Estudos Prospectivos , Proteína Supressora de Tumor p53/genéticaRESUMO
Recent studies show that aneuploidy and driver gene mutations precede cancer diagnosis by many years1-4. We assess whether these genomic signals can be used for early detection and pre-emptive cancer treatment using the neoplastic precursor lesion Barrett's esophagus as an exemplar5. Shallow whole-genome sequencing of 777 biopsies, sampled from 88 patients in Barrett's esophagus surveillance over a period of up to 15 years, shows that genomic signals can distinguish progressive from stable disease even 10 years before histopathological transformation. These findings are validated on two independent cohorts of 76 and 248 patients. These methods are low-cost and applicable to standard clinical biopsy samples. Compared with current management guidelines based on histopathology and clinical presentation, genomic classification enables earlier treatment for high-risk patients as well as reduction of unnecessary treatment and monitoring for patients who are unlikely to develop cancer.