Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 39
Filtrar
2.
Cancer Epidemiol Biomarkers Prev ; 31(1): 210-220, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-34737207

RESUMEN

BACKGROUND: Fusobacterium nucleatum (F. nucleatum) activates oncogenic signaling pathways and induces inflammation to promote colorectal carcinogenesis. METHODS: We characterized F. nucleatum and its subspecies in colorectal tumors and examined associations with tumor characteristics and colorectal cancer-specific survival. We conducted deep sequencing of nusA, nusG, and bacterial 16s rRNA genes in tumors from 1,994 patients with colorectal cancer and assessed associations between F. nucleatum presence and clinical characteristics, colorectal cancer-specific mortality, and somatic mutations. RESULTS: F. nucleatum, which was present in 10.3% of tumors, was detected in a higher proportion of right-sided and advanced-stage tumors, particularly subspecies animalis. Presence of F. nucleatum was associated with higher colorectal cancer-specific mortality (HR, 1.97; P = 0.0004). This association was restricted to nonhypermutated, microsatellite-stable tumors (HR, 2.13; P = 0.0002) and those who received chemotherapy [HR, 1.92; confidence interval (CI), 1.07-3.45; P = 0.029). Only F. nucleatum subspecies animalis, the main subspecies detected (65.8%), was associated with colorectal cancer-specific mortality (HR, 2.16; P = 0.0016), subspecies vincentii and nucleatum were not (HR, 1.07; P = 0.86). Additional adjustment for tumor stage suggests that the effect of F. nucleatum on mortality is partly driven by a stage shift. Presence of F. nucleatum was associated with microsatellite instable tumors, tumors with POLE exonuclease domain mutations, and ERBB3 mutations, and suggestively associated with TP53 mutations. CONCLUSIONS: F. nucleatum, and particularly subspecies animalis, was associated with a higher colorectal cancer-specific mortality and specific somatic mutated genes. IMPACT: Our findings identify the F. nucleatum subspecies animalis as negatively impacting colorectal cancer mortality, which may occur through a stage shift and its effect on chemoresistance.


Asunto(s)
Neoplasias Colorrectales , Fusobacterium nucleatum , Carcinogénesis , Neoplasias Colorrectales/genética , Humanos , ARN Ribosómico 16S
5.
Nat Commun ; 11(1): 3400, 2020 07 07.
Artículo en Inglés | MEDLINE | ID: mdl-32636365

RESUMEN

The Pan-Cancer Analysis of Whole Genomes (PCAWG) project generated a vast amount of whole-genome cancer sequencing resource data. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2658 cancers across 38 tumor types, we provide a user's guide to the five publicly available online data exploration and visualization tools introduced in the PCAWG marker paper. These tools are ICGC Data Portal, UCSC Xena, Chromothripsis Explorer, Expression Atlas, and PCAWG-Scout. We detail use cases and analyses for each tool, show how they incorporate outside resources from the larger genomics ecosystem, and demonstrate how the tools can be used together to understand the biology of cancers more deeply. Together, the tools enable researchers to query the complex genomic PCAWG data dynamically and integrate external information, enabling and enhancing interpretation.


Asunto(s)
Biología Computacional/métodos , Genoma Humano , Neoplasias/genética , Cromotripsis , Análisis de Datos , Bases de Datos Genéticas , Genómica , Humanos , Internet , Mutación , Programas Informáticos , Interfaz Usuario-Computador , Secuenciación Completa del Genoma
6.
Nat Genet ; 52(3): 320-330, 2020 03.
Artículo en Inglés | MEDLINE | ID: mdl-32025001

RESUMEN

Here, as part of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, for which whole-genome and-for a subset-whole-transcriptome sequencing data from 2,658 cancers across 38 tumor types was aggregated, we systematically investigated potential viral pathogens using a consensus approach that integrated three independent pipelines. Viruses were detected in 382 genome and 68 transcriptome datasets. We found a high prevalence of known tumor-associated viruses such as Epstein-Barr virus (EBV), hepatitis B virus (HBV) and human papilloma virus (HPV; for example, HPV16 or HPV18). The study revealed significant exclusivity of HPV and driver mutations in head-and-neck cancer and the association of HPV with APOBEC mutational signatures, which suggests that impaired antiviral defense is a driving force in cervical, bladder and head-and-neck carcinoma. For HBV, HPV16, HPV18 and adeno-associated virus-2 (AAV2), viral integration was associated with local variations in genomic copy numbers. Integrations at the TERT promoter were associated with high telomerase expression evidently activating this tumor-driving process. High levels of endogenous retrovirus (ERV1) expression were linked to a worse survival outcome in patients with kidney cancer.


Asunto(s)
Virus ADN Tumorales/genética , Genoma Humano/genética , Neoplasias/virología , Transcriptoma , Infecciones Tumorales por Virus/virología , Integración Viral , Variaciones en el Número de Copia de ADN , Virus de la Hepatitis B/genética , Herpesvirus Humano 4/genética , Humanos , Mutación , Neoplasias/genética , Infecciones por Papillomavirus/genética , Regiones Promotoras Genéticas/genética , Telomerasa/genética
8.
PLoS One ; 13(7): e0200926, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30040866

RESUMEN

BACKGROUND: The lack of accessible and structured documentation creates major barriers for investigators interested in understanding, properly interpreting and analyzing cohort data and biological samples. Providing the scientific community with open information is essential to optimize usage of these resources. A cataloguing toolkit is proposed by Maelstrom Research to answer these needs and support the creation of comprehensive and user-friendly study- and network-specific web-based metadata catalogues. METHODS: Development of the Maelstrom Research cataloguing toolkit was initiated in 2004. It was supported by the exploration of existing catalogues and standards, and guided by input from partner initiatives having used or pilot tested incremental versions of the toolkit. RESULTS: The cataloguing toolkit is built upon two main components: a metadata model and a suite of open-source software applications. The model sets out specific fields to describe study profiles; characteristics of the subpopulations of participants; timing and design of data collection events; and datasets/variables collected at each data collection event. It also includes the possibility to annotate variables with different classification schemes. When combined, the model and software support implementation of study and variable catalogues and provide a powerful search engine to facilitate data discovery. CONCLUSIONS: The Maelstrom Research cataloguing toolkit already serves several national and international initiatives and the suite of software is available to new initiatives through the Maelstrom Research website. With the support of new and existing partners, we hope to ensure regular improvements of the toolkit.


Asunto(s)
Estudios de Cohortes , Análisis de Datos , Bases de Datos Factuales , Estudios Epidemiológicos , Humanos , Modelos Estadísticos , Programas Informáticos , Interfaz Usuario-Computador
10.
J Virol ; 92(2)2018 01 15.
Artículo en Inglés | MEDLINE | ID: mdl-29093097

RESUMEN

Epstein-Barr virus (EBV) is a causative agent of a variety of lymphomas, nasopharyngeal carcinoma (NPC), and ∼9% of gastric carcinomas (GCs). An important question is whether particular EBV variants are more oncogenic than others, but conclusions are currently hampered by the lack of sequenced EBV genomes. Here, we contribute to this question by mining whole-genome sequences of 201 GCs to identify 13 EBV-positive GCs and by assembling 13 new EBV genome sequences, almost doubling the number of available GC-derived EBV genome sequences and providing the first non-Asian EBV genome sequences from GC. Whole-genome sequence comparisons of all EBV isolates sequenced to date (85 from tumors and 57 from healthy individuals) showed that most GC and NPC EBV isolates were closely related although American Caucasian GC samples were more distant, suggesting a geographical component. However, EBV GC isolates were found to contain some consistent changes in protein sequences regardless of geographical origin. In addition, transcriptome data available for eight of the EBV-positive GCs were analyzed to determine which EBV genes are expressed in GC. In addition to the expected latency proteins (EBNA1, LMP1, and LMP2A), specific subsets of lytic genes were consistently expressed that did not reflect a typical lytic or abortive lytic infection, suggesting a novel mechanism of EBV gene regulation in the context of GC. These results are consistent with a model in which a combination of specific latent and lytic EBV proteins promotes tumorigenesis.IMPORTANCE Epstein-Barr virus (EBV) is a widespread virus that causes cancer, including gastric carcinoma (GC), in a small subset of individuals. An important question is whether particular EBV variants are more cancer associated than others, but more EBV sequences are required to address this question. Here, we have generated 13 new EBV genome sequences from GC, almost doubling the number of EBV sequences from GC isolates and providing the first EBV sequences from non-Asian GC. We further identify sequence changes in some EBV proteins common to GC isolates. In addition, gene expression analysis of eight of the EBV-positive GCs showed consistent expression of both the expected latency proteins and a subset of lytic proteins that was not consistent with typical lytic or abortive lytic expression. These results suggest that novel mechanisms activate expression of some EBV lytic proteins and that their expression may contribute to oncogenesis.


Asunto(s)
Adenocarcinoma/etiología , Infecciones por Virus de Epstein-Barr/complicaciones , Infecciones por Virus de Epstein-Barr/virología , Regulación Viral de la Expresión Génica , Genoma Viral , Herpesvirus Humano 4/fisiología , Neoplasias Gástricas/etiología , Adenocarcinoma/patología , Sustitución de Aminoácidos , Biología Computacional/métodos , Epítopos de Linfocito T , Infecciones por Virus de Epstein-Barr/inmunología , Humanos , Mutación , Filogenia , Neoplasias Gástricas/patología , Secuenciación Completa del Genoma
11.
Cancer Res ; 77(21): e15-e18, 2017 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-29092930

RESUMEN

The NCI Genomic Data Commons (GDC) was launched in 2016 and makes available over 4 petabytes (PB) of cancer genomic and associated clinical data to the research community. This dataset continues to grow and currently includes over 14,500 patients. The GDC is an example of a biomedical data commons, which collocates biomedical data with storage and computing infrastructure and commonly used web services, software applications, and tools to create a secure, interoperable, and extensible resource for researchers. The GDC is (i) a data repository for downloading data that have been submitted to it, and also a system that (ii) applies a common set of bioinformatics pipelines to submitted data; (iii) reanalyzes existing data when new pipelines are developed; and (iv) allows users to build their own applications and systems that interoperate with the GDC using the GDC Application Programming Interface (API). We describe the GDC API and how it has been used both by the GDC itself and by third parties. Cancer Res; 77(21); e15-18. ©2017 AACR.


Asunto(s)
Biología Computacional/tendencias , Genoma Humano , Genómica , Neoplasias/genética , Conjuntos de Datos como Asunto , Humanos , Internet , Programas Informáticos , Interfaz Usuario-Computador
12.
Int J Epidemiol ; 46(5): 1372-1378, 2017 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-29025122

RESUMEN

Motivation: Improving the dissemination of information on existing epidemiological studies and facilitating the interoperability of study databases are essential to maximizing the use of resources and accelerating improvements in health. To address this, Maelstrom Research proposes Opal and Mica, two inter-operable open-source software packages providing out-of-the-box solutions for epidemiological data management, harmonization and dissemination. Implementation: Opal and Mica are two standalone but inter-operable web applications written in Java, JavaScript and PHP. They provide web services and modern user interfaces to access them. General features: Opal allows users to import, manage, annotate and harmonize study data. Mica is used to build searchable web portals disseminating study and variable metadata. When used conjointly, Mica users can securely query and retrieve summary statistics on geographically dispersed Opal servers in real-time. Integration with the DataSHIELD approach allows conducting more complex federated analyses involving statistical models. Availability: Opal and Mica are open-source and freely available at [www.obiba.org] under a General Public License (GPL) version 3, and the metadata models and taxonomies that accompany them are available under a Creative Commons licence.


Asunto(s)
Sistemas de Administración de Bases de Datos , Difusión de la Información/métodos , Programas Informáticos , Canadá , Estudios Epidemiológicos , Humanos , Internet
13.
Blood ; 130(4): 453-459, 2017 07 27.
Artículo en Inglés | MEDLINE | ID: mdl-28600341

RESUMEN

The National Cancer Institute Genomic Data Commons (GDC) is an information system for storing, analyzing, and sharing genomic and clinical data from patients with cancer. The recent high-throughput sequencing of cancer genomes and transcriptomes has produced a big data problem that precludes many cancer biologists and oncologists from gleaning knowledge from these data regarding the nature of malignant processes and the relationship between tumor genomic profiles and treatment response. The GDC aims to democratize access to cancer genomic data and to foster the sharing of these data to promote precision medicine approaches to the diagnosis and treatment of cancer.


Asunto(s)
Bases de Datos Genéticas , Neoplasias/genética , Medicina de Precisión , Programas Informáticos , Humanos , National Cancer Institute (U.S.) , Estados Unidos
14.
F1000Res ; 6: 52, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28344774

RESUMEN

As genomic datasets continue to grow, the feasibility of downloading data to a local organization and running analysis on a traditional compute environment is becoming increasingly problematic. Current large-scale projects, such as the ICGC PanCancer Analysis of Whole Genomes (PCAWG), the Data Platform for the U.S. Precision Medicine Initiative, and the NIH Big Data to Knowledge Center for Translational Genomics, are using cloud-based infrastructure to both host and perform analysis across large data sets. In PCAWG, over 5,800 whole human genomes were aligned and variant called across 14 cloud and HPC environments; the processed data was then made available on the cloud for further analysis and sharing. If run locally, an operation at this scale would have monopolized a typical academic data centre for many months, and would have presented major challenges for data storage and distribution. However, this scale is increasingly typical for genomics projects and necessitates a rethink of how analytical tools are packaged and moved to the data. For PCAWG, we embraced the use of highly portable Docker images for encapsulating and sharing complex alignment and variant calling workflows across highly variable environments. While successful, this endeavor revealed a limitation in Docker containers, namely the lack of a standardized way to describe and execute the tools encapsulated inside the container. As a result, we created the Dockstore ( https://dockstore.org), a project that brings together Docker images with standardized, machine-readable ways of describing and running the tools contained within. This service greatly improves the sharing and reuse of genomics tools and promotes interoperability with similar projects through emerging web service standards developed by the Global Alliance for Genomics and Health (GA4GH).

15.
Int J Epidemiol ; 46(1): 103-105, 2017 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-27272186

RESUMEN

Background: It is widely accepted and acknowledged that data harmonization is crucial: in its absence, the co-analysis of major tranches of high quality extant data is liable to inefficiency or error. However, despite its widespread practice, no formalized/systematic guidelines exist to ensure high quality retrospective data harmonization. Methods: To better understand real-world harmonization practices and facilitate development of formal guidelines, three interrelated initiatives were undertaken between 2006 and 2015. They included a phone survey with 34 major international research initiatives, a series of workshops with experts, and case studies applying the proposed guidelines. Results: A wide range of projects use retrospective harmonization to support their research activities but even when appropriate approaches are used, the terminologies, procedures, technologies and methods adopted vary markedly. The generic guidelines outlined in this article delineate the essentials required and describe an interdependent step-by-step approach to harmonization: 0) define the research question, objectives and protocol; 1) assemble pre-existing knowledge and select studies; 2) define targeted variables and evaluate harmonization potential; 3) process data; 4) estimate quality of the harmonized dataset(s) generated; and 5) disseminate and preserve final harmonization products. Conclusions: This manuscript provides guidelines aiming to encourage rigorous and effective approaches to harmonization which are comprehensively and transparently documented and straightforward to interpret and implement. This can be seen as a key step towards implementing guiding principles analogous to those that are well recognised as being essential in securing the foundational underpinning of systematic reviews and the meta-analysis of clinical trials.


Asunto(s)
Investigación Biomédica/estadística & datos numéricos , Recolección de Datos/métodos , Proyectos de Investigación/normas , Estudios Retrospectivos , Interpretación Estadística de Datos , Guías como Asunto , Humanos
16.
JAMA Oncol ; 3(6): 774-783, 2017 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-27768182

RESUMEN

IMPORTANCE: Outcomes for patients with pancreatic ductal adenocarcinoma (PDAC) remain poor. Advances in next-generation sequencing provide a route to therapeutic approaches, and integrating DNA and RNA analysis with clinicopathologic data may be a crucial step toward personalized treatment strategies for this disease. OBJECTIVE: To classify PDAC according to distinct mutational processes, and explore their clinical significance. DESIGN, SETTING, AND PARTICIPANTS: We performed a retrospective cohort study of resected PDAC, using cases collected between 2008 and 2015 as part of the International Cancer Genome Consortium. The discovery cohort comprised 160 PDAC cases from 154 patients (148 primary; 12 metastases) that underwent tumor enrichment prior to whole-genome and RNA sequencing. The replication cohort comprised 95 primary PDAC cases that underwent whole-genome sequencing and expression microarray on bulk biospecimens. MAIN OUTCOMES AND MEASURES: Somatic mutations accumulate from sequence-specific processes creating signatures detectable by DNA sequencing. Using nonnegative matrix factorization, we measured the contribution of each signature to carcinogenesis, and used hierarchical clustering to subtype each cohort. We examined expression of antitumor immunity genes across subtypes to uncover biomarkers predictive of response to systemic therapies. RESULTS: The discovery cohort was 53% male (n = 79) and had a median age of 67 (interquartile range, 58-74) years. The replication cohort was 50% male (n = 48) and had a median age of 68 (interquartile range, 60-75) years. Five predominant mutational subtypes were identified that clustered PDAC into 4 major subtypes: age related, double-strand break repair, mismatch repair, and 1 with unknown etiology (signature 8). These were replicated and validated. Signatures were faithfully propagated from primaries to matched metastases, implying their stability during carcinogenesis. Twelve of 27 (45%) double-strand break repair cases lacked germline or somatic events in canonical homologous recombination genes-BRCA1, BRCA2, or PALB2. Double-strand break repair and mismatch repair subtypes were associated with increased expression of antitumor immunity, including activation of CD8-positive T lymphocytes (GZMA and PRF1) and overexpression of regulatory molecules (cytotoxic T-lymphocyte antigen 4, programmed cell death 1, and indolamine 2,3-dioxygenase 1), corresponding to higher frequency of somatic mutations and tumor-specific neoantigens. CONCLUSIONS AND RELEVANCE: Signature-based subtyping may guide personalized therapy of PDAC in the context of biomarker-driven prospective trials.


Asunto(s)
Carcinoma Ductal Pancreático/genética , Mutación , Neoplasias Pancreáticas/genética , Anciano , Linfocitos T CD8-positivos/inmunología , Antígeno CTLA-4/metabolismo , Carcinoma Ductal Pancreático/inmunología , Roturas del ADN de Doble Cadena/efectos de los fármacos , Reparación de la Incompatibilidad de ADN/genética , Proteína del Grupo de Complementación N de la Anemia de Fanconi , Femenino , Genes BRCA1/fisiología , Genes BRCA2/fisiología , Predisposición Genética a la Enfermedad/genética , Estudio de Asociación del Genoma Completo , Humanos , Masculino , Persona de Mediana Edad , Proteínas Nucleares/genética , Neoplasias Pancreáticas/inmunología , Pronóstico , Receptor de Muerte Celular Programada 1/metabolismo , Estudios Retrospectivos , Proteínas Supresoras de Tumor/genética , Neoplasias Pancreáticas
18.
Bioinformatics ; 32(3): 453-5, 2016 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-26454281

RESUMEN

SUMMARY: Sequence comparison of genetic material between known and unknown organisms plays a crucial role in genomics, metagenomics and phylogenetic analysis. The emerging long-read sequencing technologies can now produce reads of tens of kilobases in length that promise a more accurate assessment of their origin. To facilitate the classification of long and short DNA sequences, we have developed a Python package that implements a new sequence classification model that we have demonstrated to improve the classification accuracy when compared with other state of the art classification methods. For the purpose of validation, and to demonstrate its usefulness, we test the combined sequence similarity score classifier (CSSSCL) using three different datasets, including a metagenomic dataset composed of short reads. AVAILABILITY AND IMPLEMENTATION: Package's source code and test datasets are available under the GPLv3 license at https://github.com/oicr-ibc/cssscl. CONTACT: ivan.borozan@oicr.on.ca SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Bacterias/clasificación , Metagenómica/métodos , Modelos Teóricos , Alineación de Secuencia , Programas Informáticos , Virus/clasificación , Bacterias/genética , Filogenia , Análisis de Secuencia de ADN , Virus/genética
19.
Bioinformatics ; 31(9): 1396-404, 2015 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-25573913

RESUMEN

MOTIVATION: Alignment-based sequence similarity searches, while accurate for some type of sequences, can produce incorrect results when used on more divergent but functionally related sequences that have undergone the sequence rearrangements observed in many bacterial and viral genomes. Here, we propose a classification model that exploits the complementary nature of alignment-based and alignment-free similarity measures with the aim to improve the accuracy with which DNA and protein sequences are characterized. RESULTS: Our model classifies sequences using a combined sequence similarity score calculated by adaptively weighting the contribution of different sequence similarity measures. Weights are determined independently for each sequence in the test set and reflect the discriminatory ability of individual similarity measures in the training set. Because the similarity between some sequences is determined more accurately with one type of measure rather than another, our classifier allows different sets of weights to be associated with different sequences. Using five different similarity measures, we show that our model significantly improves the classification accuracy over the current composition- and alignment-based models, when predicting the taxonomic lineage for both short viral sequence fragments and complete viral sequences. We also show that our model can be used effectively for the classification of reads from a real metagenome dataset as well as protein sequences. AVAILABILITY AND IMPLEMENTATION: All the datasets and the code used in this study are freely available at https://collaborators.oicr.on.ca/vferretti/borozan_csss/csss.html. CONTACT: ivan.borozan@gmail.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Alineación de Secuencia , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de Proteína/métodos , Algoritmos , Clasificación/métodos , ADN Viral , Metagenómica , Modelos Teóricos , Virus/clasificación
20.
Public Health Genomics ; 18(2): 87-96, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25532061

RESUMEN

BACKGROUND: DataSHIELD (Data Aggregation Through Anonymous Summary-statistics from Harmonised Individual levEL Databases) has been proposed to facilitate the co-analysis of individual-level data from multiple studies without physically sharing the data. In a previous paper, we investigated whether DataSHIELD could protect participant confidentiality in accordance with UK law. In this follow-up paper, we investigate whether DataSHIELD addresses a broader range of ethics-related data-sharing concerns. METHODS: Ethics-related data-sharing concerns of Institutional Review Boards, ethics experts, international research consortia and research participants were identified through a literature search and systematically examined at a multidisciplinary workshop to determine whether DataSHIELD proposes mechanisms which can address these concerns. RESULTS: DataSHIELD addresses several ethics-related data-sharing concerns related to privacy, confidentiality, and the protection of the research participant's rights while sharing data and after the data have been shared. The data remain entirely under the direct management of the study that collected them. Data processing commands are strictly supervised, and the data are queried in a protected environment. Issues related to the return of individual research results when data are shared are eliminated; the responsibility for return remains at the study of origin. CONCLUSION: DataSHIELD can provide an innovative and robust solution for addressing commonly encountered ethics-related data-sharing concerns.


Asunto(s)
Investigación Biomédica/ética , Recolección de Datos , Bases de Datos Factuales/ética , Difusión de la Información , Acceso a la Información , Investigación Biomédica/métodos , Confidencialidad/ética , Recolección de Datos/ética , Recolección de Datos/métodos , Comités de Ética en Investigación , Humanos , Difusión de la Información/ética , Difusión de la Información/métodos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...