Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 38
Filtrar
Más filtros

Base de datos
País/Región como asunto
Tipo del documento
País de afiliación
Intervalo de año de publicación
1.
Cancer Res ; 84(13): 2060-2072, 2024 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-39082680

RESUMEN

Patient-derived xenografts (PDX) model human intra- and intertumoral heterogeneity in the context of the intact tissue of immunocompromised mice. Histologic imaging via hematoxylin and eosin (H&E) staining is routinely performed on PDX samples, which could be harnessed for computational analysis. Prior studies of large clinical H&E image repositories have shown that deep learning analysis can identify intercellular and morphologic signals correlated with disease phenotype and therapeutic response. In this study, we developed an extensive, pan-cancer repository of >1,000 PDX and paired parental tumor H&E images. These images, curated from the PDX Development and Trial Centers Research Network Consortium, had a range of associated genomic and transcriptomic data, clinical metadata, pathologic assessments of cell composition, and, in several cases, detailed pathologic annotations of neoplastic, stromal, and necrotic regions. The amenability of these images to deep learning was highlighted through three applications: (i) development of a classifier for neoplastic, stromal, and necrotic regions; (ii) development of a predictor of xenograft-transplant lymphoproliferative disorder; and (iii) application of a published predictor of microsatellite instability. Together, this PDX Development and Trial Centers Research Network image repository provides a valuable resource for controlled digital pathology analysis, both for the evaluation of technical issues and for the development of computational image-based methods that make clinical predictions based on PDX treatment studies. Significance: A pan-cancer repository of >1,000 patient-derived xenograft hematoxylin and eosin-stained images will facilitate cancer biology investigations through histopathologic analysis and contributes important model system data that expand existing human histology repositories.


Asunto(s)
Aprendizaje Profundo , Neoplasias , Humanos , Animales , Ratones , Neoplasias/genética , Neoplasias/patología , Neoplasias/diagnóstico por imagen , Genómica/métodos , Xenoinjertos , Ensayos Antitumor por Modelo de Xenoinjerto , Trastornos Linfoproliferativos/genética , Trastornos Linfoproliferativos/patología , Procesamiento de Imagen Asistido por Computador/métodos
2.
bioRxiv ; 2024 Mar 13.
Artículo en Inglés | MEDLINE | ID: mdl-38559260

RESUMEN

Accurate identification of germline de novo variants (DNVs) remains a challenging problem despite rapid advances in sequencing technologies as well as methods for the analysis of the data they generate, with putative solutions often involving ad hoc filters and visual inspection of identified variants. Here, we present a purely informatic method for the identification of DNVs by analyzing short-read genome sequencing data from proband-parent trios. Our method evaluates variant calls generated by three genome sequence analysis pipelines utilizing different algorithms-GATK HaplotypeCaller, DeepTrio and Velsera GRAF-exploring the assumption that a requirement of consensus can serve as an effective filter for high-quality DNVs. We assessed the efficacy of our method by testing DNVs identified using a previously established, highly accurate classification procedure that partially relied on manual inspection and used Sanger sequencing to validate a DNV subset comprising less confident calls. The results show that our method is highly precise and that applying a force-calling procedure to putative variants further removes false-positive calls, increasing precision of the workflow to 99.6%. Our method also identified novel DNVs, 87% of which were validated, indicating it offers a higher recall rate without compromising accuracy. We have implemented this method as an automated bioinformatics workflow suitable for large-scale analyses without need for manual intervention.

3.
Cancer Res ; 84(9): 1396-1403, 2024 May 02.
Artículo en Inglés | MEDLINE | ID: mdl-38488504

RESUMEN

The NCI's Cloud Resources (CR) are the analytical components of the Cancer Research Data Commons (CRDC) ecosystem. This review describes how the three CRs (Broad Institute FireCloud, Institute for Systems Biology Cancer Gateway in the Cloud, and Seven Bridges Cancer Genomics Cloud) provide access and availability to large, cloud-hosted, multimodal cancer datasets, as well as offer tools and workspaces for performing data analysis where the data resides, without download or storage. In addition, users can upload their own data and tools into their workspaces, allowing researchers to create custom analysis workflows and integrate CRDC-hosted data with their own. See related articles by Brady et al., p. 1384, Wang et al., p. 1388, and Kim et al., p. 1404.


Asunto(s)
Nube Computacional , National Cancer Institute (U.S.) , Neoplasias , Humanos , Neoplasias/genética , Estados Unidos , Investigación Biomédica , Genómica/métodos , Biología Computacional/métodos
4.
Cancer Res ; 84(9): 1404-1409, 2024 May 02.
Artículo en Inglés | MEDLINE | ID: mdl-38488510

RESUMEN

More than ever, scientific progress in cancer research hinges on our ability to combine datasets and extract meaningful interpretations to better understand diseases and ultimately inform the development of better treatments and diagnostic tools. To enable the successful sharing and use of big data, the NCI developed the Cancer Research Data Commons (CRDC), providing access to a large, comprehensive, and expanding collection of cancer data. The CRDC is a cloud-based data science infrastructure that eliminates the need for researchers to download and store large-scale datasets by allowing them to perform analysis where data reside. Over the past 10 years, the CRDC has made significant progress in providing access to data and tools along with training and outreach to support the cancer research community. In this review, we provide an overview of the history and the impact of the CRDC to date, lessons learned, and future plans to further promote data sharing, accessibility, interoperability, and reuse. See related articles by Brady et al., p. 1384, Wang et al., p. 1388, and Pot et al., p. 1396.


Asunto(s)
Difusión de la Información , National Cancer Institute (U.S.) , Neoplasias , Humanos , Estados Unidos , Neoplasias/terapia , Difusión de la Información/métodos , Investigación Biomédica/tendencias , Bases de Datos Factuales , Macrodatos
6.
J Am Med Inform Assoc ; 30(7): 1293-1300, 2023 06 20.
Artículo en Inglés | MEDLINE | ID: mdl-37192819

RESUMEN

Research increasingly relies on interrogating large-scale data resources. The NIH National Heart, Lung, and Blood Institute developed the NHLBI BioData CatalystⓇ (BDC), a community-driven ecosystem where researchers, including bench and clinical scientists, statisticians, and algorithm developers, find, access, share, store, and compute on large-scale datasets. This ecosystem provides secure, cloud-based workspaces, user authentication and authorization, search, tools and workflows, applications, and new innovative features to address community needs, including exploratory data analysis, genomic and imaging tools, tools for reproducibility, and improved interoperability with other NIH data science platforms. BDC offers straightforward access to large-scale datasets and computational resources that support precision medicine for heart, lung, blood, and sleep conditions, leveraging separately developed and managed platforms to maximize flexibility based on researcher needs, expertise, and backgrounds. Through the NHLBI BioData Catalyst Fellows Program, BDC facilitates scientific discoveries and technological advances. BDC also facilitated accelerated research on the coronavirus disease-2019 (COVID-19) pandemic.


Asunto(s)
COVID-19 , Nube Computacional , Humanos , Ecosistema , Reproducibilidad de los Resultados , Pulmón , Programas Informáticos
7.
Nat Commun ; 13(1): 4384, 2022 08 04.
Artículo en Inglés | MEDLINE | ID: mdl-35927245

RESUMEN

Graph-based genome reference representations have seen significant development, motivated by the inadequacy of the current human genome reference to represent the diverse genetic information from different human populations and its inability to maintain the same level of accuracy for non-European ancestries. While there have been many efforts to develop computationally efficient graph-based toolkits for NGS read alignment and variant calling, methods to curate genomic variants and subsequently construct genome graphs remain an understudied problem that inevitably determines the effectiveness of the overall bioinformatics pipeline. In this study, we discuss obstacles encountered during graph construction and propose methods for sample selection based on population diversity, graph augmentation with structural variants and resolution of graph reference ambiguity caused by information overload. Moreover, we present the case for iteratively augmenting tailored genome graphs for targeted populations and demonstrate this approach on the whole-genome samples of African ancestry. Our results show that population-specific graphs, as more representative alternatives to linear or generic graph references, can achieve significantly lower read mapping errors and enhanced variant calling sensitivity, in addition to providing the improvements of joint variant calling without the need of computationally intensive post-processing steps.


Asunto(s)
Análisis de Datos , Secuenciación de Nucleótidos de Alto Rendimiento , Genoma Humano/genética , Genómica/métodos , Humanos , Análisis de Secuencia de ADN/métodos , Programas Informáticos
8.
NAR Cancer ; 4(2): zcac014, 2022 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-35475145

RESUMEN

We created the PDX Network (PDXNet) portal (https://portal.pdxnetwork.org/) to centralize access to the National Cancer Institute-funded PDXNet consortium resources, to facilitate collaboration among researchers and to make these data easily available for research. The portal includes sections for resources, analysis results, metrics for PDXNet activities, data processing protocols and training materials for processing PDX data. Currently, the portal contains PDXNet model information and data resources from 334 new models across 33 cancer types. Tissue samples of these models were deposited in the NCI's Patient-Derived Model Repository (PDMR) for public access. These models have 2134 associated sequencing files from 873 samples across 308 patients, which are hosted on the Cancer Genomics Cloud powered by Seven Bridges and the NCI Cancer Data Service for long-term storage and access with dbGaP permissions. The portal includes results from freely available, robust, validated and standardized analysis workflows on PDXNet sequencing files and PDMR data (3857 samples from 629 patients across 85 disease types). The PDXNet portal is continuously updated with new data and is of significant utility to the cancer research community as it provides a centralized location for PDXNet resources, which support multi-agent treatment studies, determination of sensitivity and resistance mechanisms, and preclinical trials.

10.
Nat Commun ; 12(1): 5086, 2021 08 24.
Artículo en Inglés | MEDLINE | ID: mdl-34429404

RESUMEN

Development of candidate cancer treatments is a resource-intensive process, with the research community continuing to investigate options beyond static genomic characterization. Toward this goal, we have established the genomic landscapes of 536 patient-derived xenograft (PDX) models across 25 cancer types, together with mutation, copy number, fusion, transcriptomic profiles, and NCI-MATCH arms. Compared with human tumors, PDXs typically have higher purity and fit to investigate dynamic driver events and molecular properties via multiple time points from same case PDXs. Here, we report on dynamic genomic landscapes and pharmacogenomic associations, including associations between activating oncogenic events and drugs, correlations between whole-genome duplications and subclone events, and the potential PDX models for NCI-MATCH trials. Lastly, we provide a web portal having comprehensive pan-cancer PDX genomic profiles and source code to facilitate identification of more druggable events and further insights into PDXs' recapitulation of human tumors.


Asunto(s)
Xenoinjertos , Neoplasias/genética , Neoplasias/metabolismo , Ensayos Antitumor por Modelo de Xenoinjerto , Animales , Modelos Animales de Enfermedad , Femenino , Regulación Neoplásica de la Expresión Génica , Genoma , Genómica , Humanos , Masculino , Ratones , Modelos Biológicos , Mutación , Transcriptoma
12.
Nat Genet ; 53(1): 86-99, 2021 01.
Artículo en Inglés | MEDLINE | ID: mdl-33414553

RESUMEN

Patient-derived xenografts (PDXs) are resected human tumors engrafted into mice for preclinical studies and therapeutic testing. It has been proposed that the mouse host affects tumor evolution during PDX engraftment and propagation, affecting the accuracy of PDX modeling of human cancer. Here, we exhaustively analyze copy number alterations (CNAs) in 1,451 PDX and matched patient tumor (PT) samples from 509 PDX models. CNA inferences based on DNA sequencing and microarray data displayed substantially higher resolution and dynamic range than gene expression-based inferences, and they also showed strong CNA conservation from PTs through late-passage PDXs. CNA recurrence analysis of 130 colorectal and breast PT/PDX-early/PDX-late trios confirmed high-resolution CNA retention. We observed no significant enrichment of cancer-related genes in PDX-specific CNAs across models. Moreover, CNA differences between patient and PDX tumors were comparable to variations in multiregion samples within patients. Our study demonstrates the lack of systematic copy number evolution driven by the PDX mouse host.


Asunto(s)
Variaciones en el Número de Copia de ADN/genética , Ensayos Antitumor por Modelo de Xenoinjerto , Animales , Bases de Datos Genéticas , Regulación Neoplásica de la Expresión Génica , Humanos , Ratones , Metástasis de la Neoplasia , Polimorfismo de Nucleótido Simple/genética , Secuenciación del Exoma
13.
Nat Neurosci ; 22(2): 167-179, 2019 02.
Artículo en Inglés | MEDLINE | ID: mdl-30643292

RESUMEN

The findings that amyotrophic lateral sclerosis (ALS) patients almost universally display pathological mislocalization of the RNA-binding protein TDP-43 and that mutations in its gene cause familial ALS have nominated altered RNA metabolism as a disease mechanism. However, the RNAs regulated by TDP-43 in motor neurons and their connection to neuropathy remain to be identified. Here we report transcripts whose abundances in human motor neurons are sensitive to TDP-43 depletion. Notably, expression of STMN2, which encodes a microtubule regulator, declined after TDP-43 knockdown and TDP-43 mislocalization as well as in patient-specific motor neurons and postmortem patient spinal cord. STMN2 loss upon reduced TDP-43 function was due to altered splicing, which is functionally important, as we show STMN2 is necessary for normal axonal outgrowth and regeneration. Notably, post-translational stabilization of STMN2 rescued neurite outgrowth and axon regeneration deficits induced by TDP-43 depletion. We propose that restoring STMN2 expression warrants examination as a therapeutic strategy for ALS.


Asunto(s)
Esclerosis Amiotrófica Lateral/metabolismo , Proteínas de Unión al ADN/metabolismo , Proteínas de la Membrana/metabolismo , Neuronas Motoras/metabolismo , Axones/metabolismo , Línea Celular , Regulación hacia Abajo , Femenino , Humanos , Células Madre Pluripotentes Inducidas , Masculino , Médula Espinal/metabolismo , Estatmina
14.
Methods Mol Biol ; 1878: 39-64, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-30378068

RESUMEN

The Seven Bridges Cancer Genomics Cloud (CGC) is part of the National Cancer Institute Cloud Resource project, which was created to explore the paradigm of co-locating massive datasets with the computational resources to analyze them. The CGC was designed to allow researchers to easily find the data they need and analyze it with robust applications in a scalable and reproducible fashion. To enable this, individual tools are packaged within Docker containers and described by the Common Workflow Language (CWL), an emerging standard for enabling reproducible data analysis. On the CGC, researchers can deploy individual tools and customize massive workflows by chaining together tools. Here, we discuss a case study in which RNA sequencing data is analyzed with different methods and compared on the Seven Bridges CGC. We highlight best practices for designing command line tools, Docker containers, and CWL descriptions to enable massively parallelized and reproducible biomedical computation with cloud resources.


Asunto(s)
Neoplasias/genética , ARN/genética , Línea Celular Tumoral , Biología Computacional/métodos , Genómica/métodos , Humanos , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Flujo de Trabajo
15.
Cancer Inform ; 17: 1176935118774787, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30283230

RESUMEN

Increased efforts in cancer genomics research and bioinformatics are producing tremendous amounts of data. These data are diverse in origin, format, and content. As the amount of available sequencing data increase, technologies that make them discoverable and usable are critically needed. In response, we have developed a Semantic Web-based Data Browser, a tool allowing users to visually build and execute ontology-driven queries. This approach simplifies access to available data and improves the process of using them in analyses on the Seven Bridges Cancer Genomics Cloud (CGC; www.cancergenomicscloud.org). The Data Browser makes large data sets easily explorable and simplifies the retrieval of specific data of interest. Although initially implemented on top of The Cancer Genome Atlas (TCGA) data set, the Data Browser's architecture allows for seamless integration of other data sets. By deploying it on the CGC, we have enabled remote researchers to access data and perform collaborative investigations.

16.
Development ; 145(22)2018 11 21.
Artículo en Inglés | MEDLINE | ID: mdl-30337375

RESUMEN

Advances in stem cell science allow the production of different cell types in vitro either through the recapitulation of developmental processes, often termed 'directed differentiation', or the forced expression of lineage-specific transcription factors. Although cells produced by both approaches are increasingly used in translational applications, their quantitative similarity to their primary counterparts remains largely unresolved. To investigate the similarity between in vitro-derived and primary cell types, we harvested and purified mouse spinal motor neurons and compared them with motor neurons produced by transcription factor-mediated lineage conversion of fibroblasts or directed differentiation of pluripotent stem cells. To enable unbiased analysis of these motor neuron types and their cells of origin, we then subjected them to whole transcriptome and DNA methylome analysis by RNA sequencing (RNA-seq) and reduced representation bisulfite sequencing (RRBS). Despite major differences in methodology, lineage conversion and directed differentiation both produce cells that closely approximate the primary motor neuron state. However, we identify differences in Fas signaling, the Hox code and synaptic gene expression between lineage-converted and directed differentiation motor neurons that affect their utility in translational studies.


Asunto(s)
Linaje de la Célula/genética , Embrión de Mamíferos/citología , Genómica , Neuronas Motoras/citología , Células Madre Pluripotentes/citología , Animales , Epigénesis Genética , Ratones Endogámicos C57BL , Neuronas Motoras/metabolismo , Células Madre Pluripotentes/metabolismo , Transcripción Genética
17.
Curr Protoc Bioinformatics ; 60: 11.16.1-11.16.32, 2017 12 08.
Artículo en Inglés | MEDLINE | ID: mdl-29220078

RESUMEN

Next-generation sequencing has produced petabytes of data, but accessing and analyzing these data remain challenging. Traditionally, researchers investigating public datasets like The Cancer Genome Atlas (TCGA) would download the data to a high-performance cluster, which could take several weeks even with a highly optimized network connection. The National Cancer Institute (NCI) initiated the Cancer Genomics Cloud Pilots program to provide researchers with the resources to process data with cloud computational resources. We present protocols using one of these Cloud Pilots, the Seven Bridges Cancer Genomics Cloud (CGC), to find and query public datasets, bring your own data to the CGC, analyze data using standard or custom workflows, and benchmark tools for accuracy with interactive analysis features. These protocols demonstrate that the CGC is a data-analysis ecosystem that fully empowers researchers with a variety of areas of expertise and interests to collaborate in the analysis of petabytes of data. © 2017 by John Wiley & Sons, Inc.


Asunto(s)
Bases de Datos Genéticas/estadística & datos numéricos , Neoplasias/genética , Nube Computacional , Biología Computacional , Interpretación Estadística de Datos , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Metadatos , Proyectos Piloto
18.
Cancer Res ; 77(21): e3-e6, 2017 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-29092927

RESUMEN

The Seven Bridges Cancer Genomics Cloud (CGC; www.cancergenomicscloud.org) enables researchers to rapidly access and collaborate on massive public cancer genomic datasets, including The Cancer Genome Atlas. It provides secure on-demand access to data, analysis tools, and computing resources. Researchers from diverse backgrounds can easily visualize, query, and explore cancer genomic datasets visually or programmatically. Data of interest can be immediately analyzed in the cloud using more than 200 preinstalled, curated bioinformatics tools and workflows. Researchers can also extend the functionality of the platform by adding their own data and tools via an intuitive software development kit. By colocalizing these resources in the cloud, the CGC enables scalable, reproducible analyses. Researchers worldwide can use the CGC to investigate key questions in cancer genomics. Cancer Res; 77(21); e3-6. ©2017 AACR.


Asunto(s)
Biología Computacional , Genómica , Neoplasias/genética , Genoma Humano , Humanos , Internet , Investigación , Programas Informáticos
19.
Pac Symp Biocomput ; 22: 154-165, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-27896971

RESUMEN

As biomedical data has become increasingly easy to generate in large quantities, the methods used to analyze it have proliferated rapidly. Reproducible and reusable methods are required to learn from large volumes of data reliably. To address this issue, numerous groups have developed workflow specifications or execution engines, which provide a framework with which to perform a sequence of analyses. One such specification is the Common Workflow Language, an emerging standard which provides a robust and flexible framework for describing data analysis tools and workflows. In addition, reproducibility can be furthered by executors or workflow engines which interpret the specification and enable additional features, such as error logging, file organization, optim1izations to computation and job scheduling, and allow for easy computing on large volumes of data. To this end, we have developed the Rabix Executor, an open-source workflow engine for the purposes of improving reproducibility through reusability and interoperability of workflow descriptions.


Asunto(s)
Programas Informáticos , Flujo de Trabajo , Biología Computacional , Humanos , Modelos Estadísticos , Reproducibilidad de los Resultados
20.
Methods Mol Biol ; 1381: 223-37, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-26667464

RESUMEN

Chromosomal rearrangements resulting in the creation of novel gene products, termed fusion genes, have been identified as driving events in the development of multiple types of cancer. As these gene products typically do not exist in normal cells, they represent valuable prognostic and therapeutic targets. Advances in next-generation sequencing and computational approaches have greatly improved our ability to detect and identify fusion genes. Nevertheless, these approaches require significant computational resources. Here we describe an approach which leverages cloud computing technologies to perform fusion gene detection from RNA sequencing data at any scale. We additionally highlight methods to enhance reproducibility of bioinformatics analyses which may be applied to any next-generation sequencing experiment.


Asunto(s)
Nube Computacional , Fusión Génica , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ARN/métodos , Humanos , Neoplasias/genética , ARN/genética , Reproducibilidad de los Resultados
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA