RESUMEN
Summary: IntegratedMRF is an open-source R implementation for integrating drug response predictions from various genomic characterizations using univariate or multivariate random forests that includes various options for error estimation techniques. The integrated framework was developed following superior performance of random forest based methods in NCI-DREAM drug sensitivity prediction challenge. The computational framework can be applied to estimate mean and confidence interval of drug response prediction errors based on ensemble approaches with various combinations of genetic and epigenetic characterizations as inputs. The multivariate random forest implementation included in the package incorporates the correlations between output responses in the modeling and has been shown to perform better than existing approaches when the drug responses are correlated. Detailed analysis of the provided features is included in the Supplementary Material . Availability and Implementation: The framework has been implemented as a package IntegratedMRF , which can be downloaded from https://cran.r-project.org/web/packages/IntegratedMRF/index.html , where further explanation of the package is available. Contact: ranadip.pal@ttu.edu. Supplementary information: Supplementary data are available at Bioinformatics online.
Asunto(s)
Biomarcadores Farmacológicos , Genómica/métodos , Modelos Genéticos , Neoplasias/genética , Programas Informáticos , Estadística como Asunto/métodos , Antineoplásicos/uso terapéutico , Metilación de ADN , Regulación de la Expresión Génica , Humanos , Neoplasias/tratamiento farmacológico , Neoplasias/metabolismo , Polimorfismo de Nucleótido Simple , Medicina de Precisión/métodos , TranscriptomaRESUMEN
The NCI Cancer Research Data Commons (CRDC) is a collection of data commons, analysis platforms, and tools that make existing cancer data more findable and accessible by the cancer research community. In practice, the two biggest hurdles to finding and using data for discovery are the wide variety of models and ontologies used to describe data, and the dispersed storage of that data. Here, we outline core CRDC services to aggregate descriptive information from multiple studies for findability via a single interface and to provide a single access method that spans multiple data commons. See related articles by Wang et al., p. 1388, Pot et al., p. 1396, and Kim et al., p. 1404.
Asunto(s)
National Cancer Institute (U.S.) , Neoplasias , Humanos , Estados Unidos , Neoplasias/terapia , Investigación Biomédica/normas , Bases de Datos FactualesRESUMEN
The NCI's Cloud Resources (CR) are the analytical components of the Cancer Research Data Commons (CRDC) ecosystem. This review describes how the three CRs (Broad Institute FireCloud, Institute for Systems Biology Cancer Gateway in the Cloud, and Seven Bridges Cancer Genomics Cloud) provide access and availability to large, cloud-hosted, multimodal cancer datasets, as well as offer tools and workspaces for performing data analysis where the data resides, without download or storage. In addition, users can upload their own data and tools into their workspaces, allowing researchers to create custom analysis workflows and integrate CRDC-hosted data with their own. See related articles by Brady et al., p. 1384, Wang et al., p. 1388, and Kim et al., p. 1404.
Asunto(s)
Nube Computacional , National Cancer Institute (U.S.) , Neoplasias , Humanos , Neoplasias/genética , Estados Unidos , Investigación Biomédica , Genómica/métodos , Biología Computacional/métodosRESUMEN
More than ever, scientific progress in cancer research hinges on our ability to combine datasets and extract meaningful interpretations to better understand diseases and ultimately inform the development of better treatments and diagnostic tools. To enable the successful sharing and use of big data, the NCI developed the Cancer Research Data Commons (CRDC), providing access to a large, comprehensive, and expanding collection of cancer data. The CRDC is a cloud-based data science infrastructure that eliminates the need for researchers to download and store large-scale datasets by allowing them to perform analysis where data reside. Over the past 10 years, the CRDC has made significant progress in providing access to data and tools along with training and outreach to support the cancer research community. In this review, we provide an overview of the history and the impact of the CRDC to date, lessons learned, and future plans to further promote data sharing, accessibility, interoperability, and reuse. See related articles by Brady et al., p. 1384, Wang et al., p. 1388, and Pot et al., p. 1396.
Asunto(s)
Difusión de la Información , National Cancer Institute (U.S.) , Neoplasias , Humanos , Estados Unidos , Neoplasias/terapia , Difusión de la Información/métodos , Investigación Biomédica/tendencias , Bases de Datos Factuales , MacrodatosRESUMEN
Since 2014, the NCI has launched a series of data commons as part of the Cancer Research Data Commons (CRDC) ecosystem housing genomic, proteomic, imaging, and clinical data to support cancer research and promote data sharing of NCI-funded studies. This review describes each data commons (Genomic Data Commons, Proteomic Data Commons, Integrated Canine Data Commons, Cancer Data Service, Imaging Data Commons, and Clinical and Translational Data Commons), including their unique and shared features, accomplishments, and challenges. Also discussed is how the CRDC data commons implement Findable, Accessible, Interoperable, Reusable (FAIR) principles and promote data sharing in support of the new NIH Data Management and Sharing Policy. See related articles by Brady et al., p. 1384, Pot et al., p. 1396, and Kim et al., p. 1404.
Asunto(s)
Difusión de la Información , National Cancer Institute (U.S.) , Neoplasias , Humanos , Estados Unidos , Neoplasias/metabolismo , Difusión de la Información/métodos , Investigación Biomédica , Genómica/métodos , Animales , Proteómica/métodosRESUMEN
Proteomics has emerged as a powerful tool for studying cancer biology, developing diagnostics, and therapies. With the continuous improvement and widespread availability of high-throughput proteomic technologies, the generation of large-scale proteomic data has become more common in cancer research, and there is a growing need for resources that support the sharing and integration of multi-omics datasets. Such datasets require extensive metadata including clinical, biospecimen, and experimental and workflow annotations that are crucial for data interpretation and reanalysis. The need to integrate, analyze, and share these data has led to the development of NCI's Proteomic Data Commons (PDC), accessible at https://pdc.cancer.gov. As a specialized repository within the NCI Cancer Research Data Commons (CRDC), PDC enables researchers to locate and analyze proteomic data from various cancer types and connect with genomic and imaging data available for the same samples in other CRDC nodes. Presently, PDC houses annotated data from more than 160 datasets across 19 cancer types, generated by several large-scale cancer research programs with cohort sizes exceeding 100 samples (tumor and associated normal when available). In this article, we review the current state of PDC in cancer research, discuss the opportunities and challenges associated with data sharing in proteomics, and propose future directions for the resource. SIGNIFICANCE: The Proteomic Data Commons (PDC) plays a crucial role in advancing cancer research by providing a centralized repository of high-quality cancer proteomic data, enriched with extensive clinical annotations. By integrating and cross-referencing with complementary genomic and imaging data, the PDC facilitates multi-omics analyses, driving comprehensive insights, and accelerating discoveries across various cancer types.
Asunto(s)
Nube Computacional , Genómica , National Cancer Institute (U.S.) , Neoplasias , Proteómica , Humanos , Proteómica/métodos , Neoplasias/genética , Neoplasias/metabolismo , Neoplasias/diagnóstico , Genómica/métodos , Estados UnidosRESUMEN
African individuals harbor molecular RH variants, which permit alloantibody formation to high-prevalence Rh antigens after transfusions. Genotyping identifies such RH variants, which are often missed by serologic blood group typing. Comprehensive molecular blood group analysis using 3 genotyping platforms, nucleotide sequencing, and serologic evaluation was performed on a 7-year-old African male with sickle cell disease who developed an "e-like" antibody shortly after initiating monthly red blood cell (RBC) transfusions for silent stroke. Genotyping of the RH variant predicted a severe shortage of compatible RBCs for long-term transfusion support, which contributed to the decision for hematopoetic stem cell transplantation. RH genotyping confirmed the RH variant in the human leukocyte antigen-matched sibling donor. The patient's (C)ce(s) type 1 haplotype occurs in up to 11% of African American sickle cell disease patients; however, haplotype-matched RBCs were serologically incompatible. This case documents that blood unit selection should be based on genotype rather than one matching haplotype.