Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
1.
Cell ; 173(2): 305-320.e10, 2018 04 05.
Artículo en Inglés | MEDLINE | ID: mdl-29625049

RESUMEN

The Cancer Genome Atlas (TCGA) has catalyzed systematic characterization of diverse genomic alterations underlying human cancers. At this historic junction marking the completion of genomic characterization of over 11,000 tumors from 33 cancer types, we present our current understanding of the molecular processes governing oncogenesis. We illustrate our insights into cancer through synthesis of the findings of the TCGA PanCancer Atlas project on three facets of oncogenesis: (1) somatic driver mutations, germline pathogenic variants, and their interactions in the tumor; (2) the influence of the tumor genome and epigenome on transcriptome and proteome; and (3) the relationship between tumor and the microenvironment, including implications for drugs targeting driver events and immunotherapies. These results will anchor future characterization of rare and common tumor types, primary and relapsed tumors, and cancers across ancestry groups and will guide the deployment of clinical genomic sequencing.


Asunto(s)
Carcinogénesis/genética , Genómica , Neoplasias/patología , Reparación del ADN/genética , Bases de Datos Genéticas , Genes Relacionados con las Neoplasias , Humanos , Redes y Vías Metabólicas/genética , Inestabilidad de Microsatélites , Mutación , Neoplasias/genética , Neoplasias/inmunología , Transcriptoma , Microambiente Tumoral/genética
2.
Cell ; 173(2): 371-385.e18, 2018 04 05.
Artículo en Inglés | MEDLINE | ID: mdl-29625053

RESUMEN

Identifying molecular cancer drivers is critical for precision oncology. Multiple advanced algorithms to identify drivers now exist, but systematic attempts to combine and optimize them on large datasets are few. We report a PanCancer and PanSoftware analysis spanning 9,423 tumor exomes (comprising all 33 of The Cancer Genome Atlas projects) and using 26 computational tools to catalog driver genes and mutations. We identify 299 driver genes with implications regarding their anatomical sites and cancer/cell types. Sequence- and structure-based analyses identified >3,400 putative missense driver mutations supported by multiple lines of evidence. Experimental validation confirmed 60%-85% of predicted mutations as likely drivers. We found that >300 MSI tumors are associated with high PD-1/PD-L1, and 57% of tumors analyzed harbor putative clinically actionable events. Our study represents the most comprehensive discovery of cancer genes and mutations to date and will serve as a blueprint for future biological and clinical endeavors.


Asunto(s)
Neoplasias/patología , Algoritmos , Antígeno B7-H1/genética , Biología Computacional , Bases de Datos Genéticas , Entropía , Humanos , Inestabilidad de Microsatélites , Mutación , Neoplasias/genética , Neoplasias/inmunología , Análisis de Componente Principal , Receptor de Muerte Celular Programada 1/genética
4.
Nucleic Acids Res ; 51(D1): D1242-D1248, 2023 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-36259664

RESUMEN

Extensive in vitro cancer drug screening datasets have enabled scientists to identify biomarkers and develop machine learning models for predicting drug sensitivity. While most advancements have focused on omics profiles, cancer drug sensitivity scores precalculated by the original sources are often used as-is, without consideration for variabilities between studies. It is well-known that significant inconsistencies exist between the drug sensitivity scores across datasets due to differences in experimental setups and preprocessing methods used to obtain the sensitivity scores. As a result, many studies opt to focus only on a single dataset, leading to underutilization of available data and a limited interpretation of cancer pharmacogenomics analysis. To overcome these caveats, we have developed CREAMMIST (https://creammist.mtms.dev), an integrative database that enables users to obtain an integrative dose-response curve, to capture uncertainty (or high certainty when multiple datasets well align) across five widely used cancer cell-line drug-response datasets. We utilized the Bayesian framework to systematically integrate all available dose-response values across datasets (>14 millions dose-response data points). CREAMMIST provides easy-to-use statistics derived from the integrative dose-response curves for various downstream analyses such as identifying biomarkers, selecting drug concentrations for experiments, and training robust machine learning models.


Asunto(s)
Antineoplásicos , Bases de Datos Factuales , Neoplasias , Humanos , Antineoplásicos/farmacología , Antineoplásicos/uso terapéutico , Teorema de Bayes , Biomarcadores , Aprendizaje Automático , Neoplasias/tratamiento farmacológico , Neoplasias/genética
5.
BMC Bioinformatics ; 25(Suppl 1): 153, 2024 Apr 16.
Artículo en Inglés | MEDLINE | ID: mdl-38627615

RESUMEN

BACKGROUND: With the rapid increase in throughput of long-read sequencing technologies, recent studies have explored their potential for taxonomic classification by using alignment-based approaches to reduce the impact of higher sequencing error rates. While alignment-based methods are generally slower, k-mer-based taxonomic classifiers can overcome this limitation, potentially at the expense of lower sensitivity for strains and species that are not in the database. RESULTS: We present MetageNN, a memory-efficient long-read taxonomic classifier that is robust to sequencing errors and missing genomes. MetageNN is a neural network model that uses short k-mer profiles of sequences to reduce the impact of distribution shifts on error-prone long reads. Benchmarking MetageNN against other machine learning approaches for taxonomic classification (GeNet) showed substantial improvements with long-read data (20% improvement in F1 score). By utilizing nanopore sequencing data, MetageNN exhibits improved sensitivity in situations where the reference database is incomplete. It surpasses the alignment-based MetaMaps and MEGAN-LR, as well as the k-mer-based Kraken2 tools, with improvements of 100%, 36%, and 23% respectively at the read-level analysis. Notably, at the community level, MetageNN consistently demonstrated higher sensitivities than the previously mentioned tools. Furthermore, MetageNN requires < 1/4th of the database storage used by Kraken2, MEGAN-LR and MMseqs2 and is > 7× faster than MetaMaps and GeNet and > 2× faster than MEGAN-LR and MMseqs2. CONCLUSION: This proof of concept work demonstrates the utility of machine-learning-based methods for taxonomic classification using long reads. MetageNN can be used on sequences not classified by conventional methods and offers an alternative approach for memory-efficient classifiers that can be optimized further.


Asunto(s)
Metagenómica , Viverridae , Animales , Metagenómica/métodos , Redes Neurales de la Computación , Metagenoma , Aprendizaje Automático , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos
6.
Bioinformatics ; 37(Supplement_1): i76-i83, 2021 Aug 04.
Artículo en Inglés | MEDLINE | ID: mdl-34000002

RESUMEN

MOTIVATION: Large-scale cancer omics studies have highlighted the diversity of patient molecular profiles and the importance of leveraging this information to deliver the right drug to the right patient at the right time. Key challenges in learning predictive models for this include the high-dimensionality of omics data and heterogeneity in biological and clinical factors affecting patient response. The use of multi-task learning techniques has been widely explored to address dataset limitations for in vitro drug response models, while domain adaptation (DA) has been employed to extend them to predict in vivo response. In both of these transfer learning settings, noisy data for some tasks (or domains) can substantially reduce the performance for others compared to single-task (domain) learners, i.e. lead to negative transfer (NT). RESULTS: We describe a novel multi-task unsupervised DA method (TUGDA) that addresses these limitations in a unified framework by quantifying uncertainty in predictors and weighting their influence on shared feature representations. TUGDA's ability to rely more on predictors with low-uncertainty allowed it to notably reduce cases of NT for in vitro models (94% overall) compared to state-of-the-art methods. For DA to in vivo settings, TUGDA improved over previous methods for patient-derived xenografts (9 out of 14 drugs) as well as patient datasets (significant associations in 9 out of 22 drugs). TUGDA's ability to avoid NT thus provides a key capability as we try to integrate diverse drug-response datasets to build consistent predictive models with in vivo utility. AVAILABILITYAND IMPLEMENTATION: https://github.com/CSB5/TUGDA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

7.
Bioinformatics ; 34(22): 3907-3914, 2018 11 15.
Artículo en Inglés | MEDLINE | ID: mdl-29868820

RESUMEN

Motivation: As we move toward an era of precision medicine, the ability to predict patient-specific drug responses in cancer based on molecular information such as gene expression data represents both an opportunity and a challenge. In particular, methods are needed that can accommodate the high-dimensionality of data to learn interpretable models capturing drug response mechanisms, as well as providing robust predictions across datasets. Results: We propose a method based on ideas from 'recommender systems' (CaDRReS) that predicts cancer drug responses for unseen cell-lines/patients based on learning projections for drugs and cell-lines into a latent 'pharmacogenomic' space. Comparisons with other proposed approaches for this problem based on large public datasets (CCLE and GDSC) show that CaDRReS provides consistently good models and robust predictions even across unseen patient-derived cell-line datasets. Analysis of the pharmacogenomic spaces inferred by CaDRReS also suggests that they can be used to understand drug mechanisms, identify cellular subtypes and further characterize drug-pathway associations. Availability and implementation: Source code and datasets are available at https://github.com/CSB5/CaDRReS. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Antineoplásicos/uso terapéutico , Neoplasias , Humanos , Neoplasias/tratamiento farmacológico , Farmacogenética , Medicina de Precisión , Programas Informáticos
8.
Bioinformatics ; 31(12): i250-7, 2015 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-26072489

RESUMEN

In this article, we described a new database framework to perform integrative "gene-set, network, and pathway analysis" (GNPA). In this framework, we integrated heterogeneous data on pathways, annotated list, and gene-sets (PAGs) into a PAG electronic repository (PAGER). PAGs in the PAGER database are organized into P-type, A-type and G-type PAGs with a three-letter-code standard naming convention. The PAGER database currently compiles 44 313 genes from 5 species including human, 38 663 PAGs, 324 830 gene-gene relationships and two types of 3 174 323 PAG-PAG regulatory relationships-co-membership based and regulatory relationship based. To help users assess each PAG's biological relevance, we developed a cohesion measure called Cohesion Coefficient (CoCo), which is capable of disambiguating between biologically significant PAGs and random PAGs with an area-under-curve performance of 0.98. PAGER database was set up to help users to search and retrieve PAGs from its online web interface. PAGER enable advanced users to build PAG-PAG regulatory networks that provide complementary biological insights not found in gene set analysis or individual gene network analysis. We provide a case study using cancer functional genomics data sets to demonstrate how integrative GNPA help improve network biology data coverage and therefore biological interpretability. The PAGER database can be accessible openly at http://discovery.informatics.iupui.edu/PAGER/.


Asunto(s)
Bases de Datos Genéticas , Redes Reguladoras de Genes , Humanos , Programas Informáticos
9.
BMC Genomics ; 16 Suppl 11: S4, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26576648

RESUMEN

BACKGROUND: Traditional approaches to studying molecular networks are based on linking genes or proteins. Higher-level networks linking gene sets or pathways have been proposed recently. Several types of gene set networks have been used to study complex molecular networks such as co-membership gene set networks (M-GSNs) and co-enrichment gene set networks (E-GSNs). Gene set networks are useful for studying biological mechanism of diseases and drug perturbations. RESULTS: In this study, we proposed a new approach for constructing directed, regulatory gene set networks (R-GSNs) to reveal novel relationships among gene sets or pathways. We collected several gene set collections and high-quality gene regulation data in order to construct R-GSNs in a comparative study with co-membership gene set networks (M-GSNs). We described a method for constructing both global and disease-specific R-GSNs and determining their significance. To demonstrate the potential applications to disease biology studies, we constructed and analysed an R-GSN specifically built for Alzheimer's disease. CONCLUSIONS: R-GSNs can provide new biological insights complementary to those derived at the protein regulatory network level or M-GSNs. When integrated properly to functional genomics data, R-GSNs can help enable future research on systems biology and translational bioinformatics.


Asunto(s)
Redes Reguladoras de Genes , Biología de Sistemas/métodos , Enfermedad de Alzheimer/genética , Bases de Datos Genéticas , Humanos
10.
Lancet Microbe ; 2024 Jul 12.
Artículo en Inglés | MEDLINE | ID: mdl-39008997

RESUMEN

BACKGROUND: The emerging fungal pathogen Candida auris poses a serious threat to global public health due to its worldwide distribution, multidrug resistance, high transmissibility, propensity to cause outbreaks, and high mortality. We aimed to characterise three unusual C auris isolates detected in Singapore, and to determine whether they constitute a novel clade distinct from all previously known C auris clades (I-V). METHODS: In this genotypic and phenotypic study, we characterised three C auris clinical isolates, which were cultured from epidemiologically unlinked inpatients at a large tertiary hospital in Singapore. The index isolate was detected in April, 2023. We performed whole-genome sequencing (WGS) and obtained hybrid assemblies of these C auris isolates. The complete genomes were compared with representative genomes of all known C auris clades. To provide a global context, 3651 international WGS data from the National Center for Biotechnology Information (NCBI) database were included in a high-resolution single nucleotide polymorphism (SNP) analysis. Antifungal susceptibility testing was done and antifungal resistance genes, mating-type locus, and chromosomal rearrangements were characterised from the WGS data of the three investigated isolates. We further implemented Bayesian logistic regression models to classify isolates into known clades and simulate the automatic detection of isolates belonging to novel clades as their WGS data became available. FINDINGS: The three investigated isolates were separated by at least 37 000 SNPs (range 37 000-236 900) from all existing C auris clades. These isolates had opposite mating-type allele and different chromosomal rearrangements when compared with their closest clade IV relatives. The isolates were susceptible to all tested antifungals. Therefore, we propose that these isolates represent a new clade of C auris, clade VI. Furthermore, an independent WGS dataset from Bangladesh, accessed via the NCBI Sequence Read Archive, was found to belong to this new clade. As a proof-of-concept, our Bayesian logistic regression model was able to flag these outlier genomes as a potential new clade. INTERPRETATION: The discovery of a new C auris clade in Singapore and Bangladesh in the Indomalayan zone, showing a close relationship to clade IV members most commonly found in South America, highlights the unknown genetic diversity and origin of C auris, particularly in under-resourced regions. Active surveillance in clinical settings, along with effective sequencing strategies and downstream analysis, will be essential in the identification of novel strains, tracking of transmission, and containment of adverse clinical effects of C auris infections. FUNDING: Duke-NUS Academic Medical Center Nurturing Clinician Researcher Scheme, and the Genedant-GIS Innovation Program.

11.
Am J Infect Control ; 51(4): 413-419, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-37010998

RESUMEN

BACKGROUND: Temporary isolation wards have been introduced to meet demands for airborne-infection-isolation-rooms (AIIRs) during the COVID-19 pandemic. Environmental sampling and outbreak investigation was conducted in temporary isolation wards converted from general wards and/or prefabricated containers, in order to evaluate the ability of such temporary isolation wards to safely manage COVID-19 cases over a period of sustained use. METHODS: Environmental sampling for SARS-CoV-2 RNA was conducted in temporary isolation ward rooms constructed from pre-fabricated containers (N = 20) or converted from normal-pressure general wards (N = 47). Whole genome sequencing (WGS) was utilized to ascertain health care-associated transmission when clusters were reported amongst HCWs working in isolation areas from July 2020 to December 2021. RESULTS: A total of 355 environmental swabs were collected; 22.4% (15/67) of patients had at least one positive environmental sample. Patients housed in temporary isolation ward rooms constructed from pre-fabricated containers (adjusted-odds-ratio, aOR = 10.46, 95% CI = 3.89-58.91, P = .008) had greater odds of detectable environmental contamination, with positive environmental samples obtained from the toilet area (60.0%, 12/20) and patient equipment, including electronic devices used for patient communication (8/20, 40.0%). A single HCW cluster was reported amongst staff working in the temporary isolation ward constructed from pre-fabricated containers; however, health care-associated transmission was deemed unlikely based on WGS and/or epidemiological investigations. CONCLUSION: Environmental contamination with SARS-CoV-2 RNA was observed in temporary isolation wards, particularly from the toilet area and smartphones used for patient communication. However, despite intensive surveillance, no healthcare-associated transmission was detected in temporary isolation wards over 18 months of prolonged usage, demonstrating their capacity for sustained use during succeeding pandemic waves.


Asunto(s)
COVID-19 , Humanos , SARS-CoV-2 , Pandemias , ARN Viral , Hospitales
12.
Infect Control Hosp Epidemiol ; 44(6): 1014-1018, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-35473629

RESUMEN

Sporadic clusters of healthcare-associated coronavirus disease 2019 (COVID-19) occurred despite intense rostered routine surveillance and a highly vaccinated healthcare worker (HCW) population, during a community surge of the severe acute respiratory coronavirus virus 2 (SARS-CoV-2) B.1.617.2 δ (delta) variant. Genomic analysis facilitated timely cluster detection and uncovered additional linkages via HCWs moving between clinical areas and among HCWs sharing a common lunch area, enabling early intervention.


Asunto(s)
COVID-19 , Virosis , Humanos , SARS-CoV-2/genética , Hospitales
13.
Microbiol Spectr ; 10(3): e0079122, 2022 06 29.
Artículo en Inglés | MEDLINE | ID: mdl-35543562

RESUMEN

Immunocompromised hosts with prolonged severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infections have been implicated in the emergence of highly mutated SARS-CoV-2 variants. Spike mutations are of particular concern because the spike protein is a key target for vaccines and therapeutics for SARS-CoV-2. Here, we report the emergence of spike mutations in two immunocompromised patients with persistent SARS-CoV-2 reverse transcription (RT)-PCR positivity (>90 days). Whole-genome sequence analysis of samples obtained before and after coronavirus disease 2019 (COVID-19) treatment demonstrated the development of partial therapeutic escape mutations and increased intrahost SARS-CoV-2 genome diversity over time. This case series thus adds to the accumulating evidence that immunocompromised hosts with persistent infections are important sources of SARS-CoV-2 genome diversity and, in particular, clinically important spike protein diversity. IMPORTANCE The emergence of clinically important mutations described in this report highlights the need for sustained vigilance and containment measures when managing immunocompromised patients with persistent COVID-19. Even as jurisdictions across the globe start lifting pandemic control measures, immunocompromised patients with persistent COVID-19 constitute a unique group that requires close genomic monitoring and enhanced infection control measures, to ensure early detection and containment of mutations and variants of therapeutic and public health importance.


Asunto(s)
COVID-19 , SARS-CoV-2 , Glicoproteína de la Espiga del Coronavirus , COVID-19/virología , Humanos , Huésped Inmunocomprometido , Mutación , SARS-CoV-2/genética , Glicoproteína de la Espiga del Coronavirus/genética
14.
Microbiol Spectr ; 10(1): e0222321, 2022 02 23.
Artículo en Inglés | MEDLINE | ID: mdl-35019683

RESUMEN

Rapid onsite whole-genome sequencing of two suspected severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) N gene diagnostic escape samples revealed a previously unreported N gene point mutation at genome position 29195. Because the G29195T mutation occurs within a region probed by a commonly referenced U.S. CDC N gene reverse transcription (RT)-PCR assay, we hypothesize that the G29195T mutation rendered the N gene target of a proprietary commercial assay undetectable. The putative diagnostic escape G29195T mutation demonstrates the need for nearly real-time surveillance, as emergence of a novel SARS-CoV-2 variant with the potential to escape diagnostic tests continues to be a threat. IMPORTANCE Accurate diagnostic detection of SARS-CoV-2 currently depends on the large-scale deployment of RT-PCR assays. SARS-CoV-2 RT-PCR assays target predetermined regions in the viral genomes by complementary binding of primers and probes to nucleic acid sequences in the clinical samples. Potential diagnostic escapes, such as those of clinical samples harboring the G29195T mutation, may result in false-negative SARS-CoV-2 RT-PCR results. The rapid detection and sharing of potential diagnostic escapes are essential for diagnostic laboratories and manufacturers around the world, to optimize their assays as SARS-CoV-2 continues to evolve.


Asunto(s)
COVID-19/diagnóstico , Mutación Puntual , SARS-CoV-2/genética , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa
15.
Front Med (Lausanne) ; 9: 964640, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35979220

RESUMEN

Shigella flexneri is a major diarrhoeal pathogen, and the emergence of multidrug-resistant S. flexneri is of public health concern. We report the detection of a clonal cluster of multidrug-resistant serotype 1c (7a) S. flexneri in Singapore in April 2022. Long-read whole-genome sequence analysis found five S. flexneri isolates to be clonal and harboring the extended-spectrum ß-lactamases bla CTX-M-15 and bla TEM-1. The isolates were phenotypically resistant to ceftriaxone and had intermediate susceptibility to ciprofloxacin. The S. flexneri clonal cluster was first detected in a tertiary hospital diagnostic laboratory (sentinel-site), to which the S. flexneri isolates were sent from other hospitals for routine serogrouping. Long-read whole-genome sequence analysis was performed in the sentinel-site near real-time in view of the unusually high number of S. flexneri isolates received within a short time frame. This study demonstrates that near real-time sentinel-site sequence-based surveillance of convenience samples can detect possible clonal outbreak clusters and may provide alerts useful for public health mitigations at the earliest possible opportunity.

16.
Front Med (Lausanne) ; 8: 790662, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34970567

RESUMEN

Background: The ongoing COVID-19 pandemic is a global health crisis caused by the spread of SARS-CoV-2. Establishing links between known cases is crucial for the containment of COVID-19. In the healthcare setting, the ability to rapidly identify potential healthcare-associated COVID-19 clusters is critical for healthcare worker and patient safety. Increasing sequencing technology accessibility has allowed routine clinical diagnostic laboratories to sequence SARS-CoV-2 in clinical samples. However, these laboratories often lack specialized informatics skills required for sequence analysis. Therefore, an on-site, intuitive sequence analysis tool that enables clinical laboratory users to analyze multiple genomes and derive clinically relevant information within an actionable timeframe is needed. Results: We propose CalmBelt, an integrated framework for on-site whole genome characterization and outbreak tracking. Nanopore sequencing technology enables on-site sequencing and construction of draft genomes for multiple SARS-CoV-2 samples within 12 h. CalmBelt's interactive interface allows users to analyse multiple SARS-CoV-2 genomes by utilizing whole genome information, collection date, and additional information such as predefined potential clusters from epidemiological investigations. CalmBelt also integrates established SARS-CoV-2 nomenclature assignments, GISAID clades and PANGO lineages, allowing users to visualize relatedness between samples together with the nomenclatures. We demonstrated multiple use cases including investigation of potential hospital transmission, mining transmission patterns in a large outbreak, and monitoring possible diagnostic-escape. Conclusions: This paper presents an on-site rapid framework for SARS-CoV-2 whole genome characterization. CalmBelt interactive web application allows non-technical users, such as routine clinical laboratory users in hospitals to determine SARS-CoV-2 variants of concern, as well as investigate the presence of potential transmission clusters. The framework is designed to be compatible with routine usage in clinical laboratories as it only requires readily available sample data, and generates information that impacts immediate infection control mitigations.

17.
Genome Med ; 13(1): 189, 2021 12 16.
Artículo en Inglés | MEDLINE | ID: mdl-34915921

RESUMEN

While understanding molecular heterogeneity across patients underpins precision oncology, there is increasing appreciation for taking intra-tumor heterogeneity into account. Based on large-scale analysis of cancer omics datasets, we highlight the importance of intra-tumor transcriptomic heterogeneity (ITTH) for predicting clinical outcomes. Leveraging single-cell RNA-seq (scRNA-seq) with a recommender system (CaDRReS-Sc), we show that heterogeneous gene-expression signatures can predict drug response with high accuracy (80%). Using patient-proximal cell lines, we established the validity of CaDRReS-Sc's monotherapy (Pearson r>0.6) and combinatorial predictions targeting clone-specific vulnerabilities (>10% improvement). Applying CaDRReS-Sc to rapidly expanding scRNA-seq compendiums can serve as in silico screen to accelerate drug-repurposing studies. Availability: https://github.com/CSB5/CaDRReS-Sc .


Asunto(s)
Neoplasias , Transcriptoma , Células Clonales , Perfilación de la Expresión Génica , Humanos , Neoplasias/tratamiento farmacológico , Neoplasias/genética , Medicina de Precisión , Análisis de Secuencia de ARN , Análisis de la Célula Individual , Programas Informáticos
18.
Cancer Res ; 78(1): 290-301, 2018 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-29259006

RESUMEN

Existing cancer driver prediction methods are based on very different assumptions and each of them can detect only a particular subset of driver genes. Here we perform a comprehensive assessment of 18 driver prediction methods on more than 3,400 tumor samples from 15 cancer types, all to determine their suitability in guiding precision medicine efforts. We categorized these methods into five groups: functional impact on proteins in general (FI) or specific to cancer (FIC), cohort-based analysis for recurrent mutations (CBA), mutations with expression correlation (MEC), and methods that use gene interaction network-based analysis (INA). The performance of driver prediction methods varied considerably, with concordance with a gold standard varying from 9% to 68%. FI methods showed relatively poor performance (concordance <22%), while CBA methods provided conservative results but required large sample sizes for high sensitivity. INA methods, through the integration of genomic and transcriptomic data, and FIC methods, by training cancer-specific models, provided the best trade-off between sensitivity and specificity. As the methods were found to predict different subsets of driver genes, we propose a novel consensus-based approach, ConsensusDriver, which significantly improves the quality of predictions (20% increase in sensitivity) in patient subgroups or even individual patients. Consensus-based methods like ConsensusDriver promise to harness the strengths of different driver prediction paradigms.Significance: These findings assess state-of-the-art cancer driver prediction methods and develop a new and improved consensus-based approach for use in precision oncology. Cancer Res; 78(1); 290-301. ©2017 AACR.


Asunto(s)
Algoritmos , Neoplasias/genética , Medicina de Precisión/métodos , Biología Computacional/métodos , Genes , Humanos , Mutación , Transcriptoma
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA