Búsqueda | Portal de Búsqueda de la BVS Colombia

1.

Perspective on Oncogenic Processes at the End of the Beginning of Cancer Genomics.

Ding, Li; Bailey, Matthew H; Porta-Pardo, Eduard; Thorsson, Vesteinn; Colaprico, Antonio; Bertrand, Denis; Gibbs, David L; Weerasinghe, Amila; Huang, Kuan-Lin; Tokheim, Collin; Cortés-Ciriano, Isidro; Jayasinghe, Reyka; Chen, Feng; Yu, Lihua; Sun, Sam; Olsen, Catharina; Kim, Jaegil; Taylor, Alison M; Cherniack, Andrew D; Akbani, Rehan; Suphavilai, Chayaporn; Nagarajan, Niranjan; Stuart, Joshua M; Mills, Gordon B; Wyczalkowski, Matthew A; Vincent, Benjamin G; Hutter, Carolyn M; Zenklusen, Jean Claude; Hoadley, Katherine A; Wendl, Michael C; Shmulevich, Llya; Lazar, Alexander J; Wheeler, David A; Getz, Gad.

Cell ; 173(2): 305-320.e10, 2018 04 05.

Artículo en Inglés | MEDLINE | ID: mdl-29625049

RESUMEN

The Cancer Genome Atlas (TCGA) has catalyzed systematic characterization of diverse genomic alterations underlying human cancers. At this historic junction marking the completion of genomic characterization of over 11,000 tumors from 33 cancer types, we present our current understanding of the molecular processes governing oncogenesis. We illustrate our insights into cancer through synthesis of the findings of the TCGA PanCancer Atlas project on three facets of oncogenesis: (1) somatic driver mutations, germline pathogenic variants, and their interactions in the tumor; (2) the influence of the tumor genome and epigenome on transcriptome and proteome; and (3) the relationship between tumor and the microenvironment, including implications for drugs targeting driver events and immunotherapies. These results will anchor future characterization of rare and common tumor types, primary and relapsed tumors, and cancers across ancestry groups and will guide the deployment of clinical genomic sequencing.

Asunto(s)

Carcinogénesis/genética , Genómica , Neoplasias/patología , Reparación del ADN/genética , Bases de Datos Genéticas , Genes Relacionados con las Neoplasias , Humanos , Redes y Vías Metabólicas/genética , Inestabilidad de Microsatélites , Mutación , Neoplasias/genética , Neoplasias/inmunología , Transcriptoma , Microambiente Tumoral/genética

2.

Comprehensive Characterization of Cancer Driver Genes and Mutations.

Bailey, Matthew H; Tokheim, Collin; Porta-Pardo, Eduard; Sengupta, Sohini; Bertrand, Denis; Weerasinghe, Amila; Colaprico, Antonio; Wendl, Michael C; Kim, Jaegil; Reardon, Brendan; Ng, Patrick Kwok-Shing; Jeong, Kang Jin; Cao, Song; Wang, Zixing; Gao, Jianjiong; Gao, Qingsong; Wang, Fang; Liu, Eric Minwei; Mularoni, Loris; Rubio-Perez, Carlota; Nagarajan, Niranjan; Cortés-Ciriano, Isidro; Zhou, Daniel Cui; Liang, Wen-Wei; Hess, Julian M; Yellapantula, Venkata D; Tamborero, David; Gonzalez-Perez, Abel; Suphavilai, Chayaporn; Ko, Jia Yu; Khurana, Ekta; Park, Peter J; Van Allen, Eliezer M; Liang, Han; Lawrence, Michael S; Godzik, Adam; Lopez-Bigas, Nuria; Stuart, Josh; Wheeler, David; Getz, Gad; Chen, Ken; Lazar, Alexander J; Mills, Gordon B; Karchin, Rachel; Ding, Li.

Cell ; 173(2): 371-385.e18, 2018 04 05.

Artículo en Inglés | MEDLINE | ID: mdl-29625053

RESUMEN

Identifying molecular cancer drivers is critical for precision oncology. Multiple advanced algorithms to identify drivers now exist, but systematic attempts to combine and optimize them on large datasets are few. We report a PanCancer and PanSoftware analysis spanning 9,423 tumor exomes (comprising all 33 of The Cancer Genome Atlas projects) and using 26 computational tools to catalog driver genes and mutations. We identify 299 driver genes with implications regarding their anatomical sites and cancer/cell types. Sequence- and structure-based analyses identified >3,400 putative missense driver mutations supported by multiple lines of evidence. Experimental validation confirmed 60%-85% of predicted mutations as likely drivers. We found that >300 MSI tumors are associated with high PD-1/PD-L1, and 57% of tumors analyzed harbor putative clinically actionable events. Our study represents the most comprehensive discovery of cancer genes and mutations to date and will serve as a blueprint for future biological and clinical endeavors.

Asunto(s)

Neoplasias/patología , Algoritmos , Antígeno B7-H1/genética , Biología Computacional , Bases de Datos Genéticas , Entropía , Humanos , Inestabilidad de Microsatélites , Mutación , Neoplasias/genética , Neoplasias/inmunología , Análisis de Componente Principal , Receptor de Muerte Celular Programada 1/genética

3.

Specialized replication mechanisms maintain genome stability at human centromeres.

Scelfo, Andrea; Angrisani, Annapaola; Grillo, Marco; Barnes, Bethany M; Muyas, Francesc; Sauer, Carolin M; Leung, Chin Wei Brian; Dumont, Marie; Grison, Marine; Mazaud, David; Garnier, Mickaël; Guintini, Laetitia; Nelson, Louisa; Esashi, Fumiko; Cortés-Ciriano, Isidro; Taylor, Stephen S; Déjardin, Jérôme; Wilhelm, Therese; Fachinetti, Daniele.

Mol Cell ; 84(6): 1003-1020.e10, 2024 Mar 21.

Artículo en Inglés | MEDLINE | ID: mdl-38359824

RESUMEN

The high incidence of whole-arm chromosome aneuploidy and translocations in tumors suggests instability of centromeres, unique loci built on repetitive sequences and essential for chromosome separation. The causes behind this fragility and the mechanisms preserving centromere integrity remain elusive. We show that replication stress, hallmark of pre-cancerous lesions, promotes centromeric breakage in mitosis, due to spindle forces and endonuclease activities. Mechanistically, we unveil unique dynamics of the centromeric replisome distinct from the rest of the genome. Locus-specific proteomics identifies specialized DNA replication and repair proteins at centromeres, highlighting them as difficult-to-replicate regions. The translesion synthesis pathway, along with other factors, acts to sustain centromere replication and integrity. Prolonged stress causes centromeric alterations like ruptures and translocations, as observed in ovarian cancer models experiencing replication stress. This study provides unprecedented insights into centromere replication and integrity, proposing mechanistic insights into the origins of centromere alterations leading to abnormal cancerous karyotypes.

Asunto(s)

Centrómero , Secuencias Repetitivas de Ácidos Nucleicos , Humanos , Centrómero/genética , Mitosis/genética , Inestabilidad Genómica

4.

Mitotic clustering of pulverized chromosomes from micronuclei.

Lin, Yu-Fen; Hu, Qing; Mazzagatti, Alice; Valle-Inclán, Jose Espejo; Maurais, Elizabeth G; Dahiya, Rashmi; Guyer, Alison; Sanders, Jacob T; Engel, Justin L; Nguyen, Giaochau; Bronder, Daniel; Bakhoum, Samuel F; Cortés-Ciriano, Isidro; Ly, Peter.

Nature ; 618(7967): 1041-1048, 2023 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-37165191

RESUMEN

Complex genome rearrangements can be generated by the catastrophic pulverization of missegregated chromosomes trapped within micronuclei through a process known as chromothripsis1-5. As each chromosome contains a single centromere, it remains unclear how acentric fragments derived from shattered chromosomes are inherited between daughter cells during mitosis6. Here we tracked micronucleated chromosomes with live-cell imaging and show that acentric fragments cluster in close spatial proximity throughout mitosis for asymmetric inheritance by a single daughter cell. Mechanistically, the CIP2A-TOPBP1 complex prematurely associates with DNA lesions within ruptured micronuclei during interphase, which poises pulverized chromosomes for clustering upon mitotic entry. Inactivation of CIP2A-TOPBP1 caused acentric fragments to disperse throughout the mitotic cytoplasm, stochastically partition into the nucleus of both daughter cells and aberrantly misaccumulate as cytoplasmic DNA. Mitotic clustering facilitates the reassembly of acentric fragments into rearranged chromosomes lacking the extensive DNA copy-number losses that are characteristic of canonical chromothripsis. Comprehensive analysis of pan-cancer genomes revealed clusters of DNA copy-number-neutral rearrangements-termed balanced chromothripsis-across diverse tumour types resulting in the acquisition of known cancer driver events. Thus, distinct patterns of chromothripsis can be explained by the spatial clustering of pulverized chromosomes from micronuclei.

Asunto(s)

Cromosomas Humanos , Cromotripsis , Micronúcleos con Defecto Cromosómico , Mitosis , Humanos , Centrómero , Cromosomas Humanos/genética , ADN/genética , ADN/metabolismo , Variaciones en el Número de Copia de ADN , Interfase , Mitosis/genética , Neoplasias/genética

5.

ERα-associated translocations underlie oncogene amplifications in breast cancer.

Lee, Jake June-Koo; Jung, Youngsook Lucy; Cheong, Taek-Chin; Espejo Valle-Inclan, Jose; Chu, Chong; Gulhan, Doga C; Ljungström, Viktor; Jin, Hu; Viswanadham, Vinayak V; Watson, Emma V; Cortés-Ciriano, Isidro; Elledge, Stephen J; Chiarle, Roberto; Pellman, David; Park, Peter J.

Nature ; 618(7967): 1024-1032, 2023 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-37198482

RESUMEN

Focal copy-number amplification is an oncogenic event. Although recent studies have revealed the complex structure1-3 and the evolutionary trajectories4 of oncogene amplicons, their origin remains poorly understood. Here we show that focal amplifications in breast cancer frequently derive from a mechanism-which we term translocation-bridge amplification-involving inter-chromosomal translocations that lead to dicentric chromosome bridge formation and breakage. In 780 breast cancer genomes, we observe that focal amplifications are frequently connected to each other by inter-chromosomal translocations at their boundaries. Subsequent analysis indicates the following model: the oncogene neighbourhood is translocated in G1 creating a dicentric chromosome, the dicentric chromosome is replicated, and as dicentric sister chromosomes segregate during mitosis, a chromosome bridge is formed and then broken, with fragments often being circularized in extrachromosomal DNAs. This model explains the amplifications of key oncogenes, including ERBB2 and CCND1. Recurrent amplification boundaries and rearrangement hotspots correlate with oestrogen receptor binding in breast cancer cells. Experimentally, oestrogen treatment induces DNA double-strand breaks in the oestrogen receptor target regions that are repaired by translocations, suggesting a role of oestrogen in generating the initial translocations. A pan-cancer analysis reveals tissue-specific biases in mechanisms initiating focal amplifications, with the breakage-fusion-bridge cycle prevalent in some and the translocation-bridge amplification in others, probably owing to the different timing of DNA break repair. Our results identify a common mode of oncogene amplification and propose oestrogen as its mechanistic origin in breast cancer.

Asunto(s)

Neoplasias de la Mama , Receptor alfa de Estrógeno , Amplificación de Genes , Oncogenes , Translocación Genética , Femenino , Humanos , Neoplasias de la Mama/genética , Receptor alfa de Estrógeno/metabolismo , Estrógenos/metabolismo , Oncogenes/genética , Translocación Genética/genética , Genoma Humano/genética , Roturas del ADN de Doble Cadena , Especificidad de Órganos

6.

Computational analysis of cancer genome sequencing data.

Cortés-Ciriano, Isidro; Gulhan, Doga C; Lee, Jake June-Koo; Melloni, Giorgio E M; Park, Peter J.

Nat Rev Genet ; 23(5): 298-314, 2022 05.

Artículo en Inglés | MEDLINE | ID: mdl-34880424

RESUMEN

Distilling biologically meaningful information from cancer genome sequencing data requires comprehensive identification of somatic alterations using rigorous computational methods. As the amount and complexity of sequencing data have increased, so has the number of tools for analysing them. Here, we describe the main steps involved in the bioinformatic analysis of cancer genomes, review key algorithmic developments and highlight popular tools and emerging technologies. These tools include those that identify point mutations, copy number alterations, structural variations and mutational signatures in cancer genomes. We also discuss issues in experimental design, the strengths and limitations of sequencing modalities and methodological challenges for the future.

Asunto(s)

Neoplasias , Mapeo Cromosómico , Biología Computacional , Variaciones en el Número de Copia de ADN , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Mutación , Neoplasias/genética

7.

Comprehensive Characterization of Cancer Driver Genes and Mutations.

Bailey, Matthew H; Tokheim, Collin; Porta-Pardo, Eduard; Sengupta, Sohini; Bertrand, Denis; Weerasinghe, Amila; Colaprico, Antonio; Wendl, Michael C; Kim, Jaegil; Reardon, Brendan; Kwok-Shing Ng, Patrick; Jeong, Kang Jin; Cao, Song; Wang, Zixing; Gao, Jianjiong; Gao, Qingsong; Wang, Fang; Liu, Eric Minwei; Mularoni, Loris; Rubio-Perez, Carlota; Nagarajan, Niranjan; Cortés-Ciriano, Isidro; Zhou, Daniel Cui; Liang, Wen-Wei; Hess, Julian M; Yellapantula, Venkata D; Tamborero, David; Gonzalez-Perez, Abel; Suphavilai, Chayaporn; Ko, Jia Yu; Khurana, Ekta; Park, Peter J; Van Allen, Eliezer M; Liang, Han; Lawrence, Michael S; Godzik, Adam; Lopez-Bigas, Nuria; Stuart, Josh; Wheeler, David; Getz, Gad; Chen, Ken; Lazar, Alexander J; Mills, Gordon B; Karchin, Rachel; Ding, Li.

Cell ; 174(4): 1034-1035, 2018 08 09.

Artículo en Inglés | MEDLINE | ID: mdl-30096302

8.

Noncoding mutations drive persistence of a founder preleukemic clone which initiates late relapse in T-ALL.

O'Connor, David; Valle-Inclán, Jose Espejo; Conde, Lucia; Bloye, Gianna; Rahman, Sunniyat; Costa, Joana R; Bartram, Jack; Adams, Stuart; Wright, Gary; Elrick, Hillary; Wall, Kerry; Dyer, Sara; Howell, Christopher; Jigoulina, Galina; Herrero, Javier; Cortes-Ciriano, Isidro; Moorman, Anthony V; Mansour, Marc R.

Blood ; 143(10): 933-937, 2024 Mar 07.

Artículo en Inglés | MEDLINE | ID: mdl-38194681

RESUMEN

ABSTRACT: T-ALL relapse usually occurs early but can occur much later, which has been suggested to represent a de novo leukemia. However, we conclusively demonstrate late relapse can evolve from a pre-leukemic subclone harbouring a non-coding mutation that evades initial chemotherapy.

Asunto(s)

Leucemia-Linfoma de Células T del Adulto , Leucemia-Linfoma Linfoblástico de Células T Precursoras , Humanos , Leucemia-Linfoma Linfoblástico de Células T Precursoras/genética , Mutación , Recurrencia , Enfermedad Crónica , Células Clonales

9.

Mechanisms and therapeutic implications of hypermutation in gliomas.

Touat, Mehdi; Li, Yvonne Y; Boynton, Adam N; Spurr, Liam F; Iorgulescu, J Bryan; Bohrson, Craig L; Cortes-Ciriano, Isidro; Birzu, Cristina; Geduldig, Jack E; Pelton, Kristine; Lim-Fat, Mary Jane; Pal, Sangita; Ferrer-Luna, Ruben; Ramkissoon, Shakti H; Dubois, Frank; Bellamy, Charlotte; Currimjee, Naomi; Bonardi, Juliana; Qian, Kenin; Ho, Patricia; Malinowski, Seth; Taquet, Leon; Jones, Robert E; Shetty, Aniket; Chow, Kin-Hoe; Sharaf, Radwa; Pavlick, Dean; Albacker, Lee A; Younan, Nadia; Baldini, Capucine; Verreault, Maïté; Giry, Marine; Guillerm, Erell; Ammari, Samy; Beuvon, Frédéric; Mokhtari, Karima; Alentorn, Agusti; Dehais, Caroline; Houillier, Caroline; Laigle-Donadey, Florence; Psimaras, Dimitri; Lee, Eudocia Q; Nayak, Lakshmi; McFaline-Figueroa, J Ricardo; Carpentier, Alexandre; Cornu, Philippe; Capelle, Laurent; Mathon, Bertrand; Barnholtz-Sloan, Jill S; Chakravarti, Arnab.

Nature ; 580(7804): 517-523, 2020 04.

Artículo en Inglés | MEDLINE | ID: mdl-32322066

RESUMEN

A high tumour mutational burden (hypermutation) is observed in some gliomas1-5; however, the mechanisms by which hypermutation develops and whether it predicts the response to immunotherapy are poorly understood. Here we comprehensively analyse the molecular determinants of mutational burden and signatures in 10,294 gliomas. We delineate two main pathways to hypermutation: a de novo pathway associated with constitutional defects in DNA polymerase and mismatch repair (MMR) genes, and a more common post-treatment pathway, associated with acquired resistance driven by MMR defects in chemotherapy-sensitive gliomas that recur after treatment with the chemotherapy drug temozolomide. Experimentally, the mutational signature of post-treatment hypermutated gliomas was recapitulated by temozolomide-induced damage in cells with MMR deficiency. MMR-deficient gliomas were characterized by a lack of prominent T cell infiltrates, extensive intratumoral heterogeneity, poor patient survival and a low rate of response to PD-1 blockade. Moreover, although bulk analyses did not detect microsatellite instability in MMR-deficient gliomas, single-cell whole-genome sequencing analysis of post-treatment hypermutated glioma cells identified microsatellite mutations. These results show that chemotherapy can drive the acquisition of hypermutated populations without promoting a response to PD-1 blockade and supports the diagnostic use of mutational burden and signatures in cancer.

Asunto(s)

Neoplasias Encefálicas/genética , Neoplasias Encefálicas/terapia , Glioma/genética , Glioma/terapia , Mutación , Animales , Antineoplásicos Alquilantes/farmacología , Antineoplásicos Alquilantes/uso terapéutico , Neoplasias Encefálicas/inmunología , Reparación de la Incompatibilidad de ADN/genética , Frecuencia de los Genes , Genoma Humano/efectos de los fármacos , Genoma Humano/genética , Glioma/inmunología , Humanos , Masculino , Ratones , Repeticiones de Microsatélite/efectos de los fármacos , Repeticiones de Microsatélite/genética , Mutagénesis/efectos de los fármacos , Mutación/efectos de los fármacos , Fenotipo , Pronóstico , Receptor de Muerte Celular Programada 1/antagonistas & inhibidores , Análisis de Secuencia de ADN , Temozolomida/farmacología , Temozolomida/uso terapéutico , Ensayos Antitumor por Modelo de Xenoinjerto

10.

The landscape of human SVA retrotransposons.

Chu, Chong; Lin, Eric W; Tran, Antuan; Jin, Hu; Ho, Natalie I; Veit, Alexander; Cortes-Ciriano, Isidro; Burns, Kathleen H; Ting, David T; Park, Peter J.

Nucleic Acids Res ; 51(21): 11453-11465, 2023 Nov 27.

Artículo en Inglés | MEDLINE | ID: mdl-37823611

RESUMEN

SINE-VNTR-Alu (SVA) retrotransposons are evolutionarily young and still-active transposable elements (TEs) in the human genome. Several pathogenic SVA insertions have been identified that directly mutate host genes to cause neurodegenerative and other types of diseases. However, due to their sequence heterogeneity and complex structures as well as limitations in sequencing techniques and analysis, SVA insertions have been less well studied compared to other mobile element insertions. Here, we identified polymorphic SVA insertions from 3646 whole-genome sequencing (WGS) samples of >150 diverse populations and constructed a polymorphic SVA insertion reference catalog. Using 20 long-read samples, we also assembled reference and polymorphic SVA sequences and characterized the internal hexamer/variable-number-tandem-repeat (VNTR) expansions as well as differing SVA activity for SVA subfamilies and human populations. In addition, we developed a module to annotate both reference and polymorphic SVA copies. By characterizing the landscape of both reference and polymorphic SVA retrotransposons, our study enables more accurate genotyping of these elements and facilitate the discovery of pathogenic SVA insertions.

Asunto(s)

Genoma Humano , Retroelementos , Humanos , Elementos Alu , Genoma Humano/genética , Repeticiones de Minisatélite/genética , Retroelementos/genética , Elementos de Nucleótido Esparcido Corto

11.

ReConPlot: an R package for the visualization and interpretation of genomic rearrangements.

Espejo Valle-Inclán, Jose; Cortés-Ciriano, Isidro.

Bioinformatics ; 39(12)2023 12 01.

Artículo en Inglés | MEDLINE | ID: mdl-38058190

RESUMEN

MOTIVATION: Whole-genome sequencing studies of human tumours have revealed that complex forms of structural variation, collectively known as complex genome rearrangements (CGRs), are pervasive across diverse cancer types. Detection, classification, and mechanistic interpretation of CGRs requires the visualization of complex patterns of somatic copy number aberrations (SCNAs) and structural variants (SVs). However, there is a lack of tools specifically designed to facilitate the visualization and study of CGRs. RESULTS: We present ReConPlot (REarrangement and COpy Number PLOT), an R package that provides functionalities for the joint visualization of SCNAs and SVs across one or multiple chromosomes. ReConPlot is based on the popular ggplot2 package, thus allowing customization of plots and the generation of publication-quality figures with minimal effort. Overall, ReConPlot facilitates the exploration, interpretation, and reporting of CGR patterns. AVAILABILITY AND IMPLEMENTATION: The R package ReConPlot is available at https://github.com/cortes-ciriano-lab/ReConPlot. Detailed documentation and a tutorial with examples are provided with the package.

Asunto(s)

Genoma Humano , Neoplasias , Humanos , Genómica , Secuenciación Completa del Genoma , Neoplasias/genética , Programas Informáticos

12.

A semi-supervised learning framework for quantitative structure-activity regression modelling.

Watson, Oliver; Cortes-Ciriano, Isidro; Watson, James A.

Bioinformatics ; 37(3): 342-350, 2021 04 20.

Artículo en Inglés | MEDLINE | ID: mdl-32777821

RESUMEN

MOTIVATION: Quantitative structure-activity relationship (QSAR) methods are increasingly used in assisting the process of preclinical, small molecule drug discovery. Regression models are trained on data consisting of a finite-dimensional representation of molecular structures and their corresponding target-specific activities. These supervised learning models can then be used to predict the activity of previously unmeasured novel compounds. RESULTS: This work provides methods that solve three problems in QSAR modelling: (i) a method for comparing the information content between finite-dimensional representations of molecular structures (fingerprints) with respect to the target of interest, (ii) a method that quantifies how the accuracy of the model prediction degrades as a function of the distance between the testing and training data and (iii) a method to adjust for screening dependent selection bias inherent in many training datasets. For example, in the most extreme cases, only compounds which pass an activity-dependent screening threshold are reported. A semi-supervised learning framework combines (ii) and (iii) and can make predictions, which take into account the similarity of the testing compounds to those in the training data and adjust for the reporting selection bias. We illustrate the three methods using publicly available structure-activity data for a large set of compounds reported by GlaxoSmithKline (the Tres Cantos AntiMalarial Set, TCAMS) to inhibit asexual in vitro Plasmodium falciparum growth. AVAILABILITYAND IMPLEMENTATION: https://github.com/owatson/PenalizedPrediction. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Antimaláricos , Plasmodium falciparum , Antimaláricos/uso terapéutico , Descubrimiento de Drogas , Relación Estructura-Actividad Cuantitativa , Aprendizaje Automático Supervisado

13.

A decision-theoretic approach to the evaluation of machine learning algorithms in computational drug discovery.

Watson, Oliver P; Cortes-Ciriano, Isidro; Taylor, Aimee R; Watson, James A.

Bioinformatics ; 35(22): 4656-4663, 2019 11 01.

Artículo en Inglés | MEDLINE | ID: mdl-31070704

RESUMEN

MOTIVATION: Artificial intelligence, trained via machine learning (e.g. neural nets, random forests) or computational statistical algorithms (e.g. support vector machines, ridge regression), holds much promise for the improvement of small-molecule drug discovery. However, small-molecule structure-activity data are high dimensional with low signal-to-noise ratios and proper validation of predictive methods is difficult. It is poorly understood which, if any, of the currently available machine learning algorithms will best predict new candidate drugs. RESULTS: The quantile-activity bootstrap is proposed as a new model validation framework using quantile splits on the activity distribution function to construct training and testing sets. In addition, we propose two novel rank-based loss functions which penalize only the out-of-sample predicted ranks of high-activity molecules. The combination of these methods was used to assess the performance of neural nets, random forests, support vector machines (regression) and ridge regression applied to 25 diverse high-quality structure-activity datasets publicly available on ChEMBL. Model validation based on random partitioning of available data favours models that overfit and 'memorize' the training set, namely random forests and deep neural nets. Partitioning based on quantiles of the activity distribution correctly penalizes extrapolation of models onto structurally different molecules outside of the training data. Simpler, traditional statistical methods such as ridge regression can outperform state-of-the-art machine learning methods in this setting. In addition, our new rank-based loss functions give considerably different results from mean squared error highlighting the necessity to define model optimality with respect to the decision task at hand. AVAILABILITY AND IMPLEMENTATION: All software and data are available as Jupyter notebooks found at https://github.com/owatson/QuantileBootstrap. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Descubrimiento de Drogas , Aprendizaje Automático , Programas Informáticos , Máquina de Vectores de Soporte

14.

Deep Confidence: A Computationally Efficient Framework for Calculating Reliable Prediction Errors for Deep Neural Networks.

Cortés-Ciriano, Isidro; Bender, Andreas.

J Chem Inf Model ; 59(3): 1269-1281, 2019 03 25.

Artículo en Inglés | MEDLINE | ID: mdl-30336009

RESUMEN

Deep learning architectures have proved versatile in a number of drug discovery applications, including the modeling of in vitro compound activity. While controlling for prediction confidence is essential to increase the trust, interpretability, and usefulness of virtual screening models in drug discovery, techniques to estimate the reliability of the predictions generated with deep learning networks remain largely underexplored. Here, we present Deep Confidence, a framework to compute valid and efficient confidence intervals for individual predictions using the deep learning technique Snapshot Ensembling and conformal prediction. Specifically, Deep Confidence generates an ensemble of deep neural networks by recording the network parameters throughout the local minima visited during the optimization phase of a single neural network. This approach serves to derive a set of base learners (i.e., snapshots) with comparable predictive power on average that will however generate slightly different predictions for a given instance. The variability across base learners and the validation residuals are in turn harnessed to compute confidence intervals using the conformal prediction framework. Using a set of 24 diverse IC50 data sets from ChEMBL 23, we show that Snapshot Ensembles perform on par with Random Forest (RF) and ensembles of independently trained deep neural networks. In addition, we find that the confidence regions predicted using the Deep Confidence framework span a narrower set of values. Overall, Deep Confidence represents a highly versatile error prediction framework that can be applied to any deep learning-based application at no extra computational cost.

Asunto(s)

Aprendizaje Profundo , Descubrimiento de Drogas/métodos , Proyectos de Investigación

15.

Reliable Prediction Errors for Deep Neural Networks Using Test-Time Dropout.

Cortés-Ciriano, Isidro; Bender, Andreas.

J Chem Inf Model ; 59(7): 3330-3339, 2019 07 22.

Artículo en Inglés | MEDLINE | ID: mdl-31241929

RESUMEN

While the use of deep learning in drug discovery is gaining increasing attention, the lack of methods to compute reliable errors in prediction for Neural Networks prevents their application to guide decision making in domains where identifying unreliable predictions is essential, e.g., precision medicine. Here, we present a framework to compute reliable errors in prediction for Neural Networks using Test-Time Dropout and Conformal Prediction. Specifically, the algorithm consists of training a single Neural Network using dropout, and then applying it N times to both the validation and test sets, also employing dropout in this step. Therefore, for each instance in the validation and test sets an ensemble of predictions are generated. The residuals and absolute errors in prediction for the validation set are then used to compute prediction errors for the test set instances using Conformal Prediction. We show using 24 bioactivity data sets from ChEMBL 23 that Dropout Conformal Predictors are valid (i.e., the fraction of instances whose true value lies within the predicted interval strongly correlates with the confidence level) and efficient, as the predicted confidence intervals span a narrower set of values than those computed with Conformal Predictors generated using Random Forest (RF) models. Lastly, we show in retrospective virtual screening experiments that dropout and RF-based Conformal Predictors lead to comparable retrieval rates of active compounds. Overall, we propose a computationally efficient framework (as only N extra forward passes are required in addition to training a single network) to harness Test-Time Dropout and the Conformal Prediction framework, which is generally applicable to generate reliable prediction errors for Deep Neural Networks in drug discovery and beyond.

Asunto(s)

Descubrimiento de Drogas/métodos , Aprendizaje Automático , Redes Neurales de la Computación

16.

The Impact of Environmental and Endogenous Damage on Somatic Mutation Load in Human Skin Fibroblasts.

Saini, Natalie; Roberts, Steven A; Klimczak, Leszek J; Chan, Kin; Grimm, Sara A; Dai, Shuangshuang; Fargo, David C; Boyer, Jayne C; Kaufmann, William K; Taylor, Jack A; Lee, Eunjung; Cortes-Ciriano, Isidro; Park, Peter J; Schurman, Shepherd H; Malc, Ewa P; Mieczkowski, Piotr A; Gordenin, Dmitry A.

PLoS Genet ; 12(10): e1006385, 2016 10.

Artículo en Inglés | MEDLINE | ID: mdl-27788131

RESUMEN

Accumulation of somatic changes, due to environmental and endogenous lesions, in the human genome is associated with aging and cancer. Understanding the impacts of these processes on mutagenesis is fundamental to understanding the etiology, and improving the prognosis and prevention of cancers and other genetic diseases. Previous methods relying on either the generation of induced pluripotent stem cells, or sequencing of single-cell genomes were inherently error-prone and did not allow independent validation of the mutations. In the current study we eliminated these potential sources of error by high coverage genome sequencing of single-cell derived clonal fibroblast lineages, obtained after minimal propagation in culture, prepared from skin biopsies of two healthy adult humans. We report here accurate measurement of genome-wide magnitude and spectra of mutations accrued in skin fibroblasts of healthy adult humans. We found that every cell contains at least one chromosomal rearrangement and 60013,000 base substitutions. The spectra and correlation of base substitutions with epigenomic features resemble many cancers. Moreover, because biopsies were taken from body parts differing by sun exposure, we can delineate the precise contributions of environmental and endogenous factors to the accrual of genetic changes within the same individual. We show here that UV-induced and endogenous DNA damage can have a comparable impact on the somatic mutation loads in skin fibroblasts. Trial Registration: ClinicalTrials.gov NCT01087307.

Asunto(s)

Daño del ADN/genética , Genoma Humano/genética , Mutación/efectos de la radiación , Neoplasias/genética , Piel/efectos de la radiación , Biopsia , Células Clonales/efectos de la radiación , Daño del ADN/efectos de la radiación , Fibroblastos/patología , Fibroblastos/efectos de la radiación , Genoma Humano/efectos de la radiación , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Masculino , Persona de Mediana Edad , Mutagénesis/genética , Mutación/genética , Tasa de Mutación , Neoplasias/etiología , Neoplasias/patología , Análisis de la Célula Individual , Piel/patología , Luz Solar/efectos adversos

17.

Discovering Highly Potent Molecules from an Initial Set of Inactives Using Iterative Screening.

Cortés-Ciriano, Isidro; Firth, Nicholas C; Bender, Andreas; Watson, Oliver.

J Chem Inf Model ; 58(9): 2000-2014, 2018 09 24.

Artículo en Inglés | MEDLINE | ID: mdl-30130102

RESUMEN

The versatility of similarity searching and quantitative structure-activity relationships to model the activity of compound sets within given bioactivity ranges (i.e., interpolation) is well established. However, their relative performance in the common scenario in early stage drug discovery where lots of inactive data but no active data points are available (i.e., extrapolation from the low-activity to the high-activity range) has not been thoroughly examined yet. To this aim, we have designed an iterative virtual screening strategy which was evaluated on 25 diverse bioactivity data sets from ChEMBL. We benchmark the efficiency of random forest (RF), multiple linear regression, ridge regression, similarity searching, and random selection of compounds to identify a highly active molecule in the test set among a large number of low-potency compounds. We use the number of iterations required to find this active molecule to evaluate the performance of each experimental setup. We show that linear and ridge regression often outperform RF and similarity searching, reducing the number of iterations to find an active compound by a factor of 2 or more. Even simple regression methods seem better able to extrapolate to high-bioactivity ranges than RF, which only provides output values in the range covered by the training set. In addition, examination of the scaffold diversity in the data sets used shows that in some cases similarity searching and RF require two times as many iterations as random selection depending on the chemical space covered in the initial training data. Lastly, we show using bioactivity data for COX-1 and COX-2 that our framework can be extended to multitarget drug discovery, where compounds are selected by concomitantly considering their activity against multiple targets. Overall, this study provides an approach for iterative screening where only inactive data are present in early stages of drug discovery in order to discover highly potent compounds and the best experimental set up in which to do so.

Asunto(s)

Descubrimiento de Drogas/métodos , Evaluación Preclínica de Medicamentos/métodos , Aprendizaje Automático , Algoritmos , Relación Estructura-Actividad Cuantitativa

18.

Conformal Regression for Quantitative Structure-Activity Relationship Modeling-Quantifying Prediction Uncertainty.

Svensson, Fredrik; Aniceto, Natalia; Norinder, Ulf; Cortes-Ciriano, Isidro; Spjuth, Ola; Carlsson, Lars; Bender, Andreas.

J Chem Inf Model ; 58(5): 1132-1140, 2018 05 29.

Artículo en Inglés | MEDLINE | ID: mdl-29701973

RESUMEN

Making predictions with an associated confidence is highly desirable as it facilitates decision making and resource prioritization. Conformal regression is a machine learning framework that allows the user to define the required confidence and delivers predictions that are guaranteed to be correct to the selected extent. In this study, we apply conformal regression to model molecular properties and bioactivity values and investigate different ways to scale the resultant prediction intervals to create as efficient (i.e., narrow) regressors as possible. Different algorithms to estimate the prediction uncertainty were used to normalize the prediction ranges, and the different approaches were evaluated on 29 publicly available data sets. Our results show that the most efficient conformal regressors are obtained when using the natural exponential of the ensemble standard deviation from the underlying random forest to scale the prediction intervals, but other approaches were almost as efficient. This approach afforded an average prediction range of 1.65 pIC50 units at the 80% confidence level when applied to bioactivity modeling. The choice of nonconformity function has a pronounced impact on the average prediction range with a difference of close to one log unit in bioactivity between the tightest and widest prediction range. Overall, conformal regression is a robust approach to generate bioactivity predictions with associated confidence.

Asunto(s)

Informática/métodos , Aprendizaje Automático , Relación Estructura-Actividad Cuantitativa , Incertidumbre , Toma de Decisiones

19.

Improved large-scale prediction of growth inhibition patterns using the NCI60 cancer cell line panel.

Cortés-Ciriano, Isidro; van Westen, Gerard J P; Bouvier, Guillaume; Nilges, Michael; Overington, John P; Bender, Andreas; Malliavin, Thérèse E.

Bioinformatics ; 32(1): 85-95, 2016 Jan 01.

Artículo en Inglés | MEDLINE | ID: mdl-26351271

RESUMEN

MOTIVATION: Recent large-scale omics initiatives have catalogued the somatic alterations of cancer cell line panels along with their pharmacological response to hundreds of compounds. In this study, we have explored these data to advance computational approaches that enable more effective and targeted use of current and future anticancer therapeutics. RESULTS: We modelled the 50% growth inhibition bioassay end-point (GI50) of 17,142 compounds screened against 59 cancer cell lines from the NCI60 panel (941,831 data-points, matrix 93.08% complete) by integrating the chemical and biological (cell line) information. We determine that the protein, gene transcript and miRNA abundance provide the highest predictive signal when modelling the GI50 endpoint, which significantly outperformed the DNA copy-number variation or exome sequencing data (Tukey's Honestly Significant Difference, P <0.05). We demonstrate that, within the limits of the data, our approach exhibits the ability to both interpolate and extrapolate compound bioactivities to new cell lines and tissues and, although to a lesser extent, to dissimilar compounds. Moreover, our approach outperforms previous models generated on the GDSC dataset. Finally, we determine that in the cases investigated in more detail, the predicted drug-pathway associations and growth inhibition patterns are mostly consistent with the experimental data, which also suggests the possibility of identifying genomic markers of drug sensitivity for novel compounds on novel cell lines. CONTACT: terez@pasteur.fr; ab454@ac.cam.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Biología Computacional/métodos , Neoplasias/patología , Bioensayo , Línea Celular Tumoral , Proliferación Celular , Bases de Datos de Proteínas , Humanos , Modelos Biológicos , Farmacogenética , Máquina de Vectores de Soporte

20.

Benchmarking the Predictive Power of Ligand Efficiency Indices in QSAR.

Cortes-Ciriano, Isidro.

J Chem Inf Model ; 56(8): 1576-87, 2016 08 22.

Artículo en Inglés | MEDLINE | ID: mdl-27399907

RESUMEN

Compound physicochemical properties favoring in vitro potency are not always correlated to desirable pharmacokinetic profiles. Therefore, using potency (i.e., IC50) as the main criterion to prioritize candidate drugs at early stage drug discovery campaigns has been questioned. Yet, the vast majority of the virtual screening models reported in the medicinal chemistry literature predict the biological activity of compounds by regressing in vitro potency on topological or physicochemical descriptors. Two studies published in this journal showed that higher predictive power on external molecules can be achieved by using ligand efficiency indices as the dependent variable instead of a metric of potency (IC50) or binding affinity (Ki). The present study aims at filling the shortage of a thorough assessment of the predictive power of ligand efficiency indices in QSAR. To this aim, the predictive power of 11 ligand efficiency indices has been benchmarked across four algorithms (Gradient Boosting Machines, Partial Least Squares, Random Forest, and Support Vector Machines), two descriptor types (Morgan fingerprints, and physicochemical descriptors), and 29 data sets collected from the literature and ChEMBL database. Ligand efficiency metrics led to the highest predictive power on external molecules irrespective of the descriptor type or algorithm used, with an R(2)test difference of â¼0.3 units and a this difference â¼0.4 units when modeling small data sets and a normalized RMSE decrease of >0.1 units in some cases. Polarity indices, such as SEI and NSEI, led to higher predictive power than metrics based on molecular size, i.e., BEI, NBEI, and LE. LELP, which comprises a polarity factor (cLogP) and a size parameter (LE) constantly led to the most predictive models, suggesting that these two properties convey a complementary predictive signal. Overall, this study suggests that using ligand efficiency indices as the dependent variable might be an efficient strategy to model compound activity.

Asunto(s)

Descubrimiento de Drogas/métodos , Relación Estructura-Actividad Cuantitativa , Benchmarking , Bases de Datos Farmacéuticas , Humanos , Ligandos , Aprendizaje Automático

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA