Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 71
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Cell ; 187(21): 6055-6070.e22, 2024 Oct 17.
Artigo em Inglês | MEDLINE | ID: mdl-39181133

RESUMO

Chromothripsis describes the catastrophic shattering of mis-segregated chromosomes trapped within micronuclei. Although micronuclei accumulate DNA double-strand breaks and replication defects throughout interphase, how chromosomes undergo shattering remains unresolved. Using CRISPR-Cas9 screens, we identify a non-canonical role of the Fanconi anemia (FA) pathway as a driver of chromothripsis. Inactivation of the FA pathway suppresses chromosome shattering during mitosis without impacting interphase-associated defects within micronuclei. Mono-ubiquitination of FANCI-FANCD2 by the FA core complex promotes its mitotic engagement with under-replicated micronuclear chromosomes. The structure-selective SLX4-XPF-ERCC1 endonuclease subsequently induces large-scale nucleolytic cleavage of persistent DNA replication intermediates, which stimulates POLD3-dependent mitotic DNA synthesis to prime shattered fragments for reassembly in the ensuing cell cycle. Notably, FA-pathway-induced chromothripsis generates complex genomic rearrangements and extrachromosomal DNA that confer acquired resistance to anti-cancer therapies. Our findings demonstrate how pathological activation of a central DNA repair mechanism paradoxically triggers cancer genome evolution through chromothripsis.


Assuntos
Cromotripsia , Resistencia a Medicamentos Antineoplásicos , Anemia de Fanconi , Humanos , Resistencia a Medicamentos Antineoplásicos/genética , Anemia de Fanconi/metabolismo , Anemia de Fanconi/genética , Proteínas de Grupos de Complementação da Anemia de Fanconi/metabolismo , Proteínas de Grupos de Complementação da Anemia de Fanconi/genética , Mitose , Proteína do Grupo de Complementação D2 da Anemia de Fanconi/metabolismo , Proteína do Grupo de Complementação D2 da Anemia de Fanconi/genética , Sistemas CRISPR-Cas/genética , Replicação do DNA , Recombinases/metabolismo , Reparo do DNA , Linhagem Celular Tumoral , Endonucleases/metabolismo , Endonucleases/genética , Quebras de DNA de Cadeia Dupla , Animais , Camundongos , Neoplasias/genética , Neoplasias/tratamento farmacológico , Neoplasias/metabolismo , Neoplasias/patologia , Ubiquitinação
2.
Cell ; 173(2): 305-320.e10, 2018 04 05.
Artigo em Inglês | MEDLINE | ID: mdl-29625049

RESUMO

The Cancer Genome Atlas (TCGA) has catalyzed systematic characterization of diverse genomic alterations underlying human cancers. At this historic junction marking the completion of genomic characterization of over 11,000 tumors from 33 cancer types, we present our current understanding of the molecular processes governing oncogenesis. We illustrate our insights into cancer through synthesis of the findings of the TCGA PanCancer Atlas project on three facets of oncogenesis: (1) somatic driver mutations, germline pathogenic variants, and their interactions in the tumor; (2) the influence of the tumor genome and epigenome on transcriptome and proteome; and (3) the relationship between tumor and the microenvironment, including implications for drugs targeting driver events and immunotherapies. These results will anchor future characterization of rare and common tumor types, primary and relapsed tumors, and cancers across ancestry groups and will guide the deployment of clinical genomic sequencing.


Assuntos
Carcinogênese/genética , Genômica , Neoplasias/patologia , Reparo do DNA/genética , Bases de Dados Genéticas , Genes Neoplásicos , Humanos , Redes e Vias Metabólicas/genética , Instabilidade de Microssatélites , Mutação , Neoplasias/genética , Neoplasias/imunologia , Transcriptoma , Microambiente Tumoral/genética
3.
Cell ; 173(2): 371-385.e18, 2018 04 05.
Artigo em Inglês | MEDLINE | ID: mdl-29625053

RESUMO

Identifying molecular cancer drivers is critical for precision oncology. Multiple advanced algorithms to identify drivers now exist, but systematic attempts to combine and optimize them on large datasets are few. We report a PanCancer and PanSoftware analysis spanning 9,423 tumor exomes (comprising all 33 of The Cancer Genome Atlas projects) and using 26 computational tools to catalog driver genes and mutations. We identify 299 driver genes with implications regarding their anatomical sites and cancer/cell types. Sequence- and structure-based analyses identified >3,400 putative missense driver mutations supported by multiple lines of evidence. Experimental validation confirmed 60%-85% of predicted mutations as likely drivers. We found that >300 MSI tumors are associated with high PD-1/PD-L1, and 57% of tumors analyzed harbor putative clinically actionable events. Our study represents the most comprehensive discovery of cancer genes and mutations to date and will serve as a blueprint for future biological and clinical endeavors.


Assuntos
Neoplasias/patologia , Algoritmos , Antígeno B7-H1/genética , Biologia Computacional , Bases de Dados Genéticas , Entropia , Humanos , Instabilidade de Microssatélites , Mutação , Neoplasias/genética , Neoplasias/imunologia , Análise de Componente Principal , Receptor de Morte Celular Programada 1/genética
4.
Mol Cell ; 84(6): 1003-1020.e10, 2024 Mar 21.
Artigo em Inglês | MEDLINE | ID: mdl-38359824

RESUMO

The high incidence of whole-arm chromosome aneuploidy and translocations in tumors suggests instability of centromeres, unique loci built on repetitive sequences and essential for chromosome separation. The causes behind this fragility and the mechanisms preserving centromere integrity remain elusive. We show that replication stress, hallmark of pre-cancerous lesions, promotes centromeric breakage in mitosis, due to spindle forces and endonuclease activities. Mechanistically, we unveil unique dynamics of the centromeric replisome distinct from the rest of the genome. Locus-specific proteomics identifies specialized DNA replication and repair proteins at centromeres, highlighting them as difficult-to-replicate regions. The translesion synthesis pathway, along with other factors, acts to sustain centromere replication and integrity. Prolonged stress causes centromeric alterations like ruptures and translocations, as observed in ovarian cancer models experiencing replication stress. This study provides unprecedented insights into centromere replication and integrity, proposing mechanistic insights into the origins of centromere alterations leading to abnormal cancerous karyotypes.


Assuntos
Centrômero , Sequências Repetitivas de Ácido Nucleico , Humanos , Centrômero/genética , Mitose/genética , Instabilidade Genômica
5.
Nature ; 618(7967): 1024-1032, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37198482

RESUMO

Focal copy-number amplification is an oncogenic event. Although recent studies have revealed the complex structure1-3 and the evolutionary trajectories4 of oncogene amplicons, their origin remains poorly understood. Here we show that focal amplifications in breast cancer frequently derive from a mechanism-which we term translocation-bridge amplification-involving inter-chromosomal translocations that lead to dicentric chromosome bridge formation and breakage. In 780 breast cancer genomes, we observe that focal amplifications are frequently connected to each other by inter-chromosomal translocations at their boundaries. Subsequent analysis indicates the following model: the oncogene neighbourhood is translocated in G1 creating a dicentric chromosome, the dicentric chromosome is replicated, and as dicentric sister chromosomes segregate during mitosis, a chromosome bridge is formed and then broken, with fragments often being circularized in extrachromosomal DNAs. This model explains the amplifications of key oncogenes, including ERBB2 and CCND1. Recurrent amplification boundaries and rearrangement hotspots correlate with oestrogen receptor binding in breast cancer cells. Experimentally, oestrogen treatment induces DNA double-strand breaks in the oestrogen receptor target regions that are repaired by translocations, suggesting a role of oestrogen in generating the initial translocations. A pan-cancer analysis reveals tissue-specific biases in mechanisms initiating focal amplifications, with the breakage-fusion-bridge cycle prevalent in some and the translocation-bridge amplification in others, probably owing to the different timing of DNA break repair. Our results identify a common mode of oncogene amplification and propose oestrogen as its mechanistic origin in breast cancer.


Assuntos
Neoplasias da Mama , Receptor alfa de Estrogênio , Amplificação de Genes , Oncogenes , Translocação Genética , Feminino , Humanos , Neoplasias da Mama/genética , Receptor alfa de Estrogênio/metabolismo , Estrogênios/metabolismo , Oncogenes/genética , Translocação Genética/genética , Genoma Humano/genética , Quebras de DNA de Cadeia Dupla , Especificidade de Órgãos
6.
Nature ; 618(7967): 1041-1048, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37165191

RESUMO

Complex genome rearrangements can be generated by the catastrophic pulverization of missegregated chromosomes trapped within micronuclei through a process known as chromothripsis1-5. As each chromosome contains a single centromere, it remains unclear how acentric fragments derived from shattered chromosomes are inherited between daughter cells during mitosis6. Here we tracked micronucleated chromosomes with live-cell imaging and show that acentric fragments cluster in close spatial proximity throughout mitosis for asymmetric inheritance by a single daughter cell. Mechanistically, the CIP2A-TOPBP1 complex prematurely associates with DNA lesions within ruptured micronuclei during interphase, which poises pulverized chromosomes for clustering upon mitotic entry. Inactivation of CIP2A-TOPBP1 caused acentric fragments to disperse throughout the mitotic cytoplasm, stochastically partition into the nucleus of both daughter cells and aberrantly misaccumulate as cytoplasmic DNA. Mitotic clustering facilitates the reassembly of acentric fragments into rearranged chromosomes lacking the extensive DNA copy-number losses that are characteristic of canonical chromothripsis. Comprehensive analysis of pan-cancer genomes revealed clusters of DNA copy-number-neutral rearrangements-termed balanced chromothripsis-across diverse tumour types resulting in the acquisition of known cancer driver events. Thus, distinct patterns of chromothripsis can be explained by the spatial clustering of pulverized chromosomes from micronuclei.


Assuntos
Cromossomos Humanos , Cromotripsia , Micronúcleos com Defeito Cromossômico , Mitose , Humanos , Centrômero , Cromossomos Humanos/genética , DNA/genética , DNA/metabolismo , Variações do Número de Cópias de DNA , Interfase , Mitose/genética , Neoplasias/genética
7.
Nat Rev Genet ; 23(5): 298-314, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-34880424

RESUMO

Distilling biologically meaningful information from cancer genome sequencing data requires comprehensive identification of somatic alterations using rigorous computational methods. As the amount and complexity of sequencing data have increased, so has the number of tools for analysing them. Here, we describe the main steps involved in the bioinformatic analysis of cancer genomes, review key algorithmic developments and highlight popular tools and emerging technologies. These tools include those that identify point mutations, copy number alterations, structural variations and mutational signatures in cancer genomes. We also discuss issues in experimental design, the strengths and limitations of sequencing modalities and methodological challenges for the future.


Assuntos
Neoplasias , Mapeamento Cromossômico , Biologia Computacional , Variações do Número de Cópias de DNA , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Mutação , Neoplasias/genética
9.
Blood ; 143(10): 933-937, 2024 03 07.
Artigo em Inglês | MEDLINE | ID: mdl-38194681

RESUMO

ABSTRACT: T-ALL relapse usually occurs early but can occur much later, which has been suggested to represent a de novo leukemia. However, we conclusively demonstrate late relapse can evolve from a pre-leukemic subclone harbouring a non-coding mutation that evades initial chemotherapy.


Assuntos
Leucemia-Linfoma de Células T do Adulto , Leucemia-Linfoma Linfoblástico de Células T Precursoras , Humanos , Leucemia-Linfoma Linfoblástico de Células T Precursoras/genética , Mutação , Recidiva , Doença Crônica , Células Clonais
10.
Nature ; 580(7804): 517-523, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-32322066

RESUMO

A high tumour mutational burden (hypermutation) is observed in some gliomas1-5; however, the mechanisms by which hypermutation develops and whether it predicts the response to immunotherapy are poorly understood. Here we comprehensively analyse the molecular determinants of mutational burden and signatures in 10,294 gliomas. We delineate two main pathways to hypermutation: a de novo pathway associated with constitutional defects in DNA polymerase and mismatch repair (MMR) genes, and a more common post-treatment pathway, associated with acquired resistance driven by MMR defects in chemotherapy-sensitive gliomas that recur after treatment with the chemotherapy drug temozolomide. Experimentally, the mutational signature of post-treatment hypermutated gliomas was recapitulated by temozolomide-induced damage in cells with MMR deficiency. MMR-deficient gliomas were characterized by a lack of prominent T cell infiltrates, extensive intratumoral heterogeneity, poor patient survival and a low rate of response to PD-1 blockade. Moreover, although bulk analyses did not detect microsatellite instability in MMR-deficient gliomas, single-cell whole-genome sequencing analysis of post-treatment hypermutated glioma cells identified microsatellite mutations. These results show that chemotherapy can drive the acquisition of hypermutated populations without promoting a response to PD-1 blockade and supports the diagnostic use of mutational burden and signatures in cancer.


Assuntos
Neoplasias Encefálicas/genética , Neoplasias Encefálicas/terapia , Glioma/genética , Glioma/terapia , Mutação , Animais , Antineoplásicos Alquilantes/farmacologia , Antineoplásicos Alquilantes/uso terapêutico , Neoplasias Encefálicas/imunologia , Reparo de Erro de Pareamento de DNA/genética , Frequência do Gene , Genoma Humano/efeitos dos fármacos , Genoma Humano/genética , Glioma/imunologia , Humanos , Masculino , Camundongos , Repetições de Microssatélites/efeitos dos fármacos , Repetições de Microssatélites/genética , Mutagênese/efeitos dos fármacos , Mutação/efeitos dos fármacos , Fenótipo , Prognóstico , Receptor de Morte Celular Programada 1/antagonistas & inibidores , Análise de Sequência de DNA , Temozolomida/farmacologia , Temozolomida/uso terapêutico , Ensaios Antitumorais Modelo de Xenoenxerto
11.
Nucleic Acids Res ; 51(21): 11453-11465, 2023 Nov 27.
Artigo em Inglês | MEDLINE | ID: mdl-37823611

RESUMO

SINE-VNTR-Alu (SVA) retrotransposons are evolutionarily young and still-active transposable elements (TEs) in the human genome. Several pathogenic SVA insertions have been identified that directly mutate host genes to cause neurodegenerative and other types of diseases. However, due to their sequence heterogeneity and complex structures as well as limitations in sequencing techniques and analysis, SVA insertions have been less well studied compared to other mobile element insertions. Here, we identified polymorphic SVA insertions from 3646 whole-genome sequencing (WGS) samples of >150 diverse populations and constructed a polymorphic SVA insertion reference catalog. Using 20 long-read samples, we also assembled reference and polymorphic SVA sequences and characterized the internal hexamer/variable-number-tandem-repeat (VNTR) expansions as well as differing SVA activity for SVA subfamilies and human populations. In addition, we developed a module to annotate both reference and polymorphic SVA copies. By characterizing the landscape of both reference and polymorphic SVA retrotransposons, our study enables more accurate genotyping of these elements and facilitate the discovery of pathogenic SVA insertions.


Assuntos
Genoma Humano , Retroelementos , Humanos , Elementos Alu , Genoma Humano/genética , Repetições Minissatélites/genética , Retroelementos/genética , Elementos Nucleotídeos Curtos e Dispersos
12.
Bioinformatics ; 39(12)2023 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-38058190

RESUMO

MOTIVATION: Whole-genome sequencing studies of human tumours have revealed that complex forms of structural variation, collectively known as complex genome rearrangements (CGRs), are pervasive across diverse cancer types. Detection, classification, and mechanistic interpretation of CGRs requires the visualization of complex patterns of somatic copy number aberrations (SCNAs) and structural variants (SVs). However, there is a lack of tools specifically designed to facilitate the visualization and study of CGRs. RESULTS: We present ReConPlot (REarrangement and COpy Number PLOT), an R package that provides functionalities for the joint visualization of SCNAs and SVs across one or multiple chromosomes. ReConPlot is based on the popular ggplot2 package, thus allowing customization of plots and the generation of publication-quality figures with minimal effort. Overall, ReConPlot facilitates the exploration, interpretation, and reporting of CGR patterns. AVAILABILITY AND IMPLEMENTATION: The R package ReConPlot is available at https://github.com/cortes-ciriano-lab/ReConPlot. Detailed documentation and a tutorial with examples are provided with the package.


Assuntos
Genoma Humano , Neoplasias , Humanos , Genômica , Sequenciamento Completo do Genoma , Neoplasias/genética , Software
13.
Bioinformatics ; 37(3): 342-350, 2021 04 20.
Artigo em Inglês | MEDLINE | ID: mdl-32777821

RESUMO

MOTIVATION: Quantitative structure-activity relationship (QSAR) methods are increasingly used in assisting the process of preclinical, small molecule drug discovery. Regression models are trained on data consisting of a finite-dimensional representation of molecular structures and their corresponding target-specific activities. These supervised learning models can then be used to predict the activity of previously unmeasured novel compounds. RESULTS: This work provides methods that solve three problems in QSAR modelling: (i) a method for comparing the information content between finite-dimensional representations of molecular structures (fingerprints) with respect to the target of interest, (ii) a method that quantifies how the accuracy of the model prediction degrades as a function of the distance between the testing and training data and (iii) a method to adjust for screening dependent selection bias inherent in many training datasets. For example, in the most extreme cases, only compounds which pass an activity-dependent screening threshold are reported. A semi-supervised learning framework combines (ii) and (iii) and can make predictions, which take into account the similarity of the testing compounds to those in the training data and adjust for the reporting selection bias. We illustrate the three methods using publicly available structure-activity data for a large set of compounds reported by GlaxoSmithKline (the Tres Cantos AntiMalarial Set, TCAMS) to inhibit asexual in vitro Plasmodium falciparum growth. AVAILABILITYAND IMPLEMENTATION: https://github.com/owatson/PenalizedPrediction. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Antimaláricos , Plasmodium falciparum , Antimaláricos/uso terapêutico , Descoberta de Drogas , Relação Quantitativa Estrutura-Atividade , Aprendizado de Máquina Supervisionado
14.
Bioinformatics ; 35(22): 4656-4663, 2019 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-31070704

RESUMO

MOTIVATION: Artificial intelligence, trained via machine learning (e.g. neural nets, random forests) or computational statistical algorithms (e.g. support vector machines, ridge regression), holds much promise for the improvement of small-molecule drug discovery. However, small-molecule structure-activity data are high dimensional with low signal-to-noise ratios and proper validation of predictive methods is difficult. It is poorly understood which, if any, of the currently available machine learning algorithms will best predict new candidate drugs. RESULTS: The quantile-activity bootstrap is proposed as a new model validation framework using quantile splits on the activity distribution function to construct training and testing sets. In addition, we propose two novel rank-based loss functions which penalize only the out-of-sample predicted ranks of high-activity molecules. The combination of these methods was used to assess the performance of neural nets, random forests, support vector machines (regression) and ridge regression applied to 25 diverse high-quality structure-activity datasets publicly available on ChEMBL. Model validation based on random partitioning of available data favours models that overfit and 'memorize' the training set, namely random forests and deep neural nets. Partitioning based on quantiles of the activity distribution correctly penalizes extrapolation of models onto structurally different molecules outside of the training data. Simpler, traditional statistical methods such as ridge regression can outperform state-of-the-art machine learning methods in this setting. In addition, our new rank-based loss functions give considerably different results from mean squared error highlighting the necessity to define model optimality with respect to the decision task at hand. AVAILABILITY AND IMPLEMENTATION: All software and data are available as Jupyter notebooks found at https://github.com/owatson/QuantileBootstrap. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Descoberta de Drogas , Aprendizado de Máquina , Software , Máquina de Vetores de Suporte
15.
J Chem Inf Model ; 59(3): 1269-1281, 2019 03 25.
Artigo em Inglês | MEDLINE | ID: mdl-30336009

RESUMO

Deep learning architectures have proved versatile in a number of drug discovery applications, including the modeling of in vitro compound activity. While controlling for prediction confidence is essential to increase the trust, interpretability, and usefulness of virtual screening models in drug discovery, techniques to estimate the reliability of the predictions generated with deep learning networks remain largely underexplored. Here, we present Deep Confidence, a framework to compute valid and efficient confidence intervals for individual predictions using the deep learning technique Snapshot Ensembling and conformal prediction. Specifically, Deep Confidence generates an ensemble of deep neural networks by recording the network parameters throughout the local minima visited during the optimization phase of a single neural network. This approach serves to derive a set of base learners (i.e., snapshots) with comparable predictive power on average that will however generate slightly different predictions for a given instance. The variability across base learners and the validation residuals are in turn harnessed to compute confidence intervals using the conformal prediction framework. Using a set of 24 diverse IC50 data sets from ChEMBL 23, we show that Snapshot Ensembles perform on par with Random Forest (RF) and ensembles of independently trained deep neural networks. In addition, we find that the confidence regions predicted using the Deep Confidence framework span a narrower set of values. Overall, Deep Confidence represents a highly versatile error prediction framework that can be applied to any deep learning-based application at no extra computational cost.


Assuntos
Aprendizado Profundo , Descoberta de Drogas/métodos , Projetos de Pesquisa
16.
J Chem Inf Model ; 59(7): 3330-3339, 2019 07 22.
Artigo em Inglês | MEDLINE | ID: mdl-31241929

RESUMO

While the use of deep learning in drug discovery is gaining increasing attention, the lack of methods to compute reliable errors in prediction for Neural Networks prevents their application to guide decision making in domains where identifying unreliable predictions is essential, e.g., precision medicine. Here, we present a framework to compute reliable errors in prediction for Neural Networks using Test-Time Dropout and Conformal Prediction. Specifically, the algorithm consists of training a single Neural Network using dropout, and then applying it N times to both the validation and test sets, also employing dropout in this step. Therefore, for each instance in the validation and test sets an ensemble of predictions are generated. The residuals and absolute errors in prediction for the validation set are then used to compute prediction errors for the test set instances using Conformal Prediction. We show using 24 bioactivity data sets from ChEMBL 23 that Dropout Conformal Predictors are valid (i.e., the fraction of instances whose true value lies within the predicted interval strongly correlates with the confidence level) and efficient, as the predicted confidence intervals span a narrower set of values than those computed with Conformal Predictors generated using Random Forest (RF) models. Lastly, we show in retrospective virtual screening experiments that dropout and RF-based Conformal Predictors lead to comparable retrieval rates of active compounds. Overall, we propose a computationally efficient framework (as only N extra forward passes are required in addition to training a single network) to harness Test-Time Dropout and the Conformal Prediction framework, which is generally applicable to generate reliable prediction errors for Deep Neural Networks in drug discovery and beyond.


Assuntos
Descoberta de Drogas/métodos , Aprendizado de Máquina , Redes Neurais de Computação
17.
PLoS Genet ; 12(10): e1006385, 2016 10.
Artigo em Inglês | MEDLINE | ID: mdl-27788131

RESUMO

Accumulation of somatic changes, due to environmental and endogenous lesions, in the human genome is associated with aging and cancer. Understanding the impacts of these processes on mutagenesis is fundamental to understanding the etiology, and improving the prognosis and prevention of cancers and other genetic diseases. Previous methods relying on either the generation of induced pluripotent stem cells, or sequencing of single-cell genomes were inherently error-prone and did not allow independent validation of the mutations. In the current study we eliminated these potential sources of error by high coverage genome sequencing of single-cell derived clonal fibroblast lineages, obtained after minimal propagation in culture, prepared from skin biopsies of two healthy adult humans. We report here accurate measurement of genome-wide magnitude and spectra of mutations accrued in skin fibroblasts of healthy adult humans. We found that every cell contains at least one chromosomal rearrangement and 600­13,000 base substitutions. The spectra and correlation of base substitutions with epigenomic features resemble many cancers. Moreover, because biopsies were taken from body parts differing by sun exposure, we can delineate the precise contributions of environmental and endogenous factors to the accrual of genetic changes within the same individual. We show here that UV-induced and endogenous DNA damage can have a comparable impact on the somatic mutation loads in skin fibroblasts. Trial Registration: ClinicalTrials.gov NCT01087307.


Assuntos
Dano ao DNA/genética , Genoma Humano/genética , Mutação/efeitos da radiação , Neoplasias/genética , Pele/efeitos da radiação , Biópsia , Células Clonais/efeitos da radiação , Dano ao DNA/efeitos da radiação , Fibroblastos/patologia , Fibroblastos/efeitos da radiação , Genoma Humano/efeitos da radiação , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Masculino , Pessoa de Meia-Idade , Mutagênese/genética , Mutação/genética , Taxa de Mutação , Neoplasias/etiologia , Neoplasias/patologia , Análise de Célula Única , Pele/patologia , Luz Solar/efeitos adversos
18.
J Chem Inf Model ; 58(9): 2000-2014, 2018 09 24.
Artigo em Inglês | MEDLINE | ID: mdl-30130102

RESUMO

The versatility of similarity searching and quantitative structure-activity relationships to model the activity of compound sets within given bioactivity ranges (i.e., interpolation) is well established. However, their relative performance in the common scenario in early stage drug discovery where lots of inactive data but no active data points are available (i.e., extrapolation from the low-activity to the high-activity range) has not been thoroughly examined yet. To this aim, we have designed an iterative virtual screening strategy which was evaluated on 25 diverse bioactivity data sets from ChEMBL. We benchmark the efficiency of random forest (RF), multiple linear regression, ridge regression, similarity searching, and random selection of compounds to identify a highly active molecule in the test set among a large number of low-potency compounds. We use the number of iterations required to find this active molecule to evaluate the performance of each experimental setup. We show that linear and ridge regression often outperform RF and similarity searching, reducing the number of iterations to find an active compound by a factor of 2 or more. Even simple regression methods seem better able to extrapolate to high-bioactivity ranges than RF, which only provides output values in the range covered by the training set. In addition, examination of the scaffold diversity in the data sets used shows that in some cases similarity searching and RF require two times as many iterations as random selection depending on the chemical space covered in the initial training data. Lastly, we show using bioactivity data for COX-1 and COX-2 that our framework can be extended to multitarget drug discovery, where compounds are selected by concomitantly considering their activity against multiple targets. Overall, this study provides an approach for iterative screening where only inactive data are present in early stages of drug discovery in order to discover highly potent compounds and the best experimental set up in which to do so.


Assuntos
Descoberta de Drogas/métodos , Avaliação Pré-Clínica de Medicamentos/métodos , Aprendizado de Máquina , Algoritmos , Relação Quantitativa Estrutura-Atividade
19.
J Chem Inf Model ; 58(5): 1132-1140, 2018 05 29.
Artigo em Inglês | MEDLINE | ID: mdl-29701973

RESUMO

Making predictions with an associated confidence is highly desirable as it facilitates decision making and resource prioritization. Conformal regression is a machine learning framework that allows the user to define the required confidence and delivers predictions that are guaranteed to be correct to the selected extent. In this study, we apply conformal regression to model molecular properties and bioactivity values and investigate different ways to scale the resultant prediction intervals to create as efficient (i.e., narrow) regressors as possible. Different algorithms to estimate the prediction uncertainty were used to normalize the prediction ranges, and the different approaches were evaluated on 29 publicly available data sets. Our results show that the most efficient conformal regressors are obtained when using the natural exponential of the ensemble standard deviation from the underlying random forest to scale the prediction intervals, but other approaches were almost as efficient. This approach afforded an average prediction range of 1.65 pIC50 units at the 80% confidence level when applied to bioactivity modeling. The choice of nonconformity function has a pronounced impact on the average prediction range with a difference of close to one log unit in bioactivity between the tightest and widest prediction range. Overall, conformal regression is a robust approach to generate bioactivity predictions with associated confidence.


Assuntos
Informática/métodos , Aprendizado de Máquina , Relação Quantitativa Estrutura-Atividade , Incerteza , Tomada de Decisões
20.
Bioinformatics ; 32(1): 85-95, 2016 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-26351271

RESUMO

MOTIVATION: Recent large-scale omics initiatives have catalogued the somatic alterations of cancer cell line panels along with their pharmacological response to hundreds of compounds. In this study, we have explored these data to advance computational approaches that enable more effective and targeted use of current and future anticancer therapeutics. RESULTS: We modelled the 50% growth inhibition bioassay end-point (GI50) of 17,142 compounds screened against 59 cancer cell lines from the NCI60 panel (941,831 data-points, matrix 93.08% complete) by integrating the chemical and biological (cell line) information. We determine that the protein, gene transcript and miRNA abundance provide the highest predictive signal when modelling the GI50 endpoint, which significantly outperformed the DNA copy-number variation or exome sequencing data (Tukey's Honestly Significant Difference, P <0.05). We demonstrate that, within the limits of the data, our approach exhibits the ability to both interpolate and extrapolate compound bioactivities to new cell lines and tissues and, although to a lesser extent, to dissimilar compounds. Moreover, our approach outperforms previous models generated on the GDSC dataset. Finally, we determine that in the cases investigated in more detail, the predicted drug-pathway associations and growth inhibition patterns are mostly consistent with the experimental data, which also suggests the possibility of identifying genomic markers of drug sensitivity for novel compounds on novel cell lines. CONTACT: terez@pasteur.fr; ab454@ac.cam.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Neoplasias/patologia , Bioensaio , Linhagem Celular Tumoral , Proliferação de Células , Bases de Dados de Proteínas , Humanos , Modelos Biológicos , Farmacogenética , Máquina de Vetores de Suporte
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA