Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
País de afiliación
Intervalo de año de publicación
1.
Am J Hum Genet ; 100(6): 865-884, 2017 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-28552196

RESUMEN

Deep sequence-based imputation can enhance the discovery power of genome-wide association studies by assessing previously unexplored variation across the common- and low-frequency spectra. We applied a hybrid whole-genome sequencing (WGS) and deep imputation approach to examine the broader allelic architecture of 12 anthropometric traits associated with height, body mass, and fat distribution in up to 267,616 individuals. We report 106 genome-wide significant signals that have not been previously identified, including 9 low-frequency variants pointing to functional candidates. Of the 106 signals, 6 are in genomic regions that have not been implicated with related traits before, 28 are independent signals at previously reported regions, and 72 represent previously reported signals for a different anthropometric trait. 71% of signals reside within genes and fine mapping resolves 23 signals to one or two likely causal variants. We confirm genetic overlap between human monogenic and polygenic anthropometric traits and find signal enrichment in cis expression QTLs in relevant tissues. Our results highlight the potential of WGS strategies to enhance biologically relevant discoveries across the frequency spectrum.


Asunto(s)
Antropometría , Genoma Humano , Estudio de Asociación del Genoma Completo , Sitios de Carácter Cuantitativo/genética , Análisis de Secuencia de ADN/métodos , Estatura/genética , Estudios de Cohortes , Metilación de ADN/genética , Bases de Datos Genéticas , Femenino , Variación Genética , Humanos , Lipodistrofia/genética , Masculino , Metaanálisis como Asunto , Obesidad/genética , Mapeo Físico de Cromosoma , Caracteres Sexuales , Síndrome , Reino Unido
2.
Bioinformatics ; 34(3): 511-513, 2018 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-28968714

RESUMEN

Summary: We present FATHMM-XF, a method for predicting pathogenic point mutations in the human genome. Drawing on an extensive feature set, FATHMM-XF outperforms competitors on benchmark tests, particularly in non-coding regions where the majority of pathogenic mutations are likely to be found. Availability and implementation: The FATHMM-XF web server is available at http://fathmm.biocompute.org.uk/fathmm-xf/, and as tracks on the Genome Tolerance Browser: http://gtb.biocompute.org.uk. Predictions are provided for human genome version GRCh37/hg19. The data used for this project can be downloaded from: http://fathmm.biocompute.org.uk/fathmm-xf/. Contact: mark.rogers@bristol.ac.uk or c.campbell@bristol.ac.uk. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Genómica/métodos , Mutación Puntual , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Genoma Humano , Humanos
3.
Hum Mol Genet ; 25(19): 4339-4349, 2016 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-27559110

RESUMEN

BACKGROUND: Single variant approaches have been successful in identifying DNA methylation quantitative trait loci (mQTL), although as with complex traits they lack the statistical power to identify the effects from rare genetic variants. We have undertaken extensive analyses to identify regions of low frequency and rare variants that are associated with DNA methylation levels. METHODS: We used repeated measurements of DNA methylation from five different life stages in human blood, taken from the Avon Longitudinal Study of Parents and Children (ALSPAC) cohort. Variants were collapsed across CpG islands and their flanking regions to identify variants collectively associated with methylation, where no single variant was individually responsible for the observed signal. All analyses were undertaken using the sequence kernel association test. RESULTS: For loci where no individual variant mQTL was observed based on a single variant analysis, we identified 95 unique regions where the combined effect of low frequency variants (MAF ≤ 5%) provided strong evidence of association with methylation. For loci where there was previous evidence of an individual variant mQTL, a further 3 regions provided evidence of association between multiple low frequency variants and methylation levels. Effects were observed consistently across 5 different time points in the lifecourse and evidence of replication in the TwinsUK and Exeter cohorts was also identified. CONCLUSION: We have demonstrated the potential of this novel approach to mQTL analysis by analysing the combined effect of multiple low frequency or rare variants. Future studies should benefit from applying this approach as a complementary follow up to single variant analyses.


Asunto(s)
Metilación de ADN/genética , Variación Genética/genética , Estudio de Asociación del Genoma Completo , Sitios de Carácter Cuantitativo/genética , Adolescente , Adulto , Niño , Preescolar , Islas de CpG/genética , Femenino , Regulación de la Expresión Génica/genética , Frecuencia de los Genes , Genotipo , Humanos , Lactante , Recién Nacido , Masculino , Persona de Mediana Edad , Polimorfismo de Nucleótido Simple/genética
4.
Bioinformatics ; 33(12): 1751-1757, 2017 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-28137713

RESUMEN

MOTIVATION: A major cause of autosomal dominant disease is haploinsufficiency, whereby a single copy of a gene is not sufficient to maintain the normal function of the gene. A large proportion of existing methods for predicting haploinsufficiency incorporate biological networks, e.g. protein-protein interaction networks that have recently been shown to introduce study bias. As a result, these methods tend to perform best on well-studied genes, but underperform on less studied genes. The advent of large genome sequencing consortia, such as the 1000 genomes project, NHLBI Exome Sequencing Project and the Exome Aggregation Consortium creates an urgent need for unbiased haploinsufficiency prediction methods. RESULTS: Here, we describe a machine learning approach, called HIPred, that integrates genomic and evolutionary information from ENSEMBL, with functional annotations from the Encyclopaedia of DNA Elements consortium and the NIH Roadmap Epigenomics Project to predict haploinsufficiency, without the study bias described earlier. We benchmark HIPred using several datasets and show that our unbiased method performs as well as, and in most cases, outperforms existing biased algorithms. AVAILABILITY AND IMPLEMENTATION: HIPred scores for all gene identifiers are available at: https://github.com/HAShihab/HIPred . CONTACT: h.shihab@bristol.ac.uk or tom.gaunt@bristol.ac.uk. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Genoma Humano , Genómica/métodos , Haploinsuficiencia , Aprendizaje Automático , Cromatina/metabolismo , Epigénesis Genética , Histonas/metabolismo , Humanos , Mapas de Interacción de Proteínas/genética , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ARN/métodos
5.
BMC Bioinformatics ; 18(1): 20, 2017 Jan 06.
Artículo en Inglés | MEDLINE | ID: mdl-28061747

RESUMEN

BACKGROUND: Accurate methods capable of predicting the impact of single nucleotide variants (SNVs) are assuming ever increasing importance. There exists a plethora of in silico algorithms designed to help identify and prioritize SNVs across the human genome for further investigation. However, no tool exists to visualize the predicted tolerance of the genome to mutation, or the similarities between these methods. RESULTS: We present the Genome Tolerance Browser (GTB, http://gtb.biocompute.org.uk ): an online genome browser for visualizing the predicted tolerance of the genome to mutation. The server summarizes several in silico prediction algorithms and conservation scores: including 13 genome-wide prediction algorithms and conservation scores, 12 non-synonymous prediction algorithms and four cancer-specific algorithms. CONCLUSION: The GTB enables users to visualize the similarities and differences between several prediction algorithms and to upload their own data as additional tracks; thereby facilitating the rapid identification of potential regions of interest.


Asunto(s)
Genoma Humano , Internet , Navegador Web , Algoritmos , Bases de Datos Genéticas , Proteínas de Homeodominio/genética , Proteínas de Homeodominio/metabolismo , Humanos , Modelos Teóricos , Neoplasias/diagnóstico , Neoplasias/genética , Receptores de LDL/genética , Receptores de LDL/metabolismo
6.
BMC Bioinformatics ; 18(1): 442, 2017 Oct 06.
Artículo en Inglés | MEDLINE | ID: mdl-28985712

RESUMEN

BACKGROUND: Small insertions and deletions (indels) have a significant influence in human disease and, in terms of frequency, they are second only to single nucleotide variants as pathogenic mutations. As the majority of mutations associated with complex traits are located outside the exome, it is crucial to investigate the potential pathogenic impact of indels in non-coding regions of the human genome. RESULTS: We present FATHMM-indel, an integrative approach to predict the functional effect, pathogenic or neutral, of indels in non-coding regions of the human genome. Our method exploits various genomic annotations in addition to sequence data. When validated on benchmark data, FATHMM-indel significantly outperforms CADD and GAVIN, state of the art models in assessing the pathogenic impact of non-coding variants. FATHMM-indel is available via a web server at indels.biocompute.org.uk. CONCLUSIONS: FATHMM-indel can accurately predict the functional impact and prioritise small indels throughout the whole non-coding genome.


Asunto(s)
Biología Computacional/métodos , ADN Intergénico/genética , Genoma Humano , Mutación INDEL/genética , Genética de Población , Humanos , Fenotipo , Curva ROC , Reproducibilidad de los Resultados , Programas Informáticos
7.
Hum Mol Genet ; 24(10): 2733-45, 2015 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-25634561

RESUMEN

Delineating the genetic causes of developmental disorders is an area of active investigation. Mosaic structural abnormalities, defined as copy number or loss of heterozygosity events that are large and present in only a subset of cells, have been detected in 0.2-1.0% of children ascertained for clinical genetic testing. However, the frequency among healthy children in the community is not well characterized, which, if known, could inform better interpretation of the pathogenic burden of this mutational category in children with developmental disorders. In a case-control analysis, we compared the rate of large-scale mosaicism between 1303 children with developmental disorders and 5094 children lacking developmental disorders, using an analytical pipeline we developed, and identified a substantial enrichment in cases (odds ratio = 39.4, P-value 1.073e - 6). A meta-analysis that included frequency estimates among an additional 7000 children with congenital diseases yielded an even stronger statistical enrichment (P-value 1.784e - 11). In addition, to maximize the detection of low-clonality events in probands, we applied a trio-based mosaic detection algorithm, which detected two additional events in probands, including an individual with genome-wide suspected chimerism. In total, we detected 12 structural mosaic abnormalities among 1303 children (0.9%). Given the burden of mosaicism detected in cases, we suspected that many of the events detected in probands were pathogenic. Scrutiny of the genotypic-phenotypic relationship of each detected variant assessed that the majority of events are very likely pathogenic. This work quantifies the burden of structural mosaicism as a cause of developmental disorders.


Asunto(s)
Discapacidades del Desarrollo/genética , Variación Estructural del Genoma , Pérdida de Heterocigocidad , Mosaicismo , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Estudios de Casos y Controles , Niño , Preescolar , Femenino , Pruebas Genéticas , Humanos , Lactante , Recién Nacido , Masculino , Persona de Mediana Edad , Adulto Joven
8.
Nucleic Acids Res ; 43(5): e33, 2015 Mar 11.
Artículo en Inglés | MEDLINE | ID: mdl-25550428

RESUMEN

Methods to interpret personal genome sequences are increasingly required. Here, we report a novel framework (EvoTol) to identify disease-causing genes using patient sequence data from within protein coding-regions. EvoTol quantifies a gene's intolerance to mutation using evolutionary conservation of protein sequences and can incorporate tissue-specific gene expression data. We apply this framework to the analysis of whole-exome sequence data in epilepsy and congenital heart disease, and demonstrate EvoTol's ability to identify known disease-causing genes is unmatched by competing methods. Application of EvoTol to the human interactome revealed networks enriched for genes intolerant to protein sequence variation, informing novel polygenic contributions to human disease.


Asunto(s)
Biología Computacional/métodos , Evolución Molecular , Predisposición Genética a la Enfermedad/genética , Proteínas/genética , Secuencia de Aminoácidos/genética , Exoma/genética , Cardiopatías Congénitas/genética , Humanos , Mutación , Filogenia , Polimorfismo de Nucleótido Simple , Mapas de Interacción de Proteínas/genética , Proteínas/clasificación , Proteínas/metabolismo , Reproducibilidad de los Resultados , Análisis de Secuencia de ADN/métodos
9.
Ann Hum Genet ; 80(3): 187-96, 2016 May.
Artículo en Inglés | MEDLINE | ID: mdl-27000383

RESUMEN

Consanguineous offspring have elevated levels of homozygosity. Autozygous stretches within their genome are likely to harbour loss of function (LoF) mutations which will lead to complete inactivation or dysfunction of genes. Studying consanguineous offspring with clinical phenotypes has been very useful for identifying disease causal mutations. However, at present, most of the genes in the human genome have no disorder associated with them or have unknown function. This is presumably mostly due to the fact that homozygous LoF variants are not observed in outbred populations which are the main focus of large sequencing projects. However, another reason may be that many genes in the genome-even when completely "knocked out," do not cause a distinct or defined phenotype. Here, we discuss the benefits and implications of studying consanguineous populations, as opposed to the traditional approach of analysing a subset of consanguineous families or individuals with disease. We suggest that studying consanguineous populations "as a whole" can speed up the characterisation of novel gene functions as well as indicating nonessential genes and/or regions in the human genome. We also suggest designing a single nucleotide variant (SNV) array to make the process more efficient.


Asunto(s)
Consanguinidad , Genética de Población , Genoma Humano , Mapeo Cromosómico , Silenciador del Gen , Heterocigoto , Homocigoto , Humanos , Fenotipo
10.
Bioinformatics ; 31(10): 1536-43, 2015 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-25583119

RESUMEN

MOTIVATION: Technological advances have enabled the identification of an increasingly large spectrum of single nucleotide variants within the human genome, many of which may be associated with monogenic disease or complex traits. Here, we propose an integrative approach, named FATHMM-MKL, to predict the functional consequences of both coding and non-coding sequence variants. Our method utilizes various genomic annotations, which have recently become available, and learns to weight the significance of each component annotation source. RESULTS: We show that our method outperforms current state-of-the-art algorithms, CADD and GWAVA, when predicting the functional consequences of non-coding variants. In addition, FATHMM-MKL is comparable to the best of these algorithms when predicting the impact of coding variants. The method includes a confidence measure to rank order predictions.


Asunto(s)
Algoritmos , Variación Genética/genética , Genoma Humano , Anotación de Secuencia Molecular , Sistemas de Lectura Abierta/genética , Regiones no Traducidas/genética , Estudio de Asociación del Genoma Completo , Genómica/métodos , Humanos , Fenotipo
11.
Hum Genomics ; 8: 11, 2014 Jun 30.
Artículo en Inglés | MEDLINE | ID: mdl-24980617

RESUMEN

As the number of non-synonymous single nucleotide polymorphisms (nsSNPs) identified through whole-exome/whole-genome sequencing programs increases, researchers and clinicians are becoming increasingly reliant upon computational prediction algorithms designed to prioritize potential functional variants for further study. A large proportion of existing prediction algorithms are 'disease agnostic' but are nevertheless quite capable of predicting when a mutation is likely to be deleterious. However, most clinical and research applications of these algorithms relate to specific diseases and would therefore benefit from an approach that discriminates between functional variants specifically related to that disease from those which are not. In a whole-exome/whole-genome sequencing context, such an approach could substantially reduce the number of false positive candidate mutations. Here, we test this postulate by incorporating a disease-specific weighting scheme into the Functional Analysis through Hidden Markov Models (FATHMM) algorithm. When compared to traditional prediction algorithms, we observed an overall reduction in the number of false positives identified using a disease-specific approach to functional prediction across 17 distinct disease concepts/categories. Our results illustrate the potential benefits of making disease-specific predictions when prioritizing candidate variants in relation to specific diseases. A web-based implementation of our algorithm is available at http://fathmm.biocompute.org.uk.


Asunto(s)
Sustitución de Aminoácidos/genética , Mutación/genética , Polimorfismo de Nucleótido Simple/genética , Programas Informáticos , Biología Computacional , Humanos , Internet , Cadenas de Markov , Fenotipo
12.
JAMA ; 313(20): 2044-54, 2015 May 26.
Artículo en Inglés | MEDLINE | ID: mdl-26010633

RESUMEN

IMPORTANCE: The association of copy number variations (CNVs), differing numbers of copies of genetic sequence at locations in the genome, with phenotypes such as intellectual disability has been almost exclusively evaluated using clinically ascertained cohorts. The contribution of these genetic variants to cognitive phenotypes in the general population remains unclear. OBJECTIVE: To investigate the clinical features conferred by CNVs associated with known syndromes in adult carriers without clinical preselection and to assess the genome-wide consequences of rare CNVs (frequency ≤0.05%; size ≥250 kilobase pairs [kb]) on carriers' educational attainment and intellectual disability prevalence in the general population. DESIGN, SETTING, AND PARTICIPANTS: The population biobank of Estonia contains 52,000 participants enrolled from 2002 through 2010. General practitioners examined participants and filled out a questionnaire of health- and lifestyle-related questions, as well as reported diagnoses. Copy number variant analysis was conducted on a random sample of 7877 individuals and genotype-phenotype associations with education and disease traits were evaluated. Our results were replicated on a high-functioning group of 993 Estonians and 3 geographically distinct populations in the United Kingdom, the United States, and Italy. MAIN OUTCOMES AND MEASURES: Phenotypes of genomic disorders in the general population, prevalence of autosomal CNVs, and association of these variants with educational attainment (from less than primary school through scientific degree) and prevalence of intellectual disability. RESULTS: Of the 7877 in the Estonian cohort, we identified 56 carriers of CNVs associated with known syndromes. Their phenotypes, including cognitive and psychiatric problems, epilepsy, neuropathies, obesity, and congenital malformations are similar to those described for carriers of identical rearrangements ascertained in clinical cohorts. A genome-wide evaluation of rare autosomal CNVs (frequency, ≤0.05%; ≥250 kb) identified 831 carriers (10.5%) of the screened general population. Eleven of 216 (5.1%) carriers of a deletion of at least 250 kb (odds ratio [OR], 3.16; 95% CI, 1.51-5.98; P = 1.5e-03) and 6 of 102 (5.9%) carriers of a duplication of at least 1 Mb (OR, 3.67; 95% CI, 1.29-8.54; P = .008) had an intellectual disability compared with 114 of 6819 (1.7%) in the Estonian cohort. The mean education attainment was 3.81 (P = 1.06e-04) among 248 (≥250 kb) deletion carriers and 3.69 (P = 5.024e-05) among 115 duplication carriers (≥1 Mb). Of the deletion carriers, 33.5% did not graduate from high school (OR, 1.48; 95% CI, 1.12-1.95; P = .005) and 39.1% of duplication carriers did not graduate high school (OR, 1.89; 95% CI, 1.27-2.8; P = 1.6e-03). Evidence for an association between rare CNVs and lower educational attainment was supported by analyses of cohorts of adults from Italy and the United States and adolescents from the United Kingdom. CONCLUSIONS AND RELEVANCE: Known pathogenic CNVs in unselected, but assumed to be healthy, adult populations may be associated with unrecognized clinical sequelae. Additionally, individually rare but collectively common intermediate-size CNVs may be negatively associated with educational attainment. Replication of these findings in additional population groups is warranted given the potential implications of this observation for genomics research, clinical care, and public health.


Asunto(s)
Variaciones en el Número de Copia de ADN , Heterocigoto , Discapacidad Intelectual/genética , Trastornos Mentales/genética , Adolescente , Adulto , Cognición , Escolaridad , Epilepsia/genética , Estonia , Femenino , Estudio de Asociación del Genoma Completo , Humanos , Italia , Masculino , Obesidad/genética , Fenotipo , Reino Unido , Estados Unidos
13.
Bioinformatics ; 29(12): 1504-10, 2013 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-23620363

RESUMEN

MOTIVATION: The number of missense mutations being identified in cancer genomes has greatly increased as a consequence of technological advances and the reduced cost of whole-genome/whole-exome sequencing methods. However, a high proportion of the amino acid substitutions detected in cancer genomes have little or no effect on tumour progression (passenger mutations). Therefore, accurate automated methods capable of discriminating between driver (cancer-promoting) and passenger mutations are becoming increasingly important. In our previous work, we developed the Functional Analysis through Hidden Markov Models (FATHMM) software and, using a model weighted for inherited disease mutations, observed improved performances over alternative computational prediction algorithms. Here, we describe an adaptation of our original algorithm that incorporates a cancer-specific model to potentiate the functional analysis of driver mutations. RESULTS: The performance of our algorithm was evaluated using two separate benchmarks. In our analysis, we observed improved performances when distinguishing between driver mutations and other germ line variants (both disease-causing and putatively neutral mutations). In addition, when discriminating between somatic driver and passenger mutations, we observed performances comparable with the leading computational prediction algorithms: SPF-Cancer and TransFIC. AVAILABILITY AND IMPLEMENTATION: A web-based implementation of our cancer-specific model, including a downloadable stand-alone package, is available at http://fathmm.biocompute.org.uk.


Asunto(s)
Sustitución de Aminoácidos , Análisis Mutacional de ADN/métodos , Neoplasias/genética , Algoritmos , Genómica , Humanos , Mutación Missense , Programas Informáticos
14.
Hum Mutat ; 34(1): 57-65, 2013 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-23033316

RESUMEN

The rate at which nonsynonymous single nucleotide polymorphisms (nsSNPs) are being identified in the human genome is increasing dramatically owing to advances in whole-genome/whole-exome sequencing technologies. Automated methods capable of accurately and reliably distinguishing between pathogenic and functionally neutral nsSNPs are therefore assuming ever-increasing importance. Here, we describe the Functional Analysis Through Hidden Markov Models (FATHMM) software and server: a species-independent method with optional species-specific weightings for the prediction of the functional effects of protein missense variants. Using a model weighted for human mutations, we obtained performance accuracies that outperformed traditional prediction methods (i.e., SIFT, PolyPhen, and PANTHER) on two separate benchmarks. Furthermore, in one benchmark, we achieve performance accuracies that outperform current state-of-the-art prediction methods (i.e., SNPs&GO and MutPred). We demonstrate that FATHMM can be efficiently applied to high-throughput/large-scale human and nonhuman genome sequencing projects with the added benefit of phenotypic outcome associations. To illustrate this, we evaluated nsSNPs in wheat (Triticum spp.) to identify some of the important genetic variants responsible for the phenotypic differences introduced by intense selection during domestication. A Web-based implementation of FATHMM, including a high-throughput batch facility and a downloadable standalone package, is available at http://fathmm.biocompute.org.uk.


Asunto(s)
Algoritmos , Sustitución de Aminoácidos , Biología Computacional/métodos , Mutación , Proteínas/genética , Estudios de Asociación Genética/métodos , Genotipo , Humanos , Internet , Fenotipo , Polimorfismo de Nucleótido Simple , Proteínas/metabolismo , Reproducibilidad de los Resultados , Programas Informáticos , Triticum/genética
15.
Elife ; 72018 05 30.
Artículo en Inglés | MEDLINE | ID: mdl-29846171

RESUMEN

Results from genome-wide association studies (GWAS) can be used to infer causal relationships between phenotypes, using a strategy known as 2-sample Mendelian randomization (2SMR) and bypassing the need for individual-level data. However, 2SMR methods are evolving rapidly and GWAS results are often insufficiently curated, undermining efficient implementation of the approach. We therefore developed MR-Base (http://www.mrbase.org): a platform that integrates a curated database of complete GWAS results (no restrictions according to statistical significance) with an application programming interface, web app and R packages that automate 2SMR. The software includes several sensitivity analyses for assessing the impact of horizontal pleiotropy and other violations of assumptions. The database currently comprises 11 billion single nucleotide polymorphism-trait associations from 1673 GWAS and is updated on a regular basis. Integrating data with software ensures more rigorous application of hypothesis-driven analyses and allows millions of potential causal relationships to be efficiently evaluated in phenome-wide association studies.


Asunto(s)
Análisis de la Aleatorización Mendeliana , LDL-Colesterol/metabolismo , Enfermedad Coronaria/etiología , Bases de Datos Genéticas , Pleiotropía Genética , Estudio de Asociación del Genoma Completo , Humanos , Modelos Genéticos , Fenotipo , Polimorfismo de Nucleótido Simple/genética
16.
Sci Rep ; 7(1): 11597, 2017 09 14.
Artículo en Inglés | MEDLINE | ID: mdl-28912487

RESUMEN

For somatic point mutations in coding and non-coding regions of the genome, we propose CScape, an integrative classifier for predicting the likelihood that mutations are cancer drivers. Tested on somatic mutations, CScape tends to outperform alternative methods, reaching 91% balanced accuracy in coding regions and 70% in non-coding regions, while even higher accuracy may be achieved using thresholds to isolate high-confidence predictions. Positive predictions tend to cluster in genomic regions, so we apply a statistical approach to isolate coding and non-coding regions of the cancer genome that appear enriched for high-confidence predicted disease-drivers. Predictions and software are available at http://CScape.biocompute.org.uk/ .


Asunto(s)
Genoma Humano , Genómica/métodos , Neoplasias/genética , Mutación Puntual , Programas Informáticos , Biología Computacional/métodos , Bases de Datos Genéticas , Humanos , Anotación de Secuencia Molecular , Curva ROC , Navegador Web
17.
Stud Health Technol Inform ; 235: 91-95, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28423762

RESUMEN

Sequencing data will become widely available in clinical practice within the near future. Uptake of sequence data is currently being stimulated within the UK through the government-funded 100,000 genomes project (Genomics England), with many similar initiatives being planned and supported internationally. The analysis of the large volumes of data derived from sequencing programmes poses a major challenge for data analysis. In this paper we outline progress we have made in the development of predictors for estimating the pathogenic impact of single nucleotide variants, indels and haploinsufficiency in the human genome. The accuracy of these methods is enhanced through the development of disease-specific predictors, trained on appropriate data, and used within a specific disease context. We outline current research on the development of disease-specific predictors, specifically in the context of cancer research.


Asunto(s)
Genoma Humano , Análisis de Secuencia de ADN , Inglaterra , Genómica , Humanos , Mutación INDEL , Neoplasias/genética , Polimorfismo de Nucleótido Simple
18.
Diabetes ; 66(6): 1713-1722, 2017 06.
Artículo en Inglés | MEDLINE | ID: mdl-28246294

RESUMEN

Several studies have investigated the relationship between genetic variation and DNA methylation with respect to type 2 diabetes, but it is unknown if DNA methylation is a mediator in the disease pathway or if it is altered in response to disease state. This study uses genotypic information as a causal anchor to help decipher the likely role of DNA methylation measured in peripheral blood in the etiology of type 2 diabetes. Illumina HumanMethylation450 BeadChip data were generated on 1,018 young individuals from the Avon Longitudinal Study of Parents and Children (ALSPAC) cohort. In stage 1, 118 unique associations between published type 2 diabetes single nucleotide polymorphisms (SNPs) and genome-wide methylation (methylation quantitative trait loci [mQTLs]) were identified. In stage 2, a further 226 mQTLs were identified between 202 additional independent non-type 2 diabetes SNPs and CpGs identified in stage 1. Where possible, associations were replicated in independent cohorts of similar age. We discovered that around half of known type 2 diabetes SNPs are associated with variation in DNA methylation and postulated that methylation could either be on a causal pathway to future disease or could be a noncausal biomarker. For one locus (KCNQ1), we were able to provide further evidence that methylation is likely to be on the causal pathway to disease in later life.


Asunto(s)
Metilación de ADN/genética , Diabetes Mellitus Tipo 2/genética , Canal de Potasio KCNQ1/genética , Adolescente , Estudios de Cohortes , Islas de CpG , Femenino , Genotipo , Humanos , Masculino , Polimorfismo de Nucleótido Simple , Estudios Prospectivos , Sitios de Carácter Cuantitativo
19.
Ann Clin Biochem ; 54(4): 472-480, 2017 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-27555663

RESUMEN

Background One of the kallikrein genes ( KLK3) encodes prostate-specific antigen, a key biomarker for prostate cancer. A number of factors, both genetic and non-genetic, determine variation of serum prostate-specific antigen concentrations in the population. We have recently found three KLK3 deletions in individuals with very low prostate-specific antigen concentrations, suggesting a link between abnormally reduced KLK3 expression and deletions of KLK3. Here, we aim to determine the frequency of kallikrein gene 3 deletions in the general population. Methods The frequency of KLK3 deletions in the general population was estimated from the 1958 Birth Cohort sample ( n = 3815) using amplification ratiometry control system. In silico analyses using PennCNV were carried out in the same cohort and in NBS-WTCCC2 in order to provide an independent estimation of the frequency of KLK3 deletions in the general population. Results Amplification ratiometry control system results from the 1958 cohort indicated a frequency of KLK3 deletions of 0.81% (3.98% following a less stringent calling criterion). From in silico analyses, we found that potential deletions harbouring the KLK3 gene occurred at rates of 2.13% (1958 Cohort, n = 2867) and 0.99% (NBS-WTCCC2, n = 2737), respectively. These results are in good agreement with our in vitro experiments. All deletions found were in heterozygosis. Conclusions We conclude that a number of individuals from the general population present KLK3 deletions in heterozygosis. Further studies are required in order to know if interpretation of low serum prostate-specific antigen concentrations in individuals with KLK3 deletions may offer false-negative assurances with consequences for prostate cancer screening, diagnosis and monitoring.


Asunto(s)
Biomarcadores de Tumor/genética , Calicreínas/genética , Tasa de Mutación , Antígeno Prostático Específico/genética , Neoplasias de la Próstata/diagnóstico , Neoplasias de la Próstata/genética , Estudios de Cohortes , Reacciones Falso Negativas , Eliminación de Gen , Expresión Génica , Heterocigoto , Humanos , Calicreínas/deficiencia , Masculino , Persona de Mediana Edad , Monitoreo Fisiológico , Pronóstico , Antígeno Prostático Específico/deficiencia , Neoplasias de la Próstata/patología
20.
PLoS One ; 11(4): e0153803, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27128313

RESUMEN

BACKGROUND: It has become common practice to analyse large scale sequencing data with statistical approaches based around the aggregation of rare variants within the same gene. We applied a novel approach to rare variant analysis by collapsing variants together using protein domain and family coordinates, regarded to be a more discrete definition of a biologically functional unit. METHODS: Using Pfam definitions, we collapsed rare variants (Minor Allele Frequency ≤ 1%) together in three different ways 1) variants within single genomic regions which map to individual protein domains 2) variants within two individual protein domain regions which are predicted to be responsible for a protein-protein interaction 3) all variants within combined regions from multiple genes responsible for coding the same protein domain (i.e. protein families). A conventional collapsing analysis using gene coordinates was also undertaken for comparison. We used UK10K sequence data and investigated associations between regions of variants and lipid traits using the sequence kernel association test (SKAT). RESULTS: We observed no strong evidence of association between regions of variants based on Pfam domain definitions and lipid traits. Quantile-Quantile plots illustrated that the overall distributions of p-values from the protein domain analyses were comparable to that of a conventional gene-based approach. Deviations from this distribution suggested that collapsing by either protein domain or gene definitions may be favourable depending on the trait analysed. CONCLUSION: We have collapsed rare variants together using protein domain and family coordinates to present an alternative approach over collapsing across conventionally used gene-based regions. Although no strong evidence of association was detected in these analyses, future studies may still find value in adopting these approaches to detect previously unidentified association signals.


Asunto(s)
Variación Genética , Dominios Proteicos/genética , Estudios de Cohortes , Simulación por Computador , ADN/genética , Interpretación Estadística de Datos , Femenino , Interacción Gen-Ambiente , Estudios de Asociación Genética , Humanos , Masculino , Polimorfismo de Nucleótido Simple , Dominios y Motivos de Interacción de Proteínas/genética , Sistema de Registros , Gemelos/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA