Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Am J Hum Genet ; 100(6): 865-884, 2017 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-28552196

RESUMO

Deep sequence-based imputation can enhance the discovery power of genome-wide association studies by assessing previously unexplored variation across the common- and low-frequency spectra. We applied a hybrid whole-genome sequencing (WGS) and deep imputation approach to examine the broader allelic architecture of 12 anthropometric traits associated with height, body mass, and fat distribution in up to 267,616 individuals. We report 106 genome-wide significant signals that have not been previously identified, including 9 low-frequency variants pointing to functional candidates. Of the 106 signals, 6 are in genomic regions that have not been implicated with related traits before, 28 are independent signals at previously reported regions, and 72 represent previously reported signals for a different anthropometric trait. 71% of signals reside within genes and fine mapping resolves 23 signals to one or two likely causal variants. We confirm genetic overlap between human monogenic and polygenic anthropometric traits and find signal enrichment in cis expression QTLs in relevant tissues. Our results highlight the potential of WGS strategies to enhance biologically relevant discoveries across the frequency spectrum.


Assuntos
Antropometria , Genoma Humano , Estudo de Associação Genômica Ampla , Locos de Características Quantitativas/genética , Análise de Sequência de DNA/métodos , Estatura/genética , Estudos de Coortes , Metilação de DNA/genética , Bases de Dados Genéticas , Feminino , Variação Genética , Humanos , Lipodistrofia/genética , Masculino , Metanálise como Assunto , Obesidade/genética , Mapeamento Físico do Cromossomo , Caracteres Sexuais , Síndrome , Reino Unido
2.
Bioinformatics ; 34(3): 511-513, 2018 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-28968714

RESUMO

Summary: We present FATHMM-XF, a method for predicting pathogenic point mutations in the human genome. Drawing on an extensive feature set, FATHMM-XF outperforms competitors on benchmark tests, particularly in non-coding regions where the majority of pathogenic mutations are likely to be found. Availability and implementation: The FATHMM-XF web server is available at http://fathmm.biocompute.org.uk/fathmm-xf/, and as tracks on the Genome Tolerance Browser: http://gtb.biocompute.org.uk. Predictions are provided for human genome version GRCh37/hg19. The data used for this project can be downloaded from: http://fathmm.biocompute.org.uk/fathmm-xf/. Contact: mark.rogers@bristol.ac.uk or c.campbell@bristol.ac.uk. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Genômica/métodos , Mutação Puntual , Análise de Sequência de DNA/métodos , Software , Genoma Humano , Humanos
3.
Hum Mol Genet ; 25(19): 4339-4349, 2016 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-27559110

RESUMO

BACKGROUND: Single variant approaches have been successful in identifying DNA methylation quantitative trait loci (mQTL), although as with complex traits they lack the statistical power to identify the effects from rare genetic variants. We have undertaken extensive analyses to identify regions of low frequency and rare variants that are associated with DNA methylation levels. METHODS: We used repeated measurements of DNA methylation from five different life stages in human blood, taken from the Avon Longitudinal Study of Parents and Children (ALSPAC) cohort. Variants were collapsed across CpG islands and their flanking regions to identify variants collectively associated with methylation, where no single variant was individually responsible for the observed signal. All analyses were undertaken using the sequence kernel association test. RESULTS: For loci where no individual variant mQTL was observed based on a single variant analysis, we identified 95 unique regions where the combined effect of low frequency variants (MAF ≤ 5%) provided strong evidence of association with methylation. For loci where there was previous evidence of an individual variant mQTL, a further 3 regions provided evidence of association between multiple low frequency variants and methylation levels. Effects were observed consistently across 5 different time points in the lifecourse and evidence of replication in the TwinsUK and Exeter cohorts was also identified. CONCLUSION: We have demonstrated the potential of this novel approach to mQTL analysis by analysing the combined effect of multiple low frequency or rare variants. Future studies should benefit from applying this approach as a complementary follow up to single variant analyses.


Assuntos
Metilação de DNA/genética , Variação Genética/genética , Estudo de Associação Genômica Ampla , Locos de Características Quantitativas/genética , Adolescente , Adulto , Criança , Pré-Escolar , Ilhas de CpG/genética , Feminino , Regulação da Expressão Gênica/genética , Frequência do Gene , Genótipo , Humanos , Lactente , Recém-Nascido , Masculino , Pessoa de Meia-Idade , Polimorfismo de Nucleotídeo Único/genética
4.
Bioinformatics ; 33(12): 1751-1757, 2017 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-28137713

RESUMO

MOTIVATION: A major cause of autosomal dominant disease is haploinsufficiency, whereby a single copy of a gene is not sufficient to maintain the normal function of the gene. A large proportion of existing methods for predicting haploinsufficiency incorporate biological networks, e.g. protein-protein interaction networks that have recently been shown to introduce study bias. As a result, these methods tend to perform best on well-studied genes, but underperform on less studied genes. The advent of large genome sequencing consortia, such as the 1000 genomes project, NHLBI Exome Sequencing Project and the Exome Aggregation Consortium creates an urgent need for unbiased haploinsufficiency prediction methods. RESULTS: Here, we describe a machine learning approach, called HIPred, that integrates genomic and evolutionary information from ENSEMBL, with functional annotations from the Encyclopaedia of DNA Elements consortium and the NIH Roadmap Epigenomics Project to predict haploinsufficiency, without the study bias described earlier. We benchmark HIPred using several datasets and show that our unbiased method performs as well as, and in most cases, outperforms existing biased algorithms. AVAILABILITY AND IMPLEMENTATION: HIPred scores for all gene identifiers are available at: https://github.com/HAShihab/HIPred . CONTACT: h.shihab@bristol.ac.uk or tom.gaunt@bristol.ac.uk. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genoma Humano , Genômica/métodos , Haploinsuficiência , Aprendizado de Máquina , Cromatina/metabolismo , Epigênese Genética , Histonas/metabolismo , Humanos , Mapas de Interação de Proteínas/genética , Análise de Sequência de DNA/métodos , Análise de Sequência de RNA/métodos
5.
BMC Bioinformatics ; 18(1): 20, 2017 Jan 06.
Artigo em Inglês | MEDLINE | ID: mdl-28061747

RESUMO

BACKGROUND: Accurate methods capable of predicting the impact of single nucleotide variants (SNVs) are assuming ever increasing importance. There exists a plethora of in silico algorithms designed to help identify and prioritize SNVs across the human genome for further investigation. However, no tool exists to visualize the predicted tolerance of the genome to mutation, or the similarities between these methods. RESULTS: We present the Genome Tolerance Browser (GTB, http://gtb.biocompute.org.uk ): an online genome browser for visualizing the predicted tolerance of the genome to mutation. The server summarizes several in silico prediction algorithms and conservation scores: including 13 genome-wide prediction algorithms and conservation scores, 12 non-synonymous prediction algorithms and four cancer-specific algorithms. CONCLUSION: The GTB enables users to visualize the similarities and differences between several prediction algorithms and to upload their own data as additional tracks; thereby facilitating the rapid identification of potential regions of interest.


Assuntos
Genoma Humano , Internet , Navegador , Algoritmos , Bases de Dados Genéticas , Proteínas de Homeodomínio/genética , Proteínas de Homeodomínio/metabolismo , Humanos , Modelos Teóricos , Neoplasias/diagnóstico , Neoplasias/genética , Receptores de LDL/genética , Receptores de LDL/metabolismo
6.
BMC Bioinformatics ; 18(1): 442, 2017 Oct 06.
Artigo em Inglês | MEDLINE | ID: mdl-28985712

RESUMO

BACKGROUND: Small insertions and deletions (indels) have a significant influence in human disease and, in terms of frequency, they are second only to single nucleotide variants as pathogenic mutations. As the majority of mutations associated with complex traits are located outside the exome, it is crucial to investigate the potential pathogenic impact of indels in non-coding regions of the human genome. RESULTS: We present FATHMM-indel, an integrative approach to predict the functional effect, pathogenic or neutral, of indels in non-coding regions of the human genome. Our method exploits various genomic annotations in addition to sequence data. When validated on benchmark data, FATHMM-indel significantly outperforms CADD and GAVIN, state of the art models in assessing the pathogenic impact of non-coding variants. FATHMM-indel is available via a web server at indels.biocompute.org.uk. CONCLUSIONS: FATHMM-indel can accurately predict the functional impact and prioritise small indels throughout the whole non-coding genome.


Assuntos
Biologia Computacional/métodos , DNA Intergênico/genética , Genoma Humano , Mutação INDEL/genética , Genética Populacional , Humanos , Fenótipo , Curva ROC , Reprodutibilidade dos Testes , Software
7.
Hum Mol Genet ; 24(10): 2733-45, 2015 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-25634561

RESUMO

Delineating the genetic causes of developmental disorders is an area of active investigation. Mosaic structural abnormalities, defined as copy number or loss of heterozygosity events that are large and present in only a subset of cells, have been detected in 0.2-1.0% of children ascertained for clinical genetic testing. However, the frequency among healthy children in the community is not well characterized, which, if known, could inform better interpretation of the pathogenic burden of this mutational category in children with developmental disorders. In a case-control analysis, we compared the rate of large-scale mosaicism between 1303 children with developmental disorders and 5094 children lacking developmental disorders, using an analytical pipeline we developed, and identified a substantial enrichment in cases (odds ratio = 39.4, P-value 1.073e - 6). A meta-analysis that included frequency estimates among an additional 7000 children with congenital diseases yielded an even stronger statistical enrichment (P-value 1.784e - 11). In addition, to maximize the detection of low-clonality events in probands, we applied a trio-based mosaic detection algorithm, which detected two additional events in probands, including an individual with genome-wide suspected chimerism. In total, we detected 12 structural mosaic abnormalities among 1303 children (0.9%). Given the burden of mosaicism detected in cases, we suspected that many of the events detected in probands were pathogenic. Scrutiny of the genotypic-phenotypic relationship of each detected variant assessed that the majority of events are very likely pathogenic. This work quantifies the burden of structural mosaicism as a cause of developmental disorders.


Assuntos
Deficiências do Desenvolvimento/genética , Variação Estrutural do Genoma , Perda de Heterozigosidade , Mosaicismo , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Estudos de Casos e Controles , Criança , Pré-Escolar , Feminino , Testes Genéticos , Humanos , Lactente , Recém-Nascido , Masculino , Pessoa de Meia-Idade , Adulto Jovem
8.
Nucleic Acids Res ; 43(5): e33, 2015 Mar 11.
Artigo em Inglês | MEDLINE | ID: mdl-25550428

RESUMO

Methods to interpret personal genome sequences are increasingly required. Here, we report a novel framework (EvoTol) to identify disease-causing genes using patient sequence data from within protein coding-regions. EvoTol quantifies a gene's intolerance to mutation using evolutionary conservation of protein sequences and can incorporate tissue-specific gene expression data. We apply this framework to the analysis of whole-exome sequence data in epilepsy and congenital heart disease, and demonstrate EvoTol's ability to identify known disease-causing genes is unmatched by competing methods. Application of EvoTol to the human interactome revealed networks enriched for genes intolerant to protein sequence variation, informing novel polygenic contributions to human disease.


Assuntos
Biologia Computacional/métodos , Evolução Molecular , Predisposição Genética para Doença/genética , Proteínas/genética , Sequência de Aminoácidos/genética , Exoma/genética , Cardiopatias Congênitas/genética , Humanos , Mutação , Filogenia , Polimorfismo de Nucleotídeo Único , Mapas de Interação de Proteínas/genética , Proteínas/classificação , Proteínas/metabolismo , Reprodutibilidade dos Testes , Análise de Sequência de DNA/métodos
9.
Ann Hum Genet ; 80(3): 187-96, 2016 May.
Artigo em Inglês | MEDLINE | ID: mdl-27000383

RESUMO

Consanguineous offspring have elevated levels of homozygosity. Autozygous stretches within their genome are likely to harbour loss of function (LoF) mutations which will lead to complete inactivation or dysfunction of genes. Studying consanguineous offspring with clinical phenotypes has been very useful for identifying disease causal mutations. However, at present, most of the genes in the human genome have no disorder associated with them or have unknown function. This is presumably mostly due to the fact that homozygous LoF variants are not observed in outbred populations which are the main focus of large sequencing projects. However, another reason may be that many genes in the genome-even when completely "knocked out," do not cause a distinct or defined phenotype. Here, we discuss the benefits and implications of studying consanguineous populations, as opposed to the traditional approach of analysing a subset of consanguineous families or individuals with disease. We suggest that studying consanguineous populations "as a whole" can speed up the characterisation of novel gene functions as well as indicating nonessential genes and/or regions in the human genome. We also suggest designing a single nucleotide variant (SNV) array to make the process more efficient.


Assuntos
Consanguinidade , Genética Populacional , Genoma Humano , Mapeamento Cromossômico , Inativação Gênica , Heterozigoto , Homozigoto , Humanos , Fenótipo
10.
Bioinformatics ; 31(10): 1536-43, 2015 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-25583119

RESUMO

MOTIVATION: Technological advances have enabled the identification of an increasingly large spectrum of single nucleotide variants within the human genome, many of which may be associated with monogenic disease or complex traits. Here, we propose an integrative approach, named FATHMM-MKL, to predict the functional consequences of both coding and non-coding sequence variants. Our method utilizes various genomic annotations, which have recently become available, and learns to weight the significance of each component annotation source. RESULTS: We show that our method outperforms current state-of-the-art algorithms, CADD and GWAVA, when predicting the functional consequences of non-coding variants. In addition, FATHMM-MKL is comparable to the best of these algorithms when predicting the impact of coding variants. The method includes a confidence measure to rank order predictions.


Assuntos
Algoritmos , Variação Genética/genética , Genoma Humano , Anotação de Sequência Molecular , Fases de Leitura Aberta/genética , Regiões não Traduzidas/genética , Estudo de Associação Genômica Ampla , Genômica/métodos , Humanos , Fenótipo
11.
Hum Genomics ; 8: 11, 2014 Jun 30.
Artigo em Inglês | MEDLINE | ID: mdl-24980617

RESUMO

As the number of non-synonymous single nucleotide polymorphisms (nsSNPs) identified through whole-exome/whole-genome sequencing programs increases, researchers and clinicians are becoming increasingly reliant upon computational prediction algorithms designed to prioritize potential functional variants for further study. A large proportion of existing prediction algorithms are 'disease agnostic' but are nevertheless quite capable of predicting when a mutation is likely to be deleterious. However, most clinical and research applications of these algorithms relate to specific diseases and would therefore benefit from an approach that discriminates between functional variants specifically related to that disease from those which are not. In a whole-exome/whole-genome sequencing context, such an approach could substantially reduce the number of false positive candidate mutations. Here, we test this postulate by incorporating a disease-specific weighting scheme into the Functional Analysis through Hidden Markov Models (FATHMM) algorithm. When compared to traditional prediction algorithms, we observed an overall reduction in the number of false positives identified using a disease-specific approach to functional prediction across 17 distinct disease concepts/categories. Our results illustrate the potential benefits of making disease-specific predictions when prioritizing candidate variants in relation to specific diseases. A web-based implementation of our algorithm is available at http://fathmm.biocompute.org.uk.


Assuntos
Substituição de Aminoácidos/genética , Mutação/genética , Polimorfismo de Nucleotídeo Único/genética , Software , Biologia Computacional , Humanos , Internet , Cadeias de Markov , Fenótipo
12.
JAMA ; 313(20): 2044-54, 2015 May 26.
Artigo em Inglês | MEDLINE | ID: mdl-26010633

RESUMO

IMPORTANCE: The association of copy number variations (CNVs), differing numbers of copies of genetic sequence at locations in the genome, with phenotypes such as intellectual disability has been almost exclusively evaluated using clinically ascertained cohorts. The contribution of these genetic variants to cognitive phenotypes in the general population remains unclear. OBJECTIVE: To investigate the clinical features conferred by CNVs associated with known syndromes in adult carriers without clinical preselection and to assess the genome-wide consequences of rare CNVs (frequency ≤0.05%; size ≥250 kilobase pairs [kb]) on carriers' educational attainment and intellectual disability prevalence in the general population. DESIGN, SETTING, AND PARTICIPANTS: The population biobank of Estonia contains 52,000 participants enrolled from 2002 through 2010. General practitioners examined participants and filled out a questionnaire of health- and lifestyle-related questions, as well as reported diagnoses. Copy number variant analysis was conducted on a random sample of 7877 individuals and genotype-phenotype associations with education and disease traits were evaluated. Our results were replicated on a high-functioning group of 993 Estonians and 3 geographically distinct populations in the United Kingdom, the United States, and Italy. MAIN OUTCOMES AND MEASURES: Phenotypes of genomic disorders in the general population, prevalence of autosomal CNVs, and association of these variants with educational attainment (from less than primary school through scientific degree) and prevalence of intellectual disability. RESULTS: Of the 7877 in the Estonian cohort, we identified 56 carriers of CNVs associated with known syndromes. Their phenotypes, including cognitive and psychiatric problems, epilepsy, neuropathies, obesity, and congenital malformations are similar to those described for carriers of identical rearrangements ascertained in clinical cohorts. A genome-wide evaluation of rare autosomal CNVs (frequency, ≤0.05%; ≥250 kb) identified 831 carriers (10.5%) of the screened general population. Eleven of 216 (5.1%) carriers of a deletion of at least 250 kb (odds ratio [OR], 3.16; 95% CI, 1.51-5.98; P = 1.5e-03) and 6 of 102 (5.9%) carriers of a duplication of at least 1 Mb (OR, 3.67; 95% CI, 1.29-8.54; P = .008) had an intellectual disability compared with 114 of 6819 (1.7%) in the Estonian cohort. The mean education attainment was 3.81 (P = 1.06e-04) among 248 (≥250 kb) deletion carriers and 3.69 (P = 5.024e-05) among 115 duplication carriers (≥1 Mb). Of the deletion carriers, 33.5% did not graduate from high school (OR, 1.48; 95% CI, 1.12-1.95; P = .005) and 39.1% of duplication carriers did not graduate high school (OR, 1.89; 95% CI, 1.27-2.8; P = 1.6e-03). Evidence for an association between rare CNVs and lower educational attainment was supported by analyses of cohorts of adults from Italy and the United States and adolescents from the United Kingdom. CONCLUSIONS AND RELEVANCE: Known pathogenic CNVs in unselected, but assumed to be healthy, adult populations may be associated with unrecognized clinical sequelae. Additionally, individually rare but collectively common intermediate-size CNVs may be negatively associated with educational attainment. Replication of these findings in additional population groups is warranted given the potential implications of this observation for genomics research, clinical care, and public health.


Assuntos
Variações do Número de Cópias de DNA , Heterozigoto , Deficiência Intelectual/genética , Transtornos Mentais/genética , Adolescente , Adulto , Cognição , Escolaridade , Epilepsia/genética , Estônia , Feminino , Estudo de Associação Genômica Ampla , Humanos , Itália , Masculino , Obesidade/genética , Fenótipo , Reino Unido , Estados Unidos
13.
Bioinformatics ; 29(12): 1504-10, 2013 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-23620363

RESUMO

MOTIVATION: The number of missense mutations being identified in cancer genomes has greatly increased as a consequence of technological advances and the reduced cost of whole-genome/whole-exome sequencing methods. However, a high proportion of the amino acid substitutions detected in cancer genomes have little or no effect on tumour progression (passenger mutations). Therefore, accurate automated methods capable of discriminating between driver (cancer-promoting) and passenger mutations are becoming increasingly important. In our previous work, we developed the Functional Analysis through Hidden Markov Models (FATHMM) software and, using a model weighted for inherited disease mutations, observed improved performances over alternative computational prediction algorithms. Here, we describe an adaptation of our original algorithm that incorporates a cancer-specific model to potentiate the functional analysis of driver mutations. RESULTS: The performance of our algorithm was evaluated using two separate benchmarks. In our analysis, we observed improved performances when distinguishing between driver mutations and other germ line variants (both disease-causing and putatively neutral mutations). In addition, when discriminating between somatic driver and passenger mutations, we observed performances comparable with the leading computational prediction algorithms: SPF-Cancer and TransFIC. AVAILABILITY AND IMPLEMENTATION: A web-based implementation of our cancer-specific model, including a downloadable stand-alone package, is available at http://fathmm.biocompute.org.uk.


Assuntos
Substituição de Aminoácidos , Análise Mutacional de DNA/métodos , Neoplasias/genética , Algoritmos , Genômica , Humanos , Mutação de Sentido Incorreto , Software
14.
Hum Mutat ; 34(1): 57-65, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23033316

RESUMO

The rate at which nonsynonymous single nucleotide polymorphisms (nsSNPs) are being identified in the human genome is increasing dramatically owing to advances in whole-genome/whole-exome sequencing technologies. Automated methods capable of accurately and reliably distinguishing between pathogenic and functionally neutral nsSNPs are therefore assuming ever-increasing importance. Here, we describe the Functional Analysis Through Hidden Markov Models (FATHMM) software and server: a species-independent method with optional species-specific weightings for the prediction of the functional effects of protein missense variants. Using a model weighted for human mutations, we obtained performance accuracies that outperformed traditional prediction methods (i.e., SIFT, PolyPhen, and PANTHER) on two separate benchmarks. Furthermore, in one benchmark, we achieve performance accuracies that outperform current state-of-the-art prediction methods (i.e., SNPs&GO and MutPred). We demonstrate that FATHMM can be efficiently applied to high-throughput/large-scale human and nonhuman genome sequencing projects with the added benefit of phenotypic outcome associations. To illustrate this, we evaluated nsSNPs in wheat (Triticum spp.) to identify some of the important genetic variants responsible for the phenotypic differences introduced by intense selection during domestication. A Web-based implementation of FATHMM, including a high-throughput batch facility and a downloadable standalone package, is available at http://fathmm.biocompute.org.uk.


Assuntos
Algoritmos , Substituição de Aminoácidos , Biologia Computacional/métodos , Mutação , Proteínas/genética , Estudos de Associação Genética/métodos , Genótipo , Humanos , Internet , Fenótipo , Polimorfismo de Nucleotídeo Único , Proteínas/metabolismo , Reprodutibilidade dos Testes , Software , Triticum/genética
15.
Elife ; 72018 05 30.
Artigo em Inglês | MEDLINE | ID: mdl-29846171

RESUMO

Results from genome-wide association studies (GWAS) can be used to infer causal relationships between phenotypes, using a strategy known as 2-sample Mendelian randomization (2SMR) and bypassing the need for individual-level data. However, 2SMR methods are evolving rapidly and GWAS results are often insufficiently curated, undermining efficient implementation of the approach. We therefore developed MR-Base (http://www.mrbase.org): a platform that integrates a curated database of complete GWAS results (no restrictions according to statistical significance) with an application programming interface, web app and R packages that automate 2SMR. The software includes several sensitivity analyses for assessing the impact of horizontal pleiotropy and other violations of assumptions. The database currently comprises 11 billion single nucleotide polymorphism-trait associations from 1673 GWAS and is updated on a regular basis. Integrating data with software ensures more rigorous application of hypothesis-driven analyses and allows millions of potential causal relationships to be efficiently evaluated in phenome-wide association studies.


Assuntos
Análise da Randomização Mendeliana , LDL-Colesterol/metabolismo , Doença das Coronárias/etiologia , Bases de Dados Genéticas , Pleiotropia Genética , Estudo de Associação Genômica Ampla , Humanos , Modelos Genéticos , Fenótipo , Polimorfismo de Nucleotídeo Único/genética
16.
Sci Rep ; 7(1): 11597, 2017 09 14.
Artigo em Inglês | MEDLINE | ID: mdl-28912487

RESUMO

For somatic point mutations in coding and non-coding regions of the genome, we propose CScape, an integrative classifier for predicting the likelihood that mutations are cancer drivers. Tested on somatic mutations, CScape tends to outperform alternative methods, reaching 91% balanced accuracy in coding regions and 70% in non-coding regions, while even higher accuracy may be achieved using thresholds to isolate high-confidence predictions. Positive predictions tend to cluster in genomic regions, so we apply a statistical approach to isolate coding and non-coding regions of the cancer genome that appear enriched for high-confidence predicted disease-drivers. Predictions and software are available at http://CScape.biocompute.org.uk/ .


Assuntos
Genoma Humano , Genômica/métodos , Neoplasias/genética , Mutação Puntual , Software , Biologia Computacional/métodos , Bases de Dados Genéticas , Humanos , Anotação de Sequência Molecular , Curva ROC , Navegador
17.
Stud Health Technol Inform ; 235: 91-95, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28423762

RESUMO

Sequencing data will become widely available in clinical practice within the near future. Uptake of sequence data is currently being stimulated within the UK through the government-funded 100,000 genomes project (Genomics England), with many similar initiatives being planned and supported internationally. The analysis of the large volumes of data derived from sequencing programmes poses a major challenge for data analysis. In this paper we outline progress we have made in the development of predictors for estimating the pathogenic impact of single nucleotide variants, indels and haploinsufficiency in the human genome. The accuracy of these methods is enhanced through the development of disease-specific predictors, trained on appropriate data, and used within a specific disease context. We outline current research on the development of disease-specific predictors, specifically in the context of cancer research.


Assuntos
Genoma Humano , Análise de Sequência de DNA , Inglaterra , Genômica , Humanos , Mutação INDEL , Neoplasias/genética , Polimorfismo de Nucleotídeo Único
18.
Diabetes ; 66(6): 1713-1722, 2017 06.
Artigo em Inglês | MEDLINE | ID: mdl-28246294

RESUMO

Several studies have investigated the relationship between genetic variation and DNA methylation with respect to type 2 diabetes, but it is unknown if DNA methylation is a mediator in the disease pathway or if it is altered in response to disease state. This study uses genotypic information as a causal anchor to help decipher the likely role of DNA methylation measured in peripheral blood in the etiology of type 2 diabetes. Illumina HumanMethylation450 BeadChip data were generated on 1,018 young individuals from the Avon Longitudinal Study of Parents and Children (ALSPAC) cohort. In stage 1, 118 unique associations between published type 2 diabetes single nucleotide polymorphisms (SNPs) and genome-wide methylation (methylation quantitative trait loci [mQTLs]) were identified. In stage 2, a further 226 mQTLs were identified between 202 additional independent non-type 2 diabetes SNPs and CpGs identified in stage 1. Where possible, associations were replicated in independent cohorts of similar age. We discovered that around half of known type 2 diabetes SNPs are associated with variation in DNA methylation and postulated that methylation could either be on a causal pathway to future disease or could be a noncausal biomarker. For one locus (KCNQ1), we were able to provide further evidence that methylation is likely to be on the causal pathway to disease in later life.


Assuntos
Metilação de DNA/genética , Diabetes Mellitus Tipo 2/genética , Canal de Potássio KCNQ1/genética , Adolescente , Estudos de Coortes , Ilhas de CpG , Feminino , Genótipo , Humanos , Masculino , Polimorfismo de Nucleotídeo Único , Estudos Prospectivos , Locos de Características Quantitativas
19.
Ann Clin Biochem ; 54(4): 472-480, 2017 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-27555663

RESUMO

Background One of the kallikrein genes ( KLK3) encodes prostate-specific antigen, a key biomarker for prostate cancer. A number of factors, both genetic and non-genetic, determine variation of serum prostate-specific antigen concentrations in the population. We have recently found three KLK3 deletions in individuals with very low prostate-specific antigen concentrations, suggesting a link between abnormally reduced KLK3 expression and deletions of KLK3. Here, we aim to determine the frequency of kallikrein gene 3 deletions in the general population. Methods The frequency of KLK3 deletions in the general population was estimated from the 1958 Birth Cohort sample ( n = 3815) using amplification ratiometry control system. In silico analyses using PennCNV were carried out in the same cohort and in NBS-WTCCC2 in order to provide an independent estimation of the frequency of KLK3 deletions in the general population. Results Amplification ratiometry control system results from the 1958 cohort indicated a frequency of KLK3 deletions of 0.81% (3.98% following a less stringent calling criterion). From in silico analyses, we found that potential deletions harbouring the KLK3 gene occurred at rates of 2.13% (1958 Cohort, n = 2867) and 0.99% (NBS-WTCCC2, n = 2737), respectively. These results are in good agreement with our in vitro experiments. All deletions found were in heterozygosis. Conclusions We conclude that a number of individuals from the general population present KLK3 deletions in heterozygosis. Further studies are required in order to know if interpretation of low serum prostate-specific antigen concentrations in individuals with KLK3 deletions may offer false-negative assurances with consequences for prostate cancer screening, diagnosis and monitoring.


Assuntos
Biomarcadores Tumorais/genética , Calicreínas/genética , Taxa de Mutação , Antígeno Prostático Específico/genética , Neoplasias da Próstata/diagnóstico , Neoplasias da Próstata/genética , Estudos de Coortes , Reações Falso-Negativas , Deleção de Genes , Expressão Gênica , Heterozigoto , Humanos , Calicreínas/deficiência , Masculino , Pessoa de Meia-Idade , Monitorização Fisiológica , Prognóstico , Antígeno Prostático Específico/deficiência , Neoplasias da Próstata/patologia
20.
PLoS One ; 11(4): e0153803, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27128313

RESUMO

BACKGROUND: It has become common practice to analyse large scale sequencing data with statistical approaches based around the aggregation of rare variants within the same gene. We applied a novel approach to rare variant analysis by collapsing variants together using protein domain and family coordinates, regarded to be a more discrete definition of a biologically functional unit. METHODS: Using Pfam definitions, we collapsed rare variants (Minor Allele Frequency ≤ 1%) together in three different ways 1) variants within single genomic regions which map to individual protein domains 2) variants within two individual protein domain regions which are predicted to be responsible for a protein-protein interaction 3) all variants within combined regions from multiple genes responsible for coding the same protein domain (i.e. protein families). A conventional collapsing analysis using gene coordinates was also undertaken for comparison. We used UK10K sequence data and investigated associations between regions of variants and lipid traits using the sequence kernel association test (SKAT). RESULTS: We observed no strong evidence of association between regions of variants based on Pfam domain definitions and lipid traits. Quantile-Quantile plots illustrated that the overall distributions of p-values from the protein domain analyses were comparable to that of a conventional gene-based approach. Deviations from this distribution suggested that collapsing by either protein domain or gene definitions may be favourable depending on the trait analysed. CONCLUSION: We have collapsed rare variants together using protein domain and family coordinates to present an alternative approach over collapsing across conventionally used gene-based regions. Although no strong evidence of association was detected in these analyses, future studies may still find value in adopting these approaches to detect previously unidentified association signals.


Assuntos
Variação Genética , Domínios Proteicos/genética , Estudos de Coortes , Simulação por Computador , DNA/genética , Interpretação Estatística de Dados , Feminino , Interação Gene-Ambiente , Estudos de Associação Genética , Humanos , Masculino , Polimorfismo de Nucleotídeo Único , Domínios e Motivos de Interação entre Proteínas/genética , Sistema de Registros , Gêmeos/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA