Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Hum Genet ; 141(9): 1515-1528, 2022 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-34862561

RESUMO

Genetic data have become increasingly complex within the past decade, leading researchers to pursue increasingly complex questions, such as those involving epistatic interactions and protein prediction. Traditional methods are ill-suited to answer these questions, but machine learning (ML) techniques offer an alternative solution. ML algorithms are commonly used in genetics to predict or classify subjects, but some methods evaluate which features (variables) are responsible for creating a good prediction; this is called feature importance. This is critical in genetics, as researchers are often interested in which features (e.g., SNP genotype or environmental exposure) are responsible for a good prediction. This allows for the deeper analysis beyond simple prediction, including the determination of risk factors associated with a given phenotype. Feature importance further permits the researcher to peer inside the black box of many ML algorithms to see how they work and which features are critical in informing a good prediction. This review focuses on ML methods that provide feature importance metrics for the analysis of genetic data. Five major categories of ML algorithms: k nearest neighbors, artificial neural networks, deep learning, support vector machines, and random forests are described. The review ends with a discussion of how to choose the best machine for a data set. This review will be particularly useful for genetic researchers looking to use ML methods to answer questions beyond basic prediction and classification.


Assuntos
Aprendizado de Máquina , Máquina de Vetores de Suporte , Algoritmos , Humanos , Redes Neurais de Computação
2.
Nat Rev Genet ; 16(2): 85-97, 2015 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-25582081

RESUMO

Recent technological advances have expanded the breadth of available omic data, from whole-genome sequencing data, to extensive transcriptomic, methylomic and metabolomic data. A key goal of analyses of these data is the identification of effective models that predict phenotypic traits and outcomes, elucidating important biomarkers and generating important insights into the genetic underpinnings of the heritability of complex traits. There is still a need for powerful and advanced analysis strategies to fully harness the utility of these comprehensive high-throughput data, identifying true associations and reducing the number of false associations. In this Review, we explore the emerging approaches for data integration - including meta-dimensional and multi-staged analyses - which aim to deepen our understanding of the role of genetics and genomics in complex outcomes. With the use and further development of these approaches, an improved understanding of the relationship between genomic variation and human phenotypes may be revealed.


Assuntos
Interpretação Estatística de Dados , Variação Genética , Genótipo , Padrões de Herança/fisiologia , Modelos Biológicos , Fenótipo , Biologia de Sistemas/métodos , Humanos , Metanálise como Assunto
3.
Am J Hum Genet ; 99(4): 877-885, 2016 Oct 06.
Artigo em Inglês | MEDLINE | ID: mdl-27666373

RESUMO

The vast majority of coding variants are rare, and assessment of the contribution of rare variants to complex traits is hampered by low statistical power and limited functional data. Improved methods for predicting the pathogenicity of rare coding variants are needed to facilitate the discovery of disease variants from exome sequencing studies. We developed REVEL (rare exome variant ensemble learner), an ensemble method for predicting the pathogenicity of missense variants on the basis of individual tools: MutPred, FATHMM, VEST, PolyPhen, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, GERP, SiPhy, phyloP, and phastCons. REVEL was trained with recently discovered pathogenic and rare neutral missense variants, excluding those previously used to train its constituent tools. When applied to two independent test sets, REVEL had the best overall performance (p < 10-12) as compared to any individual tool and seven ensemble methods: MetaSVM, MetaLR, KGGSeq, Condel, CADD, DANN, and Eigen. Importantly, REVEL also had the best performance for distinguishing pathogenic from rare neutral variants with allele frequencies <0.5%. The area under the receiver operating characteristic curve (AUC) for REVEL was 0.046-0.182 higher in an independent test set of 935 recent SwissVar disease variants and 123,935 putatively neutral exome sequencing variants and 0.027-0.143 higher in an independent test set of 1,953 pathogenic and 2,406 benign variants recently reported in ClinVar than the AUCs for other ensemble methods. We provide pre-computed REVEL scores for all possible human missense variants to facilitate the identification of pathogenic variants in the sea of rare variants discovered as sequencing studies expand in scale.


Assuntos
Doença/genética , Mutação de Sentido Incorreto/genética , Software , Área Sob a Curva , Análise Mutacional de DNA , Exoma/genética , Frequência do Gene , Humanos , Curva ROC
4.
BMC Med Genet ; 20(1): 27, 2019 01 31.
Artigo em Inglês | MEDLINE | ID: mdl-30704416

RESUMO

BACKGROUND: Myopia is one of most common eye diseases in the world and affects 1 in 4 Americans. It is a complex disease caused by both environmental and genetics effects; the genetics effects are still not well understood. In this study, we performed genetic linkage analyses on Ashkenazi Jewish families with a strong familial history of myopia to elucidate any potential causal genes. METHODS: Sixty-four extended Ashkenazi Jewish families were previously collected from New Jersey. Genotypes from the Illumina ExomePlus array were merged with prior microsatellite linkage data from these families. Additional custom markers were added for candidate regions reported in literature for myopia or refractive error. Myopia was defined as mean spherical equivalent (MSE) of -1D or worse and parametric two-point linkage analyses (using TwoPointLods) and multi-point linkage analyses (using SimWalk2) were performed as well as collapsed haplotype pattern (CHP) analysis in SEQLinkage and association analyses performed with FBAT and rv-TDT. RESULTS: Strongest evidence of linkage was on 1p36(two-point LOD = 4.47) a region previously linked to refractive error (MYP14) but not myopia. Another genome-wide significant locus was found on 8q24.22 with a maximum two-point LOD score of 3.75. CHP analysis also detected the signal on 1p36, localized to the LINC00339 gene with a maximum HLOD of 3.47, as well as genome-wide significant signals on 7q36.1 and 11p15, which overlaps with the MYP7 locus. CONCLUSIONS: We identified 2 novel linkage peaks for myopia on chromosomes 7 and 8 in these Ashkenazi Jewish families and replicated 2 more loci on chromosomes 1 and 11, one previously reported in refractive error but not myopia in these families and the other locus previously reported in the literature. Strong candidate genes have been identified within these linkage peaks in our families. Targeted sequencing in these regions will be necessary to definitively identify causal variants under these linkage peaks.


Assuntos
Cromossomos Humanos/genética , Técnicas de Genotipagem/métodos , Judeus/genética , Miopia/genética , Cromossomos Humanos Par 1/genética , Cromossomos Humanos Par 11/genética , Cromossomos Humanos Par 7/genética , Cromossomos Humanos Par 8/genética , Exoma , Feminino , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Escore Lod , Masculino , Miopia/etnologia , Linhagem , RNA Longo não Codificante/genética
5.
Hum Genet ; 136(2): 165-178, 2017 02.
Artigo em Inglês | MEDLINE | ID: mdl-27848076

RESUMO

Genetic loci explain only 25-30 % of the heritability observed in plasma lipid traits. Epistasis, or gene-gene interactions may contribute to a portion of this missing heritability. Using the genetic data from five NHLBI cohorts of 24,837 individuals, we combined the use of the quantitative multifactor dimensionality reduction (QMDR) algorithm with two SNP-filtering methods to exhaustively search for SNP-SNP interactions that are associated with HDL cholesterol (HDL-C), LDL cholesterol (LDL-C), total cholesterol (TC) and triglycerides (TG). SNPs were filtered either on the strength of their independent effects (main effect filter) or the prior knowledge supporting a given interaction (Biofilter). After the main effect filter, QMDR identified 20 SNP-SNP models associated with HDL-C, 6 associated with LDL-C, 3 associated with TC, and 10 associated with TG (permutation P value <0.05). With the use of Biofilter, we identified 2 SNP-SNP models associated with HDL-C, 3 associated with LDL-C, 1 associated with TC and 8 associated with TG (permutation P value <0.05). In an independent dataset of 7502 individuals from the eMERGE network, we replicated 14 of the interactions identified after main effect filtering: 11 for HDL-C, 1 for LDL-C and 2 for TG. We also replicated 23 of the interactions found to be associated with TG after applying Biofilter. Prior knowledge supports the possible role of these interactions in the genetic etiology of lipid traits. This study also presents a computationally efficient pipeline for analyzing data from large genotyping arrays and detecting SNP-SNP interactions that are not primarily driven by strong main effects.


Assuntos
Doenças Cardiovasculares/genética , HDL-Colesterol/sangue , LDL-Colesterol/sangue , Epistasia Genética , Fenótipo , Triglicerídeos/sangue , Índice de Massa Corporal , Doenças Cardiovasculares/sangue , Estudos de Coortes , Feminino , Loci Gênicos , Marcadores Genéticos , Genoma Humano , Técnicas de Genotipagem , Humanos , Modelos Lineares , Desequilíbrio de Ligação , Masculino , Redução Dimensional com Múltiplos Fatores , Polimorfismo de Nucleotídeo Único
6.
BMC Genet ; 17 Suppl 2: 1, 2016 Feb 03.
Artigo em Inglês | MEDLINE | ID: mdl-26866367

RESUMO

In the analysis of current genomic data, application of machine learning and data mining techniques has become more attractive given the rising complexity of the projects. As part of the Genetic Analysis Workshop 19, approaches from this domain were explored, mostly motivated from two starting points. First, assuming an underlying structure in the genomic data, data mining might identify this and thus improve downstream association analyses. Second, computational methods for machine learning need to be developed further to efficiently deal with the current wealth of data.In the course of discussing results and experiences from the machine learning and data mining approaches, six common messages were extracted. These depict the current state of these approaches in the application to complex genomic data. Although some challenges remain for future studies, important forward steps were taken in the integration of different data types and the evaluation of the evidence. Mining the data for underlying genetic or phenotypic structure and using this information in subsequent analyses proved to be extremely helpful and is likely to become of even greater use with more complex data sets.


Assuntos
Mineração de Dados/métodos , Genômica/métodos , Biologia Computacional/métodos , Testes Genéticos , Humanos , Aprendizado de Máquina
7.
Bioinformatics ; 30(5): 698-705, 2014 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-24149050

RESUMO

MOTIVATION: Advancements in high-throughput technology have allowed researchers to examine the genetic etiology of complex human traits in a robust fashion. Although genome-wide association studies have identified many novel variants associated with hundreds of traits, a large proportion of the estimated trait heritability remains unexplained. One hypothesis is that the commonly used statistical techniques and study designs are not robust to the complex etiology that may underlie these human traits. This etiology could include non-linear gene × gene or gene × environment interactions. Additionally, other levels of biological regulation may play a large role in trait variability. RESULTS: To address the need for computational tools that can explore enormous datasets to detect complex susceptibility models, we have developed a software package called the Analysis Tool for Heritable and Environmental Network Associations (ATHENA). ATHENA combines various variable filtering methods with machine learning techniques to analyze high-throughput categorical (i.e. single nucleotide polymorphisms) and quantitative (i.e. gene expression levels) predictor variables to generate multivariable models that predict either a categorical (i.e. disease status) or quantitative (i.e. cholesterol levels) outcomes. The goal of this article is to demonstrate the utility of ATHENA using simulated and biological datasets that consist of both single nucleotide polymorphisms and gene expression variables to identify complex prediction models. Importantly, this method is flexible and can be expanded to include other types of high-throughput data (i.e. RNA-seq data and biomarker measurements). AVAILABILITY: ATHENA is freely available for download. The software, user manual and tutorial can be downloaded from http://ritchielab.psu.edu/ritchielab/software.


Assuntos
Interação Gene-Ambiente , Estudo de Associação Genômica Ampla , Software , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único
8.
Annu Rev Biomed Data Sci ; 7(1): 59-81, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38608311

RESUMO

Open Targets, a consortium among academic and industry partners, focuses on using human genetics and genomics to provide insights to key questions that build therapeutic hypotheses. Large-scale experiments generate foundational data, and open-source informatic platforms systematically integrate evidence for target-disease relationships and provide dynamic tooling for target prioritization. A locus-to-gene machine learning model uses evidence from genome-wide association studies (GWAS Catalog, UK BioBank, and FinnGen), functional genomic studies, epigenetic studies, and variant effect prediction to predict potential drug targets for complex diseases. These predictions are combined with genetic evidence from gene burden analyses, rare disease genetics, somatic mutations, perturbation assays, pathway analyses, scientific literature, differential expression, and mouse models to systematically build target-disease associations (https://platform.opentargets.org). Scored target attributes such as clinical precedence, tractability, and safety guide target prioritization. Here we provide our perspective on the value and impact of human genetics and genomics for generating therapeutic hypotheses.


Assuntos
Genômica , Humanos , Genômica/métodos , Estudo de Associação Genômica Ampla , Genética Humana , Animais , Aprendizado de Máquina , Terapia de Alvo Molecular
9.
Mol Genet Genomic Med ; 11(8): e2179, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37070724

RESUMO

BACKGROUND: Oral clefts and ectrodactyly are common, heterogeneous birth defects. We performed whole-exome sequencing (WES) analysis in a Syrian family. The proband presented with both orofacial clefting and ectrodactyly but not ectodermal dysplasia as typically seen in ectrodactyly, ectodermal dysplasia, and cleft lip/palate syndrome-3. A paternal uncle with only an oral cleft was deceased and unavailable for analysis. METHODS: Variant annotation, Mendelian inconsistencies, and novel variants in known cleft genes were examined. Candidate variants were validated using Sanger sequencing, and pathogenicity assessed by knocking out the tp63 gene in zebrafish to evaluate its role during zebrafish development. RESULTS: Twenty-eight candidate de novo events were identified, one of which is in a known oral cleft and ectrodactyly gene, TP63 (c.956G > T, p.Arg319Leu), and confirmed by Sanger sequencing. CONCLUSION: TP63 mutations are associated with multiple autosomal dominant orofacial clefting and limb malformation disorders. The p.Arg319Leu mutation seen in this patient is de novo but also novel. Two known mutations in the same codon (c.956G > A, p.(Arg319His; rs121908839, c.955C > T), p.Arg319Cys) cause ectrodactyly, providing evidence that mutating this codon is deleterious. While this TP63 mutation is the best candidate for the patient's clinical presentation, whether it is responsible for the entire phenotype is unclear. Generation and characterization of tp63 knockout zebrafish showed necrosis and rupture of the head at 3 days post-fertilization (dpf). The embryonic phenotype could not be rescued by injection of zebrafish or human messenger RNA (mRNA). Further functional analysis is needed to determine what proportion of the phenotype is due to this mutation.


Assuntos
Fenda Labial , Fissura Palatina , Humanos , Animais , Fenda Labial/genética , Fissura Palatina/genética , Peixe-Zebra/genética , Sequenciamento do Exoma , Síria , Mutação , Fatores de Transcrição/genética , Proteínas Supressoras de Tumor/genética
10.
Pharmacogenet Genomics ; 22(12): 858-67, 2012 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-23080225

RESUMO

OBJECTIVES: Prior candidate gene studies have associated CYP2B6 516G→T [rs3745274] and 983T→C [rs28399499] with increased plasma efavirenz exposure. We sought to identify novel variants associated with efavirenz pharmacokinetics. MATERIALS AND METHODS: Antiretroviral therapy-naive AIDS Clinical Trials Group studies A5202, A5095, and ACTG 384 included plasma sampling for efavirenz pharmacokinetics. Log-transformed trough efavirenz concentrations (Cmin) were previously estimated by population pharmacokinetic modeling. Stored DNA was genotyped with Illumina HumanHap 650Y or 1MDuo platforms, complemented by additional targeted genotyping of CYP2B6 and CYP2A6 with MassARRAY iPLEX Gold. Associations were identified by linear regression, which included principal component vectors to adjust for genetic ancestry. RESULTS: Among 856 individuals, CYP2B6 516G→T was associated with efavirenz estimated Cmin (P=8.5×10). After adjusting for CYP2B6 516G→T, CYP2B6 983T→C was associated (P=9.9×10). After adjusting for both CYP2B6 516G→T and 983T→C, a CYP2B6 variant (rs4803419) in intron 3 was associated (P=4.4×10). After adjusting for all the three variants, non-CYP2B6 polymorphisms were associated at P-value less than 5×10. In a separate cohort of 240 individuals, only the three CYP2B6 polymorphisms replicated. These three polymorphisms explained 34% of interindividual variability in efavirenz estimated Cmin. The extensive metabolizer phenotype was best defined by the absence of all three polymorphisms. CONCLUSION: Three CYP2B6 polymorphisms were independently associated with efavirenz estimated Cmin at genome-wide significance, and explained one-third of interindividual variability. These data will inform continued efforts to translate pharmacogenomic knowledge into optimal efavirenz utilization.


Assuntos
Fármacos Anti-HIV/farmacocinética , Hidrocarboneto de Aril Hidroxilases/genética , Benzoxazinas/farmacocinética , Variação Genética , Oxirredutases N-Desmetilantes/genética , Alcinos , Fármacos Anti-HIV/uso terapêutico , Benzoxazinas/sangue , Protocolos Clínicos , Ciclopropanos , Citocromo P-450 CYP2B6 , Estudos de Associação Genética , Genoma Humano , Estudo de Associação Genômica Ampla , Infecções por HIV/tratamento farmacológico , Infecções por HIV/genética , Humanos , Polimorfismo Genético
11.
J Neurovirol ; 18(6): 511-20, 2012 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-23073667

RESUMO

HIV-associated sensory neuropathy remains an important complication of combination antiretroviral therapy and HIV infection. Mitochondrial DNA haplogroups and single nucleotide polymorphisms (SNPs) have previously been associated with symptomatic neuropathy in clinical trial participants. We examined associations between mitochondrial DNA variation and HIV-associated sensory neuropathy in CNS HIV Antiretroviral Therapy Effects Research (CHARTER). CHARTER is a USA-based longitudinal observational study of HIV-infected adults who underwent a structured interview and standardized examination. HIV-associated sensory neuropathy was determined by trained examiners as ≥1 sign (diminished vibratory and sharp-dull discrimination or ankle reflexes) bilaterally. Mitochondrial DNA sequencing was performed and haplogroups were assigned by published algorithms. Multivariable logistic regression of associations between mitochondrial DNA SNPs, haplogroups, and HIV-associated sensory neuropathy were performed. In analyses of associations of each mitochondrial DNA SNP with HIV-associated sensory neuropathy, the two most significant SNPs were at positions A12810G [odds ratio (95 % confidence interval) = 0.27 (0.11-0.65); p = 0.004] and T489C [odds ratio (95 % confidence interval) = 0.41 (0.21-0.80); p = 0.009]. These synonymous changes are known to define African haplogroup L1c and European haplogroup J, respectively. Both haplogroups were associated with decreased prevalence of HIV-associated sensory neuropathy compared with all other haplogroups [odds ratio (95 % confidence interval) = 0.29 (0.12-0.71); p = 0.007 and odds ratio (95 % confidence interval) = 0.42 (0.18-1.0); p = 0.05, respectively]. In conclusion, in this cohort of mostly combination antiretroviral therapy-treated subjects, two common mitochondrial DNA SNPs and their corresponding haplogroups were associated with a markedly decreased prevalence of HIV-associated sensory neuropathy.


Assuntos
DNA Mitocondrial/genética , Infecções por HIV/genética , HIV-1 , Mitocôndrias/genética , Polimorfismo de Nucleotídeo Único , Polineuropatias/genética , Adulto , População Negra , Feminino , Infecções por HIV/complicações , Infecções por HIV/tratamento farmacológico , Infecções por HIV/patologia , Haplótipos , Humanos , Masculino , Pessoa de Meia-Idade , Polineuropatias/tratamento farmacológico , Polineuropatias/etiologia , Polineuropatias/patologia , Estudos Prospectivos , População Branca
12.
mSphere ; 5(3)2020 05 06.
Artigo em Inglês | MEDLINE | ID: mdl-32376702

RESUMO

Bezlotoxumab is a human monoclonal antibody against Clostridium difficile toxin B, indicated to prevent recurrence of C. difficile infection (rCDI) in high-risk adults receiving antibacterial treatment for CDI. An exploratory genome-wide association study investigated whether human genetic variation influences bezlotoxumab response. DNA from 704 participants who achieved initial clinical cure in the phase 3 MODIFY I/II trials was genotyped. Single nucleotide polymorphisms (SNPs) and human leukocyte antigen (HLA) imputation were performed using IMPUTE2 and HIBAG, respectively. A joint test of genotype and genotype-by-treatment interaction in a logistic regression model was used to screen genetic variants associated with response to bezlotoxumab. The SNP rs2516513 and the HLA alleles HLA-DRB1*07:01 and HLA-DQA1*02:01, located in the extended major histocompatibility complex on chromosome 6, were associated with the reduction of rCDI in bezlotoxumab-treated participants. Carriage of a minor allele (homozygous or heterozygous) at any of the identified loci was related to a larger difference in the proportion of participants experiencing rCDI versus placebo; the effect was most prominent in the subgroup at high baseline risk for rCDI. Genotypes associated with an improved bezlotoxumab response showed no association with rCDI in the placebo cohort. These data suggest that a host-driven, immunological mechanism may impact bezlotoxumab response. Trial registration numbers are as follows: NCT01241552 (MODIFY I) and NCT01513239 (MODIFY II).IMPORTANCEClostridium difficile infection is associated with significant clinical morbidity and mortality; antibacterial treatments are effective, but recurrence of C. difficile infection is common. In this genome-wide association study, we explored whether host genetic variability affected treatment responses to bezlotoxumab, a human monoclonal antibody that binds C. difficile toxin B and is indicated for the prevention of recurrent C. difficile infection. Using data from the MODIFY I/II phase 3 clinical trials, we identified three genetic variants associated with reduced rates of C. difficile infection recurrence in bezlotoxumab-treated participants. The effects were most pronounced in participants at high risk of C. difficile infection recurrence. All three variants are located in the extended major histocompatibility complex on chromosome 6, suggesting the involvement of a host-driven immunological mechanism in the prevention of C. difficile infection recurrence.


Assuntos
Anticorpos Monoclonais/uso terapêutico , Anticorpos Amplamente Neutralizantes/uso terapêutico , Clostridioides difficile/efeitos dos fármacos , Infecções por Clostridium/tratamento farmacológico , Infecções por Clostridium/genética , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Alelos , Anticorpos Neutralizantes/sangue , Feminino , Estudo de Associação Genômica Ampla , Genótipo , Antígenos HLA-D/genética , Humanos , Masculino , Pessoa de Meia-Idade , Polimorfismo de Nucleotídeo Único , Recidiva , Adulto Jovem
13.
Mol Genet Genomic Med ; 5(5): 570-579, 2017 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-28944239

RESUMO

BACKGROUND: Nonsyndromic oral clefts are craniofacial malformations, which include cleft lip with or without cleft palate. The etiology for oral clefts is complex with both genetic and environmental factors contributing to risk. Previous genome-wide association (GWAS) studies have identified multiple loci with small effects; however, many causal variants remain elusive. METHODS: In this study, we address this by specifically looking for rare, potentially damaging variants in family-based data. We analyzed both whole exome sequence (WES) data and whole genome sequence (WGS) data in multiplex cleft families to identify variants shared by affected individuals. RESULTS: Here we present the results from these analyses. Our most interesting finding was from a single Syrian family, which showed enrichment of nonsynonymous and potentially damaging rare variants in two genes: CASP9 and FAT4. CONCLUSION: Neither of these candidate genes has previously been associated with oral clefts and, if confirmed as contributing to disease risk, may indicate novel biological pathways in the genetic etiology for oral clefts.

14.
BioData Min ; 10: 25, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28770004

RESUMO

BACKGROUND: The genetic etiology of human lipid quantitative traits is not fully elucidated, and interactions between variants may play a role. We performed a gene-centric interaction study for four different lipid traits: low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), total cholesterol (TC), and triglycerides (TG). RESULTS: Our analysis consisted of a discovery phase using a merged dataset of five different cohorts (n = 12,853 to n = 16,849 depending on lipid phenotype) and a replication phase with ten independent cohorts totaling up to 36,938 additional samples. Filters are often applied before interaction testing to correct for the burden of testing all pairwise interactions. We used two different filters: 1. A filter that tested only single nucleotide polymorphisms (SNPs) with a main effect of p < 0.001 in a previous association study. 2. A filter that only tested interactions identified by Biofilter 2.0. Pairwise models that reached an interaction significance level of p < 0.001 in the discovery dataset were tested for replication. We identified thirteen SNP-SNP models that were significant in more than one replication cohort after accounting for multiple testing. CONCLUSIONS: These results may reveal novel insights into the genetic etiology of lipid levels. Furthermore, we developed a pipeline to perform a computationally efficient interaction analysis with multi-cohort replication.

15.
BioData Min ; 9: 7, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26839594

RESUMO

BACKGROUND: Machine learning methods and in particular random forests (RFs) are a promising alternative to standard single SNP analyses in genome-wide association studies (GWAS). RFs provide variable importance measures (VIMs) to rank SNPs according to their predictive power. However, in contrast to the established genome-wide significance threshold, no clear criteria exist to determine how many SNPs should be selected for downstream analyses. RESULTS: We propose a new variable selection approach, recurrent relative variable importance measure (r2VIM). Importance values are calculated relative to an observed minimal importance score for several runs of RF and only SNPs with large relative VIMs in all of the runs are selected as important. Evaluations on simulated GWAS data show that the new method controls the number of false-positives under the null hypothesis. Under a simple alternative hypothesis with several independent main effects it is only slightly less powerful than logistic regression. In an experimental GWAS data set, the same strong signal is identified while the approach selects none of the SNPs in an underpowered GWAS. CONCLUSIONS: The novel variable selection method r2VIM is a promising extension to standard RF for objectively selecting relevant SNPs in GWAS while controlling the number of false-positive results.

16.
BMC Proc ; 10(Suppl 7): 147-152, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27980627

RESUMO

Current findings from genetic studies of complex human traits often do not explain a large proportion of the estimated variation of these traits due to genetic factors. This could be, in part, due to overly stringent significance thresholds in traditional statistical methods, such as linear and logistic regression. Machine learning methods, such as Random Forests (RF), are an alternative approach to identify potentially interesting variants. One major issue with these methods is that there is no clear way to distinguish between probable true hits and noise variables based on the importance metric calculated. To this end, we are developing a method called the Relative Recurrency Variable Importance Metric (r2VIM), a RF-based variable selection method. Here, we apply r2VIM to the unrelated Genetic Analysis Workshop 19 data with simulated systolic blood pressure as the phenotype. We compare the number of "true" functional variants identified by r2VIM with those identified by linear regression analyses that use a Bonferroni correction to calculate a significance threshold. Our results show that r2VIM performed comparably to linear regression. Our findings are proof-of-concept for r2VIM, as it identifies a similar number of functional and nonfunctional variants as a more commonly used technique when the optimal importance score threshold is used.

17.
Pac Symp Biocomput ; : 195-206, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25592581

RESUMO

Standard analysis methods for genome wide association studies (GWAS) are not robust to complex disease models, such as interactions between variables with small main effects. These types of effects likely contribute to the heritability of complex human traits. Machine learning methods that are capable of identifying interactions, such as Random Forests (RF), are an alternative analysis approach. One caveat to RF is that there is no standardized method of selecting variables so that false positives are reduced while retaining adequate power. To this end, we have developed a novel variable selection method called relative recurrency variable importance metric (r2VIM). This method incorporates recurrency and variance estimation to assist in optimal threshold selection. For this study, we specifically address how this method performs in data with almost completely epistatic effects (i.e. no marginal effects). Our results show that with appropriate parameter settings, r2VIM can identify interaction effects when the marginal effects are virtually nonexistent. It also outperforms logistic regression, which has essentially no power under this type of model when the number of potential features (genetic variants) is large. (All Supplementary Data can be found here: http://research.nhgri.nih.gov/manuscripts/Bailey-Wilson/r2VIM_epi/).


Assuntos
Epistasia Genética , Modelos Genéticos , Algoritmos , Biologia Computacional , Simulação por Computador , Bases de Dados Genéticas , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Desequilíbrio de Ligação , Modelos Logísticos , Aprendizado de Máquina , Polimorfismo de Nucleotídeo Único , Razão Sinal-Ruído
18.
Pac Symp Biocomput ; : 495-505, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25741542

RESUMO

Investigating the association between biobank derived genomic data and the information of linked electronic health records (EHRs) is an emerging area of research for dissecting the architecture of complex human traits, where cases and controls for study are defined through the use of electronic phenotyping algorithms deployed in large EHR systems. For our study, cataract cases and controls were identified within the Marshfield Personalized Medicine Research Project (PMRP) biobank and linked EHR, which is a member of the NHGRI-funded electronic Medical Records and Genomics (eMERGE) Network. Our goal was to explore potential gene-gene and gene-environment interactions within these data for 527,953 and 527,936 single nucleotide polymorphisms (SNPs) for gene-gene and gene-environment analyses, respectively, with minor allele frequency > 1%, in order to explore higher level associations with cataract risk beyond investigations of single SNP-phenotype associations. To build our SNP-SNP interaction models we utilized a prior-knowledge driven filtering method called Biofilter to minimize the multiple testing burden of exploring the vast array of interaction models possible from our extensive number of SNPs. Using Biofilter, we developed 57,376 prior-knowledge directed SNP-SNP models to test for association with cataract status. We selected models that required 6 sources of external domain knowledge. We identified 13 statistically significant SNP-SNP models with an interaction with p-value < 1 × 10(-4), as well as an overall model with p-value < 0.01 associated with cataract status. We also conducted gene-environment interaction analyses for all GWAS SNPs and a set of environmental factors from the PhenX Toolkit: smoking, UV exposure, and alcohol use;these environmental factors have been previously associated with the formation of cataracts. We found a total of 782 gene-environment models that exhibit an interaction with a p-value < 1 × 10(-4) associatedwith cataract status. Our results show these approaches enable advanced searches for epistasis and gene-environment interactions beyond GWAS, and that the EHR based approach provides an additional source of data for seeking these advanced explanatory models of the etiology of complex disease/outcome such as cataracts.


Assuntos
Catarata/genética , Algoritmos , Bancos de Espécimes Biológicos , Estudos de Casos e Controles , Biologia Computacional , Bases de Dados Genéticas , Registros Eletrônicos de Saúde , Epistasia Genética , Interação Gene-Ambiente , Estudo de Associação Genômica Ampla , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único , Software
19.
BioData Min ; 8: 41, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26674805

RESUMO

BACKGROUND: Despite heritability estimates of 40-70 % for obesity, less than 2 % of its variation is explained by Body Mass Index (BMI) associated loci that have been identified so far. Epistasis, or gene-gene interactions are a plausible source to explain portions of the missing heritability of BMI. METHODS: Using genotypic data from 18,686 individuals across five study cohorts - ARIC, CARDIA, FHS, CHS, MESA - we filtered SNPs (Single Nucleotide Polymorphisms) using two parallel approaches. SNPs were filtered either on the strength of their main effects of association with BMI, or on the number of knowledge sources supporting a specific SNP-SNP interaction in the context of BMI. Filtered SNPs were specifically analyzed for interactions that are highly associated with BMI using QMDR (Quantitative Multifactor Dimensionality Reduction). QMDR is a nonparametric, genetic model-free method that detects non-linear interactions associated with a quantitative trait. RESULTS: We identified seven novel, epistatic models with a Bonferroni corrected p-value of association < 0.1. Prior experimental evidence helps explain the plausible biological interactions highlighted within our results and their relationship with obesity. We identified interactions between genes involved in mitochondrial dysfunction (POLG2), cholesterol metabolism (SOAT2), lipid metabolism (CYP11B2), cell adhesion (EZR), cell proliferation (MAP2K5), and insulin resistance (IGF1R). Moreover, we found an 8.8 % increase in the variance in BMI explained by these seven SNP-SNP interactions, beyond what is explained by the main effects of an index FTO SNP and the SNPs within these interactions. We also replicated one of these interactions and 58 proxy SNP-SNP models representing it in an independent dataset from the eMERGE study. CONCLUSION: This study highlights a novel approach for discovering gene-gene interactions by combining methods such as QMDR with traditional statistics.

20.
PLoS One ; 9(8): e103123, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25144566

RESUMO

HIV sensory neuropathy and distal neuropathic pain (DNP) are common, disabling complications associated with combination antiretroviral therapy (cART). We previously associated iron-regulatory genetic polymorphisms with a reduced risk of HIV sensory neuropathy during more neurotoxic types of cART. We here evaluated the impact of polymorphisms in 19 iron-regulatory genes on DNP in 560 HIV-infected subjects from a prospective, observational study, who underwent neurological examinations to ascertain peripheral neuropathy and structured interviews to ascertain DNP. Genotype-DNP associations were explored by logistic regression and permutation-based analytical methods. Among 559 evaluable subjects, 331 (59%) developed HIV-SN, and 168 (30%) reported DNP. Fifteen polymorphisms in 8 genes (p<0.05) and 5 variants in 4 genes (p<0.01) were nominally associated with DNP: polymorphisms in TF, TFRC, BMP6, ACO1, SLC11A2, and FXN conferred reduced risk (adjusted odds ratios [ORs] ranging from 0.2 to 0.7, all p<0.05); other variants in TF, CP, ACO1, BMP6, and B2M conferred increased risk (ORs ranging from 1.3 to 3.1, all p<0.05). Risks associated with some variants were statistically significant either in black or white subgroups but were consistent in direction. ACO1 rs2026739 remained significantly associated with DNP in whites (permutation p<0.0001) after correction for multiple tests. Several of the same iron-regulatory-gene polymorphisms, including ACO1 rs2026739, were also associated with severity of DNP (all p<0.05). Common polymorphisms in iron-management genes are associated with DNP and with DNP severity in HIV-infected persons receiving cART. Consistent risk estimates across population subgroups and persistence of the ACO1 rs2026739 association after adjustment for multiple testing suggest that genetic variation in iron-regulation and transport modulates susceptibility to DNP.


Assuntos
Variação Genética/genética , Infecções por HIV/genética , Infecções por HIV/fisiopatologia , Ferro/metabolismo , Neuralgia/fisiopatologia , Adulto , Idoso , Antirretrovirais/uso terapêutico , Feminino , Genótipo , Infecções por HIV/tratamento farmacológico , Infecções por HIV/metabolismo , Humanos , Proteína 1 Reguladora do Ferro/genética , Desequilíbrio de Ligação/genética , Masculino , Pessoa de Meia-Idade , Análise Multivariada , Neuralgia/genética , Neuralgia/metabolismo , Adulto Jovem
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA