Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Hum Genet ; 141(9): 1515-1528, 2022 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-34862561

RESUMEN

Genetic data have become increasingly complex within the past decade, leading researchers to pursue increasingly complex questions, such as those involving epistatic interactions and protein prediction. Traditional methods are ill-suited to answer these questions, but machine learning (ML) techniques offer an alternative solution. ML algorithms are commonly used in genetics to predict or classify subjects, but some methods evaluate which features (variables) are responsible for creating a good prediction; this is called feature importance. This is critical in genetics, as researchers are often interested in which features (e.g., SNP genotype or environmental exposure) are responsible for a good prediction. This allows for the deeper analysis beyond simple prediction, including the determination of risk factors associated with a given phenotype. Feature importance further permits the researcher to peer inside the black box of many ML algorithms to see how they work and which features are critical in informing a good prediction. This review focuses on ML methods that provide feature importance metrics for the analysis of genetic data. Five major categories of ML algorithms: k nearest neighbors, artificial neural networks, deep learning, support vector machines, and random forests are described. The review ends with a discussion of how to choose the best machine for a data set. This review will be particularly useful for genetic researchers looking to use ML methods to answer questions beyond basic prediction and classification.


Asunto(s)
Aprendizaje Automático , Máquina de Vectores de Soporte , Algoritmos , Humanos , Redes Neurales de la Computación
2.
Nat Rev Genet ; 16(2): 85-97, 2015 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-25582081

RESUMEN

Recent technological advances have expanded the breadth of available omic data, from whole-genome sequencing data, to extensive transcriptomic, methylomic and metabolomic data. A key goal of analyses of these data is the identification of effective models that predict phenotypic traits and outcomes, elucidating important biomarkers and generating important insights into the genetic underpinnings of the heritability of complex traits. There is still a need for powerful and advanced analysis strategies to fully harness the utility of these comprehensive high-throughput data, identifying true associations and reducing the number of false associations. In this Review, we explore the emerging approaches for data integration - including meta-dimensional and multi-staged analyses - which aim to deepen our understanding of the role of genetics and genomics in complex outcomes. With the use and further development of these approaches, an improved understanding of the relationship between genomic variation and human phenotypes may be revealed.


Asunto(s)
Interpretación Estadística de Datos , Variación Genética , Genotipo , Patrón de Herencia/fisiología , Modelos Biológicos , Fenotipo , Biología de Sistemas/métodos , Humanos , Metaanálisis como Asunto
3.
Am J Hum Genet ; 99(4): 877-885, 2016 Oct 06.
Artículo en Inglés | MEDLINE | ID: mdl-27666373

RESUMEN

The vast majority of coding variants are rare, and assessment of the contribution of rare variants to complex traits is hampered by low statistical power and limited functional data. Improved methods for predicting the pathogenicity of rare coding variants are needed to facilitate the discovery of disease variants from exome sequencing studies. We developed REVEL (rare exome variant ensemble learner), an ensemble method for predicting the pathogenicity of missense variants on the basis of individual tools: MutPred, FATHMM, VEST, PolyPhen, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, GERP, SiPhy, phyloP, and phastCons. REVEL was trained with recently discovered pathogenic and rare neutral missense variants, excluding those previously used to train its constituent tools. When applied to two independent test sets, REVEL had the best overall performance (p < 10-12) as compared to any individual tool and seven ensemble methods: MetaSVM, MetaLR, KGGSeq, Condel, CADD, DANN, and Eigen. Importantly, REVEL also had the best performance for distinguishing pathogenic from rare neutral variants with allele frequencies <0.5%. The area under the receiver operating characteristic curve (AUC) for REVEL was 0.046-0.182 higher in an independent test set of 935 recent SwissVar disease variants and 123,935 putatively neutral exome sequencing variants and 0.027-0.143 higher in an independent test set of 1,953 pathogenic and 2,406 benign variants recently reported in ClinVar than the AUCs for other ensemble methods. We provide pre-computed REVEL scores for all possible human missense variants to facilitate the identification of pathogenic variants in the sea of rare variants discovered as sequencing studies expand in scale.


Asunto(s)
Enfermedad/genética , Mutación Missense/genética , Programas Informáticos , Área Bajo la Curva , Análisis Mutacional de ADN , Exoma/genética , Frecuencia de los Genes , Humanos , Curva ROC
4.
BMC Med Genet ; 20(1): 27, 2019 01 31.
Artículo en Inglés | MEDLINE | ID: mdl-30704416

RESUMEN

BACKGROUND: Myopia is one of most common eye diseases in the world and affects 1 in 4 Americans. It is a complex disease caused by both environmental and genetics effects; the genetics effects are still not well understood. In this study, we performed genetic linkage analyses on Ashkenazi Jewish families with a strong familial history of myopia to elucidate any potential causal genes. METHODS: Sixty-four extended Ashkenazi Jewish families were previously collected from New Jersey. Genotypes from the Illumina ExomePlus array were merged with prior microsatellite linkage data from these families. Additional custom markers were added for candidate regions reported in literature for myopia or refractive error. Myopia was defined as mean spherical equivalent (MSE) of -1D or worse and parametric two-point linkage analyses (using TwoPointLods) and multi-point linkage analyses (using SimWalk2) were performed as well as collapsed haplotype pattern (CHP) analysis in SEQLinkage and association analyses performed with FBAT and rv-TDT. RESULTS: Strongest evidence of linkage was on 1p36(two-point LOD = 4.47) a region previously linked to refractive error (MYP14) but not myopia. Another genome-wide significant locus was found on 8q24.22 with a maximum two-point LOD score of 3.75. CHP analysis also detected the signal on 1p36, localized to the LINC00339 gene with a maximum HLOD of 3.47, as well as genome-wide significant signals on 7q36.1 and 11p15, which overlaps with the MYP7 locus. CONCLUSIONS: We identified 2 novel linkage peaks for myopia on chromosomes 7 and 8 in these Ashkenazi Jewish families and replicated 2 more loci on chromosomes 1 and 11, one previously reported in refractive error but not myopia in these families and the other locus previously reported in the literature. Strong candidate genes have been identified within these linkage peaks in our families. Targeted sequencing in these regions will be necessary to definitively identify causal variants under these linkage peaks.


Asunto(s)
Cromosomas Humanos/genética , Técnicas de Genotipaje/métodos , Judíos/genética , Miopía/genética , Cromosomas Humanos Par 1/genética , Cromosomas Humanos Par 11/genética , Cromosomas Humanos Par 7/genética , Cromosomas Humanos Par 8/genética , Exoma , Femenino , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Escala de Lod , Masculino , Miopía/etnología , Linaje , ARN Largo no Codificante/genética
5.
Hum Genet ; 136(2): 165-178, 2017 02.
Artículo en Inglés | MEDLINE | ID: mdl-27848076

RESUMEN

Genetic loci explain only 25-30 % of the heritability observed in plasma lipid traits. Epistasis, or gene-gene interactions may contribute to a portion of this missing heritability. Using the genetic data from five NHLBI cohorts of 24,837 individuals, we combined the use of the quantitative multifactor dimensionality reduction (QMDR) algorithm with two SNP-filtering methods to exhaustively search for SNP-SNP interactions that are associated with HDL cholesterol (HDL-C), LDL cholesterol (LDL-C), total cholesterol (TC) and triglycerides (TG). SNPs were filtered either on the strength of their independent effects (main effect filter) or the prior knowledge supporting a given interaction (Biofilter). After the main effect filter, QMDR identified 20 SNP-SNP models associated with HDL-C, 6 associated with LDL-C, 3 associated with TC, and 10 associated with TG (permutation P value <0.05). With the use of Biofilter, we identified 2 SNP-SNP models associated with HDL-C, 3 associated with LDL-C, 1 associated with TC and 8 associated with TG (permutation P value <0.05). In an independent dataset of 7502 individuals from the eMERGE network, we replicated 14 of the interactions identified after main effect filtering: 11 for HDL-C, 1 for LDL-C and 2 for TG. We also replicated 23 of the interactions found to be associated with TG after applying Biofilter. Prior knowledge supports the possible role of these interactions in the genetic etiology of lipid traits. This study also presents a computationally efficient pipeline for analyzing data from large genotyping arrays and detecting SNP-SNP interactions that are not primarily driven by strong main effects.


Asunto(s)
Enfermedades Cardiovasculares/genética , HDL-Colesterol/sangre , LDL-Colesterol/sangre , Epistasis Genética , Fenotipo , Triglicéridos/sangre , Índice de Masa Corporal , Enfermedades Cardiovasculares/sangre , Estudios de Cohortes , Femenino , Sitios Genéticos , Marcadores Genéticos , Genoma Humano , Técnicas de Genotipaje , Humanos , Modelos Lineales , Desequilibrio de Ligamiento , Masculino , Reducción de Dimensionalidad Multifactorial , Polimorfismo de Nucleótido Simple
6.
BMC Genet ; 17 Suppl 2: 1, 2016 Feb 03.
Artículo en Inglés | MEDLINE | ID: mdl-26866367

RESUMEN

In the analysis of current genomic data, application of machine learning and data mining techniques has become more attractive given the rising complexity of the projects. As part of the Genetic Analysis Workshop 19, approaches from this domain were explored, mostly motivated from two starting points. First, assuming an underlying structure in the genomic data, data mining might identify this and thus improve downstream association analyses. Second, computational methods for machine learning need to be developed further to efficiently deal with the current wealth of data.In the course of discussing results and experiences from the machine learning and data mining approaches, six common messages were extracted. These depict the current state of these approaches in the application to complex genomic data. Although some challenges remain for future studies, important forward steps were taken in the integration of different data types and the evaluation of the evidence. Mining the data for underlying genetic or phenotypic structure and using this information in subsequent analyses proved to be extremely helpful and is likely to become of even greater use with more complex data sets.


Asunto(s)
Minería de Datos/métodos , Genómica/métodos , Biología Computacional/métodos , Pruebas Genéticas , Humanos , Aprendizaje Automático
7.
Bioinformatics ; 30(5): 698-705, 2014 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-24149050

RESUMEN

MOTIVATION: Advancements in high-throughput technology have allowed researchers to examine the genetic etiology of complex human traits in a robust fashion. Although genome-wide association studies have identified many novel variants associated with hundreds of traits, a large proportion of the estimated trait heritability remains unexplained. One hypothesis is that the commonly used statistical techniques and study designs are not robust to the complex etiology that may underlie these human traits. This etiology could include non-linear gene × gene or gene × environment interactions. Additionally, other levels of biological regulation may play a large role in trait variability. RESULTS: To address the need for computational tools that can explore enormous datasets to detect complex susceptibility models, we have developed a software package called the Analysis Tool for Heritable and Environmental Network Associations (ATHENA). ATHENA combines various variable filtering methods with machine learning techniques to analyze high-throughput categorical (i.e. single nucleotide polymorphisms) and quantitative (i.e. gene expression levels) predictor variables to generate multivariable models that predict either a categorical (i.e. disease status) or quantitative (i.e. cholesterol levels) outcomes. The goal of this article is to demonstrate the utility of ATHENA using simulated and biological datasets that consist of both single nucleotide polymorphisms and gene expression variables to identify complex prediction models. Importantly, this method is flexible and can be expanded to include other types of high-throughput data (i.e. RNA-seq data and biomarker measurements). AVAILABILITY: ATHENA is freely available for download. The software, user manual and tutorial can be downloaded from http://ritchielab.psu.edu/ritchielab/software.


Asunto(s)
Interacción Gen-Ambiente , Estudio de Asociación del Genoma Completo , Programas Informáticos , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple
8.
Artículo en Inglés | MEDLINE | ID: mdl-38608311

RESUMEN

Open Targets, a consortium among academic and industry partners, focuses on using human genetics and genomics to provide insights to key questions that build therapeutic hypotheses. Large-scale experiments generate foundational data, and open-source informatic platforms systematically integrate evidence for target-disease relationships and provide dynamic tooling for target prioritization. A locus-to-gene machine learning model uses evidence from genome-wide association studies (GWAS Catalog, UK BioBank, and FinnGen), functional genomic studies, epigenetic studies, and variant effect prediction to predict potential drug targets for complex diseases. These predictions are combined with genetic evidence from gene burden analyses, rare disease genetics, somatic mutations, perturbation assays, pathway analyses, scientific literature, differential expression, and mouse models to systematically build target-disease associations (https://platform.opentargets.org). Scored target attributes such as clinical precedence, tractability, and safety guide target prioritization. Here we provide our perspective on the value and impact of human genetics and genomics for generating therapeutic hypotheses.

9.
Mol Genet Genomic Med ; 11(8): e2179, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-37070724

RESUMEN

BACKGROUND: Oral clefts and ectrodactyly are common, heterogeneous birth defects. We performed whole-exome sequencing (WES) analysis in a Syrian family. The proband presented with both orofacial clefting and ectrodactyly but not ectodermal dysplasia as typically seen in ectrodactyly, ectodermal dysplasia, and cleft lip/palate syndrome-3. A paternal uncle with only an oral cleft was deceased and unavailable for analysis. METHODS: Variant annotation, Mendelian inconsistencies, and novel variants in known cleft genes were examined. Candidate variants were validated using Sanger sequencing, and pathogenicity assessed by knocking out the tp63 gene in zebrafish to evaluate its role during zebrafish development. RESULTS: Twenty-eight candidate de novo events were identified, one of which is in a known oral cleft and ectrodactyly gene, TP63 (c.956G > T, p.Arg319Leu), and confirmed by Sanger sequencing. CONCLUSION: TP63 mutations are associated with multiple autosomal dominant orofacial clefting and limb malformation disorders. The p.Arg319Leu mutation seen in this patient is de novo but also novel. Two known mutations in the same codon (c.956G > A, p.(Arg319His; rs121908839, c.955C > T), p.Arg319Cys) cause ectrodactyly, providing evidence that mutating this codon is deleterious. While this TP63 mutation is the best candidate for the patient's clinical presentation, whether it is responsible for the entire phenotype is unclear. Generation and characterization of tp63 knockout zebrafish showed necrosis and rupture of the head at 3 days post-fertilization (dpf). The embryonic phenotype could not be rescued by injection of zebrafish or human messenger RNA (mRNA). Further functional analysis is needed to determine what proportion of the phenotype is due to this mutation.


Asunto(s)
Labio Leporino , Fisura del Paladar , Humanos , Animales , Labio Leporino/genética , Fisura del Paladar/genética , Pez Cebra/genética , Secuenciación del Exoma , Siria , Mutación , Factores de Transcripción/genética , Proteínas Supresoras de Tumor/genética
10.
Pharmacogenet Genomics ; 22(12): 858-67, 2012 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-23080225

RESUMEN

OBJECTIVES: Prior candidate gene studies have associated CYP2B6 516G→T [rs3745274] and 983T→C [rs28399499] with increased plasma efavirenz exposure. We sought to identify novel variants associated with efavirenz pharmacokinetics. MATERIALS AND METHODS: Antiretroviral therapy-naive AIDS Clinical Trials Group studies A5202, A5095, and ACTG 384 included plasma sampling for efavirenz pharmacokinetics. Log-transformed trough efavirenz concentrations (Cmin) were previously estimated by population pharmacokinetic modeling. Stored DNA was genotyped with Illumina HumanHap 650Y or 1MDuo platforms, complemented by additional targeted genotyping of CYP2B6 and CYP2A6 with MassARRAY iPLEX Gold. Associations were identified by linear regression, which included principal component vectors to adjust for genetic ancestry. RESULTS: Among 856 individuals, CYP2B6 516G→T was associated with efavirenz estimated Cmin (P=8.5×10). After adjusting for CYP2B6 516G→T, CYP2B6 983T→C was associated (P=9.9×10). After adjusting for both CYP2B6 516G→T and 983T→C, a CYP2B6 variant (rs4803419) in intron 3 was associated (P=4.4×10). After adjusting for all the three variants, non-CYP2B6 polymorphisms were associated at P-value less than 5×10. In a separate cohort of 240 individuals, only the three CYP2B6 polymorphisms replicated. These three polymorphisms explained 34% of interindividual variability in efavirenz estimated Cmin. The extensive metabolizer phenotype was best defined by the absence of all three polymorphisms. CONCLUSION: Three CYP2B6 polymorphisms were independently associated with efavirenz estimated Cmin at genome-wide significance, and explained one-third of interindividual variability. These data will inform continued efforts to translate pharmacogenomic knowledge into optimal efavirenz utilization.


Asunto(s)
Fármacos Anti-VIH/farmacocinética , Hidrocarburo de Aril Hidroxilasas/genética , Benzoxazinas/farmacocinética , Variación Genética , Oxidorreductasas N-Desmetilantes/genética , Alquinos , Fármacos Anti-VIH/uso terapéutico , Benzoxazinas/sangre , Protocolos Clínicos , Ciclopropanos , Citocromo P-450 CYP2B6 , Estudios de Asociación Genética , Genoma Humano , Estudio de Asociación del Genoma Completo , Infecciones por VIH/tratamiento farmacológico , Infecciones por VIH/genética , Humanos , Polimorfismo Genético
11.
J Neurovirol ; 18(6): 511-20, 2012 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-23073667

RESUMEN

HIV-associated sensory neuropathy remains an important complication of combination antiretroviral therapy and HIV infection. Mitochondrial DNA haplogroups and single nucleotide polymorphisms (SNPs) have previously been associated with symptomatic neuropathy in clinical trial participants. We examined associations between mitochondrial DNA variation and HIV-associated sensory neuropathy in CNS HIV Antiretroviral Therapy Effects Research (CHARTER). CHARTER is a USA-based longitudinal observational study of HIV-infected adults who underwent a structured interview and standardized examination. HIV-associated sensory neuropathy was determined by trained examiners as ≥1 sign (diminished vibratory and sharp-dull discrimination or ankle reflexes) bilaterally. Mitochondrial DNA sequencing was performed and haplogroups were assigned by published algorithms. Multivariable logistic regression of associations between mitochondrial DNA SNPs, haplogroups, and HIV-associated sensory neuropathy were performed. In analyses of associations of each mitochondrial DNA SNP with HIV-associated sensory neuropathy, the two most significant SNPs were at positions A12810G [odds ratio (95 % confidence interval) = 0.27 (0.11-0.65); p = 0.004] and T489C [odds ratio (95 % confidence interval) = 0.41 (0.21-0.80); p = 0.009]. These synonymous changes are known to define African haplogroup L1c and European haplogroup J, respectively. Both haplogroups were associated with decreased prevalence of HIV-associated sensory neuropathy compared with all other haplogroups [odds ratio (95 % confidence interval) = 0.29 (0.12-0.71); p = 0.007 and odds ratio (95 % confidence interval) = 0.42 (0.18-1.0); p = 0.05, respectively]. In conclusion, in this cohort of mostly combination antiretroviral therapy-treated subjects, two common mitochondrial DNA SNPs and their corresponding haplogroups were associated with a markedly decreased prevalence of HIV-associated sensory neuropathy.


Asunto(s)
ADN Mitocondrial/genética , Infecciones por VIH/genética , VIH-1 , Mitocondrias/genética , Polimorfismo de Nucleótido Simple , Polineuropatías/genética , Adulto , Población Negra , Femenino , Infecciones por VIH/complicaciones , Infecciones por VIH/tratamiento farmacológico , Infecciones por VIH/patología , Haplotipos , Humanos , Masculino , Persona de Mediana Edad , Polineuropatías/tratamiento farmacológico , Polineuropatías/etiología , Polineuropatías/patología , Estudios Prospectivos , Población Blanca
12.
mSphere ; 5(3)2020 05 06.
Artículo en Inglés | MEDLINE | ID: mdl-32376702

RESUMEN

Bezlotoxumab is a human monoclonal antibody against Clostridium difficile toxin B, indicated to prevent recurrence of C. difficile infection (rCDI) in high-risk adults receiving antibacterial treatment for CDI. An exploratory genome-wide association study investigated whether human genetic variation influences bezlotoxumab response. DNA from 704 participants who achieved initial clinical cure in the phase 3 MODIFY I/II trials was genotyped. Single nucleotide polymorphisms (SNPs) and human leukocyte antigen (HLA) imputation were performed using IMPUTE2 and HIBAG, respectively. A joint test of genotype and genotype-by-treatment interaction in a logistic regression model was used to screen genetic variants associated with response to bezlotoxumab. The SNP rs2516513 and the HLA alleles HLA-DRB1*07:01 and HLA-DQA1*02:01, located in the extended major histocompatibility complex on chromosome 6, were associated with the reduction of rCDI in bezlotoxumab-treated participants. Carriage of a minor allele (homozygous or heterozygous) at any of the identified loci was related to a larger difference in the proportion of participants experiencing rCDI versus placebo; the effect was most prominent in the subgroup at high baseline risk for rCDI. Genotypes associated with an improved bezlotoxumab response showed no association with rCDI in the placebo cohort. These data suggest that a host-driven, immunological mechanism may impact bezlotoxumab response. Trial registration numbers are as follows: NCT01241552 (MODIFY I) and NCT01513239 (MODIFY II).IMPORTANCEClostridium difficile infection is associated with significant clinical morbidity and mortality; antibacterial treatments are effective, but recurrence of C. difficile infection is common. In this genome-wide association study, we explored whether host genetic variability affected treatment responses to bezlotoxumab, a human monoclonal antibody that binds C. difficile toxin B and is indicated for the prevention of recurrent C. difficile infection. Using data from the MODIFY I/II phase 3 clinical trials, we identified three genetic variants associated with reduced rates of C. difficile infection recurrence in bezlotoxumab-treated participants. The effects were most pronounced in participants at high risk of C. difficile infection recurrence. All three variants are located in the extended major histocompatibility complex on chromosome 6, suggesting the involvement of a host-driven immunological mechanism in the prevention of C. difficile infection recurrence.


Asunto(s)
Anticuerpos Monoclonales/uso terapéutico , Anticuerpos ampliamente neutralizantes/uso terapéutico , Clostridioides difficile/efectos de los fármacos , Infecciones por Clostridium/tratamiento farmacológico , Infecciones por Clostridium/genética , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Alelos , Anticuerpos Neutralizantes/sangre , Femenino , Estudio de Asociación del Genoma Completo , Genotipo , Antígenos HLA-D/genética , Humanos , Masculino , Persona de Mediana Edad , Polimorfismo de Nucleótido Simple , Recurrencia , Adulto Joven
13.
Mol Genet Genomic Med ; 5(5): 570-579, 2017 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-28944239

RESUMEN

BACKGROUND: Nonsyndromic oral clefts are craniofacial malformations, which include cleft lip with or without cleft palate. The etiology for oral clefts is complex with both genetic and environmental factors contributing to risk. Previous genome-wide association (GWAS) studies have identified multiple loci with small effects; however, many causal variants remain elusive. METHODS: In this study, we address this by specifically looking for rare, potentially damaging variants in family-based data. We analyzed both whole exome sequence (WES) data and whole genome sequence (WGS) data in multiplex cleft families to identify variants shared by affected individuals. RESULTS: Here we present the results from these analyses. Our most interesting finding was from a single Syrian family, which showed enrichment of nonsynonymous and potentially damaging rare variants in two genes: CASP9 and FAT4. CONCLUSION: Neither of these candidate genes has previously been associated with oral clefts and, if confirmed as contributing to disease risk, may indicate novel biological pathways in the genetic etiology for oral clefts.

14.
BioData Min ; 10: 25, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28770004

RESUMEN

BACKGROUND: The genetic etiology of human lipid quantitative traits is not fully elucidated, and interactions between variants may play a role. We performed a gene-centric interaction study for four different lipid traits: low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), total cholesterol (TC), and triglycerides (TG). RESULTS: Our analysis consisted of a discovery phase using a merged dataset of five different cohorts (n = 12,853 to n = 16,849 depending on lipid phenotype) and a replication phase with ten independent cohorts totaling up to 36,938 additional samples. Filters are often applied before interaction testing to correct for the burden of testing all pairwise interactions. We used two different filters: 1. A filter that tested only single nucleotide polymorphisms (SNPs) with a main effect of p < 0.001 in a previous association study. 2. A filter that only tested interactions identified by Biofilter 2.0. Pairwise models that reached an interaction significance level of p < 0.001 in the discovery dataset were tested for replication. We identified thirteen SNP-SNP models that were significant in more than one replication cohort after accounting for multiple testing. CONCLUSIONS: These results may reveal novel insights into the genetic etiology of lipid levels. Furthermore, we developed a pipeline to perform a computationally efficient interaction analysis with multi-cohort replication.

15.
BMC Proc ; 10(Suppl 7): 147-152, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27980627

RESUMEN

Current findings from genetic studies of complex human traits often do not explain a large proportion of the estimated variation of these traits due to genetic factors. This could be, in part, due to overly stringent significance thresholds in traditional statistical methods, such as linear and logistic regression. Machine learning methods, such as Random Forests (RF), are an alternative approach to identify potentially interesting variants. One major issue with these methods is that there is no clear way to distinguish between probable true hits and noise variables based on the importance metric calculated. To this end, we are developing a method called the Relative Recurrency Variable Importance Metric (r2VIM), a RF-based variable selection method. Here, we apply r2VIM to the unrelated Genetic Analysis Workshop 19 data with simulated systolic blood pressure as the phenotype. We compare the number of "true" functional variants identified by r2VIM with those identified by linear regression analyses that use a Bonferroni correction to calculate a significance threshold. Our results show that r2VIM performed comparably to linear regression. Our findings are proof-of-concept for r2VIM, as it identifies a similar number of functional and nonfunctional variants as a more commonly used technique when the optimal importance score threshold is used.

16.
BioData Min ; 9: 7, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-26839594

RESUMEN

BACKGROUND: Machine learning methods and in particular random forests (RFs) are a promising alternative to standard single SNP analyses in genome-wide association studies (GWAS). RFs provide variable importance measures (VIMs) to rank SNPs according to their predictive power. However, in contrast to the established genome-wide significance threshold, no clear criteria exist to determine how many SNPs should be selected for downstream analyses. RESULTS: We propose a new variable selection approach, recurrent relative variable importance measure (r2VIM). Importance values are calculated relative to an observed minimal importance score for several runs of RF and only SNPs with large relative VIMs in all of the runs are selected as important. Evaluations on simulated GWAS data show that the new method controls the number of false-positives under the null hypothesis. Under a simple alternative hypothesis with several independent main effects it is only slightly less powerful than logistic regression. In an experimental GWAS data set, the same strong signal is identified while the approach selects none of the SNPs in an underpowered GWAS. CONCLUSIONS: The novel variable selection method r2VIM is a promising extension to standard RF for objectively selecting relevant SNPs in GWAS while controlling the number of false-positive results.

17.
Pac Symp Biocomput ; : 195-206, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25592581

RESUMEN

Standard analysis methods for genome wide association studies (GWAS) are not robust to complex disease models, such as interactions between variables with small main effects. These types of effects likely contribute to the heritability of complex human traits. Machine learning methods that are capable of identifying interactions, such as Random Forests (RF), are an alternative analysis approach. One caveat to RF is that there is no standardized method of selecting variables so that false positives are reduced while retaining adequate power. To this end, we have developed a novel variable selection method called relative recurrency variable importance metric (r2VIM). This method incorporates recurrency and variance estimation to assist in optimal threshold selection. For this study, we specifically address how this method performs in data with almost completely epistatic effects (i.e. no marginal effects). Our results show that with appropriate parameter settings, r2VIM can identify interaction effects when the marginal effects are virtually nonexistent. It also outperforms logistic regression, which has essentially no power under this type of model when the number of potential features (genetic variants) is large. (All Supplementary Data can be found here: http://research.nhgri.nih.gov/manuscripts/Bailey-Wilson/r2VIM_epi/).


Asunto(s)
Epistasis Genética , Modelos Genéticos , Algoritmos , Biología Computacional , Simulación por Computador , Bases de Datos Genéticas , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Humanos , Desequilibrio de Ligamiento , Modelos Logísticos , Aprendizaje Automático , Polimorfismo de Nucleótido Simple , Relación Señal-Ruido
18.
Pac Symp Biocomput ; : 495-505, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25741542

RESUMEN

Investigating the association between biobank derived genomic data and the information of linked electronic health records (EHRs) is an emerging area of research for dissecting the architecture of complex human traits, where cases and controls for study are defined through the use of electronic phenotyping algorithms deployed in large EHR systems. For our study, cataract cases and controls were identified within the Marshfield Personalized Medicine Research Project (PMRP) biobank and linked EHR, which is a member of the NHGRI-funded electronic Medical Records and Genomics (eMERGE) Network. Our goal was to explore potential gene-gene and gene-environment interactions within these data for 527,953 and 527,936 single nucleotide polymorphisms (SNPs) for gene-gene and gene-environment analyses, respectively, with minor allele frequency > 1%, in order to explore higher level associations with cataract risk beyond investigations of single SNP-phenotype associations. To build our SNP-SNP interaction models we utilized a prior-knowledge driven filtering method called Biofilter to minimize the multiple testing burden of exploring the vast array of interaction models possible from our extensive number of SNPs. Using Biofilter, we developed 57,376 prior-knowledge directed SNP-SNP models to test for association with cataract status. We selected models that required 6 sources of external domain knowledge. We identified 13 statistically significant SNP-SNP models with an interaction with p-value < 1 × 10(-4), as well as an overall model with p-value < 0.01 associated with cataract status. We also conducted gene-environment interaction analyses for all GWAS SNPs and a set of environmental factors from the PhenX Toolkit: smoking, UV exposure, and alcohol use;these environmental factors have been previously associated with the formation of cataracts. We found a total of 782 gene-environment models that exhibit an interaction with a p-value < 1 × 10(-4) associatedwith cataract status. Our results show these approaches enable advanced searches for epistasis and gene-environment interactions beyond GWAS, and that the EHR based approach provides an additional source of data for seeking these advanced explanatory models of the etiology of complex disease/outcome such as cataracts.


Asunto(s)
Catarata/genética , Algoritmos , Bancos de Muestras Biológicas , Estudios de Casos y Controles , Biología Computacional , Bases de Datos Genéticas , Registros Electrónicos de Salud , Epistasis Genética , Interacción Gen-Ambiente , Estudio de Asociación del Genoma Completo , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple , Programas Informáticos
19.
BioData Min ; 8: 41, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26674805

RESUMEN

BACKGROUND: Despite heritability estimates of 40-70 % for obesity, less than 2 % of its variation is explained by Body Mass Index (BMI) associated loci that have been identified so far. Epistasis, or gene-gene interactions are a plausible source to explain portions of the missing heritability of BMI. METHODS: Using genotypic data from 18,686 individuals across five study cohorts - ARIC, CARDIA, FHS, CHS, MESA - we filtered SNPs (Single Nucleotide Polymorphisms) using two parallel approaches. SNPs were filtered either on the strength of their main effects of association with BMI, or on the number of knowledge sources supporting a specific SNP-SNP interaction in the context of BMI. Filtered SNPs were specifically analyzed for interactions that are highly associated with BMI using QMDR (Quantitative Multifactor Dimensionality Reduction). QMDR is a nonparametric, genetic model-free method that detects non-linear interactions associated with a quantitative trait. RESULTS: We identified seven novel, epistatic models with a Bonferroni corrected p-value of association < 0.1. Prior experimental evidence helps explain the plausible biological interactions highlighted within our results and their relationship with obesity. We identified interactions between genes involved in mitochondrial dysfunction (POLG2), cholesterol metabolism (SOAT2), lipid metabolism (CYP11B2), cell adhesion (EZR), cell proliferation (MAP2K5), and insulin resistance (IGF1R). Moreover, we found an 8.8 % increase in the variance in BMI explained by these seven SNP-SNP interactions, beyond what is explained by the main effects of an index FTO SNP and the SNPs within these interactions. We also replicated one of these interactions and 58 proxy SNP-SNP models representing it in an independent dataset from the eMERGE study. CONCLUSION: This study highlights a novel approach for discovering gene-gene interactions by combining methods such as QMDR with traditional statistics.

20.
PLoS One ; 9(8): e103123, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25144566

RESUMEN

HIV sensory neuropathy and distal neuropathic pain (DNP) are common, disabling complications associated with combination antiretroviral therapy (cART). We previously associated iron-regulatory genetic polymorphisms with a reduced risk of HIV sensory neuropathy during more neurotoxic types of cART. We here evaluated the impact of polymorphisms in 19 iron-regulatory genes on DNP in 560 HIV-infected subjects from a prospective, observational study, who underwent neurological examinations to ascertain peripheral neuropathy and structured interviews to ascertain DNP. Genotype-DNP associations were explored by logistic regression and permutation-based analytical methods. Among 559 evaluable subjects, 331 (59%) developed HIV-SN, and 168 (30%) reported DNP. Fifteen polymorphisms in 8 genes (p<0.05) and 5 variants in 4 genes (p<0.01) were nominally associated with DNP: polymorphisms in TF, TFRC, BMP6, ACO1, SLC11A2, and FXN conferred reduced risk (adjusted odds ratios [ORs] ranging from 0.2 to 0.7, all p<0.05); other variants in TF, CP, ACO1, BMP6, and B2M conferred increased risk (ORs ranging from 1.3 to 3.1, all p<0.05). Risks associated with some variants were statistically significant either in black or white subgroups but were consistent in direction. ACO1 rs2026739 remained significantly associated with DNP in whites (permutation p<0.0001) after correction for multiple tests. Several of the same iron-regulatory-gene polymorphisms, including ACO1 rs2026739, were also associated with severity of DNP (all p<0.05). Common polymorphisms in iron-management genes are associated with DNP and with DNP severity in HIV-infected persons receiving cART. Consistent risk estimates across population subgroups and persistence of the ACO1 rs2026739 association after adjustment for multiple testing suggest that genetic variation in iron-regulation and transport modulates susceptibility to DNP.


Asunto(s)
Variación Genética/genética , Infecciones por VIH/genética , Infecciones por VIH/fisiopatología , Hierro/metabolismo , Neuralgia/fisiopatología , Adulto , Anciano , Antirretrovirales/uso terapéutico , Femenino , Genotipo , Infecciones por VIH/tratamiento farmacológico , Infecciones por VIH/metabolismo , Humanos , Proteína 1 Reguladora de Hierro/genética , Desequilibrio de Ligamiento/genética , Masculino , Persona de Mediana Edad , Análisis Multivariante , Neuralgia/genética , Neuralgia/metabolismo , Adulto Joven
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA