Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 37
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Am J Hum Genet ; 110(4): 575-591, 2023 04 06.
Artículo en Inglés | MEDLINE | ID: mdl-37028392

RESUMEN

Leveraging linkage disequilibrium (LD) patterns as representative of population substructure enables the discovery of additive association signals in genome-wide association studies (GWASs). Standard GWASs are well-powered to interrogate additive models; however, new approaches are required for invesigating other modes of inheritance such as dominance and epistasis. Epistasis, or non-additive interaction between genes, exists across the genome but often goes undetected because of a lack of statistical power. Furthermore, the adoption of LD pruning as customary in standard GWASs excludes detection of sites that are in LD but might underlie the genetic architecture of complex traits. We hypothesize that uncovering long-range interactions between loci with strong LD due to epistatic selection can elucidate genetic mechanisms underlying common diseases. To investigate this hypothesis, we tested for associations between 23 common diseases and 5,625,845 epistatic SNP-SNP pairs (determined by Ohta's D statistics) in long-range LD (>0.25 cM). Across five disease phenotypes, we identified one significant and four near-significant associations that replicated in two large genotype-phenotype datasets (UK Biobank and eMERGE). The genes that were most likely involved in the replicated associations were (1) members of highly conserved gene families with complex roles in multiple pathways, (2) essential genes, and/or (3) genes that were associated in the literature with complex traits that display variable expressivity. These results support the highly pleiotropic and conserved nature of variants in long-range LD under epistatic selection. Our work supports the hypothesis that epistatic interactions regulate diverse clinical mechanisms and might especially be driving factors in conditions with a wide range of phenotypic outcomes.


Asunto(s)
Epistasis Genética , Estudio de Asociación del Genoma Completo , Desequilibrio de Ligamiento/genética , Genotipo , Bancos de Muestras Biológicas , Reino Unido , Polimorfismo de Nucleótido Simple/genética
2.
Pharmacogenomics J ; 19(2): 178-190, 2019 04.
Artículo en Inglés | MEDLINE | ID: mdl-29795408

RESUMEN

Identifying genetic variants associated with chemotherapeutic induced toxicity is an important step towards personalized treatment of cancer patients. However, annotating and interpreting the associated genetic variants remains challenging because each associated variant is a surrogate for many other variants in the same region. The issue is further complicated when investigating patterns of associated variants with multiple drugs. In this study, we used biological knowledge to annotate and compare genetic variants associated with cellular sensitivity to mechanistically distinct chemotherapeutic drugs, including platinating agents (cisplatin, carboplatin), capecitabine, cytarabine, and paclitaxel. The most significantly associated SNPs from genome wide association studies of cellular sensitivity to each drug in lymphoblastoid cell lines derived from populations of European (CEU) and African (YRI) descent were analyzed for their enrichment in biological pathways and processes. We annotated genetic variants using higher-level biological annotations in efforts to group variants into more interpretable biological modules. Using the higher-level annotations, we observed distinct biological modules associated with cell line populations as well as classes of chemotherapeutic drugs. We also integrated genetic variants and gene expression variables to build predictive models for chemotherapeutic drug cytotoxicity and prioritized the network models based on the enrichment of DNA regulatory data. Several biological annotations, often encompassing different SNPs, were replicated in independent datasets. By using biological knowledge and DNA regulatory information, we propose a novel approach for jointly analyzing genetic variants associated with multiple chemotherapeutic drugs.


Asunto(s)
Variación Genética/genética , Estudio de Asociación del Genoma Completo/métodos , Neoplasias/tratamiento farmacológico , Farmacogenética/métodos , Población Negra/genética , Capecitabina/efectos adversos , Capecitabina/uso terapéutico , Carboplatino/efectos adversos , Carboplatino/uso terapéutico , Línea Celular , Cisplatino/efectos adversos , Cisplatino/uso terapéutico , Regulación Neoplásica de la Expresión Génica/efectos de los fármacos , Genoma Humano/genética , Humanos , Anotación de Secuencia Molecular , Neoplasias/genética , Paclitaxel/efectos adversos , Paclitaxel/uso terapéutico , Polimorfismo de Nucleótido Simple/genética , Población Blanca/genética
3.
Bioinformatics ; 30(5): 698-705, 2014 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-24149050

RESUMEN

MOTIVATION: Advancements in high-throughput technology have allowed researchers to examine the genetic etiology of complex human traits in a robust fashion. Although genome-wide association studies have identified many novel variants associated with hundreds of traits, a large proportion of the estimated trait heritability remains unexplained. One hypothesis is that the commonly used statistical techniques and study designs are not robust to the complex etiology that may underlie these human traits. This etiology could include non-linear gene × gene or gene × environment interactions. Additionally, other levels of biological regulation may play a large role in trait variability. RESULTS: To address the need for computational tools that can explore enormous datasets to detect complex susceptibility models, we have developed a software package called the Analysis Tool for Heritable and Environmental Network Associations (ATHENA). ATHENA combines various variable filtering methods with machine learning techniques to analyze high-throughput categorical (i.e. single nucleotide polymorphisms) and quantitative (i.e. gene expression levels) predictor variables to generate multivariable models that predict either a categorical (i.e. disease status) or quantitative (i.e. cholesterol levels) outcomes. The goal of this article is to demonstrate the utility of ATHENA using simulated and biological datasets that consist of both single nucleotide polymorphisms and gene expression variables to identify complex prediction models. Importantly, this method is flexible and can be expanded to include other types of high-throughput data (i.e. RNA-seq data and biomarker measurements). AVAILABILITY: ATHENA is freely available for download. The software, user manual and tutorial can be downloaded from http://ritchielab.psu.edu/ritchielab/software.


Asunto(s)
Interacción Gen-Ambiente , Estudio de Asociación del Genoma Completo , Programas Informáticos , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple
4.
J Biomed Inform ; 56: 220-8, 2015 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-26048077

RESUMEN

Evaluation of survival models to predict cancer patient prognosis is one of the most important areas of emphasis in cancer research. A binary classification approach has difficulty directly predicting survival due to the characteristics of censored observations and the fact that the predictive power depends on the threshold used to set two classes. In contrast, the traditional Cox regression approach has some drawbacks in the sense that it does not allow for the identification of interactions between genomic features, which could have key roles associated with cancer prognosis. In addition, data integration is regarded as one of the important issues in improving the predictive power of survival models since cancer could be caused by multiple alterations through meta-dimensional genomic data including genome, epigenome, transcriptome, and proteome. Here we have proposed a new integrative framework designed to perform these three functions simultaneously: (1) predicting censored survival data; (2) integrating meta-dimensional omics data; (3) identifying interactions within/between meta-dimensional genomic features associated with survival. In order to predict censored survival time, martingale residuals were calculated as a new continuous outcome and a new fitness function used by the grammatical evolution neural network (GENN) based on mean absolute difference of martingale residuals was implemented. To test the utility of the proposed framework, a simulation study was conducted, followed by an analysis of meta-dimensional omics data including copy number, gene expression, DNA methylation, and protein expression data in breast cancer retrieved from The Cancer Genome Atlas (TCGA). On the basis of the results from breast cancer dataset, we were able to identify interactions not only within a single dimension of genomic data but also between meta-dimensional omics data that are associated with survival. Notably, the predictive power of our best meta-dimensional model was 73% which outperformed all of the other models conducted based on a single dimension of genomic data. Breast cancer is an extremely heterogeneous disease and the high levels of genomic diversity within/between breast tumors could affect the risk of therapeutic responses and disease progression. Thus, identifying interactions within/between meta-dimensional omics data associated with survival in breast cancer is expected to deliver direction for improved meta-dimensional prognostic biomarkers and therapeutic targets.


Asunto(s)
Neoplasias de la Mama/mortalidad , Recolección de Datos , Informática Médica/métodos , Análisis de Supervivencia , Algoritmos , Neoplasias de la Mama/genética , Neoplasias de la Mama/metabolismo , Biología Computacional/métodos , Simulación por Computador , Metilación de ADN , Progresión de la Enfermedad , Epigenómica , Femenino , Perfilación de la Expresión Génica , Genoma Humano , Genómica , Humanos , Modelos Estadísticos , Redes Neurales de la Computación , Pronóstico , Modelos de Riesgos Proporcionales , Proteoma , Programas Informáticos , Transcriptoma , Resultado del Tratamiento
5.
J Pers Med ; 12(12)2022 Nov 29.
Artículo en Inglés | MEDLINE | ID: mdl-36556195

RESUMEN

The Penn Medicine BioBank (PMBB) is an electronic health record (EHR)-linked biobank at the University of Pennsylvania (Penn Medicine). A large variety of health-related information, ranging from diagnosis codes to laboratory measurements, imaging data and lifestyle information, is integrated with genomic and biomarker data in the PMBB to facilitate discoveries and translational science. To date, 174,712 participants have been enrolled into the PMBB, including approximately 30% of participants of non-European ancestry, making it one of the most diverse medical biobanks. There is a median of seven years of longitudinal data in the EHR available on participants, who also consent to permission to recontact. Herein, we describe the operations and infrastructure of the PMBB, summarize the phenotypic architecture of the enrolled participants, and use body mass index (BMI) as a proof-of-concept quantitative phenotype for PheWAS, LabWAS, and GWAS. The major representation of African-American participants in the PMBB addresses the essential need to expand the diversity in genetic and translational research. There is a critical need for a "medical biobank consortium" to facilitate replication, increase power for rare phenotypes and variants, and promote harmonized collaboration to optimize the potential for biological discovery and precision medicine.

6.
Ann Hum Genet ; 75(1): 78-89, 2011 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-21158747

RESUMEN

Analyzing the combined effects of genes and/or environmental factors on the development of complex diseases is a great challenge from both the statistical and computational perspective, even using a relatively small number of genetic and nongenetic exposures. Several data-mining methods have been proposed for interaction analysis, among them, the Multifactor Dimensionality Reduction Method (MDR) has proven its utility in a variety of theoretical and practical settings. Model-Based Multifactor Dimensionality Reduction (MB-MDR), a relatively new MDR-based technique that is able to unify the best of both nonparametric and parametric worlds, was developed to address some of the remaining concerns that go along with an MDR analysis. These include the restriction to univariate, dichotomous traits, the absence of flexible ways to adjust for lower order effects and important confounders, and the difficulty in highlighting epistatic effects when too many multilocus genotype cells are pooled into two new genotype groups. We investigate the empirical power of MB-MDR to detect gene-gene interactions in the absence of any noise and in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity. Power is generally higher for MB-MDR than for MDR, in particular in the presence of genetic heterogeneity, phenocopy, or low minor allele frequencies.


Asunto(s)
Enfermedad/genética , Epistasis Genética , Modelos Genéticos , Reducción de Dimensionalidad Multifactorial , Estudios de Casos y Controles , Simulación por Computador
7.
N Engl J Med ; 358(10): 999-1008, 2008 Mar 06.
Artículo en Inglés | MEDLINE | ID: mdl-18322281

RESUMEN

BACKGROUND: Genetic variants of the enzyme that metabolizes warfarin, cytochrome P-450 2C9 (CYP2C9), and of a key pharmacologic target of warfarin, vitamin K epoxide reductase (VKORC1), contribute to differences in patients' responses to various warfarin doses, but the role of these variants during initial anticoagulation is not clear. METHODS: In 297 patients starting warfarin therapy, we assessed CYP2C9 genotypes (CYP2C9 *1, *2, and *3), VKORC1 haplotypes (designated A and non-A), clinical characteristics, response to therapy (as determined by the international normalized ratio [INR]), and bleeding events. The study outcomes were the time to the first INR within the therapeutic range, the time to the first INR of more than 4, the time above the therapeutic INR range, the INR response over time, and the warfarin dose requirement. RESULTS: As compared with patients with the non-A/non-A haplotype, patients with the A/A haplotype of VKORC1 had a decreased time to the first INR within the therapeutic range (P=0.02) and to the first INR of more than 4 (P=0.003). In contrast, the CYP2C9 genotype was not a significant predictor of the time to the first INR within the therapeutic range (P=0.57) but was a significant predictor of the time to the first INR of more than 4 (P=0.03). Both the CYP2C9 genotype and VKORC1 haplotype had a significant influence on the required warfarin dose after the first 2 weeks of therapy. CONCLUSIONS: Initial variability in the INR response to warfarin was more strongly associated with genetic variability in the pharmacologic target of warfarin, VKORC1, than with CYP2C9.


Asunto(s)
Anticoagulantes/uso terapéutico , Sistema Enzimático del Citocromo P-450/genética , Relación Normalizada Internacional , Oxigenasas de Función Mixta/genética , Warfarina/uso terapéutico , Adulto , Anciano , Estudios de Cohortes , Femenino , Genotipo , Haplotipos , Humanos , Desequilibrio de Ligamiento , Masculino , Persona de Mediana Edad , Polimorfismo Genético , Vitamina K Epóxido Reductasas
8.
Bioinformatics ; 26(4): 578-9, 2010 Feb 15.
Artículo en Inglés | MEDLINE | ID: mdl-20130027

RESUMEN

SUMMARY: Often in human genetic analysis, multiple tables of single nucleotide polymorphism (SNP) statistics are shown alongside a Haploview style correlation plot. Readers are then asked to make inferences that incorporate knowledge across these multiple sets of results. To better facilitate a collective understanding of all available data, we developed a Ruby-based web application, LD-Plus, to generate figures that simultaneously display physical location of SNPs, binary SNP attributes (such as coding/non-coding or presence on genotyping platforms), common haplotypes and their frequencies and continuously scaled values (such as F(st), minor allele frequency, genotyping efficiency or P-values), all in the context of the D' and r(2) linkage disequilibrium structures. Combining these results into one comprehensive figure reduces dereferencing between figures and tables, and can provide unique insights into genetic features that are not clearly seen when results are partitioned across multiple figures and tables.


Asunto(s)
Desequilibrio de Ligamiento , Polimorfismo de Nucleótido Simple , Programas Informáticos , Algoritmos , Bases de Datos Genéticas , Genotipo , Haplotipos
9.
Neuron ; 37(2): 249-61, 2003 Jan 23.
Artículo en Inglés | MEDLINE | ID: mdl-12546820

RESUMEN

The Drosophila circadian oscillator consists of interlocked period (per)/timeless (tim) and Clock (Clk) transcriptional/translational feedback loops. Within these feedback loops, CLK and CYCLE (CYC) activate per and tim transcription at the same time as they repress Clk transcription, thus controlling the opposite cycling phases of these transcripts. CLK-CYC directly bind E box elements to activate transcription, but the mechanism of CLK-CYC-dependent repression is not known. Here we show that a CLK-CYC-activated gene, vrille (vri), encodes a repressor of Clk transcription, thereby identifying vri as a key negative component of the Clk feedback loop in Drosophila's circadian oscillator. The blue light photoreceptor encoding cryptochrome (cry) gene is also a target for VRI repression, suggesting a broader role for VRI in the rhythmic repression of output genes that cycle in phase with Clk.


Asunto(s)
Ritmo Circadiano/genética , Proteínas de Drosophila , Drosophila/fisiología , Transactivadores/genética , Factores de Transcripción/genética , Factores de Transcripción/fisiología , Animales , Animales Modificados Genéticamente , Sitios de Unión , Western Blotting , Proteínas CLOCK , Proteínas de Unión al ADN/biosíntesis , Proteínas de Unión al ADN/metabolismo , Ensayo de Cambio de Movilidad Electroforética , Retroalimentación/fisiología , Factores de Unión a la G-Box , Calor , Inmunohistoquímica , Datos de Secuencia Molecular , Ensayos de Protección de Nucleasas , Células Fotorreceptoras de Invertebrados/fisiología , ARN Mensajero/biosíntesis , Factores de Transcripción/biosíntesis , Factores de Transcripción/metabolismo
10.
BMC Bioinformatics ; 9: 238, 2008 May 16.
Artículo en Inglés | MEDLINE | ID: mdl-18485205

RESUMEN

BACKGROUND: Multifactor Dimensionality Reduction (MDR) has been introduced previously as a non-parametric statistical method for detecting gene-gene interactions. MDR performs a dimensional reduction by assigning multi-locus genotypes to either high- or low-risk groups and measuring the percentage of cases and controls incorrectly labelled by this classification - the classification error. The combination of variables that produces the lowest classification error is selected as the best or most fit model. The correctly and incorrectly labelled cases and controls can be expressed as a two-way contingency table. We sought to improve the ability of MDR to detect gene-gene interactions by replacing classification error with a different measure to score model quality. RESULTS: In this study, we compare the detection and power of MDR using a variety of measures for two-way contingency table analysis. We simulated 40 genetic models, varying the number of disease loci in the model (2 - 5), allele frequencies of the disease loci (.2/.8 or .4/.6) and the broad-sense heritability of the model (.05 - .3). Overall, detection using NMI was 65.36% across all models, and specific detection was 59.4% versus detection using classification error at 62% and specific detection was 52.2%. CONCLUSION: Of the 10 measures evaluated, the likelihood ratio and normalized mutual information (NMI) are measures that consistently improve the detection and power of MDR in simulated data over using classification error. These measures also reduce the inclusion of spurious variables in a multi-locus model. Thus, MDR, which has already been demonstrated as a powerful tool for detecting gene-gene interactions, can be improved with the use of alternative fitness functions.


Asunto(s)
Errores Diagnósticos/estadística & datos numéricos , Almacenamiento y Recuperación de la Información/métodos , Modelos Genéticos , Sesgo , Errores Diagnósticos/clasificación , Frecuencia de los Genes , Redes Reguladoras de Genes , Marcadores Genéticos , Genotipo , Oportunidad Relativa , Medición de Riesgo/métodos , Sensibilidad y Especificidad , Estadísticas no Paramétricas , Pesos y Medidas
11.
Bioinformatics ; 22(17): 2173-4, 2006 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-16809395

RESUMEN

UNLABELLED: Parallel multifactor dimensionality reduction is a tool for large-scale analysis of gene-gene and gene-environment interactions. The MDR algorithm was redesigned to allow an unlimited number of study subjects, total variables and variable states, and to remove restrictions on the order of interactions being analyzed. In addition, the algorithm is markedly more efficient, with approximately 150-fold decrease in runtime for equivalent analyses. To facilitate the processing of large datasets, the algorithm was made parallel. AVAILABILITY: Parallel MDR is freely available for non-commercial research institutions. For full details see http://chgr.mc.vanderbilt.edu/ritchielab/pMDR. An open-source version of MDR software is available at http://www.epistasis.org.


Asunto(s)
Algoritmos , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Mapeo de Interacción de Proteínas/métodos , Proteoma/metabolismo , Transducción de Señal/fisiología , Programas Informáticos , Metodologías Computacionales
12.
J Biol Rhythms ; 21(2): 93-103, 2006 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-16603674

RESUMEN

CLOCK (CLK) is a core component of the transcriptional feedback loops that comprise the circadian timekeeping mechanism in Drosophila. As a heterodimer with CYCLE (CYC), CLK binds E-boxes to activate the transcription of rhythmically expressed genes within and downstream of the circadian clock, but this activation unexpectedly occurs at times when CLK is at its lowest levels on Western blots. Recent studies demonstrate that CLK also regulates nonrhythmic gene expression and behaviors. Despite the critical roles CLK plays within and outside the circadian clock, its spatial expression pattern has not been characterized. Using a newly developed CLK antibody, the authors show that CLK is coexpressed with PERIOD (PER) in canonical oscillator cells throughout the head and body. In contrast to PER, however, the levels of CLK immunoreactivity do not cycle in intensity, CLK is detected primarily in the nucleus throughout the circadian cycle, and CLK is expressed in non-oscillator cells within the lateral and dorsal brain, including Kenyon cells, which mediate various forms of learning and memory. These results indicate that constitutive levels of nuclear CLK regulate rhythmic transcription in circadian oscillator cells and suggest that CLK contributes to other behavioral processes by regulating gene expression in non-oscillator cells.


Asunto(s)
Proteínas de Drosophila/fisiología , Regulación de la Expresión Génica , Factores de Transcripción/fisiología , Animales , Western Blotting , Encéfalo/metabolismo , Proteínas CLOCK , Núcleo Celular/metabolismo , Ritmo Circadiano , Drosophila , Proteínas de Drosophila/metabolismo , Immunoblotting , Inmunohistoquímica , Microscopía Confocal , Microscopía Fluorescente , Modelos Biológicos , Proteínas Nucleares/metabolismo , Oscilometría , Proteínas Circadianas Period , Unión Proteica , Factores de Tiempo , Factores de Transcripción/metabolismo , Transcripción Genética
13.
J Am Med Inform Assoc ; 24(3): 577-587, 2017 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-28040685

RESUMEN

It is common that cancer patients have different molecular signatures even though they have similar clinical features, such as histology, due to the heterogeneity of tumors. To overcome this variability, we previously developed a new approach incorporating prior biological knowledge that identifies knowledge-driven genomic interactions associated with outcomes of interest. However, no systematic approach has been proposed to identify interaction models between pathways based on multi-omics data. Here we have proposed such a novel methodological framework, called metadimensional knowledge-driven genomic interactions (MKGIs). To test the utility of the proposed framework, we applied it to an ovarian cancer dataset including multi-omics profiles from The Cancer Genome Atlas to predict grade, stage, and survival outcome. We found that each knowledge-driven genomic interaction model, based on different genomic datasets, contains different sets of pathway features, which suggests that each genomic data type may contribute to outcomes in ovarian cancer via a different pathway. In addition, MKGI models significantly outperformed the single knowledge-driven genomic interaction model. From the MKGI models, many interactions between pathways associated with outcomes were found, including the mitogen-activated protein kinase (MAPK) signaling pathway and the gonadotropin-releasing hormone (GnRH) signaling pathway, which are known to play important roles in cancer pathogenesis. The beauty of incorporating biological knowledge into the model based on multi-omics data is the ability to improve diagnosis and prognosis and provide better interpretability. Thus, determining variability in molecular signatures based on these interactions between pathways may lead to better diagnostic/treatment strategies for better precision medicine.


Asunto(s)
Genómica/métodos , Modelos Genéticos , Neoplasias Ováricas/genética , Adulto , Anciano , Anciano de 80 o más Años , Conjuntos de Datos como Asunto , Femenino , Expresión Génica , Humanos , Persona de Mediana Edad , Neoplasias Ováricas/diagnóstico , Pronóstico
14.
Nat Commun ; 8(1): 1167, 2017 10 27.
Artículo en Inglés | MEDLINE | ID: mdl-29079728

RESUMEN

Genome-wide, imputed, sequence, and structural data are now available for exceedingly large sample sizes. The needs for data management, handling population structure and related samples, and performing associations have largely been met. However, the infrastructure to support analyses involving complexity beyond genome-wide association studies is not standardized or centralized. We provide the PLatform for the Analysis, Translation, and Organization of large-scale data (PLATO), a software tool equipped to handle multi-omic data for hundreds of thousands of samples to explore complexity using genetic interactions, environment-wide association studies and gene-environment interactions, phenome-wide association studies, as well as copy number and rare variant analyses. Using the data from the Marshfield Personalized Medicine Research Project, a site in the electronic Medical Records and Genomics Network, we apply each feature of PLATO to type 2 diabetes and demonstrate how PLATO can be used to uncover the complex etiology of common traits.


Asunto(s)
Biología Computacional , Genoma Humano , Estudio de Asociación del Genoma Completo , Consumo de Bebidas Alcohólicas , Alelos , Bases de Datos Genéticas , Diabetes Mellitus Tipo 2/genética , Dieta , Epistasis Genética , Eliminación de Gen , Dosificación de Gen , Interacción Gen-Ambiente , Genómica , Genotipo , Glutamato Descarboxilasa/genética , Humanos , Modelos Genéticos , Fenotipo , Polimorfismo de Nucleótido Simple , Lenguajes de Programación , Recurrencia , Análisis de Secuencia de ADN , Programas Informáticos , Encuestas y Cuestionarios
15.
BioData Min ; 9: 18, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27168765

RESUMEN

BACKGROUND: The future of medicine is moving towards the phase of precision medicine, with the goal to prevent and treat diseases by taking inter-individual variability into account. A large part of the variability lies in our genetic makeup. With the fast paced improvement of high-throughput methods for genome sequencing, a tremendous amount of genetics data have already been generated. The next hurdle for precision medicine is to have sufficient computational tools for analyzing large sets of data. Genome-Wide Association Studies (GWAS) have been the primary method to assess the relationship between single nucleotide polymorphisms (SNPs) and disease traits. While GWAS is sufficient in finding individual SNPs with strong main effects, it does not capture potential interactions among multiple SNPs. In many traits, a large proportion of variation remain unexplained by using main effects alone, leaving the door open for exploring the role of genetic interactions. However, identifying genetic interactions in large-scale genomics data poses a challenge even for modern computing. RESULTS: For this study, we present a new algorithm, Grammatical Evolution Bayesian Network (GEBN) that utilizes Bayesian Networks to identify interactions in the data, and at the same time, uses an evolutionary algorithm to reduce the computational cost associated with network optimization. GEBN excelled in simulation studies where the data contained main effects and interaction effects. We also applied GEBN to a Type 2 diabetes (T2D) dataset obtained from the Marshfield Personalized Medicine Research Project (PMRP). We were able to identify genetic interactions for T2D cases and controls and use information from those interactions to classify T2D samples. We obtained an average testing area under the curve (AUC) of 86.8 %. We also identified several interacting genes such as INADL and LPP that are known to be associated with T2D. CONCLUSIONS: Developing the computational tools to explore genetic associations beyond main effects remains a critically important challenge in human genetics. Methods, such as GEBN, demonstrate the utility of considering genetic interactions, as they likely explain some of the missing heritability.

16.
Neurobiol Aging ; 38: 141-150, 2016 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-26827652

RESUMEN

Late-onset Alzheimer disease (AD) has a complex genetic etiology, involving locus heterogeneity, polygenic inheritance, and gene-gene interactions; however, the investigation of interactions in recent genome-wide association studies has been limited. We used a biological knowledge-driven approach to evaluate gene-gene interactions for consistency across 13 data sets from the Alzheimer Disease Genetics Consortium. Fifteen single nucleotide polymorphism (SNP)-SNP pairs within 3 gene-gene combinations were identified: SIRT1 × ABCB1, PSAP × PEBP4, and GRIN2B × ADRA1A. In addition, we extend a previously identified interaction from an endophenotype analysis between RYR3 × CACNA1C. Finally, post hoc gene expression analyses of the implicated SNPs further implicate SIRT1 and ABCB1, and implicate CDH23 which was most recently identified as an AD risk locus in an epigenetic analysis of AD. The observed interactions in this article highlight ways in which genotypic variation related to disease may depend on the genetic context in which it occurs. Further, our results highlight the utility of evaluating genetic interactions to explain additional variance in AD risk and identify novel molecular mechanisms of AD pathogenesis.


Asunto(s)
Enfermedad de Alzheimer/genética , Conjuntos de Datos como Asunto , Epistasis Genética/genética , Estudios de Asociación Genética , Subfamilia B de Transportador de Casetes de Unión a ATP/genética , Proteínas Relacionadas con las Cadherinas , Cadherinas/genética , Canales de Calcio Tipo L/genética , Progresión de la Enfermedad , Femenino , Humanos , Masculino , Modelos Genéticos , Proteínas de Unión a Fosfatidiletanolamina/genética , Polimorfismo de Nucleótido Simple , Receptores Adrenérgicos alfa 1/genética , Receptores de N-Metil-D-Aspartato/genética , Riesgo , Canal Liberador de Calcio Receptor de Rianodina/genética , Saposinas/genética , Sirtuina 1/genética
17.
Pac Symp Biocomput ; : 96-107, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25592572

RESUMEN

Enormous efforts of whole exome and genome sequencing from hundreds to thousands of patients have provided the landscape of somatic genomic alterations in many cancer types to distinguish between driver mutations and passenger mutations. Driver mutations show strong associations with cancer clinical outcomes such as survival. However, due to the heterogeneity of tumors, somatic mutation profiles are exceptionally sparse whereas other types of genomic data such as miRNA or gene expression contain much more complete data for all genomic features with quantitative values measured in each patient. To overcome the extreme sparseness of somatic mutation profiles and allow for the discovery of combinations of somatic mutations that may predict cancer clinical outcomes, here we propose a new approach for binning somatic mutations based on existing biological knowledge. Through the analysis using renal cell carcinoma dataset from The Cancer Genome Atlas (TCGA), we identified combinations of somatic mutation burden based on pathways, protein families, evolutionary conversed regions, and regulatory regions associated with survival. Due to the nature of heterogeneity in cancer, using a binning strategy for somatic mutation profiles based on biological knowledge will be valuable for improved prognostic biomarkers and potentially for tailoring therapeutic strategies by identifying combinations of driver mutations.


Asunto(s)
Carcinoma de Células Renales/genética , Neoplasias Renales/genética , Mutación , Biomarcadores de Tumor/genética , Carcinoma de Células Renales/mortalidad , Biología Computacional , Bases de Datos Genéticas , Humanos , Neoplasias Renales/mortalidad , Modelos Genéticos , Redes Neurales de la Computación , Pronóstico , Análisis de Supervivencia
18.
Pac Symp Biocomput ; : 495-505, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25741542

RESUMEN

Investigating the association between biobank derived genomic data and the information of linked electronic health records (EHRs) is an emerging area of research for dissecting the architecture of complex human traits, where cases and controls for study are defined through the use of electronic phenotyping algorithms deployed in large EHR systems. For our study, cataract cases and controls were identified within the Marshfield Personalized Medicine Research Project (PMRP) biobank and linked EHR, which is a member of the NHGRI-funded electronic Medical Records and Genomics (eMERGE) Network. Our goal was to explore potential gene-gene and gene-environment interactions within these data for 527,953 and 527,936 single nucleotide polymorphisms (SNPs) for gene-gene and gene-environment analyses, respectively, with minor allele frequency > 1%, in order to explore higher level associations with cataract risk beyond investigations of single SNP-phenotype associations. To build our SNP-SNP interaction models we utilized a prior-knowledge driven filtering method called Biofilter to minimize the multiple testing burden of exploring the vast array of interaction models possible from our extensive number of SNPs. Using Biofilter, we developed 57,376 prior-knowledge directed SNP-SNP models to test for association with cataract status. We selected models that required 6 sources of external domain knowledge. We identified 13 statistically significant SNP-SNP models with an interaction with p-value < 1 × 10(-4), as well as an overall model with p-value < 0.01 associated with cataract status. We also conducted gene-environment interaction analyses for all GWAS SNPs and a set of environmental factors from the PhenX Toolkit: smoking, UV exposure, and alcohol use;these environmental factors have been previously associated with the formation of cataracts. We found a total of 782 gene-environment models that exhibit an interaction with a p-value < 1 × 10(-4) associatedwith cataract status. Our results show these approaches enable advanced searches for epistasis and gene-environment interactions beyond GWAS, and that the EHR based approach provides an additional source of data for seeking these advanced explanatory models of the etiology of complex disease/outcome such as cataracts.


Asunto(s)
Catarata/genética , Algoritmos , Bancos de Muestras Biológicas , Estudios de Casos y Controles , Biología Computacional , Bases de Datos Genéticas , Registros Electrónicos de Salud , Epistasis Genética , Interacción Gen-Ambiente , Estudio de Asociación del Genoma Completo , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple , Programas Informáticos
19.
BioData Min ; 7: 20, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25214892

RESUMEN

BACKGROUND: Effective cancer clinical outcome prediction for understanding of the mechanism of various types of cancer has been pursued using molecular-based data such as gene expression profiles, an approach that has promise for providing better diagnostics and supporting further therapies. However, clinical outcome prediction based on gene expression profiles varies between independent data sets. Further, single-gene expression outcome prediction is limited for cancer evaluation since genes do not act in isolation, but rather interact with other genes in complex signaling or regulatory networks. In addition, since pathways are more likely to co-operate together, it would be desirable to incorporate expert knowledge to combine pathways in a useful and informative manner. METHODS: Thus, we propose a novel approach for identifying knowledge-driven genomic interactions and applying it to discover models associated with cancer clinical phenotypes using grammatical evolution neural networks (GENN). In order to demonstrate the utility of the proposed approach, an ovarian cancer data from the Cancer Genome Atlas (TCGA) was used for predicting clinical stage as a pilot project. RESULTS: We identified knowledge-driven genomic interactions associated with cancer stage from single knowledge bases such as sources of pathway-pathway interaction, but also knowledge-driven genomic interactions across different sets of knowledge bases such as pathway-protein family interactions by integrating different types of information. Notably, an integration model from different sources of biological knowledge achieved 78.82% balanced accuracy and outperformed the top models with gene expression or single knowledge-based data types alone. Furthermore, the results from the models are more interpretable because they are framed in the context of specific biological pathways or other expert knowledge. CONCLUSIONS: The success of the pilot study we have presented herein will allow us to pursue further identification of models predictive of clinical cancer survival and recurrence. Understanding the underlying tumorigenesis and progression in ovarian cancer through the global view of interactions within/between different biological knowledge sources has the potential for providing more effective screening strategies and therapeutic targets for many types of cancer.

20.
Pac Symp Biocomput ; : 200-11, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24297547

RESUMEN

Environment-wide association studies (EWAS) provide a way to uncover the environmental mechanisms involved in complex traits in a high-throughput manner. Genome-wide association studies have led to the discovery of genetic variants associated with many common diseases but do not take into account the environmental component of complex phenotypes. This EWAS assesses the comprehensive association between environmental variables and the outcome of type 2 diabetes (T2D) in the Marshfield Personalized Medicine Research Project Biobank (Marshfield PMRP). We sought replication in two National Health and Nutrition Examination Surveys (NHANES). The Marshfield PMRP currently uses four tools for measuring environmental exposures and outcome traits: 1) the PhenX Toolkit includes standardized exposure and phenotypic measures across several domains, 2) the Diet History Questionnaire (DHQ) is a food frequency questionnaire, 3) the Measurement of a Person's Habitual Physical Activity scores the level of an individual's physical activity, and 4) electronic health records (EHR) employs validated algorithms to establish T2D case-control status. Using PLATO software, 314 environmental variables were tested for association with T2D using logistic regression, adjusting for sex, age, and BMI in over 2,200 European Americans. When available, similar variables were tested with the same methods and adjustment in samples from NHANES III and NHANES 1999-2002. Twelve and 31 associations were identified in the Marshfield samples at p<0.01 and p<0.05, respectively. Seven and 13 measures replicated in at least one of the NHANES at p<0.01 and p<0.05, respectively, with the same direction of effect. The most significant environmental exposures associated with T2D status included decreased alcohol use as well as increased smoking exposure in childhood and adulthood. The results demonstrate the utility of the EWAS method and survey tools for identifying environmental components of complex diseases like type 2 diabetes. These high-throughput and comprehensive investigation methods can easily be applied to investigate the relation between environmental exposures and multiple phenotypes in future analyses.


Asunto(s)
Diabetes Mellitus Tipo 2/etiología , Ambiente , Bancos de Muestras Biológicas , Biología Computacional , Registros de Dieta , Exposición a Riesgos Ambientales , Femenino , Interacción Gen-Ambiente , Humanos , Masculino , Actividad Motora , Encuestas Nutricionales , Fenotipo , Medicina de Precisión , Programas Informáticos , Wisconsin
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA