Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 37
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Am J Hum Genet ; 110(4): 575-591, 2023 04 06.
Artigo em Inglês | MEDLINE | ID: mdl-37028392

RESUMO

Leveraging linkage disequilibrium (LD) patterns as representative of population substructure enables the discovery of additive association signals in genome-wide association studies (GWASs). Standard GWASs are well-powered to interrogate additive models; however, new approaches are required for invesigating other modes of inheritance such as dominance and epistasis. Epistasis, or non-additive interaction between genes, exists across the genome but often goes undetected because of a lack of statistical power. Furthermore, the adoption of LD pruning as customary in standard GWASs excludes detection of sites that are in LD but might underlie the genetic architecture of complex traits. We hypothesize that uncovering long-range interactions between loci with strong LD due to epistatic selection can elucidate genetic mechanisms underlying common diseases. To investigate this hypothesis, we tested for associations between 23 common diseases and 5,625,845 epistatic SNP-SNP pairs (determined by Ohta's D statistics) in long-range LD (>0.25 cM). Across five disease phenotypes, we identified one significant and four near-significant associations that replicated in two large genotype-phenotype datasets (UK Biobank and eMERGE). The genes that were most likely involved in the replicated associations were (1) members of highly conserved gene families with complex roles in multiple pathways, (2) essential genes, and/or (3) genes that were associated in the literature with complex traits that display variable expressivity. These results support the highly pleiotropic and conserved nature of variants in long-range LD under epistatic selection. Our work supports the hypothesis that epistatic interactions regulate diverse clinical mechanisms and might especially be driving factors in conditions with a wide range of phenotypic outcomes.


Assuntos
Epistasia Genética , Estudo de Associação Genômica Ampla , Desequilíbrio de Ligação/genética , Genótipo , Bancos de Espécimes Biológicos , Reino Unido , Polimorfismo de Nucleotídeo Único/genética
2.
Pharmacogenomics J ; 19(2): 178-190, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-29795408

RESUMO

Identifying genetic variants associated with chemotherapeutic induced toxicity is an important step towards personalized treatment of cancer patients. However, annotating and interpreting the associated genetic variants remains challenging because each associated variant is a surrogate for many other variants in the same region. The issue is further complicated when investigating patterns of associated variants with multiple drugs. In this study, we used biological knowledge to annotate and compare genetic variants associated with cellular sensitivity to mechanistically distinct chemotherapeutic drugs, including platinating agents (cisplatin, carboplatin), capecitabine, cytarabine, and paclitaxel. The most significantly associated SNPs from genome wide association studies of cellular sensitivity to each drug in lymphoblastoid cell lines derived from populations of European (CEU) and African (YRI) descent were analyzed for their enrichment in biological pathways and processes. We annotated genetic variants using higher-level biological annotations in efforts to group variants into more interpretable biological modules. Using the higher-level annotations, we observed distinct biological modules associated with cell line populations as well as classes of chemotherapeutic drugs. We also integrated genetic variants and gene expression variables to build predictive models for chemotherapeutic drug cytotoxicity and prioritized the network models based on the enrichment of DNA regulatory data. Several biological annotations, often encompassing different SNPs, were replicated in independent datasets. By using biological knowledge and DNA regulatory information, we propose a novel approach for jointly analyzing genetic variants associated with multiple chemotherapeutic drugs.


Assuntos
Variação Genética/genética , Estudo de Associação Genômica Ampla/métodos , Neoplasias/tratamento farmacológico , Farmacogenética/métodos , População Negra/genética , Capecitabina/efeitos adversos , Capecitabina/uso terapêutico , Carboplatina/efeitos adversos , Carboplatina/uso terapêutico , Linhagem Celular , Cisplatino/efeitos adversos , Cisplatino/uso terapêutico , Regulação Neoplásica da Expressão Gênica/efeitos dos fármacos , Genoma Humano/genética , Humanos , Anotação de Sequência Molecular , Neoplasias/genética , Paclitaxel/efeitos adversos , Paclitaxel/uso terapêutico , Polimorfismo de Nucleotídeo Único/genética , População Branca/genética
3.
Bioinformatics ; 30(5): 698-705, 2014 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-24149050

RESUMO

MOTIVATION: Advancements in high-throughput technology have allowed researchers to examine the genetic etiology of complex human traits in a robust fashion. Although genome-wide association studies have identified many novel variants associated with hundreds of traits, a large proportion of the estimated trait heritability remains unexplained. One hypothesis is that the commonly used statistical techniques and study designs are not robust to the complex etiology that may underlie these human traits. This etiology could include non-linear gene × gene or gene × environment interactions. Additionally, other levels of biological regulation may play a large role in trait variability. RESULTS: To address the need for computational tools that can explore enormous datasets to detect complex susceptibility models, we have developed a software package called the Analysis Tool for Heritable and Environmental Network Associations (ATHENA). ATHENA combines various variable filtering methods with machine learning techniques to analyze high-throughput categorical (i.e. single nucleotide polymorphisms) and quantitative (i.e. gene expression levels) predictor variables to generate multivariable models that predict either a categorical (i.e. disease status) or quantitative (i.e. cholesterol levels) outcomes. The goal of this article is to demonstrate the utility of ATHENA using simulated and biological datasets that consist of both single nucleotide polymorphisms and gene expression variables to identify complex prediction models. Importantly, this method is flexible and can be expanded to include other types of high-throughput data (i.e. RNA-seq data and biomarker measurements). AVAILABILITY: ATHENA is freely available for download. The software, user manual and tutorial can be downloaded from http://ritchielab.psu.edu/ritchielab/software.


Assuntos
Interação Gene-Ambiente , Estudo de Associação Genômica Ampla , Software , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único
4.
J Biomed Inform ; 56: 220-8, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-26048077

RESUMO

Evaluation of survival models to predict cancer patient prognosis is one of the most important areas of emphasis in cancer research. A binary classification approach has difficulty directly predicting survival due to the characteristics of censored observations and the fact that the predictive power depends on the threshold used to set two classes. In contrast, the traditional Cox regression approach has some drawbacks in the sense that it does not allow for the identification of interactions between genomic features, which could have key roles associated with cancer prognosis. In addition, data integration is regarded as one of the important issues in improving the predictive power of survival models since cancer could be caused by multiple alterations through meta-dimensional genomic data including genome, epigenome, transcriptome, and proteome. Here we have proposed a new integrative framework designed to perform these three functions simultaneously: (1) predicting censored survival data; (2) integrating meta-dimensional omics data; (3) identifying interactions within/between meta-dimensional genomic features associated with survival. In order to predict censored survival time, martingale residuals were calculated as a new continuous outcome and a new fitness function used by the grammatical evolution neural network (GENN) based on mean absolute difference of martingale residuals was implemented. To test the utility of the proposed framework, a simulation study was conducted, followed by an analysis of meta-dimensional omics data including copy number, gene expression, DNA methylation, and protein expression data in breast cancer retrieved from The Cancer Genome Atlas (TCGA). On the basis of the results from breast cancer dataset, we were able to identify interactions not only within a single dimension of genomic data but also between meta-dimensional omics data that are associated with survival. Notably, the predictive power of our best meta-dimensional model was 73% which outperformed all of the other models conducted based on a single dimension of genomic data. Breast cancer is an extremely heterogeneous disease and the high levels of genomic diversity within/between breast tumors could affect the risk of therapeutic responses and disease progression. Thus, identifying interactions within/between meta-dimensional omics data associated with survival in breast cancer is expected to deliver direction for improved meta-dimensional prognostic biomarkers and therapeutic targets.


Assuntos
Neoplasias da Mama/mortalidade , Coleta de Dados , Informática Médica/métodos , Análise de Sobrevida , Algoritmos , Neoplasias da Mama/genética , Neoplasias da Mama/metabolismo , Biologia Computacional/métodos , Simulação por Computador , Metilação de DNA , Progressão da Doença , Epigenômica , Feminino , Perfilação da Expressão Gênica , Genoma Humano , Genômica , Humanos , Modelos Estatísticos , Redes Neurais de Computação , Prognóstico , Modelos de Riscos Proporcionais , Proteoma , Software , Transcriptoma , Resultado do Tratamento
5.
J Pers Med ; 12(12)2022 Nov 29.
Artigo em Inglês | MEDLINE | ID: mdl-36556195

RESUMO

The Penn Medicine BioBank (PMBB) is an electronic health record (EHR)-linked biobank at the University of Pennsylvania (Penn Medicine). A large variety of health-related information, ranging from diagnosis codes to laboratory measurements, imaging data and lifestyle information, is integrated with genomic and biomarker data in the PMBB to facilitate discoveries and translational science. To date, 174,712 participants have been enrolled into the PMBB, including approximately 30% of participants of non-European ancestry, making it one of the most diverse medical biobanks. There is a median of seven years of longitudinal data in the EHR available on participants, who also consent to permission to recontact. Herein, we describe the operations and infrastructure of the PMBB, summarize the phenotypic architecture of the enrolled participants, and use body mass index (BMI) as a proof-of-concept quantitative phenotype for PheWAS, LabWAS, and GWAS. The major representation of African-American participants in the PMBB addresses the essential need to expand the diversity in genetic and translational research. There is a critical need for a "medical biobank consortium" to facilitate replication, increase power for rare phenotypes and variants, and promote harmonized collaboration to optimize the potential for biological discovery and precision medicine.

6.
Ann Hum Genet ; 75(1): 78-89, 2011 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-21158747

RESUMO

Analyzing the combined effects of genes and/or environmental factors on the development of complex diseases is a great challenge from both the statistical and computational perspective, even using a relatively small number of genetic and nongenetic exposures. Several data-mining methods have been proposed for interaction analysis, among them, the Multifactor Dimensionality Reduction Method (MDR) has proven its utility in a variety of theoretical and practical settings. Model-Based Multifactor Dimensionality Reduction (MB-MDR), a relatively new MDR-based technique that is able to unify the best of both nonparametric and parametric worlds, was developed to address some of the remaining concerns that go along with an MDR analysis. These include the restriction to univariate, dichotomous traits, the absence of flexible ways to adjust for lower order effects and important confounders, and the difficulty in highlighting epistatic effects when too many multilocus genotype cells are pooled into two new genotype groups. We investigate the empirical power of MB-MDR to detect gene-gene interactions in the absence of any noise and in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity. Power is generally higher for MB-MDR than for MDR, in particular in the presence of genetic heterogeneity, phenocopy, or low minor allele frequencies.


Assuntos
Doença/genética , Epistasia Genética , Modelos Genéticos , Redução Dimensional com Múltiplos Fatores , Estudos de Casos e Controles , Simulação por Computador
7.
N Engl J Med ; 358(10): 999-1008, 2008 Mar 06.
Artigo em Inglês | MEDLINE | ID: mdl-18322281

RESUMO

BACKGROUND: Genetic variants of the enzyme that metabolizes warfarin, cytochrome P-450 2C9 (CYP2C9), and of a key pharmacologic target of warfarin, vitamin K epoxide reductase (VKORC1), contribute to differences in patients' responses to various warfarin doses, but the role of these variants during initial anticoagulation is not clear. METHODS: In 297 patients starting warfarin therapy, we assessed CYP2C9 genotypes (CYP2C9 *1, *2, and *3), VKORC1 haplotypes (designated A and non-A), clinical characteristics, response to therapy (as determined by the international normalized ratio [INR]), and bleeding events. The study outcomes were the time to the first INR within the therapeutic range, the time to the first INR of more than 4, the time above the therapeutic INR range, the INR response over time, and the warfarin dose requirement. RESULTS: As compared with patients with the non-A/non-A haplotype, patients with the A/A haplotype of VKORC1 had a decreased time to the first INR within the therapeutic range (P=0.02) and to the first INR of more than 4 (P=0.003). In contrast, the CYP2C9 genotype was not a significant predictor of the time to the first INR within the therapeutic range (P=0.57) but was a significant predictor of the time to the first INR of more than 4 (P=0.03). Both the CYP2C9 genotype and VKORC1 haplotype had a significant influence on the required warfarin dose after the first 2 weeks of therapy. CONCLUSIONS: Initial variability in the INR response to warfarin was more strongly associated with genetic variability in the pharmacologic target of warfarin, VKORC1, than with CYP2C9.


Assuntos
Anticoagulantes/uso terapêutico , Sistema Enzimático do Citocromo P-450/genética , Coeficiente Internacional Normatizado , Oxigenases de Função Mista/genética , Varfarina/uso terapêutico , Adulto , Idoso , Estudos de Coortes , Feminino , Genótipo , Haplótipos , Humanos , Desequilíbrio de Ligação , Masculino , Pessoa de Meia-Idade , Polimorfismo Genético , Vitamina K Epóxido Redutases
8.
Bioinformatics ; 26(4): 578-9, 2010 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-20130027

RESUMO

SUMMARY: Often in human genetic analysis, multiple tables of single nucleotide polymorphism (SNP) statistics are shown alongside a Haploview style correlation plot. Readers are then asked to make inferences that incorporate knowledge across these multiple sets of results. To better facilitate a collective understanding of all available data, we developed a Ruby-based web application, LD-Plus, to generate figures that simultaneously display physical location of SNPs, binary SNP attributes (such as coding/non-coding or presence on genotyping platforms), common haplotypes and their frequencies and continuously scaled values (such as F(st), minor allele frequency, genotyping efficiency or P-values), all in the context of the D' and r(2) linkage disequilibrium structures. Combining these results into one comprehensive figure reduces dereferencing between figures and tables, and can provide unique insights into genetic features that are not clearly seen when results are partitioned across multiple figures and tables.


Assuntos
Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único , Software , Algoritmos , Bases de Dados Genéticas , Genótipo , Haplótipos
9.
Neuron ; 37(2): 249-61, 2003 Jan 23.
Artigo em Inglês | MEDLINE | ID: mdl-12546820

RESUMO

The Drosophila circadian oscillator consists of interlocked period (per)/timeless (tim) and Clock (Clk) transcriptional/translational feedback loops. Within these feedback loops, CLK and CYCLE (CYC) activate per and tim transcription at the same time as they repress Clk transcription, thus controlling the opposite cycling phases of these transcripts. CLK-CYC directly bind E box elements to activate transcription, but the mechanism of CLK-CYC-dependent repression is not known. Here we show that a CLK-CYC-activated gene, vrille (vri), encodes a repressor of Clk transcription, thereby identifying vri as a key negative component of the Clk feedback loop in Drosophila's circadian oscillator. The blue light photoreceptor encoding cryptochrome (cry) gene is also a target for VRI repression, suggesting a broader role for VRI in the rhythmic repression of output genes that cycle in phase with Clk.


Assuntos
Ritmo Circadiano/genética , Proteínas de Drosophila , Drosophila/fisiologia , Transativadores/genética , Fatores de Transcrição/genética , Fatores de Transcrição/fisiologia , Animais , Animais Geneticamente Modificados , Sítios de Ligação , Western Blotting , Proteínas CLOCK , Proteínas de Ligação a DNA/biossíntese , Proteínas de Ligação a DNA/metabolismo , Ensaio de Desvio de Mobilidade Eletroforética , Retroalimentação/fisiologia , Fatores de Ligação G-Box , Temperatura Alta , Imuno-Histoquímica , Dados de Sequência Molecular , Ensaios de Proteção de Nucleases , Células Fotorreceptoras de Invertebrados/fisiologia , RNA Mensageiro/biossíntese , Fatores de Transcrição/biossíntese , Fatores de Transcrição/metabolismo
10.
BMC Bioinformatics ; 9: 238, 2008 May 16.
Artigo em Inglês | MEDLINE | ID: mdl-18485205

RESUMO

BACKGROUND: Multifactor Dimensionality Reduction (MDR) has been introduced previously as a non-parametric statistical method for detecting gene-gene interactions. MDR performs a dimensional reduction by assigning multi-locus genotypes to either high- or low-risk groups and measuring the percentage of cases and controls incorrectly labelled by this classification - the classification error. The combination of variables that produces the lowest classification error is selected as the best or most fit model. The correctly and incorrectly labelled cases and controls can be expressed as a two-way contingency table. We sought to improve the ability of MDR to detect gene-gene interactions by replacing classification error with a different measure to score model quality. RESULTS: In this study, we compare the detection and power of MDR using a variety of measures for two-way contingency table analysis. We simulated 40 genetic models, varying the number of disease loci in the model (2 - 5), allele frequencies of the disease loci (.2/.8 or .4/.6) and the broad-sense heritability of the model (.05 - .3). Overall, detection using NMI was 65.36% across all models, and specific detection was 59.4% versus detection using classification error at 62% and specific detection was 52.2%. CONCLUSION: Of the 10 measures evaluated, the likelihood ratio and normalized mutual information (NMI) are measures that consistently improve the detection and power of MDR in simulated data over using classification error. These measures also reduce the inclusion of spurious variables in a multi-locus model. Thus, MDR, which has already been demonstrated as a powerful tool for detecting gene-gene interactions, can be improved with the use of alternative fitness functions.


Assuntos
Erros de Diagnóstico/estatística & dados numéricos , Armazenamento e Recuperação da Informação/métodos , Modelos Genéticos , Viés , Erros de Diagnóstico/classificação , Frequência do Gene , Redes Reguladoras de Genes , Marcadores Genéticos , Genótipo , Razão de Chances , Medição de Risco/métodos , Sensibilidade e Especificidade , Estatísticas não Paramétricas , Pesos e Medidas
11.
Bioinformatics ; 22(17): 2173-4, 2006 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-16809395

RESUMO

UNLABELLED: Parallel multifactor dimensionality reduction is a tool for large-scale analysis of gene-gene and gene-environment interactions. The MDR algorithm was redesigned to allow an unlimited number of study subjects, total variables and variable states, and to remove restrictions on the order of interactions being analyzed. In addition, the algorithm is markedly more efficient, with approximately 150-fold decrease in runtime for equivalent analyses. To facilitate the processing of large datasets, the algorithm was made parallel. AVAILABILITY: Parallel MDR is freely available for non-commercial research institutions. For full details see http://chgr.mc.vanderbilt.edu/ritchielab/pMDR. An open-source version of MDR software is available at http://www.epistasis.org.


Assuntos
Algoritmos , Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Mapeamento de Interação de Proteínas/métodos , Proteoma/metabolismo , Transdução de Sinais/fisiologia , Software , Metodologias Computacionais
12.
J Biol Rhythms ; 21(2): 93-103, 2006 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-16603674

RESUMO

CLOCK (CLK) is a core component of the transcriptional feedback loops that comprise the circadian timekeeping mechanism in Drosophila. As a heterodimer with CYCLE (CYC), CLK binds E-boxes to activate the transcription of rhythmically expressed genes within and downstream of the circadian clock, but this activation unexpectedly occurs at times when CLK is at its lowest levels on Western blots. Recent studies demonstrate that CLK also regulates nonrhythmic gene expression and behaviors. Despite the critical roles CLK plays within and outside the circadian clock, its spatial expression pattern has not been characterized. Using a newly developed CLK antibody, the authors show that CLK is coexpressed with PERIOD (PER) in canonical oscillator cells throughout the head and body. In contrast to PER, however, the levels of CLK immunoreactivity do not cycle in intensity, CLK is detected primarily in the nucleus throughout the circadian cycle, and CLK is expressed in non-oscillator cells within the lateral and dorsal brain, including Kenyon cells, which mediate various forms of learning and memory. These results indicate that constitutive levels of nuclear CLK regulate rhythmic transcription in circadian oscillator cells and suggest that CLK contributes to other behavioral processes by regulating gene expression in non-oscillator cells.


Assuntos
Proteínas de Drosophila/fisiologia , Regulação da Expressão Gênica , Fatores de Transcrição/fisiologia , Animais , Western Blotting , Encéfalo/metabolismo , Proteínas CLOCK , Núcleo Celular/metabolismo , Ritmo Circadiano , Drosophila , Proteínas de Drosophila/metabolismo , Immunoblotting , Imuno-Histoquímica , Microscopia Confocal , Microscopia de Fluorescência , Modelos Biológicos , Proteínas Nucleares/metabolismo , Oscilometria , Proteínas Circadianas Period , Ligação Proteica , Fatores de Tempo , Fatores de Transcrição/metabolismo , Transcrição Gênica
13.
J Am Med Inform Assoc ; 24(3): 577-587, 2017 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-28040685

RESUMO

It is common that cancer patients have different molecular signatures even though they have similar clinical features, such as histology, due to the heterogeneity of tumors. To overcome this variability, we previously developed a new approach incorporating prior biological knowledge that identifies knowledge-driven genomic interactions associated with outcomes of interest. However, no systematic approach has been proposed to identify interaction models between pathways based on multi-omics data. Here we have proposed such a novel methodological framework, called metadimensional knowledge-driven genomic interactions (MKGIs). To test the utility of the proposed framework, we applied it to an ovarian cancer dataset including multi-omics profiles from The Cancer Genome Atlas to predict grade, stage, and survival outcome. We found that each knowledge-driven genomic interaction model, based on different genomic datasets, contains different sets of pathway features, which suggests that each genomic data type may contribute to outcomes in ovarian cancer via a different pathway. In addition, MKGI models significantly outperformed the single knowledge-driven genomic interaction model. From the MKGI models, many interactions between pathways associated with outcomes were found, including the mitogen-activated protein kinase (MAPK) signaling pathway and the gonadotropin-releasing hormone (GnRH) signaling pathway, which are known to play important roles in cancer pathogenesis. The beauty of incorporating biological knowledge into the model based on multi-omics data is the ability to improve diagnosis and prognosis and provide better interpretability. Thus, determining variability in molecular signatures based on these interactions between pathways may lead to better diagnostic/treatment strategies for better precision medicine.


Assuntos
Genômica/métodos , Modelos Genéticos , Neoplasias Ovarianas/genética , Adulto , Idoso , Idoso de 80 Anos ou mais , Conjuntos de Dados como Assunto , Feminino , Expressão Gênica , Humanos , Pessoa de Meia-Idade , Neoplasias Ovarianas/diagnóstico , Prognóstico
14.
Nat Commun ; 8(1): 1167, 2017 10 27.
Artigo em Inglês | MEDLINE | ID: mdl-29079728

RESUMO

Genome-wide, imputed, sequence, and structural data are now available for exceedingly large sample sizes. The needs for data management, handling population structure and related samples, and performing associations have largely been met. However, the infrastructure to support analyses involving complexity beyond genome-wide association studies is not standardized or centralized. We provide the PLatform for the Analysis, Translation, and Organization of large-scale data (PLATO), a software tool equipped to handle multi-omic data for hundreds of thousands of samples to explore complexity using genetic interactions, environment-wide association studies and gene-environment interactions, phenome-wide association studies, as well as copy number and rare variant analyses. Using the data from the Marshfield Personalized Medicine Research Project, a site in the electronic Medical Records and Genomics Network, we apply each feature of PLATO to type 2 diabetes and demonstrate how PLATO can be used to uncover the complex etiology of common traits.


Assuntos
Biologia Computacional , Genoma Humano , Estudo de Associação Genômica Ampla , Consumo de Bebidas Alcoólicas , Alelos , Bases de Dados Genéticas , Diabetes Mellitus Tipo 2/genética , Dieta , Epistasia Genética , Deleção de Genes , Dosagem de Genes , Interação Gene-Ambiente , Genômica , Genótipo , Glutamato Descarboxilase/genética , Humanos , Modelos Genéticos , Fenótipo , Polimorfismo de Nucleotídeo Único , Linguagens de Programação , Recidiva , Análise de Sequência de DNA , Software , Inquéritos e Questionários
15.
BioData Min ; 9: 18, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27168765

RESUMO

BACKGROUND: The future of medicine is moving towards the phase of precision medicine, with the goal to prevent and treat diseases by taking inter-individual variability into account. A large part of the variability lies in our genetic makeup. With the fast paced improvement of high-throughput methods for genome sequencing, a tremendous amount of genetics data have already been generated. The next hurdle for precision medicine is to have sufficient computational tools for analyzing large sets of data. Genome-Wide Association Studies (GWAS) have been the primary method to assess the relationship between single nucleotide polymorphisms (SNPs) and disease traits. While GWAS is sufficient in finding individual SNPs with strong main effects, it does not capture potential interactions among multiple SNPs. In many traits, a large proportion of variation remain unexplained by using main effects alone, leaving the door open for exploring the role of genetic interactions. However, identifying genetic interactions in large-scale genomics data poses a challenge even for modern computing. RESULTS: For this study, we present a new algorithm, Grammatical Evolution Bayesian Network (GEBN) that utilizes Bayesian Networks to identify interactions in the data, and at the same time, uses an evolutionary algorithm to reduce the computational cost associated with network optimization. GEBN excelled in simulation studies where the data contained main effects and interaction effects. We also applied GEBN to a Type 2 diabetes (T2D) dataset obtained from the Marshfield Personalized Medicine Research Project (PMRP). We were able to identify genetic interactions for T2D cases and controls and use information from those interactions to classify T2D samples. We obtained an average testing area under the curve (AUC) of 86.8 %. We also identified several interacting genes such as INADL and LPP that are known to be associated with T2D. CONCLUSIONS: Developing the computational tools to explore genetic associations beyond main effects remains a critically important challenge in human genetics. Methods, such as GEBN, demonstrate the utility of considering genetic interactions, as they likely explain some of the missing heritability.

16.
Neurobiol Aging ; 38: 141-150, 2016 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-26827652

RESUMO

Late-onset Alzheimer disease (AD) has a complex genetic etiology, involving locus heterogeneity, polygenic inheritance, and gene-gene interactions; however, the investigation of interactions in recent genome-wide association studies has been limited. We used a biological knowledge-driven approach to evaluate gene-gene interactions for consistency across 13 data sets from the Alzheimer Disease Genetics Consortium. Fifteen single nucleotide polymorphism (SNP)-SNP pairs within 3 gene-gene combinations were identified: SIRT1 × ABCB1, PSAP × PEBP4, and GRIN2B × ADRA1A. In addition, we extend a previously identified interaction from an endophenotype analysis between RYR3 × CACNA1C. Finally, post hoc gene expression analyses of the implicated SNPs further implicate SIRT1 and ABCB1, and implicate CDH23 which was most recently identified as an AD risk locus in an epigenetic analysis of AD. The observed interactions in this article highlight ways in which genotypic variation related to disease may depend on the genetic context in which it occurs. Further, our results highlight the utility of evaluating genetic interactions to explain additional variance in AD risk and identify novel molecular mechanisms of AD pathogenesis.


Assuntos
Doença de Alzheimer/genética , Conjuntos de Dados como Assunto , Epistasia Genética/genética , Estudos de Associação Genética , Subfamília B de Transportador de Cassetes de Ligação de ATP/genética , Proteínas Relacionadas a Caderinas , Caderinas/genética , Canais de Cálcio Tipo L/genética , Progressão da Doença , Feminino , Humanos , Masculino , Modelos Genéticos , Proteína de Ligação a Fosfatidiletanolamina/genética , Polimorfismo de Nucleotídeo Único , Receptores Adrenérgicos alfa 1/genética , Receptores de N-Metil-D-Aspartato/genética , Risco , Canal de Liberação de Cálcio do Receptor de Rianodina/genética , Saposinas/genética , Sirtuína 1/genética
17.
Pac Symp Biocomput ; : 96-107, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25592572

RESUMO

Enormous efforts of whole exome and genome sequencing from hundreds to thousands of patients have provided the landscape of somatic genomic alterations in many cancer types to distinguish between driver mutations and passenger mutations. Driver mutations show strong associations with cancer clinical outcomes such as survival. However, due to the heterogeneity of tumors, somatic mutation profiles are exceptionally sparse whereas other types of genomic data such as miRNA or gene expression contain much more complete data for all genomic features with quantitative values measured in each patient. To overcome the extreme sparseness of somatic mutation profiles and allow for the discovery of combinations of somatic mutations that may predict cancer clinical outcomes, here we propose a new approach for binning somatic mutations based on existing biological knowledge. Through the analysis using renal cell carcinoma dataset from The Cancer Genome Atlas (TCGA), we identified combinations of somatic mutation burden based on pathways, protein families, evolutionary conversed regions, and regulatory regions associated with survival. Due to the nature of heterogeneity in cancer, using a binning strategy for somatic mutation profiles based on biological knowledge will be valuable for improved prognostic biomarkers and potentially for tailoring therapeutic strategies by identifying combinations of driver mutations.


Assuntos
Carcinoma de Células Renais/genética , Neoplasias Renais/genética , Mutação , Biomarcadores Tumorais/genética , Carcinoma de Células Renais/mortalidade , Biologia Computacional , Bases de Dados Genéticas , Humanos , Neoplasias Renais/mortalidade , Modelos Genéticos , Redes Neurais de Computação , Prognóstico , Análise de Sobrevida
18.
Pac Symp Biocomput ; : 495-505, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25741542

RESUMO

Investigating the association between biobank derived genomic data and the information of linked electronic health records (EHRs) is an emerging area of research for dissecting the architecture of complex human traits, where cases and controls for study are defined through the use of electronic phenotyping algorithms deployed in large EHR systems. For our study, cataract cases and controls were identified within the Marshfield Personalized Medicine Research Project (PMRP) biobank and linked EHR, which is a member of the NHGRI-funded electronic Medical Records and Genomics (eMERGE) Network. Our goal was to explore potential gene-gene and gene-environment interactions within these data for 527,953 and 527,936 single nucleotide polymorphisms (SNPs) for gene-gene and gene-environment analyses, respectively, with minor allele frequency > 1%, in order to explore higher level associations with cataract risk beyond investigations of single SNP-phenotype associations. To build our SNP-SNP interaction models we utilized a prior-knowledge driven filtering method called Biofilter to minimize the multiple testing burden of exploring the vast array of interaction models possible from our extensive number of SNPs. Using Biofilter, we developed 57,376 prior-knowledge directed SNP-SNP models to test for association with cataract status. We selected models that required 6 sources of external domain knowledge. We identified 13 statistically significant SNP-SNP models with an interaction with p-value < 1 × 10(-4), as well as an overall model with p-value < 0.01 associated with cataract status. We also conducted gene-environment interaction analyses for all GWAS SNPs and a set of environmental factors from the PhenX Toolkit: smoking, UV exposure, and alcohol use;these environmental factors have been previously associated with the formation of cataracts. We found a total of 782 gene-environment models that exhibit an interaction with a p-value < 1 × 10(-4) associatedwith cataract status. Our results show these approaches enable advanced searches for epistasis and gene-environment interactions beyond GWAS, and that the EHR based approach provides an additional source of data for seeking these advanced explanatory models of the etiology of complex disease/outcome such as cataracts.


Assuntos
Catarata/genética , Algoritmos , Bancos de Espécimes Biológicos , Estudos de Casos e Controles , Biologia Computacional , Bases de Dados Genéticas , Registros Eletrônicos de Saúde , Epistasia Genética , Interação Gene-Ambiente , Estudo de Associação Genômica Ampla , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único , Software
19.
BioData Min ; 7: 20, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25214892

RESUMO

BACKGROUND: Effective cancer clinical outcome prediction for understanding of the mechanism of various types of cancer has been pursued using molecular-based data such as gene expression profiles, an approach that has promise for providing better diagnostics and supporting further therapies. However, clinical outcome prediction based on gene expression profiles varies between independent data sets. Further, single-gene expression outcome prediction is limited for cancer evaluation since genes do not act in isolation, but rather interact with other genes in complex signaling or regulatory networks. In addition, since pathways are more likely to co-operate together, it would be desirable to incorporate expert knowledge to combine pathways in a useful and informative manner. METHODS: Thus, we propose a novel approach for identifying knowledge-driven genomic interactions and applying it to discover models associated with cancer clinical phenotypes using grammatical evolution neural networks (GENN). In order to demonstrate the utility of the proposed approach, an ovarian cancer data from the Cancer Genome Atlas (TCGA) was used for predicting clinical stage as a pilot project. RESULTS: We identified knowledge-driven genomic interactions associated with cancer stage from single knowledge bases such as sources of pathway-pathway interaction, but also knowledge-driven genomic interactions across different sets of knowledge bases such as pathway-protein family interactions by integrating different types of information. Notably, an integration model from different sources of biological knowledge achieved 78.82% balanced accuracy and outperformed the top models with gene expression or single knowledge-based data types alone. Furthermore, the results from the models are more interpretable because they are framed in the context of specific biological pathways or other expert knowledge. CONCLUSIONS: The success of the pilot study we have presented herein will allow us to pursue further identification of models predictive of clinical cancer survival and recurrence. Understanding the underlying tumorigenesis and progression in ovarian cancer through the global view of interactions within/between different biological knowledge sources has the potential for providing more effective screening strategies and therapeutic targets for many types of cancer.

20.
Pac Symp Biocomput ; : 200-11, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24297547

RESUMO

Environment-wide association studies (EWAS) provide a way to uncover the environmental mechanisms involved in complex traits in a high-throughput manner. Genome-wide association studies have led to the discovery of genetic variants associated with many common diseases but do not take into account the environmental component of complex phenotypes. This EWAS assesses the comprehensive association between environmental variables and the outcome of type 2 diabetes (T2D) in the Marshfield Personalized Medicine Research Project Biobank (Marshfield PMRP). We sought replication in two National Health and Nutrition Examination Surveys (NHANES). The Marshfield PMRP currently uses four tools for measuring environmental exposures and outcome traits: 1) the PhenX Toolkit includes standardized exposure and phenotypic measures across several domains, 2) the Diet History Questionnaire (DHQ) is a food frequency questionnaire, 3) the Measurement of a Person's Habitual Physical Activity scores the level of an individual's physical activity, and 4) electronic health records (EHR) employs validated algorithms to establish T2D case-control status. Using PLATO software, 314 environmental variables were tested for association with T2D using logistic regression, adjusting for sex, age, and BMI in over 2,200 European Americans. When available, similar variables were tested with the same methods and adjustment in samples from NHANES III and NHANES 1999-2002. Twelve and 31 associations were identified in the Marshfield samples at p<0.01 and p<0.05, respectively. Seven and 13 measures replicated in at least one of the NHANES at p<0.01 and p<0.05, respectively, with the same direction of effect. The most significant environmental exposures associated with T2D status included decreased alcohol use as well as increased smoking exposure in childhood and adulthood. The results demonstrate the utility of the EWAS method and survey tools for identifying environmental components of complex diseases like type 2 diabetes. These high-throughput and comprehensive investigation methods can easily be applied to investigate the relation between environmental exposures and multiple phenotypes in future analyses.


Assuntos
Diabetes Mellitus Tipo 2/etiologia , Meio Ambiente , Bancos de Espécimes Biológicos , Biologia Computacional , Registros de Dieta , Exposição Ambiental , Feminino , Interação Gene-Ambiente , Humanos , Masculino , Atividade Motora , Inquéritos Nutricionais , Fenótipo , Medicina de Precisão , Software , Wisconsin
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa