Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
BMC Plant Biol ; 20(1): 120, 2020 Mar 18.
Artigo em Inglês | MEDLINE | ID: mdl-32183694

RESUMO

BACKGROUND: Potato is the third most consumed crop in the world. Breeding for traits such as yield, product quality and pathogen resistance are main priorities. Identifying molecular signatures of these and other important traits is important in future breeding efforts. In this study, a progeny population from a cross between a breeding line, SW93-1015, and a cultivar, Désirée, was studied by trait analysis and RNA-seq in order to develop understanding of segregating traits at the molecular level and identify transcripts with expressional correlation to these traits. Transcript markers with predictive value for field performance applicable under controlled environments would be of great value for plant breeding. RESULTS: A total of 34 progeny lines from SW93-1015 and Désirée were phenotyped for 17 different traits in a field in Nordic climate conditions and controlled climate settings. A master transcriptome was constructed with all 34 progeny lines and the parents through a de novo assembly of RNA-seq reads. Gene expression data obtained in a controlled environment from the 34 lines was correlated to traits by different similarity indices, including Pearson and Spearman, as well as DUO, which calculates the co-occurrence between high and low values for gene expression and trait. Our study linked transcripts to traits such as yield, growth rate, high laying tubers, late and tuber blight, tuber greening and early flowering. We found several transcripts associated to late blight resistance and transcripts encoding receptors were associated to Dickeya solani susceptibility. Transcript levels of a UBX-domain protein was negatively associated to yield and a GLABRA2 expression modulator was negatively associated to growth rate. CONCLUSION: In our study, we identify 100's of transcripts, putatively linked based on expression with 17 traits of potato, representing both well-known and novel associations. This approach can be used to link the transcriptome to traits. We explore the possibility of associating the level of transcript expression from controlled, optimal environments to traits in a progeny population with different methods introducing the application of DUO for the first time on transcriptome data. We verify the expression pattern for five of the putative transcript markers in another progeny population.


Assuntos
Características de História de Vida , Fenótipo , Solanum tuberosum/genética , Transcriptoma , Tetraploidia
2.
Genet Epidemiol ; 38(7): 610-21, 2014 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-25168954

RESUMO

Complex diseases are often associated with sets of multiple interacting genetic factors and possibly with unique sets of the genetic factors in different groups of individuals (genetic heterogeneity). We introduce a novel concept of custom correlation coefficient (CCC) between single nucleotide polymorphisms (SNPs) that address genetic heterogeneity by measuring subset correlations autonomously. It is used to develop a 3-step process to identify candidate multi-SNP patterns: (1) pairwise (SNP-SNP) correlations are computed using CCC; (2) clusters of so-correlated SNPs identified; and (3) frequencies of these clusters in disease cases and controls compared to identify disease-associated multi-SNP patterns. This method identified 42 candidate multi-SNP associations with hypertensive heart disease (HHD), among which one cluster of 22 SNPs (six genes) included 13 in SLC8A1 (aka NCX1, an essential component of cardiac excitation-contraction coupling) and another of 32 SNPs had 29 from a different segment of SLC8A1. While allele frequencies show little difference between cases and controls, the cluster of 22 associated alleles were found in 20% of controls but no cases and the other in 3% of controls but 20% of cases. These suggest that both protective and risk effects on HHD could be exerted by combinations of variants in different regions of SLC8A1, modified by variants from other genes. The results demonstrate that this new correlation metric identifies disease-associated multi-SNP patterns overlooked by commonly used correlation measures. Furthermore, computation time using CCC is a small fraction of that required by other methods, thereby enabling the analyses of large GWAS datasets.


Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Algoritmos , Estudos de Casos e Controles , Análise por Conglomerados , Simulação por Computador , Epistasia Genética , Frequência do Gene , Predisposição Genética para Doença , Genoma Humano , Genótipo , Cardiopatias/genética , Humanos , Modelos Genéticos
3.
PLoS Comput Biol ; 10(9): e1003766, 2014 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-25233071

RESUMO

Hundreds of genetic markers have shown associations with various complex diseases, yet the "missing heritability" remains alarmingly elusive. Combinatorial interactions may account for a substantial portion of this missing heritability, but their discoveries have been impeded by computational complexity and genetic heterogeneity. We present BlocBuster, a novel systems-level approach that efficiently constructs genome-wide, allele-specific networks that accurately segregate homogenous combinations of genetic factors, tests the associations of these combinations with the given phenotype, and rigorously validates the results using a series of unbiased validation methods. BlocBuster employs a correlation measure that is customized for single nucleotide polymorphisms and returns a multi-faceted collection of values that captures genetic heterogeneity. We applied BlocBuster to analyze psoriasis, discovering a combinatorial pattern with an odds ratio of 3.64 and Bonferroni-corrected p-value of 5.01×10(-16). This pattern was replicated in independent data, reflecting robustness of the method. In addition to improving prediction of disease susceptibility and broadening our understanding of the pathogenesis underlying psoriasis, these results demonstrate BlocBuster's potential for discovering combinatorial genetic associations within heterogeneous genome-wide data, thereby transcending the limiting "small effects" produced by individual markers examined in isolation.


Assuntos
Marcadores Genéticos/genética , Estudo de Associação Genômica Ampla/métodos , Psoríase/genética , Alelos , Biologia Computacional , Humanos , Polimorfismo de Nucleotídeo Único/genética
4.
Front Comput Neurosci ; 18: 1388504, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39309755

RESUMO

Late-onset Alzheimer disease (AD) is a highly complex disease with multiple subtypes, as demonstrated by its disparate risk factors, pathological manifestations, and clinical traits. Discovery of biomarkers to diagnose specific AD subtypes is a key step towards understanding biological mechanisms underlying this enigmatic disease, generating candidate drug targets, and selecting participants for drug trials. Popular statistical methods for evaluating candidate biomarkers, fold change (FC) and area under the receiver operating characteristic curve (AUC), were designed for homogeneous data and we demonstrate the inherent weaknesses of these approaches when used to evaluate subtypes representing less than half of the diseased cases. We introduce a unique evaluation metric that is based on the distribution of the values, rather than the magnitude of the values, to identify analytes that are associated with a subset of the diseased cases, thereby revealing potential biomarkers for subtypes. Our approach, Bimodality Coefficient Difference (BCD), computes the difference between the degrees of bimodality for the cases and controls. We demonstrate the effectiveness of our approach with large-scale synthetic data trials containing nearly perfect subtypes. In order to reveal novel AD biomarkers for heterogeneous subtypes, we applied BCD to gene expression data for 8,650 genes for 176 AD cases and 187 controls. Our results confirm the utility of BCD for identifying subtypes of heterogeneous diseases.

5.
HGG Adv ; 4(1): 100150, 2023 01 12.
Artigo em Inglês | MEDLINE | ID: mdl-36340933

RESUMO

The heritability of autism spectrum disorder (ASD), based on 680,000 families and five countries, is estimated to be nearly 80%, yet heritability reported from SNP-based studies are consistently lower, and few significant loci have been identified with genome-wide association studies. This gap in genomic information may reside in rare variants, interaction among variants (epistasis), or cryptic structural variation (SV) and may provide mechanisms that underlie ASD. Here we use a method to identify potential SVs based on non-Mendelian inheritance patterns in pedigrees using parent-child genotypes from ASD families and demonstrate that they are enriched in ASD-risk genes. Most are in non-coding genic space and are over-represented in expression quantitative trait loci, suggesting that they affect gene regulation, which we confirm with their overlap of differentially expressed genes in postmortem brain tissue of ASD individuals. We then identify an SV in the GRIK2 gene that alters RNA splicing and a regulatory region of the ACMSD gene in the kynurenine pathway as significantly associated with a non-verbal ASD phenotype, supporting our hypothesis that these currently excluded loci can provide a clearer mechanistic understanding of ASD. Finally, we use an explainable artificial intelligence approach to define subgroups demonstrating their use in the context of precision medicine.


Assuntos
Transtorno do Espectro Autista , Humanos , Transtorno do Espectro Autista/genética , Estudo de Associação Genômica Ampla/métodos , Inteligência Artificial , Locos de Características Quantitativas/genética , Padrões de Herança/genética
6.
iScience ; 26(4): 106408, 2023 Apr 21.
Artigo em Inglês | MEDLINE | ID: mdl-36974157

RESUMO

Identification of proteins dysregulated by COVID-19 infection is critically important for better understanding of its pathophysiology, building prognostic models, and identifying new targets. Plasma proteomic profiling of 4,301 proteins was performed in two independent datasets and tested for the association for three COVID-19 outcomes (infection, ventilation, and death). We identified 1,449 proteins consistently associated in both datasets with any of these three outcomes. We subsequently created highly accurate models that distinctively predict infection, ventilation, and death. These proteins were enriched in specific biological processes including cytokine signaling, Alzheimer's disease, and coronary artery disease. Mendelian randomization and gene network analyses identified eight causal proteins and 141 highly connected hub proteins including 35 with known drug targets. Our findings provide distinctive prognostic biomarkers for two severe COVID-19 outcomes, reveal their relationship to Alzheimer's disease and coronary artery disease, and identify potential therapeutic targets for COVID-19 outcomes.

7.
medRxiv ; 2022 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-35923315

RESUMO

Identification of the plasma proteomic changes of Coronavirus disease 2019 (COVID-19) is essential to understanding the pathophysiology of the disease and developing predictive models and novel therapeutics. We performed plasma deep proteomic profiling from 332 COVID-19 patients and 150 controls and pursued replication in an independent cohort (297 cases and 76 controls) to find potential biomarkers and causal proteins for three COVID-19 outcomes (infection, ventilation, and death). We identified and replicated 1,449 proteins associated with any of the three outcomes (841 for infection, 833 for ventilation, and 253 for death) that can be query on a web portal ( https://covid.proteomics.wustl.edu/ ). Using those proteins and machine learning approached we created and validated specific prediction models for ventilation (AUC>0.91), death (AUC>0.95) and either outcome (AUC>0.80). These proteins were also enriched in specific biological processes, including immune and cytokine signaling (FDR ≤ 3.72×10 -14 ), Alzheimer's disease (FDR ≤ 5.46×10 -10 ) and coronary artery disease (FDR ≤ 4.64×10 -2 ). Mendelian randomization using pQTL as instrumental variants nominated BCAT2 and GOLM1 as a causal proteins for COVID-19. Causal gene network analyses identified 141 highly connected key proteins, of which 35 have known drug targets with FDA-approved compounds. Our findings provide distinctive prognostic biomarkers for two severe COVID-19 outcomes (ventilation and death), reveal their relationship to Alzheimer's disease and coronary artery disease, and identify potential therapeutic targets for COVID-19 outcomes.

8.
Patterns (N Y) ; 2(5): 100260, 2021 May 14.
Artigo em Inglês | MEDLINE | ID: mdl-33880455

RESUMO

The conundrums of choosing candidate genes, via differential expression between treated and mock specimens, are tackled by Ghandikota et al. in this issue of Patterns in their efforts to tease out genetic patterns that are characteristic of coronavirus disease 2019 (COVID-19) outcomes.

9.
Patterns (N Y) ; 2(12): 100374, 2021 Dec 10.
Artigo em Inglês | MEDLINE | ID: mdl-34950902

RESUMO

Network modeling transforms data into a structure of nodes and edges such that edges represent relationships between pairs of objects, then extracts clusters of densely connected nodes in order to capture high-dimensional relationships hidden in the data. This efficient and flexible strategy holds potential for unveiling complex patterns concealed within massive datasets, but standard implementations overlook several key issues that can undermine research efforts. These issues range from data imputation and discretization to correlation metrics, clustering methods, and validation of results. Here, we enumerate these pitfalls and provide practical strategies for alleviating their negative effects. These guidelines increase prospects for future research endeavors as they reduce type I and type II (false-positive and false-negative) errors and are generally applicable for network modeling applications across diverse domains.

10.
Bioinformatics ; 25(1): 68-74, 2009 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-18987010

RESUMO

MOTIVATION: Inference of haplotypes from genotype data is crucial and challenging for many vitally important studies. The first, and most critical step, is the ascertainment of a biologically sound model to be optimized. Many models that have been proposed rely partially or entirely on reducing the number of unique haplotypes in the solution. RESULTS: This article examines the parsimony of haplotypes using known haplotypes as well as genotypes from the HapMap project. Our study reveals that there are relatively few unique haplotypes, but not always the least possible, for the datasets with known solutions. Furthermore, we show that there are frequently very large numbers of parsimonious solutions, and the number increases exponentially with increasing cardinality. Moreover, these solutions are quite varied, most of which are not consistent with the true solutions. These results quantify the limitations of the Pure Parsimony model and demonstrate the imperative need to consider additional properties for haplotype inference models. At a higher level, and with broad applicability, this article illustrates the power of combinatorial methods to tease out imperfections in a given biological model.


Assuntos
Haplótipos , Natureza , Bases de Dados Genéticas , Heterozigoto , Humanos
11.
Methods Mol Biol ; 2096: 197-215, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32720156

RESUMO

We demonstrate a selection of network and machine learning techniques useful in the analysis of complex datasets, including 2-way similarity networks, Markov clustering, enrichment statistical networks, FCROS differential analysis, and random forests. We demonstrate each of these techniques on the Populus trichocarpa gene expression atlas.


Assuntos
Bases de Dados como Assunto , Redes Reguladoras de Genes , Populus/genética , Algoritmos , Análise por Conglomerados , Regulação da Expressão Gênica de Plantas , Software
12.
G3 (Bethesda) ; 6(5): 1251-66, 2016 05 03.
Artigo em Inglês | MEDLINE | ID: mdl-26921301

RESUMO

The well-documented latitudinal clines of genes affecting human skin color presumably arise from the need for protection from intense ultraviolet radiation (UVR) vs. the need to use UVR for vitamin D synthesis. Sampling 751 subjects from a broad range of latitudes and skin colors, we investigated possible multilocus correlated adaptation of skin color genes with the vitamin D receptor gene (VDR), using a vector correlation metric and network method called BlocBuster. We discovered two multilocus networks involving VDR promoter and skin color genes that display strong latitudinal clines as multilocus networks, even though many of their single gene components do not. Considered one by one, the VDR components of these networks show diverse patterns: no cline, a weak declining latitudinal cline outside of Africa, and a strong in- vs. out-of-Africa frequency pattern. We confirmed these results with independent data from HapMap. Standard linkage disequilibrium analyses did not detect these networks. We applied BlocBuster across the entire genome, showing that our networks are significant outliers for interchromosomal disequilibrium that overlap with environmental variation relevant to the genes' functions. These results suggest that these multilocus correlations most likely arose from a combination of parallel selective responses to a common environmental variable and coadaptation, given the known Mendelian epistasis among VDR and the skin color genes.


Assuntos
Altitude , Interação Gene-Ambiente , Receptores de Calcitriol/genética , Pigmentação da Pele/genética , Adaptação Biológica/genética , Alelos , Biologia Computacional/métodos , Epistasia Genética , Frequência do Gene , Redes Reguladoras de Genes , Ligação Genética , Genoma Humano , Genômica/métodos , Genótipo , Humanos , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único
13.
Front Genet ; 6: 301, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26500678

RESUMO

The substantial progress in the last few years toward uncovering genetic causes and risk factors for autism spectrum disorders (ASDs) has opened new experimental avenues for identifying the underlying neurobiological mechanism of the condition. The bounty of genetic findings has led to a variety of data-driven exploratory analyses aimed at deriving new insights about the shared features of these genes. These approaches leverage data from a variety of different sources such as co-expression in transcriptomic studies, protein-protein interaction networks, gene ontologies (GOs) annotations, or multi-level combinations of all of these. Here, we review the recurrent themes emerging from these analyses and highlight some of the challenges going forward. Themes include findings that ASD associated genes discovered by a variety of methods have been shown to contain disproportionate amounts of neurite outgrowth/cytoskeletal, synaptic, and more recently Wnt-related and chromatin modifying genes. Expression studies have highlighted a disproportionate expression of ASD gene sets during mid fetal cortical development, particularly for rare variants, with multiple analyses highlighting the striatum and cortical projection and interneurons as well. While these explorations have highlighted potentially interesting relationships among these ASD-related genes, there are challenges in how to best transition these insights into empirically testable hypotheses. Nonetheless, defining shared molecular or cellular pathology downstream of the diverse genes associated with ASDs could provide the cornerstones needed to build toward broadly applicable therapeutic approaches.

14.
Nat Commun ; 6: 6534, 2015 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-25813846

RESUMO

Gephyrin is a highly conserved gene that is vital for the organization of proteins at inhibitory receptors, molybdenum cofactor biosynthesis and other diverse functions. Its specific function is intricately regulated and its aberrant activities have been observed for a number of human diseases. Here we report a remarkable yin-yang haplotype pattern encompassing gephyrin. Yin-yang haplotypes arise when a stretch of DNA evolves to present two disparate forms that bear differing states for nucleotide variations along their lengths. The gephyrin yin-yang pair consists of 284 divergent nucleotide states and both variants vary drastically from their mutual ancestral haplotype, suggesting rapid evolution. Several independent lines of evidence indicate strong positive selection on the region and suggest these high-frequency haplotypes represent two distinct functional mechanisms. This discovery holds potential to deepen our understanding of variable human-specific regulation of gephyrin while providing clues for rapid evolutionary events and allelic migrations buried within human history.


Assuntos
Proteínas de Transporte/genética , Evolução Molecular , Haplótipos/genética , Proteínas de Membrana/genética , Sequência de Bases , Bases de Dados Genéticas , Humanos
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa