Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
1.
Nucleic Acids Res ; 47(13): e76, 2019 07 26.
Artigo em Inglês | MEDLINE | ID: mdl-31329928

RESUMO

Existing large gene expression data repositories hold enormous potential to elucidate disease mechanisms, characterize changes in cellular pathways, and to stratify patients based on molecular profiles. To achieve this goal, integrative resources and tools are needed that allow comparison of results across datasets and data types. We propose an intuitive approach for data-driven stratifications of molecular profiles and benchmark our methodology using the dimensionality reduction algorithm t-distributed stochastic neighbor embedding (t-SNE) with multi-study and multi-platform data on hematological malignancies. Our approach enables assessing the contribution of biological versus technical variation to sample clustering, direct incorporation of additional datasets to the same low dimensional representation, comparison of molecular disease subtypes identified from separate t-SNE representations, and characterization of the obtained clusters based on pathway databases and additional data. In this manner, we performed an integrative analysis across multi-omics acute myeloid leukemia studies. Our approach indicated new molecular subtypes with differential survival and drug responsiveness among samples lacking fusion genes, including a novel myelodysplastic syndrome-like cluster and a cluster characterized with CEBPA mutations and differential activity of the S-adenosylmethionine-dependent DNA methylation pathway. In summary, integration across multiple studies can help to identify novel molecular disease subtypes and generate insight into disease biology.


Assuntos
Análise por Conglomerados , Biologia Computacional/métodos , Mineração de Dados/métodos , Conjuntos de Dados como Assunto , Perfilação da Expressão Gênica/métodos , Regulação Leucêmica da Expressão Gênica , Leucemia Mieloide Aguda/genética , Fenótipo , Algoritmos , Bases de Dados Genéticas , Genes Neoplásicos , Humanos , Leucemia Mieloide Aguda/classificação , Mutação , Tamanho da Amostra
2.
Gut ; 69(8): 1416-1422, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-31744911

RESUMO

OBJECTIVE: Higher gluten intake, frequent gastrointestinal infections and adenovirus, enterovirus, rotavirus and reovirus have been proposed as environmental triggers for coeliac disease. However, it is not known whether an interaction exists between the ingested gluten amount and viral exposures in the development of coeliac disease. This study investigated whether distinct viral exposures alone or together with gluten increase the risk of coeliac disease autoimmunity (CDA) in genetically predisposed children. DESIGN: The Environmental Determinants of Diabetes in the Young study prospectively followed children carrying the HLA risk haplotypes DQ2 and/or DQ8 and constructed a nested case-control design. From this design, 83 CDA case-control pairs were identified. Median age of CDA was 31 months. Stool samples collected monthly up to the age of 2 years were analysed for virome composition by Illumina next-generation sequencing followed by comprehensive computational virus profiling. RESULTS: The cumulative number of stool enteroviral exposures between 1 and 2 years of age was associated with an increased risk for CDA. In addition, there was a significant interaction between cumulative stool enteroviral exposures and gluten consumption. The risk conferred by stool enteroviruses was increased in cases reporting higher gluten intake. CONCLUSIONS: Frequent exposure to enterovirus between 1 and 2 years of age was associated with increased risk of CDA. The increased risk conferred by the interaction between enteroviruses and higher gluten intake indicate a cumulative effect of these factors in the development of CDA.


Assuntos
Doenças Autoimunes/etiologia , Doença Celíaca/etiologia , Enterovirus/isolamento & purificação , Fezes/virologia , Glutens/administração & dosagem , Adenoviridae/isolamento & purificação , Autoanticorpos/sangue , Doenças Autoimunes/sangue , Doenças Autoimunes/genética , Autoimunidade , Estudos de Casos e Controles , Doença Celíaca/sangue , Doença Celíaca/genética , Pré-Escolar , Dieta , Feminino , Proteínas de Ligação ao GTP/imunologia , Predisposição Genética para Doença , Antígenos HLA-DQ/genética , Humanos , Lactente , Masculino , Metagenômica , Proteína 2 Glutamina gama-Glutamiltransferase , Fatores de Risco , Transglutaminases/imunologia
3.
BMC Genomics ; 18(1): 378, 2017 05 15.
Artigo em Inglês | MEDLINE | ID: mdl-28506246

RESUMO

BACKGROUND: Next generation sequencing (NGS) technology allows laboratories to investigate virome composition in clinical and environmental samples in a culture-independent way. There is a need for bioinformatic tools capable of parallel processing of virome sequencing data by exactly identical methods: this is especially important in studies of multifactorial diseases, or in parallel comparison of laboratory protocols. RESULTS: We have developed a web-based application allowing direct upload of sequences from multiple virome samples using custom parameters. The samples are then processed in parallel using an identical protocol, and can be easily reanalyzed. The pipeline performs de-novo assembly, taxonomic classification of viruses as well as sample analyses based on user-defined grouping categories. Tables of virus abundance are produced from cross-validation by remapping the sequencing reads to a union of all observed reference viruses. In addition, read sets and reports are created after processing unmapped reads against known human and bacterial ribosome references. Secured interactive results are dynamically plotted with population and diversity charts, clustered heatmaps and a sortable and searchable abundance table. CONCLUSIONS: The Vipie web application is a unique tool for multi-sample metagenomic analysis of viral data, producing searchable hits tables, interactive population maps, alpha diversity measures and clustered heatmaps that are grouped in applicable custom sample categories. Known references such as human genome and bacterial ribosomal genes are optionally removed from unmapped ('dark matter') reads. Secured results are accessible and shareable on modern browsers. Vipie is a freely available web-based tool whose code is open source.


Assuntos
Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Internet , Software , Vírus/genética , Variação Genética , Humanos , Microbiota/genética
4.
Pediatr Diabetes ; 18(7): 588-598, 2017 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-27860030

RESUMO

BACKGROUND: We set out to explore associations between the stool bacteriome profiles and early-onset islet autoimmunity, taking into account the interactions with the virus component of the microbiome. METHODS: Serial stool samples were longitudinally collected from 18 infants and toddlers with early-onset islet autoimmunity (median age 17.4 months) followed by type 1 diabetes, and 18 tightly matched controls from the Finnish Diabetes Prediction and Prevention (DIPP) cohort. Three stool samples were analyzed, taken 3, 6, and 9 months before the first detection of serum autoantibodies in the case child. The risk of islet autoimmunity was evaluated in relation to the composition of the bacteriome 16S rDNA profiles assessed by mass sequencing, and to the composition of DNA and RNA viromes. RESULTS: Four operational taxonomic units were significantly less abundant in children who later on developed islet autoimmunity as compared to controls-most markedly the species of Bacteroides vulgatus and Bifidobacterium bifidum. The alpha or beta diversity, or the taxonomic levels of bacterial phyla, classes or genera, showed no differences between cases and controls. A correlation analysis suggested a possible relation between CrAssphage signals and quantities of Bacteroides dorei. No apparent associations were seen between development of islet autoimmunity and sequences of yet unknown origin. CONCLUSIONS: The results confirm previous findings that an imbalance within the prevalent Bacteroides genus is associated with islet autoimmunity. The detected quantitative relation of the novel "orphan" bacteriophage CrAssphage with a prevalent species of the Bacteroides genus may exemplify possible modifiers of the bacteriome.


Assuntos
Doenças Autoimunes/etiologia , Autoimunidade , Bacteriófagos/imunologia , Bacteroides/imunologia , Diabetes Mellitus Tipo 1/etiologia , Disbiose/fisiopatologia , Microbioma Gastrointestinal/imunologia , Doenças Autoimunes/sangue , Doenças Autoimunes/epidemiologia , Doenças Autoimunes/imunologia , Bacteriófagos/classificação , Bacteriófagos/isolamento & purificação , Bacteroides/classificação , Bacteroides/isolamento & purificação , Bacteroides/virologia , Estudos de Casos e Controles , Criança , Estudos de Coortes , Biologia Computacional , Diabetes Mellitus Tipo 1/sangue , Diabetes Mellitus Tipo 1/epidemiologia , Diabetes Mellitus Tipo 1/imunologia , Disbiose/imunologia , Disbiose/microbiologia , Disbiose/virologia , Fezes/microbiologia , Fezes/virologia , Feminino , Finlândia/epidemiologia , Hospitais Universitários , Humanos , Ilhotas Pancreáticas/imunologia , Estudos Longitudinais , Masculino , Tipagem Molecular , Filogenia , Estudos Prospectivos , RNA Bacteriano/química , RNA Bacteriano/metabolismo , RNA Ribossômico 16S/química , RNA Ribossômico 16S/metabolismo , RNA Viral/química , RNA Viral/metabolismo , Risco
5.
Nat Methods ; 10(7): 671-5, 2013 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-23666411

RESUMO

Tetrad analysis has been a gold-standard genetic technique for several decades. Unfortunately, the need to manually isolate, disrupt and space tetrads has relegated its application to small-scale studies and limited its integration with high-throughput DNA sequencing technologies. We have developed a rapid, high-throughput method, called barcode-enabled sequencing of tetrads (BEST), that uses (i) a meiosis-specific GFP fusion protein to isolate tetrads by FACS and (ii) molecular barcodes that are read during genotyping to identify spores derived from the same tetrad. Maintaining tetrad information allows accurate inference of missing genetic markers and full genotypes of missing (and presumably nonviable) individuals. An individual researcher was able to isolate over 3,000 yeast tetrads in 3 h, an output equivalent to that of almost 1 month of manual dissection. BEST is transferable to other microorganisms for which meiotic mapping is significantly more laborious.


Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , DNA Fúngico/genética , Marcadores Genéticos/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Meiose/genética , Saccharomyces cerevisiae/genética
6.
Nucleic Acids Res ; 42(3): 1474-96, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24198249

RESUMO

Metabolic diseases and comorbidities represent an ever-growing epidemic where multiple cell types impact tissue homeostasis. Here, the link between the metabolic and gene regulatory networks was studied through experimental and computational analysis. Integrating gene regulation data with a human metabolic network prompted the establishment of an open-sourced web portal, IDARE (Integrated Data Nodes of Regulation), for visualizing various gene-related data in context of metabolic pathways. Motivated by increasing availability of deep sequencing studies, we obtained ChIP-seq data from widely studied human umbilical vein endothelial cells. Interestingly, we found that association of metabolic genes with multiple transcription factors (TFs) enriched disease-associated genes. To demonstrate further extensions enabled by examining these networks together, constraint-based modeling was applied to data from human preadipocyte differentiation. In parallel, data on gene expression, genome-wide ChIP-seq profiles for peroxisome proliferator-activated receptor (PPAR) γ, CCAAT/enhancer binding protein (CEBP) α, liver X receptor (LXR) and H3K4me3 and microRNA target identification for miR-27a, miR-29a and miR-222 were collected. Disease-relevant key nodes, including mitochondrial glycerol-3-phosphate acyltransferase (GPAM), were exposed from metabolic pathways predicted to change activity by focusing on association with multiple regulators. In both cell types, our analysis reveals the convergence of microRNAs and TFs within the branched chain amino acid (BCAA) metabolic pathway, possibly providing an explanation for its downregulation in obese and diabetic conditions.


Assuntos
Doença/genética , Regulação da Expressão Gênica , Redes e Vias Metabólicas/genética , Adipócitos/citologia , Adipócitos/metabolismo , Diferenciação Celular , Linhagem Celular , Cromatina/genética , Perfilação da Expressão Gênica , Células Endoteliais da Veia Umbilical Humana/metabolismo , Humanos , MicroRNAs/metabolismo , Fatores de Transcrição/metabolismo , Transcrição Gênica
7.
BMC Genomics ; 15: 1154, 2014 Dec 20.
Artigo em Inglês | MEDLINE | ID: mdl-25528190

RESUMO

BACKGROUND: The human neuroblastoma cell line, SH-SY5Y, is a commonly used cell line in studies related to neurotoxicity, oxidative stress, and neurodegenerative diseases. Although this cell line is often used as a cellular model for Parkinson's disease, the relevance of this cellular model in the context of Parkinson's disease (PD) and other neurodegenerative diseases has not yet been systematically evaluated. RESULTS: We have used a systems genomics approach to characterize the SH-SY5Y cell line using whole-genome sequencing to determine the genetic content of the cell line and used transcriptomics and proteomics data to determine molecular correlations. Further, we integrated genomic variants using a network analysis approach to evaluate the suitability of the SH-SY5Y cell line for perturbation experiments in the context of neurodegenerative diseases, including PD. CONCLUSIONS: The systems genomics approach showed consistency across different biological levels (DNA, RNA and protein concentrations). Most of the genes belonging to the major Parkinson's disease pathways and modules were intact in the SH-SY5Y genome. Specifically, each analysed gene related to PD has at least one intact copy in SH-SY5Y. The disease-specific network analysis approach ranked the genetic integrity of SH-SY5Y as higher for PD than for Alzheimer's disease but lower than for Huntington's disease and Amyotrophic Lateral Sclerosis for loss of function perturbation experiments.


Assuntos
Genômica , Neuroblastoma/patologia , Doença de Parkinson/genética , Linhagem Celular Tumoral , Variações do Número de Cópias de DNA , Elementos de DNA Transponíveis/genética , Perfilação da Expressão Gênica , Variação Genética , Humanos , Mutação INDEL , Proteômica
8.
BMC Genomics ; 14: 918, 2013 Dec 24.
Artigo em Inglês | MEDLINE | ID: mdl-24365393

RESUMO

BACKGROUND: Systems biology experiments studying different topics and organisms produce thousands of data values across different types of genomic data. Further, data mining analyses are yielding ranked and heterogeneous results and association networks distributed over the entire genome. The visualization of these results is often difficult and standalone web tools allowing for custom inputs and dynamic filtering are limited. RESULTS: We have developed POMO (http://pomo.cs.tut.fi), an interactive web-based application to visually explore omics data analysis results and associations in circular, network and grid views. The circular graph represents the chromosome lengths as perimeter segments, as a reference outer ring, such as cytoband for human. The inner arcs between nodes represent the uploaded network. Further, multiple annotation rings, for example depiction of gene copy number changes, can be uploaded as text files and represented as bar, histogram or heatmap rings. POMO has built-in references for human, mouse, nematode, fly, yeast, zebrafish, rice, tomato, Arabidopsis, and Escherichia coli. In addition, POMO provides custom options that allow integrated plotting of unsupported strains or closely related species associations, such as human and mouse orthologs or two yeast wild types, studied together within a single analysis. The web application also supports interactive label and weight filtering. Every iterative filtered result in POMO can be exported as image file and text file for sharing or direct future input. CONCLUSIONS: The POMO web application is a unique tool for omics data analysis, which can be used to visualize and filter the genome-wide networks in the context of chromosomal locations as well as multiple network layouts. With the several illustration and filtering options the tool supports the analysis and visualization of any heterogeneous omics data analysis association results for many organisms. POMO is freely available and does not require any installation or registration.


Assuntos
Biologia Computacional/métodos , Genômica/métodos , Software , Biologia de Sistemas , Internet
9.
JACC Basic Transl Sci ; 8(12): 1489-1499, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38205343

RESUMO

There are several established biomarkers for coronary heart disease (CHD), including blood pressure, cholesterol, and lipoproteins. It is of high interest to determine how a combined polygenic risk score (PRS) of CHD-associated biomarkers (BioPRS) can further improve genetic prediction of CHD. We developed CHDBioPRS, combining BioPRS with PRS of CHD in the UK Biobank and tested it on FinnGen. We found that BioPRS was clearly predictive of CHD and that CHDBioPRS improved the standard CHD PRS. The largest effect was observed with early onset cases in FinnGen, with HRs above 2 per standard deviation of CHDBioPRS.

10.
Aging Cell ; 22(8): e13868, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37184129

RESUMO

Identifying metabolic biomarkers of frailty, an age-related state of physiological decline, is important for understanding its metabolic underpinnings and developing preventive strategies. Here, we systematically examined 168 nuclear magnetic resonance-based metabolomic biomarkers and 32 clinical biomarkers for their associations with frailty. In up to 90,573 UK Biobank participants, we identified 59 biomarkers robustly and independently associated with the frailty index (FI). Of these, 34 associations were replicated in the Swedish TwinGene study (n = 11,025) and the Finnish Health 2000 Survey (n = 6073). Using two-sample Mendelian randomization, we showed that the genetically predicted level of glycoprotein acetyls, an inflammatory marker, was statistically significantly associated with an increased FI (ß per SD increase = 0.37%, 95% confidence interval: 0.12-0.61). Creatinine and several lipoprotein lipids were also associated with increased FI, yet their effects were mostly driven by kidney and cardiometabolic diseases, respectively. Our findings provide new insights into the causal effects of metabolites on frailty and highlight the role of chronic inflammation underlying frailty development.


Assuntos
Fragilidade , Análise da Randomização Mendeliana , Humanos , Biomarcadores , Fragilidade/genética , Estudo de Associação Genômica Ampla , Espectroscopia de Ressonância Magnética , Metabolômica , Polimorfismo de Nucleotídeo Único
11.
Nat Commun ; 14(1): 7630, 2023 Nov 22.
Artigo em Inglês | MEDLINE | ID: mdl-37993433

RESUMO

Although the genetic basis and pathogenesis of type 1 diabetes have been studied extensively, how host responses to environmental factors might contribute to autoantibody development remains largely unknown. Here, we use longitudinal blood transcriptome sequencing data to characterize host responses in children within 12 months prior to the appearance of type 1 diabetes-linked islet autoantibodies, as well as matched control children. We report that children who present with insulin-specific autoantibodies first have distinct transcriptional profiles from those who develop GADA autoantibodies first. In particular, gene dosage-driven expression of GSTM1 is associated with GADA autoantibody positivity. Moreover, compared with controls, we observe increased monocyte and decreased B cell proportions 9-12 months prior to autoantibody positivity, especially in children who developed antibodies against insulin first. Lastly, we show that control children present transcriptional signatures consistent with robust immune responses to enterovirus infection, whereas children who later developed islet autoimmunity do not. These findings highlight distinct immune-related transcriptomic differences between case and control children prior to case progression to islet autoimmunity and uncover deficient antiviral response in children who later develop islet autoimmunity.


Assuntos
Diabetes Mellitus Tipo 1 , Infecções por Enterovirus , Ilhotas Pancreáticas , Humanos , Criança , Autoanticorpos , Transcriptoma , Autoimunidade/genética , Insulina/metabolismo , Infecções por Enterovirus/genética , Ilhotas Pancreáticas/metabolismo
12.
BMC Bioinformatics ; 12: 411, 2011 Oct 24.
Artigo em Inglês | MEDLINE | ID: mdl-22024252

RESUMO

BACKGROUND: In computational biology, permutation tests have become a widely used tool to assess the statistical significance of an event under investigation. However, the common way of computing the P-value, which expresses the statistical significance, requires a very large number of permutations when small (and thus interesting) P-values are to be accurately estimated. This is computationally expensive and often infeasible. Recently, we proposed an alternative estimator, which requires far fewer permutations compared to the standard empirical approach while still reliably estimating small P-values. RESULTS: The proposed P-value estimator has been enriched with additional functionalities and is made available to the general community through a public website and web service, called EPEPT. This means that the EPEPT routines can be accessed not only via a website, but also programmatically using any programming language that can interact with the web. Examples of web service clients in multiple programming languages can be downloaded. Additionally, EPEPT accepts data of various common experiment types used in computational biology. For these experiment types EPEPT first computes the permutation values and then performs the P-value estimation. Finally, the source code of EPEPT can be downloaded. CONCLUSIONS: Different types of users, such as biologists, bioinformaticians and software engineers, can use the method in an appropriate and simple way.


Assuntos
Biologia Computacional/métodos , Software , Humanos , Internet , Linguagens de Programação , Análise de Regressão
13.
Metabolites ; 11(5)2021 May 06.
Artigo em Inglês | MEDLINE | ID: mdl-34066448

RESUMO

Visual integration of experimental data in metabolic networks is an important step to understanding their meaning. As genome-scale metabolic networks reach several thousand reactions, the task becomes more difficult and less revealing. While databases like KEGG and BioCyc provide curated pathways that allow a navigation of the metabolic landscape of an organism, it is rather laborious to map data directly onto those pathways. There are programs available using these kind of databases as a source for visualization; however, these programs are then restricted to the pathways available in the database. Here, we present IDARE2 a cytoscape plugin that allows the visualization of multiomics data in cytoscape in a user-friendly way. It further provides tools to disentangle highly connected network structures based on common properties of nodes and retains structural links between the generated subnetworks, offering a straightforward way to traverse the splitted network. The tool is extensible, allowing the implementation of specialised representations and data format parsers. We present the automated reproduction of the original IDARE nodes using our tool and show examples of other data being mapped on a network of E. coli. The extensibility is demonstrated with two plugins that are available on github. IDARE2 provides an intuitive way to visualise data from multiple sources and allows one to disentangle the often complex network structure in large networks using predefined properties of the network nodes.

14.
Eur J Hum Genet ; 29(2): 309-324, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-33110245

RESUMO

Multivariate methods are known to increase the statistical power to detect associations in the case of shared genetic basis between phenotypes. They have, however, lacked essential analytic tools to follow-up and understand the biology underlying these associations. We developed a novel computational workflow for multivariate GWAS follow-up analyses, including fine-mapping and identification of the subset of traits driving associations (driver traits). Many follow-up tools require univariate regression coefficients which are lacking from multivariate results. Our method overcomes this problem by using Canonical Correlation Analysis to turn each multivariate association into its optimal univariate Linear Combination Phenotype (LCP). This enables an LCP-GWAS, which in turn generates the statistics required for follow-up analyses. We implemented our method on 12 highly correlated inflammatory biomarkers in a Finnish population-based study. Altogether, we identified 11 associations, four of which (F5, ABO, C1orf140 and PDGFRB) were not detected by biomarker-specific analyses. Fine-mapping identified 19 signals within the 11 loci and driver trait analysis determined the traits contributing to the associations. A phenome-wide association study on the 19 representative variants from the signals in 176,899 individuals from the FinnGen study revealed 53 disease associations (p < 1 × 10-4). Several reported pQTLs in the 11 loci provided orthogonal evidence for the biologically relevant functions of the representative variants. Our novel multivariate analysis workflow provides a powerful addition to standard univariate GWAS analyses by enabling multivariate GWAS follow-up and thus promoting the advancement of powerful multivariate methods in genomics.


Assuntos
Biomarcadores , Doença/genética , Variação Genética/genética , Estudo de Associação Genômica Ampla/métodos , Idoso , Análise de Correlação Canônica , Citocinas/genética , Feminino , Genômica , Humanos , Masculino , Pessoa de Meia-Idade , Fenótipo , Serpina E2/genética
15.
BMC Bioinformatics ; 11: 377, 2010 Jul 14.
Artigo em Inglês | MEDLINE | ID: mdl-20630057

RESUMO

BACKGROUND: High throughput sequencing has become an increasingly important tool for biological research. However, the existing software systems for managing and processing these data have not provided the flexible infrastructure that research requires. RESULTS: Existing software solutions provide static and well-established algorithms in a restrictive package. However as high throughput sequencing is a rapidly evolving field, such static approaches lack the ability to readily adopt the latest advances and techniques which are often required by researchers. We have used a loosely coupled, service-oriented infrastructure to develop SeqAdapt. This system streamlines data management and allows for rapid integration of novel algorithms. Our approach also allows computational biologists to focus on developing and applying new methods instead of writing boilerplate infrastructure code. CONCLUSION: The system is based around the Addama service architecture and is available at our website as a demonstration web application, an installable single download and as a collection of individual customizable services.


Assuntos
Biologia Computacional/métodos , Análise de Sequência de DNA/métodos , Software , Algoritmos , Sequência de Bases , Sistemas de Gerenciamento de Base de Dados , Internet , Análise de Sequência de DNA/instrumentação
16.
Front Genet ; 11: 431, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32499813

RESUMO

BACKGROUND: Multivariate testing tools that integrate multiple genome-wide association studies (GWAS) have become important as the number of phenotypes gathered from study cohorts and biobanks has increased. While these tools have been shown to boost statistical power considerably over univariate tests, an important remaining challenge is to interpret which traits are driving the multivariate association and which traits are just passengers with minor contributions to the genotype-phenotypes association statistic. RESULTS: We introduce MetaPhat, a novel bioinformatics tool to conduct GWAS of multiple correlated traits using univariate GWAS results and to decompose multivariate associations into sets of central traits based on intuitive trace plots that visualize Bayesian Information Criterion (BIC) and P-value statistics of multivariate association models. We validate MetaPhat with Global Lipids Genetics Consortium GWAS results, and we apply MetaPhat to univariate GWAS results for 21 heritable and correlated polyunsaturated lipid species from 2,045 Finnish samples, detecting seven independent loci associated with a cluster of lipid species. In most cases, we are able to decompose these multivariate associations to only three to five central traits out of all 21 traits included in the analyses. We release MetaPhat as an open source tool written in Python with built-in support for multi-processing, quality control, clumping and intuitive visualizations using the R software. CONCLUSION: MetaPhat efficiently decomposes associations between multivariate phenotypes and genetic variants into smaller sets of central traits and improves the interpretation and specificity of genome-phenome associations. MetaPhat is freely available under the MIT license at: https://sourceforge.net/projects/meta-pheno-association-tracer.

17.
Microorganisms ; 8(11)2020 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-33203081

RESUMO

Coxsackie B (CVB) viruses have been associated with type 1 diabetes. We have recently observed that CVB1 was linked to the initiation of the autoimmune process leading to type 1 diabetes in Finnish children. Viral persistency in the pancreas is currently considered as one possible mechanism. In the current study persistent infection was established in pancreatic ductal and beta cell lines (PANC-1 and 1.1B4) using four different CVB1 strains, including the prototype strain and three clinical isolates. We sequenced 5' untranslated region (UTR) and regions coding for structural and non-structural proteins and the second single open reading frame (ORF) protein of all persisting CVB1 strains using next generation sequencing to identify mutations that are common for all of these strains. One mutation, K257R in VP1, was found from all persisting CVB1 strains. The mutations were mainly accumulated in viral structural proteins, especially at BC, DE, EF loops and C-terminus of viral capsid protein 1 (VP1), the puff region of VP2, the knob region of VP3 and infection-enhancing epitope of VP4. This showed that the capsid region of the viruses sustains various changes during persistency some of which could be hallmark(s) of persistency.

18.
Circ Genom Precis Med ; 13(2): e002725, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-32154731

RESUMO

BACKGROUND: Hyperlipidemia is a highly heritable risk factor for coronary artery disease (CAD). While monogenic familial hypercholesterolemia associates with severely increased CAD risk, it remains less clear to what extent a high polygenic load of a large number of LDL (low-density lipoprotein) cholesterol (LDL-C) or triglyceride (TG)-increasing variants associates with increased CAD risk. METHODS: We derived polygenic risk scores (PRSs) with ≈6M variants separately for LDL-C and TG with weights from a UK Biobank-based genome-wide association study with ≈324K samples. We evaluated the impact of polygenic hypercholesterolemia and hypertriglyceridemia to lipid levels in 27 039 individuals from the National FINRISK Study (FINRISK) cohort and to CAD risk in 135 638 individuals (13 753 CAD cases) from the FinnGen project (FinnGen). RESULTS: In FINRISK, median LDL-C was 3.39 (95% CI, 3.38-3.40) mmol/L, and it ranged from 2.87 (95% CI, 2.82-2.94) to 3.78 (95% CI, 3.71-3.83) mmol/L between the lowest and highest 5% of the LDL-C PRS distribution. Median TG was 1.19 (95% CI, 1.18-1.20) mmol/L, ranging from 0.97 (95% CI, 0.94-1.00) to 1.55 (95% CI, 1.48-1.61) mmol/L with the TG PRS. In FinnGen, comparing the highest 5% of the PRS to the lowest 95%, CAD odds ratio was 1.36 (95% CI, 1.24-1.49) for the LDL-C PRS and 1.31 (95% CI, 1.19-1.43) for the TG PRS. These estimates were only slightly attenuated when adjusting for a CAD PRS (odds ratio, 1.26 [95% CI, 1.16-1.38] for LDL-C and 1.24 [95% CI, 1.13-1.36] for TG PRS). CONCLUSIONS: The CAD risk associated with a high polygenic load for lipid-increasing variants was proportional to their impact on lipid levels and partially overlapping with a CAD PRS. In contrast with a PRS for CAD, the lipid PRSs point to known and directly modifiable risk factors providing additional guidance for clinical translation.


Assuntos
LDL-Colesterol/sangue , Doença da Artéria Coronariana/epidemiologia , Predisposição Genética para Doença , Hiperlipidemias/genética , Herança Multifatorial , Polimorfismo de Nucleotídeo Único , Triglicerídeos/sangue , Estudos de Coortes , Doença da Artéria Coronariana/sangue , Doença da Artéria Coronariana/etiologia , Feminino , Estudo de Associação Genômica Ampla , Humanos , Hiperlipidemias/complicações , Masculino , Pessoa de Meia-Idade , Prognóstico , Fatores de Risco
19.
Cancer Res ; 79(10): 2466-2479, 2019 05 15.
Artigo em Inglês | MEDLINE | ID: mdl-30940663

RESUMO

Large collections of genome-wide data can facilitate the characterization of disease states and subtypes, permitting pan-cancer analysis of molecular phenotypes and evaluation of disease context for new therapeutic approaches. We analyzed 9,544 transcriptomes from more than 30 hematologic malignancies, normal blood cell types, and cell lines, and showed that disease types could be stratified in a data-driven manner. We then identified cluster-specific pathway activity, new biomarkers, and in silico drug target prioritization through interrogation of drug target databases. Using known vulnerabilities and available drug screens, we highlighted the importance of integrating molecular phenotype with drug target expression for in silico prediction of drug responsiveness. Our analysis implicated BCL2 expression level as an important indicator of venetoclax responsiveness and provided a rationale for its targeting in specific leukemia subtypes and multiple myeloma, linked several polycomb group proteins that could be targeted by small molecules (SFMBT1, CBX7, and EZH1) with chronic lymphocytic leukemia, and supported CDK6 as a disease-specific target in acute myeloid leukemia. Through integration with proteomics data, we characterized target protein expression for pre-B leukemia immunotherapy candidates, including DPEP1. These molecular data can be explored using our publicly available interactive resource, Hemap, for expediting therapeutic innovations in hematologic malignancies. SIGNIFICANCE: This study describes a data resource for researching derailed cellular pathways and candidate drug targets across hematologic malignancies.


Assuntos
Neoplasias Hematológicas/genética , Antineoplásicos/uso terapêutico , Biomarcadores Tumorais/genética , Compostos Bicíclicos Heterocíclicos com Pontes/uso terapêutico , Neoplasias Hematológicas/tratamento farmacológico , Humanos , Imunoterapia/métodos , Internet , Leucemia Mieloide Aguda/genética , Leucemia Mieloide Aguda/terapia , Linfoma de Células B/tratamento farmacológico , Fenótipo , Proteínas Proto-Oncogênicas c-bcl-2/genética , Bibliotecas de Moléculas Pequenas/uso terapêutico , Sulfonamidas/uso terapêutico , Transcriptoma/genética
20.
Methods Mol Biol ; 1838: 261-272, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30129002

RESUMO

The human microbiome project via application of metagenomic next-generation sequencing techniques has found surprising large and diverse amounts of microbial sequences across different body sites. There is a wave of investigators studying autoimmune related diseases designing from birth case and control studies to elucidate microbial associations and potential direct triggers. Sequencing analysis, considered big data as it typically includes millions of reads, is challenging but particularly demanding and complex is virome profiling due to its lack of pan-viral genomic signature. Impressively thousands of virus complete genomes have been deposited and these high-quality references are core components of virus profiling pipelines and databases. Still it is commonly known that most viral sequences do not map to known viruses. Moreover human viruses, particularly RNA groups, are notoriously heterogeneous due to high mutation rates. Here, we present the related assembling challenges and a series of bioinformatics steps that were applied in the construction of the complete consensus genome of a novel clinical isolate of Coxsackievirus B1. We further demonstrate our effort in calling mutations between prototype Coxsackievirus B1 sequence from GenBank and serial clinical isolate genome grown in cell culture.


Assuntos
Biologia Computacional , Enterovirus Humano B/genética , Genoma Viral , Genômica , Biologia Computacional/métodos , Genômica/métodos , Humanos , Metagenoma , Metagenômica/métodos , Metagenômica/normas , Controle de Qualidade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA