Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
1.
Nat Commun ; 14(1): 5562, 2023 09 09.
Artigo em Inglês | MEDLINE | ID: mdl-37689782

RESUMO

Genes act in concert with each other in specific contexts to perform their functions. Determining how these genes influence complex traits requires a mechanistic understanding of expression regulation across different conditions. It has been shown that this insight is critical for developing new therapies. Transcriptome-wide association studies have helped uncover the role of individual genes in disease-relevant mechanisms. However, modern models of the architecture of complex traits predict that gene-gene interactions play a crucial role in disease origin and progression. Here we introduce PhenoPLIER, a computational approach that maps gene-trait associations and pharmacological perturbation data into a common latent representation for a joint analysis. This representation is based on modules of genes with similar expression patterns across the same conditions. We observe that diseases are significantly associated with gene modules expressed in relevant cell types, and our approach is accurate in predicting known drug-disease pairs and inferring mechanisms of action. Furthermore, using a CRISPR screen to analyze lipid regulation, we find that functionally important players lack associations but are prioritized in trait-associated modules by PhenoPLIER. By incorporating groups of co-expressed genes, PhenoPLIER can contextualize genetic associations and reveal potential targets missed by single-gene strategies.


Assuntos
Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , Epistasia Genética , Causalidade , Redes Reguladoras de Genes , Transcriptoma
2.
bioRxiv ; 2023 Jan 23.
Artigo em Inglês | MEDLINE | ID: mdl-36747665

RESUMO

In this work we investigate how models with advanced natural language processing capabilities can be used to reduce the time-consuming process of writing and revising scholarly manuscripts. To this end, we integrate large language models into the Manubot publishing ecosystem to suggest revisions for scholarly text. We tested our AI-based revision workflow in three case studies of existing manuscripts, including the present one. Our results suggest that these models can capture the concepts in the scholarly text and produce high-quality revisions that improve clarity. Given the amount of time that researchers put into crafting prose, we anticipate that this advance will revolutionize the type of knowledge work performed by academics.

4.
Nat Comput Sci ; 3(5): 403-417, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-38177845

RESUMO

Human diseases are traditionally studied as singular, independent entities, limiting researchers' capacity to view human illnesses as dependent states in a complex, homeostatic system. Here, using time-stamped clinical records of over 151 million unique Americans, we construct a disease representation as points in a continuous, high-dimensional space, where diseases with similar etiology and manifestations lie near one another. We use the UK Biobank cohort, with half a million participants, to perform a genome-wide association study of newly defined human quantitative traits reflecting individuals' health states, corresponding to patient positions in our disease space. We discover 116 genetic associations involving 108 genetic loci and then use ten disease constellations resulting from clustering analysis of diseases in the embedding space, as well as 30 common diseases, to demonstrate that these genetic associations can be used to robustly predict various morbidities.


Assuntos
Loci Gênicos , Estudo de Associação Genômica Ampla , Humanos , Estados Unidos , Estudo de Associação Genômica Ampla/métodos , Fenótipo
5.
Curr Protoc ; 2(11): e603, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-36441943

RESUMO

Genome-wide association studies (GWAS) are being conducted at an unprecedented rate in population-based cohorts and have increased our understanding of the pathophysiology of many complex diseases. Regardless of the context, the practical utility of this information ultimately depends upon the quality of the data used for statistical analyses. Quality control (QC) procedures for GWAS are constantly evolving. Here, we enumerate some of the challenges in QC of genotyped GWAS data and describe the approaches involving genotype imputation of a sample dataset along with post-imputation quality assurance, thereby minimizing potential bias and error in GWAS results. We discuss common issues associated with QC of the GWAS data (genotyped and imputed), including data file formats, software packages for data manipulation and analysis, sex chromosome anomalies, sample identity, sample relatedness, population substructure, batch effects, and marker quality. We provide detailed guidelines along with a sample dataset to suggest current best practices and discuss areas of ongoing and future research. © 2022 Wiley Periodicals LLC.


Assuntos
Estudo de Associação Genômica Ampla , Projetos de Pesquisa , Humanos , Controle de Qualidade , Genótipo , Aberrações dos Cromossomos Sexuais
6.
Nat Commun ; 13(1): 6712, 2022 11 07.
Artigo em Inglês | MEDLINE | ID: mdl-36344522

RESUMO

Asthma is a heterogeneous, complex syndrome, and identifying asthma endotypes has been challenging. We hypothesize that distinct endotypes of asthma arise in disparate genetic variation and life-time environmental exposure backgrounds, and that disease comorbidity patterns serve as a surrogate for such genetic and exposure variations. Here, we computationally discover 22 distinct comorbid disease patterns among individuals with asthma (asthma comorbidity subgroups) using diagnosis records for >151 M US residents, and re-identify 11 of the 22 subgroups in the much smaller UK Biobank. GWASs to discern asthma risk loci for individuals within each subgroup and in all subgroups combined reveal 109 independent risk loci, of which 52 are replicated in multi-ancestry meta-analysis across different ethnicity subsamples in UK Biobank, US BioVU, and BioBank Japan. Fourteen loci confer asthma risk in multiple subgroups and in all subgroups combined. Importantly, another six loci confer asthma risk in only one subgroup. The strength of association between asthma and each of 44 health-related phenotypes also varies dramatically across subgroups. This work reveals subpopulations of asthma patients distinguished by comorbidity patterns, asthma risk loci, gene expression, and health-related phenotypes, and so reveals different asthma endotypes.


Assuntos
Asma , Humanos , Asma/epidemiologia , Asma/genética , Estudo de Associação Genômica Ampla , Fenótipo , Comorbidade , Japão/epidemiologia
7.
Genome Biol ; 23(1): 23, 2022 01 13.
Artigo em Inglês | MEDLINE | ID: mdl-35027082

RESUMO

BACKGROUND: Polygenic risk scores (PRS) are valuable to translate the results of genome-wide association studies (GWAS) into clinical practice. To date, most GWAS have been based on individuals of European-ancestry leading to poor performance in populations of non-European ancestry. RESULTS: We introduce the polygenic transcriptome risk score (PTRS), which is based on predicted transcript levels (rather than SNPs), and explore the portability of PTRS across populations using UK Biobank data. CONCLUSIONS: We show that PTRS has a significantly higher portability (Wilcoxon p=0.013) in the African-descent samples where the loss of performance is most acute with better performance than PRS when used in combination.


Assuntos
Estudo de Associação Genômica Ampla , Transcriptoma , Predisposição Genética para Doença , Humanos , Herança Multifatorial , Polimorfismo de Nucleotídeo Único , Fatores de Risco
8.
Genome Biol ; 22(1): 49, 2021 01 26.
Artigo em Inglês | MEDLINE | ID: mdl-33499903

RESUMO

The resources generated by the GTEx consortium offer unprecedented opportunities to advance our understanding of the biology of human diseases. Here, we present an in-depth examination of the phenotypic consequences of transcriptome regulation and a blueprint for the functional interpretation of genome-wide association study-discovered loci. Across a broad set of complex traits and diseases, we demonstrate widespread dose-dependent effects of RNA expression and splicing. We develop a data-driven framework to benchmark methods that prioritize causal genes and find no single approach outperforms the combination of multiple approaches. Using colocalization and association approaches that take into account the observed allelic heterogeneity of gene expression, we propose potential target genes for 47% (2519 out of 5385) of the GWAS loci examined.


Assuntos
Expressão Gênica , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla/métodos , Genótipo , Genes , Humanos , Herança Multifatorial , Transcriptoma
9.
Artigo em Inglês | MEDLINE | ID: mdl-35391741

RESUMO

We present a new combinatorial model for identifying regulatory modules in gene co-expression data using a decomposition into weighted cliques. To capture complex interaction effects, we generalize the previously-studied weighted edge clique partition problem. As a first step, we restrict ourselves to the noise-free setting, and show that the problem is fixed parameter tractable when parameterized by the number of modules (cliques). We present two new algorithms for finding these decompositions, using linear programming and integer partitioning to determine the clique weights. Further, we implement these algorithms in Python and test them on a biologically-inspired synthetic corpus generated using real-world data from transcription factors and a latent variable analysis of co-expression in varying cell types.

10.
Am J Hum Genet ; 108(1): 25-35, 2021 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-33308443

RESUMO

Colocalization analysis has emerged as a powerful tool to uncover the overlapping of causal variants responsible for both molecular and complex disease phenotypes. The findings from colocalization analysis yield insights into the molecular pathways of complex diseases. In this paper, we conduct an in-depth investigation of the promise and limitations of the available colocalization analysis approaches. Focusing on variant-level colocalization approaches, we first establish the connections between various existing methods. We proceed to discuss the impacts of various controllable analytical factors and uncontrollable practical factors on outcomes of colocalization analysis through realistic simulations and real data examples. We identify a single analytical factor, the specification of prior enrichment levels, which can lead to severe inflation of false-positive colocalization findings. Meanwhile, the combination of many other analytical and practical factors all lead to diminished power. Consequently, we recommend the following strategies for the best practice of colocalization analysis: (1) estimating prior enrichment level from the observed data and (2) separating fine-mapping and colocalization analysis. Our analysis of 4,091 complex traits and the multi-tissue expression quantitative trait loci (eQTL) data from the GTEx (v.8) suggests that colocalizations of molecular QTLs and causal complex trait associations are widespread. However, only a small proportion can be confidently identified from currently available data due to a lack of power. Our findings set a benchmark for current and future integrative genetic association analysis applications.


Assuntos
Estudo de Associação Genômica Ampla/métodos , Herança Multifatorial/genética , Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas/genética , Predisposição Genética para Doença/genética , Humanos , Desequilíbrio de Ligação/genética , Fenótipo
11.
Sci Adv ; 6(37)2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32917697

RESUMO

Large-scale genomic and transcriptomic initiatives offer unprecedented insight into complex traits, but clinical translation remains limited by variant-level associations without biological context and lack of analytic resources. Our resource, PhenomeXcan, synthesizes 8.87 million variants from genome-wide association study summary statistics on 4091 traits with transcriptomic data from 49 tissues in Genotype-Tissue Expression v8 into a gene-based, queryable platform including 22,515 genes. We developed a novel Bayesian colocalization method, fast enrichment estimation aided colocalization analysis (fastENLOC), to prioritize likely causal gene-trait associations. We successfully replicate associations from the phenome-wide association studies (PheWAS) catalog Online Mendelian Inheritance in Man, and an evidence-based curated gene list. Using PhenomeXcan results, we provide examples of novel and underreported genome-to-phenome associations, complex gene-trait clusters, shared causal genes between common and rare diseases via further integration of PhenomeXcan with ClinVar, and potential therapeutic targets. PhenomeXcan (phenomexcan.org) provides broad, user-friendly access to complex data for translational researchers.

12.
Lancet Respir Med ; 7(6): 509-522, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-31036433

RESUMO

BACKGROUND: Childhood-onset and adult-onset asthma differ with respect to severity and comorbidities. Whether they also differ with respect to genetic risk factors has not been previously investigated in large samples. The goals of this study were to identify shared and distinct genetic risk loci for childhood-onset and adult-onset asthma, and to identify the genes that might mediate the effects of associated variation. METHODS: We did genome-wide and transcriptome-wide studies, using data from the UK Biobank, in individuals with asthma, including adults with childhood-onset asthma (onset before 12 years of age), adults with adult-onset asthma (onset between 26 and 65 years of age), and adults without asthma (controls; aged older than 38 years). We did genome-wide association studies (GWAS) for childhood-onset asthma and adult-onset asthma each compared with shared controls, and for age of asthma onset in all asthma cases, with a genome-wide significance threshold of p<5 × 10-8. Enrichment studies determined the tissues in which genes at GWAS loci were most highly expressed, and PrediXcan, a transcriptome-wide gene-based test, was used to identify candidate risk genes. FINDINGS: Of 376 358 British white individuals from the UK Biobank, we included 37 846 with self-reports of doctor-diagnosed asthma: 9433 adults with childhood-onset asthma; 21 564 adults with adult-onset asthma; and an additional 6849 young adults with asthma with onset between 12 and 25 years of age. For the first and second GWAS analyses, 318 237 individuals older than 38 years without asthma were used as controls. We detected 61 independent asthma loci: 23 were childhood-onset specific, one was adult-onset specific, and 37 were shared. 19 loci were associated with age of asthma onset. The most significant asthma-associated locus was at 17q12 (odds ratio 1·406, 95% CI 1·365-1·448; p=1·45 × 10-111) in the childhood-onset GWAS. Genes at the childhood onset-specific loci were most highly expressed in skin, blood, and small intestine; genes at the adult onset-specific loci were most highly expressed in lung, blood, small intestine, and spleen. PrediXcan identified 113 unique candidate genes at 22 of the 61 GWAS loci. Single-nucleotide polymorphism-based heritability estimates were more than three times larger for childhood-onset asthma (0·327) than for adult-onset disease (0·098). The onset of disease in childhood was associated with additional genes with relatively large effect sizes, with the largest odds ratio observed at the FLG locus at 1q21.3 (1·970, 95% CI 1·823-2·129). INTERPRETATION: Genetic risk factors for adult-onset asthma are largely a subset of the genetic risk for childhood-onset asthma but with overall smaller effects, suggesting a greater role for non-genetic risk factors in adult-onset asthma. Combined with gene expression and tissue enrichment patterns, we suggest that the establishment of disease in children is driven more by dysregulated allergy and epithelial barrier function genes, whereas the cause of adult-onset asthma is more lung-centred and environmentally determined, but with immune-mediated mechanisms driving disease progression in both children and adults. FUNDING: US National Institutes of Health.


Assuntos
Idade de Início , Asma/genética , Predisposição Genética para Doença/genética , Adulto , Idoso , Estudos de Casos e Controles , Criança , Feminino , Proteínas Filagrinas , Loci Gênicos , Estudo de Associação Genômica Ampla , Humanos , Masculino , Pessoa de Meia-Idade , Polimorfismo de Nucleotídeo Único/genética , Fatores de Risco , Transcriptoma , População Branca/genética
13.
PLoS Genet ; 15(1): e1007889, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-30668570

RESUMO

Integration of genome-wide association studies (GWAS) and expression quantitative trait loci (eQTL) studies is needed to improve our understanding of the biological mechanisms underlying GWAS hits, and our ability to identify therapeutic targets. Gene-level association methods such as PrediXcan can prioritize candidate targets. However, limited eQTL sample sizes and absence of relevant developmental and disease context restrict our ability to detect associations. Here we propose an efficient statistical method (MultiXcan) that leverages the substantial sharing of eQTLs across tissues and contexts to improve our ability to identify potential target genes. MultiXcan integrates evidence across multiple panels using multivariate regression, which naturally takes into account the correlation structure. We apply our method to simulated and real traits from the UK Biobank and show that, in realistic settings, we can detect a larger set of significantly associated genes than using each panel separately. To improve applicability, we developed a summary result-based extension called S-MultiXcan, which we show yields highly concordant results with the individual level version when LD is well matched. Our multivariate model-based approach allowed us to use the individual level results as a gold standard to calibrate and develop a robust implementation of the summary-based extension. Results from our analysis as well as software and necessary resources to apply our method are publicly available.


Assuntos
Estudo de Associação Genômica Ampla/estatística & dados numéricos , Locos de Características Quantitativas/genética , Transcriptoma/genética , Expressão Gênica/genética , Humanos , Polimorfismo de Nucleotídeo Único/genética , Software/estatística & dados numéricos
14.
Brief Bioinform ; 20(5): 1607-1620, 2019 09 27.
Artigo em Inglês | MEDLINE | ID: mdl-29800232

RESUMO

MOTIVATION: The importance of microRNAs (miRNAs) is widely recognized in the community nowadays because these short segments of RNA can play several roles in almost all biological processes. The computational prediction of novel miRNAs involves training a classifier for identifying sequences having the highest chance of being precursors of miRNAs (pre-miRNAs). The big issue with this task is that well-known pre-miRNAs are usually few in comparison with the hundreds of thousands of candidate sequences in a genome, which results in high class imbalance. This imbalance has a strong influence on most standard classifiers, and if not properly addressed in the model and the experiments, not only performance reported can be completely unrealistic but also the classifier will not be able to work properly for pre-miRNA prediction. Besides, another important issue is that for most of the machine learning (ML) approaches already used (supervised methods), it is necessary to have both positive and negative examples. The selection of positive examples is straightforward (well-known pre-miRNAs). However, it is difficult to build a representative set of negative examples because they should be sequences with hairpin structure that do not contain a pre-miRNA. RESULTS: This review provides a comprehensive study and comparative assessment of methods from these two ML approaches for dealing with the prediction of novel pre-miRNAs: supervised and unsupervised training. We present and analyze the ML proposals that have appeared during the past 10 years in literature. They have been compared in several prediction tasks involving two model genomes and increasing imbalance levels. This work provides a review of existing ML approaches for pre-miRNA prediction and fair comparisons of the classifiers with same features and data sets, instead of just a revision of published software tools. The results and the discussion can help the community to select the most adequate bioinformatics approach according to the prediction task at hand. The comparative results obtained suggest that from low to mid-imbalance levels between classes, supervised methods can be the best. However, at very high imbalance levels, closer to real case scenarios, models including unsupervised and deep learning can provide better performance.


Assuntos
Aprendizado de Máquina , MicroRNAs/fisiologia , Animais , Biologia Computacional , Humanos , MicroRNAs/química , MicroRNAs/genética
15.
Bioinformatics ; 35(11): 1931-1939, 2019 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-30357313

RESUMO

MOTIVATION: Heterogeneous and voluminous data sources are common in modern datasets, particularly in systems biology studies. For instance, in multi-holistic approaches in the fruit biology field, data sources can include a mix of measurements such as morpho-agronomic traits, different kinds of molecules (nucleic acids and metabolites) and consumer preferences. These sources not only have different types of data (quantitative and qualitative), but also large amounts of variables with possibly non-linear relationships among them. An integrative analysis is usually hard to conduct, since it requires several manual standardization steps, with a direct and critical impact on the results obtained. These are important issues in clustering applications, which highlight the need of new methods for uncovering complex relationships in such diverse repositories. RESULTS: We designed a new method named Clustermatch to easily and efficiently perform data-mining tasks on large and highly heterogeneous datasets. Our approach can derive a similarity measure between any quantitative or qualitative variables by looking on how they influence on the clustering of the biological materials under study. Comparisons with other methods in both simulated and real datasets show that Clustermatch is better suited for finding meaningful relationships in complex datasets. AVAILABILITY AND IMPLEMENTATION: Files can be downloaded from https://sourceforge.net/projects/sourcesinc/files/clustermatch/ and https://bitbucket.org/sinc-lab/clustermatch/. In addition, a web-demo is available at http://sinc.unl.edu.ar/web-demo/clustermatch/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Mineração de Dados , Análise por Conglomerados , Padrões de Referência
16.
Bioinformatics ; 35(11): 1971-1973, 2019 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-30395166

RESUMO

SUMMARY: Large biobanks, such as UK Biobank with half a million participants, are changing the scale and availability of genotypic and phenotypic data for researchers to ask fundamental questions about the biology of health and disease. The breadth of the UK Biobank data is enabling discoveries at an unprecedented pace. However, this size and complexity pose new challenges to investigators who need to keep the accruing data up to date, comply with potential consent changes, and efficiently and reproducibly extract subsets of the data to answer specific scientific questions. Here we propose a tool called ukbREST designed for the UK Biobank study (easily extensible to other biobanks), which allows authorized users to efficiently retrieve phenotypic and genetic data. It exposes a REST API that makes data highly accessible inside a private and secure network, allowing the data specification in a human readable text format easily shareable with other researchers. These characteristics make ukbREST an important tool to make biobank's valuable data more readily accessible to the research community and facilitate reproducibility of the analysis, a key aspect of science. AVAILABILITY AND IMPLEMENTATION: It is implemented in Python using the Flask-RESTful framework for the API, and it is under the MIT license. It works with PostgreSQL and a Docker image is available for easy deployment. The source code and documentation is available in Github: https://github.com/hakyimlab/ukbrest.


Assuntos
Software , Bancos de Espécimes Biológicos , Documentação , Humanos , Reprodutibilidade dos Testes
17.
Brief Bioinform ; 17(1): 180-3, 2016 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-26223526

RESUMO

The reproducibility of research in bioinformatics refers to the notion that new methodologies/algorithms and scientific claims have to be published together with their data and source code, in a way that other researchers may verify the findings to further build more knowledge on them. The replication and corroboration of research results are key to the scientific process, and many journals are discussing the matter nowadays, taking concrete steps in this direction. In this journal itself, a recent opinion note has appeared highlighting the increasing importance of this topic in bioinformatics and computational biology, inviting the community to further discuss the matter. In agreement with that article, we would like to propose here another step into that direction with a tool that allows the automatic generation of a web interface, named web-demo, directly from source code in a simple and straightforward way. We believe this contribution can help make research not only reproducible but also more easily accessible. A web-demo associated to a published paper can accelerate an algorithm validation with real data, wide-spreading its use with just a few clicks.


Assuntos
Algoritmos , Biologia Computacional/métodos , Internet , Humanos , Reprodutibilidade dos Testes , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA