Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Bioinformatics ; 2021 Feb 09.
Artigo em Inglês | MEDLINE | ID: mdl-33560296

RESUMO

MOTIVATION: The prediction performance of Cox proportional hazard model suffers when there are only few uncensored events in the training data. RESULTS: We propose a Sparse-Group regularized Cox regression method to improve the prediction performance of large-scale and high-dimensional survival data with few observed events. Our approach is applicable when there is one or more other survival responses that 1. has a large number of observed events; 2. share a common set of associated predictors with the rare event response. This scenario is common in the UK Biobank (Sudlow et al., 2015) dataset where records for a large number of common and less prevalent diseases of the same set of individuals are available. By analyzing these responses together, we hope to achieve higher prediction performance than when they are analyzed individually. To make this approach practical for large-scale data, we developed an accelerated proximal gradient optimization algorithm as well as a screening procedure inspired by Qian et al. (2020). AVAILABILITY: https://github.com/rivas-lab/multisnpnet-Cox. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

2.
Eur J Hum Genet ; 2021 Feb 08.
Artigo em Inglês | MEDLINE | ID: mdl-33558700

RESUMO

Polygenic risk models have led to significant advances in understanding complex diseases and their clinical presentation. While polygenic risk scores (PRS) can effectively predict outcomes, they do not generally account for disease subtypes or pathways which underlie within-trait diversity. Here, we introduce a latent factor model of genetic risk based on components from Decomposition of Genetic Associations (DeGAs), which we call the DeGAs polygenic risk score (dPRS). We compute DeGAs using genetic associations for 977 traits and find that dPRS performs comparably to standard PRS while offering greater interpretability. We show how to decompose an individual's genetic risk for a trait across DeGAs components, with examples for body mass index (BMI) and myocardial infarction (heart attack) in 337,151 white British individuals in the UK Biobank, with replication in a further set of 25,486 non-British white individuals. We find that BMI polygenic risk factorizes into components related to fat-free mass, fat mass, and overall health indicators like physical activity. Most individuals with high dPRS for BMI have strong contributions from both a fat-mass component and a fat-free mass component, whereas a few "outlier" individuals have strong contributions from only one of the two components. Overall, our method enables fine-scale interpretation of the drivers of genetic risk for complex traits.

3.
Nat Genet ; 53(2): 185-194, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-33462484

RESUMO

Clinical laboratory tests are a critical component of the continuum of care. We evaluate the genetic basis of 35 blood and urine laboratory measurements in the UK Biobank (n = 363,228 individuals). We identify 1,857 loci associated with at least one trait, containing 3,374 fine-mapped associations and additional sets of large-effect (>0.1 s.d.) protein-altering, human leukocyte antigen (HLA) and copy number variant (CNV) associations. Through Mendelian randomization (MR) analysis, we discover 51 causal relationships, including previously known agonistic effects of urate on gout and cystatin C on stroke. Finally, we develop polygenic risk scores (PRSs) for each biomarker and build 'multi-PRS' models for diseases using 35 PRSs simultaneously, which improved chronic kidney disease, type 2 diabetes, gout and alcoholic cirrhosis genetic risk stratification in an independent dataset (FinnGen; n = 135,500) relative to single-disease PRSs. Together, our results delineate the genetic basis of biomarkers and their causal influences on diseases and improve genetic risk stratification for common diseases.

4.
PLoS Genet ; 16(10): e1009141, 2020 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-33095761

RESUMO

The UK Biobank is a very large, prospective population-based cohort study across the United Kingdom. It provides unprecedented opportunities for researchers to investigate the relationship between genotypic information and phenotypes of interest. Multiple regression methods, compared with genome-wide association studies (GWAS), have already been showed to greatly improve the prediction performance for a variety of phenotypes. In the high-dimensional settings, the lasso, since its first proposal in statistics, has been proved to be an effective method for simultaneous variable selection and estimation. However, the large-scale and ultrahigh dimension seen in the UK Biobank pose new challenges for applying the lasso method, as many existing algorithms and their implementations are not scalable to large applications. In this paper, we propose a computational framework called batch screening iterative lasso (BASIL) that can take advantage of any existing lasso solver and easily build a scalable solution for very large data, including those that are larger than the memory size. We introduce snpnet, an R package that implements the proposed algorithm on top of glmnet and optimizes for single nucleotide polymorphism (SNP) datasets. It currently supports ℓ1-penalized linear model, logistic regression, Cox model, and also extends to the elastic net with ℓ1/ℓ2 penalty. We demonstrate results on the UK Biobank dataset, where we achieve competitive predictive performance for all four phenotypes considered (height, body mass index, asthma, high cholesterol) using only a small fraction of the variants compared with other established polygenic risk score methods.

5.
Artigo em Inglês | MEDLINE | ID: mdl-33125279

RESUMO

Background - The aortic valve is an important determinant of cardiovascular physiology and anatomic location of common human diseases. Methods - From a sample of 34,287 white British-ancestry participants, we estimated functional aortic valve area by planimetry from prospectively obtained cardiac MRI sequences of the aortic valve. Aortic valve area measurements were submitted to genome-wide association testing, followed by polygenic risk scoring and phenome-wide screening to identify genetic comorbidities. Results - A genome-wide association study of aortic valve area in these UK Biobank participants showed three significant associations, indexed by rs71190365 (chr13:50764607, DLEU1, p=1.8×10-9), rs35991305 (chr12:94191968, CRADD, p=3.4×10-8) and chr17:45013271:C:T (GOSR2, p=5.6×10-8). Replication on an independent set of 8,145 unrelated European-ancestry participants showed consistent effect sizes in all three loci, although rs35991305 did not meet nominal significance. We constructed a polygenic risk score for aortic valve area, which in a separate cohort of 311,728 individuals without imaging demonstrated that smaller aortic valve area is predictive of increased risk for aortic valve disease (Odds Ratio 1.14, p=2.3×10-6). After excluding subjects with a medical diagnosis of aortic valve stenosis (remaining n=308,683 individuals), phenome-wide association of >10,000 traits showed multiple links between the polygenic score for aortic valve disease and key health-related comorbidities involving the cardiovascular system and autoimmune disease. Genetic correlation analysis supports a shared genetic etiology with between aortic valve area and birthweight along with other cardiovascular conditions. Conclusions - These results illustrate the use of automated phenotyping of cardiac imaging data from the general population to investigate the genetic etiology of aortic valve disease, perform clinical prediction, and uncover new clinical and genetic correlates of cardiac anatomy.

6.
Eur J Hum Genet ; 2020 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-32873964

RESUMO

Sex differences have been shown in laboratory biomarkers; however, the extent to which this is due to genetics is unknown. In this study, we infer sex-specific genetic parameters (heritability and genetic correlation) across 33 quantitative biomarker traits in 181,064 females and 156,135 males from the UK Biobank study. We apply a Bayesian Mixture Model, Sex Effects Mixture Model (SEMM), to Genome-wide Association Study summary statistics in order to (1) estimate the contributions of sex to the genetic variance of these biomarkers and (2) identify variants whose statistical association with these traits is sex-specific. We find that the genetics of most biomarker traits are shared between males and females, with the notable exception of testosterone, where we identify 119 female and 445 male-specific variants. These include protein-altering variants in steroid hormone production genes (POR, UGT2B7). Using the sex-specific variants as genetic instruments for Mendelian randomization, we find evidence for causal links between testosterone levels and height, body mass index, waist and hip circumference, and type 2 diabetes. We also show that sex-specific polygenic risk score models for testosterone outperform a combined model. Overall, these results demonstrate that while sex has a limited role in the genetics of most biomarker traits, sex plays an important role in testosterone genetics.

7.
Biostatistics ; 2020 Sep 29.
Artigo em Inglês | MEDLINE | ID: mdl-32989444

RESUMO

We develop a scalable and highly efficient algorithm to fit a Cox proportional hazard model by maximizing the $L^1$-regularized (Lasso) partial likelihood function, based on the Batch Screening Iterative Lasso (BASIL) method developed in Qian and others (2019). Our algorithm is particularly suitable for large-scale and high-dimensional data that do not fit in the memory. The output of our algorithm is the full Lasso path, the parameter estimates at all predefined regularization parameters, as well as their validation accuracy measured using the concordance index (C-index) or the validation deviance. To demonstrate the effectiveness of our algorithm, we analyze a large genotype-survival time dataset across 306 disease outcomes from the UK Biobank (Sudlow and others, 2015). We provide a publicly available implementation of the proposed approach for genetics data on top of the PLINK2 package and name it snpnet-Cox.

8.
medRxiv ; 2020 Jul 29.
Artigo em Inglês | MEDLINE | ID: mdl-32766602

RESUMO

During COVID19 and other viral pandemics, rapid generation of host and pathogen genomic data is critical to tracking infection and informing therapies. There is an urgent need for efficient approaches to this data generation at scale. We have developed a scalable, high throughput approach to generate high fidelity low pass whole genome and HLA sequencing, viral genomes, and representation of human transcriptome from single nasopharyngeal swabs of COVID19 patients.

9.
PLoS Genet ; 16(5): e1008682, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32369491

RESUMO

Protein-altering variants that are protective against human disease provide in vivo validation of therapeutic targets. Here we use genotyping data from UK Biobank (n = 337,151 unrelated White British individuals) and FinnGen (n = 176,899) to conduct a search for protein-altering variants conferring lower intraocular pressure (IOP) and protection against glaucoma. Through rare protein-altering variant association analysis, we find a missense variant in ANGPTL7 in UK Biobank (rs28991009, p.Gln175His, MAF = 0.8%, genotyped in 82,253 individuals with measured IOP and an independent set of 4,238 glaucoma patients and 250,660 controls) that significantly lowers IOP (ß = -0.53 and -0.67 mmHg for heterozygotes, -3.40 and -2.37 mmHg for homozygotes, P = 5.96 x 10-9 and 1.07 x 10-13 for corneal compensated and Goldman-correlated IOP, respectively) and is associated with 34% reduced risk of glaucoma (P = 0.0062). In FinnGen, we identify an ANGPTL7 missense variant at a greater than 50-fold increased frequency in Finland compared with other populations (rs147660927, p.Arg220Cys, MAF Finland = 4.3%), which was genotyped in 6,537 glaucoma patients and 170,362 controls and is associated with a 29% lower glaucoma risk (P = 1.9 x 10-12 for all glaucoma types and also protection against its subtypes including exfoliation, primary open-angle, and primary angle-closure). We further find three rarer variants in UK Biobank, including a protein-truncating variant, which confer a strong composite lowering of IOP (P = 0.0012 and 0.24 for Goldman-correlated and corneal compensated IOP, respectively), suggesting the protective mechanism likely resides in the loss of interaction or function. Our results support inhibition or down-regulation of ANGPTL7 as a therapeutic strategy for glaucoma.


Assuntos
Proteínas Semelhantes a Angiopoietina/genética , Glaucoma/genética , Glaucoma/prevenção & controle , Pressão Intraocular/genética , Polimorfismo de Nucleotídeo Único , Adulto , Idoso , Idoso de 80 Anos ou mais , Bancos de Espécimes Biológicos/estatística & dados numéricos , Estudos de Casos e Controles , Estudos de Coortes , Feminino , Finlândia/epidemiologia , Frequência do Gene , Predisposição Genética para Doença , Genética Populacional , Estudo de Associação Genômica Ampla , Glaucoma/epidemiologia , Humanos , Mutação com Perda de Função/genética , Masculino , Pessoa de Meia-Idade , Mutação de Sentido Incorreto , Reino Unido/epidemiologia
10.
Am J Hum Genet ; 106(5): 611-622, 2020 05 07.
Artigo em Inglês | MEDLINE | ID: mdl-32275883

RESUMO

Population-scale biobanks that combine genetic data and high-dimensional phenotyping for a large number of participants provide an exciting opportunity to perform genome-wide association studies (GWAS) to identify genetic variants associated with diverse quantitative traits and diseases. A major challenge for GWAS in population biobanks is ascertaining disease cases from heterogeneous data sources such as hospital records, digital questionnaire responses, or interviews. In this study, we use genetic parameters, including genetic correlation, to evaluate whether GWAS performed using cases in the UK Biobank ascertained from hospital records, questionnaire responses, and family history of disease implicate similar disease genetics across a range of effect sizes. We find that hospital record and questionnaire GWAS largely identify similar genetic effects for many complex phenotypes and that combining together both phenotyping methods improves power to detect genetic associations. We also show that family history GWAS using cases ascertained on family history of disease agrees with combined hospital record and questionnaire GWAS and that family history GWAS has better power to detect genetic associations for some phenotypes. Overall, this work demonstrates that digital phenotyping and unstructured phenotype data can be combined with structured data such as hospital records to identify cases for GWAS in biobanks and improve the ability of such studies to identify genetic associations.


Assuntos
Doença/genética , Estudo de Associação Genômica Ampla , Fenótipo , Asma/genética , Bases de Dados Factuais , Feminino , Genética Médica , Genótipo , Humanos , Masculino , Neoplasias/genética , Reino Unido
11.
Mol Psychiatry ; 25(10): 2422-2430, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-30610202

RESUMO

Suicide accounts for nearly 800,000 deaths per year worldwide with rates of both deaths and attempts rising. Family studies have estimated substantial heritability of suicidal behavior; however, collecting the sample sizes necessary for successful genetic studies has remained a challenge. We utilized two different approaches in independent datasets to characterize the contribution of common genetic variation to suicide attempt. The first is a patient reported suicide attempt phenotype asked as part of an online mental health survey taken by a subset of participants (n = 157,366) in the UK Biobank. After quality control, we leveraged a genotyped set of unrelated, white British ancestry participants including 2433 cases and 334,766 controls that included those that did not participate in the survey or were not explicitly asked about attempting suicide. The second leveraged electronic health record (EHR) data from the Vanderbilt University Medical Center (VUMC, 2.8 million patients, 3250 cases) and machine learning to derive probabilities of attempting suicide in 24,546 genotyped patients. We identified significant and comparable heritability estimates of suicide attempt from both the patient reported phenotype in the UK Biobank (h2SNP = 0.035, p = 7.12 × 10-4) and the clinically predicted phenotype from VUMC (h2SNP = 0.046, p = 1.51 × 10-2). A significant genetic overlap was demonstrated between the two measures of suicide attempt in these independent samples through polygenic risk score analysis (t = 4.02, p = 5.75 × 10-5) and genetic correlation (rg = 1.073, SE = 0.36, p = 0.003). Finally, we show significant but incomplete genetic correlation of suicide attempt with insomnia (rg = 0.34-0.81) as well as several psychiatric disorders (rg = 0.26-0.79). This work demonstrates the contribution of common genetic variation to suicide attempt. It points to a genetic underpinning to clinically predicted risk of attempting suicide that is similar to the genetic profile from a patient reported outcome. Lastly, it presents an approach for using EHR data and clinical prediction to generate quantitative measures from binary phenotypes that can improve power for genetic studies.

12.
Nat Commun ; 10(1): 4064, 2019 09 06.
Artigo em Inglês | MEDLINE | ID: mdl-31492854

RESUMO

Population-based biobanks with genomic and dense phenotype data provide opportunities for generating effective therapeutic hypotheses and understanding the genomic role in disease predisposition. To characterize latent components of genetic associations, we apply truncated singular value decomposition (DeGAs) to matrices of summary statistics derived from genome-wide association analyses across 2,138 phenotypes measured in 337,199 White British individuals in the UK Biobank study. We systematically identify key components of genetic associations and the contributions of variants, genes, and phenotypes to each component. As an illustration of the utility of the approach to inform downstream experiments, we report putative loss of function variants, rs114285050 (GPR151) and rs150090666 (PDE3B), that substantially contribute to obesity-related traits and experimentally demonstrate the role of these genes in adipocyte biology. Our approach to dissect components of genetic associations across the human phenome will accelerate biomedical hypothesis generation by providing insights on previously unexplored latent structures.


Assuntos
Adipócitos/metabolismo , Bancos de Espécimes Biológicos , Estudos de Associação Genética/métodos , Estudo de Associação Genômica Ampla/métodos , Células 3T3-L1 , Adipócitos/citologia , Animais , Células Cultivadas , Nucleotídeo Cíclico Fosfodiesterase do Tipo 3/genética , Predisposição Genética para Doença/genética , Humanos , Camundongos , Obesidade/genética , Fenótipo , Polimorfismo de Nucleotídeo Único , Reino Unido
13.
Pac Symp Biocomput ; 24: 184-195, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30864321

RESUMO

Genetic variations of the human genome are linked to many disease phenotypes. While whole-genome sequencing and genome-wide association studies (GWAS) have uncovered a number of genotype-phenotype associations, their functional interpretation remains challenging given most single nucleotide polymorphisms (SNPs) fall into the non-coding region of the genome. Advances in chromatin immunoprecipitation sequencing (ChIP-seq) have made large-scale repositories of epigenetic data available, allowing investigation of coordinated mechanisms of epigenetic markers and transcriptional regulation and their influence on biological function. To address this, we propose SNPs2ChIP, a method to infer biological functions of non-coding variants through unsupervised statistical learning methods applied to publicly-available epigenetic datasets. We systematically characterized latent factors by applying singular value decomposition to ChIP-seq tracks of lymphoblastoid cell lines, and annotated the biological function of each latent factor using the genomic region enrichment analysis tool. Using these annotated latent factors as reference, we developed SNPs2ChIP, a pipeline that takes genomic region(s) as an input, identifies the relevant latent factors with quantitative scores, and returns them along with their inferred functions. As a case study, we focused on systemic lupus erythematosus and demonstrated our method's ability to infer relevant biological function. We systematically applied SNPs2ChIP on publicly available datasets, including known GWAS associations from the GWAS catalogue and ChIP-seq peaks from a previously published study. Our approach to leverage latent patterns across genome-wide epigenetic datasets to infer the biological function will advance understanding of the genetics of human diseases by accelerating the interpretation of non-coding genomes.


Assuntos
Imunoprecipitação da Cromatina/estatística & dados numéricos , Polimorfismo de Nucleotídeo Único , Algoritmos , Linhagem Celular , Biologia Computacional/métodos , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Epigênese Genética , Estudos de Associação Genética , Genoma Humano , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Humanos , Lúpus Eritematoso Sistêmico/genética , Linfócitos/metabolismo , Receptores de Calcitriol/genética
14.
Bioinformatics ; 35(14): 2495-2497, 2019 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-30520965

RESUMO

SUMMARY: Large biobanks linking phenotype to genotype have led to an explosion of genetic association studies across a wide range of phenotypes. Sharing the knowledge generated by these resources with the scientific community remains a challenge due to patient privacy and the vast amount of data. Here, we present Global Biobank Engine (GBE), a web-based tool that enables exploration of the relationship between genotype and phenotype in biobank cohorts, such as the UK Biobank. GBE supports browsing for results from genome-wide association studies, phenome-wide association studies, gene-based tests and genetic correlation between phenotypes. We envision GBE as a platform that facilitates the dissemination of summary statistics from biobanks to the scientific and clinical communities. AVAILABILITY AND IMPLEMENTATION: GBE currently hosts data from the UK Biobank and can be found freely available at biobankengine.stanford.edu.


Assuntos
Bancos de Espécimes Biológicos , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Fenômica , Fenótipo
15.
Nat Commun ; 9(1): 1612, 2018 04 24.
Artigo em Inglês | MEDLINE | ID: mdl-29691392

RESUMO

Protein-truncating variants can have profound effects on gene function and are critical for clinical genome interpretation and generating therapeutic hypotheses, but their relevance to medical phenotypes has not been systematically assessed. Here, we characterize the effect of 18,228 protein-truncating variants across 135 phenotypes from the UK Biobank and find 27 associations between medical phenotypes and protein-truncating variants in genes outside the major histocompatibility complex. We perform phenome-wide analyses and directly measure the effect in homozygous carriers, commonly referred to as "human knockouts," across medical phenotypes for genes implicated as being protective against disease or associated with at least one phenotype in our study. We find several genes with strong pleiotropic or non-additive effects. Our results illustrate the importance of protein-truncating variants in a variety of diseases.


Assuntos
Bases de Dados de Ácidos Nucleicos , Proteínas/genética , Deleção de Sequência , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Estudo de Associação Genômica Ampla , Humanos , Fenótipo , Reino Unido
16.
J Plant Res ; 131(4): 709-717, 2018 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-29460198

RESUMO

Recent studies have shown that environmental DNA is found almost everywhere. Flower petal surfaces are an attractive tissue to use for investigation of the dispersal of environmental DNA in nature as they are isolated from the external environment until the bud opens and only then can the petal surface accumulate environmental DNA. Here, we performed a crowdsourced experiment, the "Ohanami Project", to obtain environmental DNA samples from petal surfaces of Cerasus × yedoensis 'Somei-yoshino' across the Japanese archipelago during spring 2015. C. × yedoensis is the most popular garden cherry species in Japan and clones of this cultivar bloom simultaneously every spring. Data collection spanned almost every prefecture and totaled 577 DNA samples from 149 collaborators. Preliminary amplicon-sequencing analysis showed the rapid attachment of environmental DNA onto the petal surfaces. Notably, we found DNA of other common plant species in samples obtained from a wide distribution; this DNA likely originated from the pollen of the Japanese cedar. Our analysis supports our belief that petal surfaces after blossoming are a promising target to reveal the dynamics of environmental DNA in nature. The success of our experiment also shows that crowdsourced environmental DNA analyses have considerable value in ecological studies.


Assuntos
DNA de Plantas/genética , DNA/genética , Meio Ambiente , Flores/genética , Prunus/genética , Cloroplastos/genética , Cianobactérias/genética , Flores/microbiologia , Japão , Proteobactérias/genética , Prunus/microbiologia , Alinhamento de Sequência , Análise de Sequência de DNA
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...