Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 27
Filter
1.
Physiol Genomics ; 2024 May 13.
Article in English | MEDLINE | ID: mdl-38738317

ABSTRACT

BACKGROUND: Hypertonic dehydration is associated with muscle wasting and synthesis of organic osmolytes. We recently showed a metabolic shift to amino acid production and urea cycle activation in COVID-19, consistent with the aestivation response. The aim of the present investigation was to validate the metabolic shift and development of long-term physical outcome in the non-COVID cohort of the Biobanque Québécoise de la COVID-19 (BQC19). METHODS: We included 824 patients from BQC19, where of 571 patients had data of dehydration in the form of estimated osmolality (eOSM = 2Na+2K+glucose+urea), and 284 patients had metabolome data and long-term follow-up. We correlated the degree of dehydration to mortality, invasive mechanical ventilation, acute kidney injury, and long-term symptoms. RESULTS: As found in the COVID cohort, higher eOSM correlated with higher proportion of urea and glucose of total eOSM and an enrichment of amino acids compared to other metabolites. Sex stratified analysis indicated that women may show a weaker aestivation response. More severe dehydration was associated with mortality, invasive mechanical ventilation, and acute kidney injury during the acute illness. Importantly, more severe dehydration was associated with physical long-term symptoms but not mental long-term symptoms after adjustment for age, sex, and disease severity. CONCLUSIONS: Patients with water deficit in the form of increased eOSM tend to have more severe disease and experience more physical symptoms after an acute episode of care. This is associated with amino acid and urea production indicating dehydration induced muscle wasting.

3.
Nature ; 625(7993): 92-100, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38057664

ABSTRACT

The depletion of disruptive variation caused by purifying natural selection (constraint) has been widely used to investigate protein-coding genes underlying human disorders1-4, but attempts to assess constraint for non-protein-coding regions have proved more difficult. Here we aggregate, process and release a dataset of 76,156 human genomes from the Genome Aggregation Database (gnomAD)-the largest public open-access human genome allele frequency reference dataset-and use it to build a genomic constraint map for the whole genome (genomic non-coding constraint of haploinsufficient variation (Gnocchi)). We present a refined mutational model that incorporates local sequence context and regional genomic features to detect depletions of variation. As expected, the average constraint for protein-coding sequences is stronger than that for non-coding regions. Within the non-coding genome, constrained regions are enriched for known regulatory elements and variants that are implicated in complex human diseases and traits, facilitating the triangulation of biological annotation, disease association and natural selection to non-coding DNA analysis. More constrained regulatory elements tend to regulate more constrained protein-coding genes, which in turn suggests that non-coding constraint can aid the identification of constrained genes that are as yet unrecognized by current gene constraint metrics. We demonstrate that this genome-wide constraint map improves the identification and interpretation of functional human genetic variation.


Subject(s)
Genome, Human , Genomics , Models, Genetic , Mutation , Humans , Access to Information , Databases, Genetic , Datasets as Topic , Gene Frequency , Genome, Human/genetics , Mutation/genetics , Selection, Genetic
4.
Nat Commun ; 14(1): 6198, 2023 10 04.
Article in English | MEDLINE | ID: mdl-37794074

ABSTRACT

Alternative splicing generates functional diversity in isoforms, impacting immune response to infection. Here, we evaluate the causal role of alternative splicing in COVID-19 severity and susceptibility by applying two-sample Mendelian randomization to cis-splicing quantitative trait loci and the results from COVID-19 Host Genetics Initiative. We identify that alternative splicing in lung, rather than total expression of OAS1, ATP11A, DPP9 and NPNT, is associated with COVID-19 severity. MUC1 and PMF1 splicing is associated with COVID-19 susceptibility. Colocalization analyses support a shared genetic mechanism between COVID-19 severity with idiopathic pulmonary fibrosis at the ATP11A and DPP9 loci, and with chronic obstructive lung diseases at the NPNT locus. Last, we show that ATP11A, DPP9, NPNT, and MUC1 are highly expressed in lung alveolar epithelial cells, both in COVID-19 uninfected and infected samples. These findings clarify the importance of alternative splicing in lung for COVID-19 and respiratory diseases, providing isoform-based targets for drug discovery.


Subject(s)
COVID-19 , Pulmonary Disease, Chronic Obstructive , Respiration Disorders , Humans , Alternative Splicing/genetics , Genetic Predisposition to Disease , COVID-19/genetics , COVID-19/metabolism , Lung/metabolism , Pulmonary Disease, Chronic Obstructive/genetics , Pulmonary Disease, Chronic Obstructive/metabolism , Protein Isoforms/genetics , Respiration Disorders/metabolism , Genome-Wide Association Study/methods
5.
Hum Genet ; 142(10): 1461-1476, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37640912

ABSTRACT

Identifying causal genes at GWAS loci can help pinpoint targets for therapeutic interventions. Expression studies can disentangle such loci but signals from expression quantitative trait loci (eQTLs) often fail to colocalize-which means that the genetic control of measured expression is not shared with the genetic control of disease risk. This may be because gene expression is measured in the wrong cell type, physiological state, or organ. We tested whether Mendelian randomization (MR) could identify genes at loci influencing COVID-19 outcomes and whether the colocalization of genetic control of expression and COVID-19 outcomes was influenced by cell type, cell stimulation, and organ. We conducted MR of cis-eQTLs from single cell (scRNA-seq) and bulk RNA sequencing. We then tested variables that could influence colocalization, including cell type, cell stimulation, RNA sequencing modality, organ, symptoms of COVID-19, and SARS-CoV-2 status among individuals with symptoms of COVID-19. The outcomes used to test colocalization were COVID-19 severity and susceptibility as assessed in the Host Genetics Initiative release 7. Most transcripts identified using MR did not colocalize when tested across cell types, cell state and in different organs. Most that did colocalize likely represented false positives due to linkage disequilibrium. In general, colocalization was highly variable and at times inconsistent for the same transcript across cell type, cell stimulation and organ. While we identified factors that influenced colocalization for select transcripts, identifying 33 that mediate COVID-19 outcomes, our study suggests that colocalization of expression with COVID-19 outcomes is partially due to noisy signals even after following quality control and sensitivity testing. These findings illustrate the present difficulty of linking expression transcripts to disease outcomes and the need for skepticism when observing eQTL MR results, even accounting for cell types, stimulation state and different organs.


Subject(s)
COVID-19 , Humans , COVID-19/genetics , SARS-CoV-2/genetics , Linkage Disequilibrium , Quality Control , Quantitative Trait Loci
6.
Hum Genet ; 142(6): 749-758, 2023 Jun.
Article in English | MEDLINE | ID: mdl-37009933

ABSTRACT

GWAS has identified thousands of loci associated with disease, yet the causal genes within these loci remain largely unknown. Identifying these causal genes would enable deeper understanding of the disease and assist in genetics-based drug development. Exome-wide association studies (ExWAS) are more expensive but can pinpoint causal genes offering high-yield drug targets, yet suffer from a high false-negative rate. Several algorithms have been developed to prioritize genes at GWAS loci, such as the Effector Index (Ei), Locus-2-Gene (L2G), Polygenic Prioritization score (PoPs), and Activity-by-Contact score (ABC) and it is not known if these algorithms can predict ExWAS findings from GWAS data. However, if this were the case, thousands of associated GWAS loci could potentially be resolved to causal genes. Here, we quantified the performance of these algorithms by evaluating their ability to identify ExWAS significant genes for nine traits. We found that Ei, L2G, and PoPs can identify ExWAS significant genes with high areas under the precision recall curve (Ei: 0.52, L2G: 0.37, PoPs: 0.18, ABC: 0.14). Furthermore, we found that for every unit increase in the normalized scores, there was an associated 1.3-4.6-fold increase in the odds of a gene reaching exome-wide significance (Ei: 4.6, L2G: 2.5, PoPs: 2.1, ABC: 1.3). Overall, we found that Ei, L2G, and PoPs can anticipate ExWAS findings from widely available GWAS results. These techniques are therefore promising when well-powered ExWAS data are not readily available and can be used to anticipate ExWAS findings, allowing for prioritization of genes at GWAS loci.


Subject(s)
Exome , Quantitative Trait Loci , Humans , Genome-Wide Association Study/methods , Phenotype , Algorithms , Genetic Predisposition to Disease , Polymorphism, Single Nucleotide
7.
Sci Rep ; 13(1): 6236, 2023 04 17.
Article in English | MEDLINE | ID: mdl-37069249

ABSTRACT

Predicting COVID-19 severity is difficult, and the biological pathways involved are not fully understood. To approach this problem, we measured 4701 circulating human protein abundances in two independent cohorts totaling 986 individuals. We then trained prediction models including protein abundances and clinical risk factors to predict COVID-19 severity in 417 subjects and tested these models in a separate cohort of 569 individuals. For severe COVID-19, a baseline model including age and sex provided an area under the receiver operator curve (AUC) of 65% in the test cohort. Selecting 92 proteins from the 4701 unique protein abundances improved the AUC to 88% in the training cohort, which remained relatively stable in the testing cohort at 86%, suggesting good generalizability. Proteins selected from different COVID-19 severity were enriched for cytokine and cytokine receptors, but more than half of the enriched pathways were not immune-related. Taken together, these findings suggest that circulating proteins measured at early stages of disease progression are reasonably accurate predictors of COVID-19 severity. Further research is needed to understand how to incorporate protein measurement into clinical care.


Subject(s)
COVID-19 , Humans , COVID-19/diagnosis , Proteins , Risk Factors , Disease Progression , Retrospective Studies
8.
Int J Epidemiol ; 52(4): 1163-1174, 2023 08 02.
Article in English | MEDLINE | ID: mdl-36773317

ABSTRACT

OBJECTIVES: Increased iron stores have been associated with elevated risks of different infectious diseases, suggesting that iron supplementation may increase the risk of infections. However, these associations may be biased by confounding or reverse causation. This is important, since up to 19% of the population takes iron supplementation. We used Mendelian randomization (MR) to bypass these biases and estimate the causal effect of iron on infections. METHODS: As instrumental variables, we used genetic variants associated with iron biomarkers in two genome-wide association studies (GWASs) of European ancestry participants. For outcomes, we used GWAS results from the UK Biobank, FinnGen, the COVID-19 Host Genetics Initiative or 23andMe, for seven infection phenotypes: 'any infections', combined, COVID-19 hospitalization, candidiasis, pneumonia, sepsis, skin and soft tissue infection (SSTI) and urinary tract infection (UTI). RESULTS: Most of our analyses showed increasing iron (measured by its biomarkers) was associated with only modest changes in the odds of infectious outcomes, with all 95% odds ratios confidence intervals within the 0.88 to 1.26 range. However, for the three predominantly bacterial infections (sepsis, SSTI, UTI), at least one analysis showed a nominally elevated risk with increased iron stores (P <0.05). CONCLUSION: Using MR, we did not observe an increase in risk of most infectious diseases with increases in iron stores. However for bacterial infections, higher iron stores may increase odds of infections. Hence, using genetic variation in iron pathways as a proxy for iron supplementation, iron supplements are likely safe on a population level, but we should continue the current practice of conservative iron supplementation during bacterial infections or in those at high risk of developing them.


Subject(s)
COVID-19 , Communicable Diseases , Sepsis , Humans , Genome-Wide Association Study , Mendelian Randomization Analysis/methods , Iron , Biomarkers , Sepsis/epidemiology , Sepsis/genetics , Communicable Diseases/epidemiology , Communicable Diseases/genetics , Polymorphism, Single Nucleotide
9.
Nat Metab ; 5(2): 248-264, 2023 02.
Article in English | MEDLINE | ID: mdl-36805566

ABSTRACT

Obesity is a major risk factor for Coronavirus disease (COVID-19) severity; however, the mechanisms underlying this relationship are not fully understood. As obesity influences the plasma proteome, we sought to identify circulating proteins mediating the effects of obesity on COVID-19 severity in humans. Here, we screened 4,907 plasma proteins to identify proteins influenced by body mass index using Mendelian randomization. This yielded 1,216 proteins, whose effect on COVID-19 severity was assessed, again using Mendelian randomization. We found that an s.d. increase in nephronectin (NPNT) was associated with increased odds of critically ill COVID-19 (OR = 1.71, P = 1.63 × 10-10). The effect was driven by an NPNT splice isoform. Mediation analyses supported NPNT as a mediator. In single-cell RNA-sequencing, NPNT was expressed in alveolar cells and fibroblasts of the lung in individuals who died of COVID-19. Finally, decreasing body fat mass and increasing fat-free mass were found to lower NPNT levels. These findings provide actionable insights into how obesity influences COVID-19 severity.


Subject(s)
COVID-19 , Obesity , Proteome , Humans , COVID-19/genetics , Mendelian Randomization Analysis , Obesity/complications , Obesity/genetics
10.
Nat Genet ; 55(1): 44-53, 2023 01.
Article in English | MEDLINE | ID: mdl-36635386

ABSTRACT

Metabolic processes can influence disease risk and provide therapeutic targets. By conducting genome-wide association studies of 1,091 blood metabolites and 309 metabolite ratios, we identified associations with 690 metabolites at 248 loci and associations with 143 metabolite ratios at 69 loci. Integrating metabolite-gene and gene expression information identified 94 effector genes for 109 metabolites and 48 metabolite ratios. Using Mendelian randomization (MR), we identified 22 metabolites and 20 metabolite ratios having estimated causal effect on 12 traits and diseases, including orotate for estimated bone mineral density, α-hydroxyisovalerate for body mass index and ergothioneine for inflammatory bowel disease and asthma. We further measured the orotate level in a separate cohort and demonstrated that, consistent with MR, orotate levels were positively associated with incident hip fractures. This study provides a valuable resource describing the genetic architecture of metabolites and delivers insights into their roles in common diseases, thereby offering opportunities for therapeutic targets.


Subject(s)
Genome-Wide Association Study , Metabolome , Humans , Metabolome/genetics , Phenotype , Bone Density/genetics , Genomics , Polymorphism, Single Nucleotide/genetics
12.
Crit Care ; 26(1): 322, 2022 10 21.
Article in English | MEDLINE | ID: mdl-36271419

ABSTRACT

BACKGROUND: We have previously shown that iatrogenic dehydration is associated with a shift to organic osmolyte production in the general ICU population. The aim of the present investigation was to determine the validity of the physiological response to dehydration known as aestivation and its relevance for long-term disease outcome in COVID-19. METHODS: The study includes 374 COVID-19 patients from the Pronmed cohort admitted to the ICU at Uppsala University Hospital. Dehydration data was available for 165 of these patients and used for the primary analysis. Validation was performed in Biobanque Québécoise de la COVID-19 (BQC19) using 1052 patients with dehydration data. Dehydration was assessed through estimated osmolality (eOSM = 2Na + 2 K + glucose + urea), and correlated to important endpoints including death, invasive mechanical ventilation, acute kidney injury, and long COVID-19 symptom score grouped by physical or mental. RESULTS: Increasing eOSM was correlated with increasing role of organic osmolytes for eOSM, while the proportion of sodium and potassium of eOSM were inversely correlated to eOSM. Acute outcomes were associated with pronounced dehydration, and physical long-COVID was more strongly associated with dehydration than mental long-COVID after adjustment for age, sex, and disease severity. Metabolomic analysis showed enrichment of amino acids among metabolites that showed an aestivating pattern. CONCLUSIONS: Dehydration during acute COVID-19 infection causes an aestivation response that is associated with protein degradation and physical long-COVID. TRIAL REGISTRATION: The study was registered à priori (clinicaltrials.gov: NCT04316884 registered on 2020-03-13 and NCT04474249 registered on 2020-06-29).


Subject(s)
COVID-19 , Humans , SARS-CoV-2 , Dehydration/etiology , Sodium , Urea , Potassium , Amino Acids , Glucose , Post-Acute COVID-19 Syndrome
13.
Clin Proteomics ; 19(1): 34, 2022 Sep 28.
Article in English | MEDLINE | ID: mdl-36171541

ABSTRACT

INTRODUCTION: Severe COVID-19 leads to important changes in circulating immune-related proteins. To date it has been difficult to understand their temporal relationship and identify cytokines that are drivers of severe COVID-19 outcomes and underlie differences in outcomes between sexes. Here, we measured 147 immune-related proteins during acute COVID-19 to investigate these questions. METHODS: We measured circulating protein abundances using the SOMAscan nucleic acid aptamer panel in two large independent hospital-based COVID-19 cohorts in Canada and the United States. We fit generalized additive models with cubic splines from the start of symptom onset to identify protein levels over the first 14 days of infection which were different between severe cases and controls, adjusting for age and sex. Severe cases were defined as individuals with COVID-19 requiring invasive or non-invasive mechanical respiratory support. RESULTS: 580 individuals were included in the analysis. Mean subject age was 64.3 (sd 18.1), and 47% were male. Of the 147 proteins, 69 showed a significant difference between cases and controls (p < 3.4 × 10-4). Three clusters were formed by 108 highly correlated proteins that replicated in both cohorts, making it difficult to determine which proteins have a true causal effect on severe COVID-19. Six proteins showed sex differences in levels over time, of which 3 were also associated with severe COVID-19: CCL26, IL1RL2, and IL3RA, providing insights to better understand the marked differences in outcomes by sex. CONCLUSIONS: Severe COVID-19 is associated with large changes in 69 immune-related proteins. Further, five proteins were associated with sex differences in outcomes. These results provide direct insights into immune-related proteins that are strongly influenced by severe COVID-19 infection.

16.
Commun Biol ; 3(1): 744, 2020 12 08.
Article in English | MEDLINE | ID: mdl-33293579

ABSTRACT

Existing cancer benchmark data sets for human sequencing data use germline variants, synthetic methods, or expensive validations, none of which are satisfactory for providing a large collection of true somatic variation across a whole genome. Here we propose a data set, Lineage derived Somatic Truth (LinST), of short somatic mutations in the HT115 colon cancer cell-line, that are validated using a known cell lineage that includes thousands of mutations and a high confidence region covering 2.7 gigabases per sample.


Subject(s)
Gene Expression Regulation, Neoplastic/physiology , Genome, Human , Neoplasm Proteins/metabolism , Databases, Genetic , Genetic Predisposition to Disease , Humans , Mutation , Neoplasm Proteins/genetics , Reproducibility of Results , Software
17.
Nat Commun ; 11(1): 3697, 2020 07 29.
Article in English | MEDLINE | ID: mdl-32728101

ABSTRACT

As the number of genomics datasets grows rapidly, sample mislabeling has become a high stakes issue. We present CrosscheckFingerprints (Crosscheck), a tool for quantifying sample-relatedness and detecting incorrectly paired sequencing datasets from different donors. Crosscheck outperforms similar methods and is effective even when data are sparse or from different assays. Application of Crosscheck to 8851 ENCODE ChIP-, RNA-, and DNase-seq datasets enabled us to identify and correct dozens of mislabeled samples and ambiguous metadata annotations, representing ~1% of ENCODE datasets.


Subject(s)
High-Throughput Nucleotide Sequencing , Linkage Disequilibrium/genetics , Databases, Nucleic Acid , Genotype , HEK293 Cells , Human Umbilical Vein Endothelial Cells/metabolism , Humans , K562 Cells , Lod Score , Molecular Sequence Annotation
18.
Nature ; 581(7809): 434-443, 2020 05.
Article in English | MEDLINE | ID: mdl-32461654

ABSTRACT

Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases.


Subject(s)
Exome/genetics , Genes, Essential/genetics , Genetic Variation/genetics , Genome, Human/genetics , Adult , Brain/metabolism , Cardiovascular Diseases/genetics , Cohort Studies , Databases, Genetic , Female , Genetic Predisposition to Disease/genetics , Genome-Wide Association Study , Humans , Loss of Function Mutation/genetics , Male , Mutation Rate , Proprotein Convertase 9/genetics , RNA, Messenger/genetics , Reproducibility of Results , Exome Sequencing , Whole Genome Sequencing
19.
Bioinformatics ; 36(7): 2060-2067, 2020 04 01.
Article in English | MEDLINE | ID: mdl-31830260

ABSTRACT

SUMMARY: We investigate convolutional neural networks (CNNs) for filtering small genomic variants in short-read DNA sequence data. Errors created during sequencing and library preparation make variant calling a difficult task. Encoding the reference genome and aligned reads covering sites of genetic variation as numeric tensors allows us to leverage CNNs for variant filtration. Convolutions over these tensors learn to detect motifs useful for classifying variants. Variant filtering models are trained to classify variants as artifacts or real variation. Visualizing the learned weights of the CNN confirmed it detects familiar DNA motifs known to correlate with real variation, like homopolymers and short tandem repeats (STR). After confirmation of the biological plausibility of the learned features we compared our model to current state-of-the-art filtration methods like Gaussian Mixture Models, Random Forests and CNNs designed for image classification, like DeepVariant. We demonstrate improvements in both sensitivity and precision. The tensor encoding was carefully tailored for processing genomic data, respecting the qualitative differences in structure between DNA and natural images. Ablation tests quantitatively measured the benefits of our tensor encoding strategy. Bayesian hyper-parameter optimization confirmed our notion that architectures designed with DNA data in mind outperform off-the-shelf image classification models. Our cross-generalization analysis identified idiosyncrasies in truth resources pointing to the need for new methods to construct genomic truth data. Our results show that models trained on heterogenous data types and diverse truth resources generalize well to new datasets, negating the need to train separate models for each data type. AVAILABILITY AND IMPLEMENTATION: This work is available in the Genome Analysis Toolkit (GATK) with the tool name CNNScoreVariants (https://github.com/broadinstitute/gatk). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genomics , INDEL Mutation , Bayes Theorem , High-Throughput Nucleotide Sequencing , Neural Networks, Computer , Sequence Analysis
20.
Nat Commun ; 9(1): 4038, 2018 10 02.
Article in English | MEDLINE | ID: mdl-30279509

ABSTRACT

Hundreds of thousands of human whole genome sequencing (WGS) datasets will be generated over the next few years. These data are more valuable in aggregate: joint analysis of genomes from many sources increases sample size and statistical power. A central challenge for joint analysis is that different WGS data processing pipelines cause substantial differences in variant calling in combined datasets, necessitating computationally expensive reprocessing. This approach is no longer tenable given the scale of current studies and data volumes. Here, we define WGS data processing standards that allow different groups to produce functionally equivalent (FE) results, yet still innovate on data processing pipelines. We present initial FE pipelines developed at five genome centers and show that they yield similar variant calling results and produce significantly less variability than sequencing replicates. This work alleviates a key technical bottleneck for genome aggregation and helps lay the foundation for community-wide human genetics studies.


Subject(s)
Human Genetics/standards , Whole Genome Sequencing/standards , Genome, Human , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...