Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 15 de 15
Filter
1.
BMC Genomics ; 25(1): 375, 2024 Apr 17.
Article in English | MEDLINE | ID: mdl-38627641

ABSTRACT

BACKGROUND: Approximately 95% of samples analyzed in univariate genome-wide association studies (GWAS) are of European ancestry. This bias toward European ancestry populations in association screening also exists for other analyses and methods that are often developed and tested on European ancestry only. However, existing data in non-European populations, which are often of modest sample size, could benefit from innovative approaches as recently illustrated in the context of polygenic risk scores. METHODS: Here, we extend and assess the potential limitations and gains of our multi-trait GWAS pipeline, JASS (Joint Analysis of Summary Statistics), for the analysis of non-European ancestries. To this end, we conducted the joint GWAS of 19 hematological traits and glycemic traits across five ancestries (European (EUR), admixed American (AMR), African (AFR), East Asian (EAS), and South-East Asian (SAS)). RESULTS: We detected 367 new genome-wide significant associations in non-European populations (15 in Admixed American (AMR), 72 in African (AFR) and 280 in East Asian (EAS)). New associations detected represent 5%, 17% and 13% of associations in the AFR, AMR and EAS populations, respectively. Overall, multi-trait testing increases the replication of European associated loci in non-European ancestry by 15%. Pleiotropic effects were highly similar at significant loci across ancestries (e.g. the mean correlation between multi-trait genetic effects of EUR and EAS ancestries was 0.88). For hematological traits, strong discrepancies in multi-trait genetic effects are tied to known evolutionary divergences: the ARKC1 loci, which is adaptive to overcome p.vivax induced malaria. CONCLUSIONS: Multi-trait GWAS can be a valuable tool to narrow the genetic knowledge gap between European and non-European populations.


Subject(s)
Asian People , Black People , Genome-Wide Association Study , Humans , Asian People/genetics , Black People/genetics , Genetic Predisposition to Disease , Genome-Wide Association Study/methods , Phenotype , Polymorphism, Single Nucleotide , European People/genetics
2.
bioRxiv ; 2023 Oct 27.
Article in English | MEDLINE | ID: mdl-37961722

ABSTRACT

Since the first Genome-Wide Association Studies (GWAS), thousands of variant-trait associations have been discovered. However, the sample size required to detect additional variants using standard univariate association screening is increasingly prohibitive. Multi-trait GWAS offers a relevant alternative: it can improve statistical power and lead to new insights about gene function and the joint genetic architecture of human phenotypes. Although many methodological hurdles of multi-trait testing have been discussed, the strategy to select trait, among overwhelming possibilities, has been overlooked. In this study, we conducted extensive multi-trait tests using JASS (Joint Analysis of Summary Statistics) and assessed which genetic features of the analysed sets were associated with an increased detection of variants as compared to univariate screening. Our analyses identified multiple factors associated with the gain in the association detection in multi-trait tests. Together, these factors of the analysed sets are predictive of the gain of the multi-trait test (Pearson's ρ equal to 0.43 between the observed and predicted gain, P < 1.6 × 10-60). Applying an alternative multi-trait approach (MTAG, multi-trait analysis of GWAS), we found that in most scenarios but particularly those with larger numbers of traits, JASS outperformed MTAG. Finally, we benchmark several strategies to select set of traits including the prevalent strategy of selecting clinically similar traits, which systematically underperformed selecting clinically heterogenous traits or selecting sets that issued from our data-driven models. This work provides a unique picture of the determinant of multi-trait GWAS statistical power and outline practical strategies for multi-trait testing.

3.
PLoS One ; 18(6): e0286811, 2023.
Article in English | MEDLINE | ID: mdl-37285372

ABSTRACT

Success in STEM (Science, Technology, Engineering, and Math) remains influenced by race, gender, and socioeconomic status. Here, we focus on the impact of gender on question-asking behavior during the 2021 JOBIM virtual conference (Journées Ouvertes en Biologie et Mathématiques). We gathered quantitative and qualitative data including : demographic information, question asking motivations, live observations and interviews of participants. Quantitative analyses include unprecedented figures such as the fraction of the audience identifying as LGBTQIA+ and an increased attendance of women in virtual conferences. Although parity was reached in the audience, women asked half as many questions as men. This under-representation persisted after accounting for seniority of the asker. Interviews of participants highlighted several barriers to oral expression encountered by women and gender minorities : negative reactions to their speech, discouragement to pursue a career in research, and gender discrimination/sexual harassment. Informed by the study, guidelines for conference organizers have been written. The story behind the making of this study has been highlighted in a Nature Career article.


Subject(s)
Sexism , Sexual Harassment , Male , Humans , Female , Speech , Social Class , Bias
4.
medRxiv ; 2023 Feb 21.
Article in English | MEDLINE | ID: mdl-36865145

ABSTRACT

Chronic Obstructive Pulmonary Disease (COPD) has a simple physiological diagnostic criterion but a wide range of clinical characteristics. The mechanisms underlying this variability in COPD phenotypes are unclear. To investigate the potential contribution of genetic variants to phenotypic heterogeneity, we examined the association of genome-wide associated lung function, COPD, and asthma variants with other phenotypes using phenome-wide association results derived in the UK Biobank. Our clustering analysis of the variants-phenotypes association matrix identified three clusters of genetic variants with different effects on white blood cell counts, height, and body mass index (BMI). To assess the potential clinical and molecular effects of these groups of variants, we investigated the association between cluster-specific genetic risk scores and phenotypes in the COPDGene cohort. We observed differences in steroid use, BMI, lymphocyte counts, chronic bronchitis, and differential gene and protein expression across the three genetic risk scores. Our results suggest that multi-phenotype analysis of obstructive lung disease-related risk variants may identify genetically driven phenotypic patterns in COPD.

5.
BMC Bioinformatics ; 23(1): 208, 2022 Jun 01.
Article in English | MEDLINE | ID: mdl-35650523

ABSTRACT

BACKGROUND: Bioinformatics investigators often gain insights by combining information across multiple and disparate data sets. Merging data from multiple sources frequently results in data sets that are incomplete or contain missing values. Although missing data are ubiquitous, existing implementations of Gaussian mixture models (GMMs) either cannot accommodate missing data, or do so by imposing simplifying assumptions that limit the applicability of the model. In the presence of missing data, a standard ad hoc practice is to perform complete case analysis or imputation prior to model fitting. Both approaches have serious drawbacks, potentially resulting in biased and unstable parameter estimates. RESULTS: Here we present missingness-aware Gaussian mixture models (MGMM), an R package for fitting GMMs in the presence of missing data. Unlike existing GMM implementations that can accommodate missing data, MGMM places no restrictions on the form of the covariance matrix. Using three case studies on real and simulated 'omics data sets, we demonstrate that, when the underlying data distribution is near-to a GMM, MGMM is more effective at recovering the true cluster assignments than either the existing GMM implementations that accommodate missing data, or fitting a standard GMM after state of the art imputation. Moreover, MGMM provides an accurate assessment of cluster assignment uncertainty, even when the generative distribution is not a GMM. CONCLUSION: Compared to state-of-the-art competitors, MGMM demonstrates a better ability to recover the true cluster assignments for a wide variety of data sets and a large range of missingness rates. MGMM provides the bioinformatics community with a powerful, easy-to-use, and statistically sound tool for performing clustering and density estimation in the presence of missing data. MGMM is publicly available as an R package on CRAN: https://CRAN.R-project.org/package=MGMM .


Subject(s)
Computational Biology , Cluster Analysis , Computational Biology/methods , Normal Distribution
7.
Nat Med ; 28(2): 303-314, 2022 02.
Article in English | MEDLINE | ID: mdl-35177860

ABSTRACT

Previous microbiome and metabolome analyses exploring non-communicable diseases have paid scant attention to major confounders of study outcomes, such as common, pre-morbid and co-morbid conditions, or polypharmacy. Here, in the context of ischemic heart disease (IHD), we used a study design that recapitulates disease initiation, escalation and response to treatment over time, mirroring a longitudinal study that would otherwise be difficult to perform given the protracted nature of IHD pathogenesis. We recruited 1,241 middle-aged Europeans, including healthy individuals, individuals with dysmetabolic morbidities (obesity and type 2 diabetes) but lacking overt IHD diagnosis and individuals with IHD at three distinct clinical stages-acute coronary syndrome, chronic IHD and IHD with heart failure-and characterized their phenome, gut metagenome and serum and urine metabolome. We found that about 75% of microbiome and metabolome features that distinguish individuals with IHD from healthy individuals after adjustment for effects of medication and lifestyle are present in individuals exhibiting dysmetabolism, suggesting that major alterations of the gut microbiome and metabolome might begin long before clinical onset of IHD. We further categorized microbiome and metabolome signatures related to prodromal dysmetabolism, specific to IHD in general or to each of its three subtypes or related to escalation or de-escalation of IHD. Discriminant analysis based on specific IHD microbiome and metabolome features could better differentiate individuals with IHD from healthy individuals or metabolically matched individuals as compared to the conventional risk markers, pointing to a pathophysiological relevance of these features.


Subject(s)
Cardiovascular Diseases , Diabetes Mellitus, Type 2 , Microbiota , Humans , Longitudinal Studies , Metabolome , Middle Aged
8.
PLoS Genet ; 17(8): e1009713, 2021 08.
Article in English | MEDLINE | ID: mdl-34460823

ABSTRACT

Genome-wide association studies (GWASs) have uncovered a wealth of associations between common variants and human phenotypes. Here, we present an integrative analysis of GWAS summary statistics from 36 phenotypes to decipher multitrait genetic architecture and its link with biological mechanisms. Our framework incorporates multitrait association mapping along with an investigation of the breakdown of genetic associations into clusters of variants harboring similar multitrait association profiles. Focusing on two subsets of immunity and metabolism phenotypes, we then demonstrate how genetic variants within clusters can be mapped to biological pathways and disease mechanisms. Finally, for the metabolism set, we investigate the link between gene cluster assignment and the success of drug targets in randomized controlled trials.


Subject(s)
Computational Biology/methods , Polymorphism, Single Nucleotide , Quantitative Trait Loci , Cluster Analysis , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans , Phenotype
9.
Sci Rep ; 11(1): 13252, 2021 06 24.
Article in English | MEDLINE | ID: mdl-34168163

ABSTRACT

Knowledge about in vivo effects of human circulating C-6 hydroxylated bile acids (BAs), also called muricholic acids, is sparse. It is unsettled if the gut microbiome might contribute to their biosynthesis. Here, we measured a range of serum BAs and related them to markers of human metabolic health and the gut microbiome. We examined 283 non-obese and obese Danish adults from the MetaHit study. Fasting concentrations of serum BAs were quantified using ultra-performance liquid chromatography-tandem mass-spectrometry. The gut microbiome was characterized with shotgun metagenomic sequencing and genome-scale metabolic modeling. We find that tauro- and glycohyocholic acid correlated inversely with body mass index (P = 4.1e-03, P = 1.9e-05, respectively), waist circumference (P = 0.017, P = 1.1e-04, respectively), body fat percentage (P = 2.5e-03, P = 2.3e-06, respectively), insulin resistance (P = 0.051, P = 4.6e-4, respectively), fasting concentrations of triglycerides (P = 0.06, P = 9.2e-4, respectively) and leptin (P = 0.067, P = 9.2e-4). Tauro- and glycohyocholic acids, and tauro-a-muricholic acid were directly linked with a distinct gut microbial community primarily composed of Clostridia species (P = 0.037, P = 0.013, P = 0.027, respectively). We conclude that serum conjugated C-6-hydroxylated BAs associate with measures of human metabolic health and gut communities of Clostridia species. The findings merit preclinical interventions and human feasibility studies to explore the therapeutic potential of these BAs in obesity and type 2 diabetes.


Subject(s)
Bile Acids and Salts/blood , Clostridium/metabolism , Gastrointestinal Microbiome , Adiposity , Body Mass Index , Cholic Acids/blood , Chromatography, High Pressure Liquid , Clostridium/genetics , Deoxycholic Acid/blood , Female , Gastrointestinal Microbiome/genetics , Humans , Logistic Models , Male , Metagenomics , Middle Aged , Obesity/blood , Obesity/microbiology , Tandem Mass Spectrometry , Taurocholic Acid/blood , Waist Circumference
10.
NAR Genom Bioinform ; 2(1): lqaa003, 2020 Mar.
Article in English | MEDLINE | ID: mdl-32002517

ABSTRACT

Genome-wide association study (GWAS) has been the driving force for identifying association between genetic variants and human phenotypes. Thousands of GWAS summary statistics covering a broad range of human traits and diseases are now publicly available. These GWAS have proven their utility for a range of secondary analyses, including in particular the joint analysis of multiple phenotypes to identify new associated genetic variants. However, although several methods have been proposed, there are very few large-scale applications published so far because of challenges in implementing these methods on real data. Here, we present JASS (Joint Analysis of Summary Statistics), a polyvalent Python package that addresses this need. Our package incorporates recently developed joint tests such as the omnibus approach and various weighted sum of Z-score tests while solving all practical and computational barriers for large-scale multivariate analysis of GWAS summary statistics. This includes data cleaning and harmonization tools, an efficient algorithm for fast derivation of joint statistics, an optimized data management process and a web interface for exploration purposes. Both benchmark analyses and real data applications demonstrated the robustness and strong potential of JASS for the detection of new associated genetic variants. Our package is freely available at https://gitlab.pasteur.fr/statistical-genetics/jass.

11.
Nat Commun ; 10(1): 4788, 2019 10 21.
Article in English | MEDLINE | ID: mdl-31636271

ABSTRACT

Genetic studies of metabolites have identified thousands of variants, many of which are associated with downstream metabolic and obesogenic disorders. However, these studies have relied on univariate analyses, reducing power and limiting context-specific understanding. Here we aim to provide an integrated perspective of the genetic basis of metabolites by leveraging the Finnish Metabolic Syndrome In Men (METSIM) cohort, a unique genetic resource which contains metabolic measurements, mostly lipids, across distinct time points as well as information on statin usage. We increase effective sample size by an average of two-fold by applying the Covariates for Multi-phenotype Studies (CMS) approach, identifying 588 significant SNP-metabolite associations, including 228 new associations. Our analysis pinpoints a small number of master metabolic regulator genes, balancing the relative proportion of dozens of metabolite levels. We further identify associations to changes in metabolic levels across time as well as genetic interactions with statin at both the master metabolic regulator and genome-wide level.


Subject(s)
Genetic Pleiotropy , Metabolic Syndrome/genetics , Metabolome/genetics , Aged , Amino Acids/genetics , Amino Acids/metabolism , Cohort Studies , Fatty Acids/genetics , Fatty Acids/metabolism , Gene Regulatory Networks , Genome-Wide Association Study , Humans , Lipoproteins, HDL/genetics , Lipoproteins, HDL/metabolism , Lipoproteins, IDL/genetics , Lipoproteins, IDL/metabolism , Lipoproteins, LDL/genetics , Lipoproteins, LDL/metabolism , Lipoproteins, VLDL/genetics , Lipoproteins, VLDL/metabolism , Magnetic Resonance Spectroscopy , Male , Middle Aged , Polymorphism, Single Nucleotide
12.
Bioinformatics ; 35(22): 4837-4839, 2019 11 01.
Article in English | MEDLINE | ID: mdl-31173064

ABSTRACT

MOTIVATION: Multi-trait analyses using public summary statistics from genome-wide association studies (GWASs) are becoming increasingly popular. A constraint of multi-trait methods is that they require complete summary data for all traits. Although methods for the imputation of summary statistics exist, they lack precision for genetic variants with small effect size. This is benign for univariate analyses where only variants with large effect size are selected a posteriori. However, it can lead to strong p-value inflation in multi-trait testing. Here we present a new approach that improve the existing imputation methods and reach a precision suitable for multi-trait analyses. RESULTS: We fine-tuned parameters to obtain a very high accuracy imputation from summary statistics. We demonstrate this accuracy for variants of all effect sizes on real data of 28 GWAS. We implemented the resulting methodology in a python package specially designed to efficiently impute multiple GWAS in parallel. AVAILABILITY AND IMPLEMENTATION: The python package is available at: https://gitlab.pasteur.fr/statistical-genetics/raiss, its accompanying documentation is accessible here http://statistical-genetics.pages.pasteur.fr/raiss/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genome-Wide Association Study , Software , Genotype , Phenotype , Polymorphism, Single Nucleotide
13.
PLoS Comput Biol ; 11(2): e1003969, 2015 Feb.
Article in English | MEDLINE | ID: mdl-25658386

ABSTRACT

Epigenetic regulation of the replication program during mammalian cell differentiation remains poorly understood. We performed an integrative analysis of eleven genome-wide epigenetic profiles at 100 kb resolution of Mean Replication Timing (MRT) data in six human cell lines. Compared to the organization in four chromatin states shared by the five somatic cell lines, embryonic stem cell (ESC) line H1 displays (i) a gene-poor but highly dynamic chromatin state (EC4) associated to histone variant H2AZ rather than a HP1-associated heterochromatin state (C4) and (ii) a mid-S accessible chromatin state with bivalent gene marks instead of a polycomb-repressed heterochromatin state. Plastic MRT regions (≲ 20% of the genome) are predominantly localized at the borders of U-shaped timing domains. Whereas somatic-specific U-domain borders are gene-dense GC-rich regions, 31.6% of H1-specific U-domain borders are early EC4 regions enriched in pluripotency transcription factors NANOG and OCT4 despite being GC poor and gene deserts. Silencing of these ESC-specific "master" replication initiation zones during differentiation corresponds to a loss of H2AZ and an enrichment in H3K9me3 mark characteristic of late replicating C4 heterochromatin. These results shed a new light on the epigenetically regulated global chromatin reorganization that underlies the loss of pluripotency and lineage commitment.


Subject(s)
Chromatin/genetics , Embryonic Stem Cells/physiology , Epigenesis, Genetic/genetics , Histones/genetics , Replication Origin/genetics , Cell Differentiation/genetics , Cell Line , Chromatin/chemistry , Chromatin/metabolism , Cluster Analysis , Computational Biology , Histones/chemistry , Histones/metabolism , Humans
14.
PLoS Comput Biol ; 9(10): e1003233, 2013.
Article in English | MEDLINE | ID: mdl-24130466

ABSTRACT

Advances in genomic studies have led to significant progress in understanding the epigenetically controlled interplay between chromatin structure and nuclear functions. Epigenetic modifications were shown to play a key role in transcription regulation and genome activity during development and differentiation or in response to the environment. Paradoxically, the molecular mechanisms that regulate the initiation and the maintenance of the spatio-temporal replication program in higher eukaryotes, and in particular their links to epigenetic modifications, still remain elusive. By integrative analysis of the genome-wide distributions of thirteen epigenetic marks in the human cell line K562, at the 100 kb resolution of corresponding mean replication timing (MRT) data, we identify four major groups of chromatin marks with shared features. These states have different MRT, namely from early to late replicating, replication proceeds though a transcriptionally active euchromatin state (C1), a repressive type of chromatin (C2) associated with polycomb complexes, a silent state (C3) not enriched in any available marks, and a gene poor HP1-associated heterochromatin state (C4). When mapping these chromatin states inside the megabase-sized U-domains (U-shaped MRT profile) covering about 50% of the human genome, we reveal that the associated replication fork polarity gradient corresponds to a directional path across the four chromatin states, from C1 at U-domains borders followed by C2, C3 and C4 at centers. Analysis of the other genome half is consistent with early and late replication loci occurring in separate compartments, the former correspond to gene-rich, high-GC domains of intermingled chromatin states C1 and C2, whereas the latter correspond to gene-poor, low-GC domains of alternating chromatin states C3 and C4 or long C4 domains. This new segmentation sheds a new light on the epigenetic regulation of the spatio-temporal replication program in human and provides a framework for further studies in different cell types, in both health and disease.


Subject(s)
Chromatin/genetics , Computational Biology/methods , DNA Replication/genetics , Genome, Human/genetics , Chromatin/metabolism , Cluster Analysis , Gene Expression/genetics , Humans , K562 Cells , Principal Component Analysis , Statistics, Nonparametric
15.
Nat Protoc ; 8(1): 98-110, 2013 Jan.
Article in English | MEDLINE | ID: mdl-23237832

ABSTRACT

In this protocol, we describe the use of the LastWave open-source signal-processing command language (http://perso.ens-lyon.fr/benjamin.audit/LastWave/) for analyzing cellular DNA replication timing profiles. LastWave makes use of a multiscale, wavelet-based signal-processing algorithm that is based on a rigorous theoretical analysis linking timing profiles to fundamental features of the cell's DNA replication program, such as the average replication fork polarity and the difference between replication origin density and termination site density. We describe the flow of signal-processing operations to obtain interactive visual analyses of DNA replication timing profiles. We focus on procedures for exploring the space-scale map of apparent replication speeds to detect peaks in the replication timing profiles that represent preferential replication initiation zones, and for delimiting U-shaped domains in the replication timing profile. In comparison with the generally adopted approach that involves genome segmentation into regions of constant timing separated by timing transition regions, the present protocol enables the recognition of more complex patterns of the spatio-temporal replication program and has a broader range of applications. Completing the full procedure should not take more than 1 h, although learning the basics of the program can take a few hours and achieving full proficiency in the use of the software may take days.


Subject(s)
Algorithms , DNA Replication Timing , Genome, Human , Software , Wavelet Analysis , HeLa Cells , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...