RESUMO
GWAS has identified thousands of loci associated with disease, yet the causal genes within these loci remain largely unknown. Identifying these causal genes would enable deeper understanding of the disease and assist in genetics-based drug development. Exome-wide association studies (ExWAS) are more expensive but can pinpoint causal genes offering high-yield drug targets, yet suffer from a high false-negative rate. Several algorithms have been developed to prioritize genes at GWAS loci, such as the Effector Index (Ei), Locus-2-Gene (L2G), Polygenic Prioritization score (PoPs), and Activity-by-Contact score (ABC) and it is not known if these algorithms can predict ExWAS findings from GWAS data. However, if this were the case, thousands of associated GWAS loci could potentially be resolved to causal genes. Here, we quantified the performance of these algorithms by evaluating their ability to identify ExWAS significant genes for nine traits. We found that Ei, L2G, and PoPs can identify ExWAS significant genes with high areas under the precision recall curve (Ei: 0.52, L2G: 0.37, PoPs: 0.18, ABC: 0.14). Furthermore, we found that for every unit increase in the normalized scores, there was an associated 1.3-4.6-fold increase in the odds of a gene reaching exome-wide significance (Ei: 4.6, L2G: 2.5, PoPs: 2.1, ABC: 1.3). Overall, we found that Ei, L2G, and PoPs can anticipate ExWAS findings from widely available GWAS results. These techniques are therefore promising when well-powered ExWAS data are not readily available and can be used to anticipate ExWAS findings, allowing for prioritization of genes at GWAS loci.
Assuntos
Exoma , Locos de Características Quantitativas , Humanos , Estudo de Associação Genômica Ampla/métodos , Fenótipo , Algoritmos , Predisposição Genética para Doença , Polimorfismo de Nucleotídeo ÚnicoRESUMO
MarkerDB is a freely available electronic database that attempts to consolidate information on all known clinical and a selected set of pre-clinical molecular biomarkers into a single resource. The database includes four major types of molecular biomarkers (chemical, protein, DNA [genetic] and karyotypic) and four biomarker categories (diagnostic, predictive, prognostic and exposure). MarkerDB provides information such as: biomarker names and synonyms, associated conditions or pathologies, detailed disease descriptions, detailed biomarker descriptions, biomarker specificity, sensitivity and ROC curves, standard reference values (for protein and chemical markers), variants (for SNP or genetic markers), sequence information (for genetic and protein markers), molecular structures (for protein and chemical markers), tissue or biofluid sources (for protein and chemical markers), chromosomal location and structure (for genetic and karyotype markers), clinical approval status and relevant literature references. Users can browse the data by conditions, condition categories, biomarker types, biomarker categories or search by sequence similarity through the advanced search function. Currently, the database contains 142 protein biomarkers, 1089 chemical biomarkers, 154 karyotype biomarkers and 26 374 genetic markers. These are categorized into 25 560 diagnostic biomarkers, 102 prognostic biomarkers, 265 exposure biomarkers and 6746 predictive biomarkers or biomarker panels. Collectively, these markers can be used to detect, monitor or predict 670 specific human conditions which are grouped into 27 broad condition categories. MarkerDB is available at https://markerdb.ca.
Assuntos
Biomarcadores/metabolismo , Bases de Dados Factuais , Doença/genética , Marcadores Genéticos , Proteínas/genética , Aberrações Cromossômicas , Doença/classificação , Humanos , Internet , Cariotipagem , Valor Preditivo dos Testes , Prognóstico , Proteínas/metabolismo , Curva ROC , SoftwareRESUMO
Cholera has been endemic to the Ganges Delta for centuries. Although the causative agent, Vibrio cholerae, is autochthonous to coastal and brackish water, cholera occurs continually in Dhaka, the inland capital city of Bangladesh which is surrounded by fresh water. Despite the persistence of this problem, little is known about the environmental abundance and distribution of lineages of V. cholerae, the most important being the pandemic generating (PG) lineage consisting mostly of serogroup O1 strains. To understand spatial and temporal dynamics of PG lineage and other lineages belonging to the V. cholerae species in surface water in and around Dhaka City, we used qPCR and high-throughput amplicon sequencing. Seven different freshwater sites across Dhaka were investigated for six consecutive months, and physiochemical parameters were measured in situ. Total abundance of V. cholerae was found to be relatively stable throughout the 6-month sampling period, with 2 × 105 to 4 × 105 genome copies/L at six sites and around 5 × 105 genome copies/L at the site located in the most densely populated part of Dhaka City. PG O1 V. cholerae was present in high abundance during the entire sampling period and composed between 24 and 92% of the total V. cholerae population, only showing occasional but sudden reductions in abundance. In instances where PG O1 lost its dominance, other lineages underwent a rapid expansion while the size of the total V. cholerae population remained almost unchanged. Intraspecies richness of V. cholerae was positively correlated with salinity, conductivity, and total dissolved solids (TDS), while it was negatively correlated with dissolved oxygen (DO) concentration in water. Interestingly, negative correlation was observed specifically between PG O1 and salinity, even though the changes in this variable were minor (0-0.8 ppt). Observations in this study suggest that at the subspecies level, population composition of naturally occurring V. cholerae can be influenced by fluctuations in environmental factors, which can lead to altered competition dynamics among the lineages.
Assuntos
Cólera , Vibrio cholerae , Humanos , Vibrio cholerae/genética , Cólera/epidemiologia , Bangladesh/epidemiologia , ÁguaRESUMO
Most efforts to understand the biology of Vibrio cholerae have focused on a single group, the pandemic-generating lineage harboring the strains responsible for all known cholera pandemics. Consequently, little is known about the diversity of this species in its native aquatic environment. To understand the differences in the V. cholerae populations inhabiting regions with a history of cholera cases and those lacking such a history, a comparative analysis of population composition was performed. Little overlap was found in lineage compositions between those in Dhaka, Bangladesh (where cholera is endemic), located in the Ganges Delta, and those in Falmouth, MA (no known history of cholera), a small coastal town on the United States east coast. The most striking difference was the presence of a group of related lineages at high abundance in Dhaka, which was completely absent from Falmouth. Phylogenomic analysis revealed that these lineages form a cluster at the base of the phylogeny for the V. cholerae species and were sufficiently differentiated genetically and phenotypically to form a novel species. A retrospective search revealed that strains from this species have been anecdotally found from around the world and were isolated as early as 1916 from a British soldier in Egypt suffering from choleraic diarrhea. In 1935, Gardner and Venkatraman unofficially referred to a member of this group as Vibrio paracholerae. In recognition of this earlier designation, we propose the name Vibrio paracholerae sp. nov. for this bacterium. Genomic analysis suggests a link with human populations for this novel species and substantial interaction with its better-known sister species. IMPORTANCE Cholera continues to remain a major public health threat around the globe. Understanding the ecology, evolution, and environmental adaptation of the causative agent (Vibrio cholerae) and tracking the emergence of novel lineages with pathogenic potential are essential to combat the problem. In this study, we investigated the population dynamics of Vibrio cholerae in an inland locality, which is known as endemic for cholera, and compared them with those of a cholera-free coastal location. We found the consistent presence of the pandemic-generating lineage of V. cholerae in Dhaka, where cholera is endemic, and an exclusive presence of a lineage phylogenetically distinct from other V. cholerae lineages. Our study suggests that this lineage represents a novel species that has pathogenic potential and a human link to its environmental abundance. The possible association with human populations and coexistence and interaction with toxigenic V. cholerae in the natural environment make this potential human pathogen an important subject for future studies.
Assuntos
Cólera/microbiologia , Reservatórios de Doenças/microbiologia , Água do Mar/microbiologia , Vibrio/isolamento & purificação , Bangladesh/epidemiologia , Cólera/epidemiologia , Evolução Molecular , Humanos , Filogenia , Estudos Retrospectivos , Vibrio/classificação , Vibrio/genética , Vibrio cholerae O1/classificação , Vibrio cholerae O1/genéticaRESUMO
Core genome multilocus sequence typing (cgMLST) has gained popularity in recent years in epidemiological research and subspecies-level classification. cgMLST retains the intuitive nature of traditional MLST but offers much greater resolution by utilizing significantly larger portions of the genome. Here, we introduce a cgMLST scheme for Vibrio cholerae, a bacterium abundant in marine and freshwater environments and the etiologic agent of cholera. A set of 2,443 core genes ubiquitous in V. cholerae were used to analyze a comprehensive data set of 1,262 clinical and environmental strains collected from 52 countries, including 65 newly sequenced genomes in this study. We established a sublineage threshold based on 133 allelic differences that creates clusters nearly identical to traditional MLST types, providing backwards compatibility to new cgMLST classifications. We also defined an outbreak threshold based on seven allelic differences that is capable of identifying strains from the same outbreak and closely related isolates that could give clues on outbreak origin. Using cgMLST, we confirmed the South Asian origin of modern epidemics and identified clustering affinity among sublineages of environmental isolates from the same geographic origin. Advantages of this method are highlighted by direct comparison with existing classification methods, such as MLST and single-nucleotide polymorphism-based methods. cgMLST outperforms all existing methods in terms of resolution, standardization, and ease of use. We anticipate this scheme will serve as a basis for a universally applicable and standardized classification system for V. cholerae research and epidemiological surveillance in the future. This cgMLST scheme is publicly available on PubMLST (https://pubmlst.org/vcholerae/).IMPORTANCE Toxigenic Vibrio cholerae isolates of the O1 and O139 serogroups are the causative agents of cholera, an acute diarrheal disease that plagued the world for centuries, if not millennia. Here, we introduce a core genome multilocus sequence typing scheme for V. cholerae Using this scheme, we have standardized the definition for subspecies-level classification, facilitating global collaboration in the surveillance of V. cholerae In addition, this typing scheme allows for quick identification of outbreak-related isolates that can guide subsequent analyses, serving as an important first step in epidemiological research. This scheme is also easily scalable to analyze thousands of isolates at various levels of resolution, making it an invaluable tool for large-scale ecological and evolutionary analyses.
Assuntos
Técnicas de Tipagem Bacteriana/métodos , Cólera/microbiologia , Tipagem de Sequências Multilocus/métodos , Vibrio cholerae/genética , Alelos , Cólera/epidemiologia , Estudos Epidemiológicos , Genoma Bacteriano , Genótipo , Humanos , Filogenia , Polimorfismo de Nucleotídeo Único , Vibrio cholerae/classificação , Vibrio cholerae/isolamento & purificação , Iêmen/epidemiologiaRESUMO
CONTEXT: Trinucleotide repeats in the androgen receptor have been proposed to influence testosterone signaling in men, but the clinical relevance of these trinucleotide repeats remains controversial. OBJECTIVE: To examine how androgen receptor trinucleotide repeat lengths affect androgen-related traits and disease risks and whether they influence the clinical importance of circulating testosterone levels. METHODS: We quantified CAG and GGC repeat lengths in the androgen receptor (AR) gene of European-ancestry male participants in UK Biobank from whole-genome and whole-exome sequence data using ExpansionHunter, and tested associations with androgen-related traits and diseases. We also examined whether the associations between testosterone levels and these outcomes were affected by adjustment for the repeat lengths. RESULTS: We successfully quantified the repeat lengths from whole-genome and/or whole-exome sequence data in 181,217 males. Both repeat lengths were shown to be positively associated with circulating total testosterone level and bone mineral density, whereas CAG repeat length was negatively associated with male-pattern baldness, but their effects were relatively small and were not associated with most of the other outcomes. Circulating total testosterone level was associated with various outcomes, but this relationship was not affected by adjustment for the repeat lengths. CONCLUSION: In this large-scale study, we found that longer CAG and GGC repeats in the AR gene influence androgen resistance, elevate circulating testosterone level via a feedback loop and play a role in some androgen-targeted tissues. Generally, however, circulating testosterone level is a more important determinant of androgen action in males than repeat lengths.
RESUMO
A novel algorithm, AlphaMissense, has been shown to have an improved ability to predict the pathogenicity of rare missense genetic variants. However, it is not known whether AlphaMissense improves the ability of gene-based testing to identify disease-influencing genes. Using whole-exome sequencing data from the UK Biobank, we compared gene-based association analysis strategies including sets of deleterious variants: predicted loss-of-function (pLoF) variants only, pLoF plus AlphaMissense pathogenic variants, pLoF with missense variants predicted to be deleterious by any of five commonly utilized annotation methods (Missense (1/5)) or only variants predicted to be deleterious by all five methods (Missense (5/5)). We measured performance to identify 519 previously identified positive control genes, which can lead to Mendelian diseases, or are the targets of successfully developed medicines. These strategies identified 0.85 million pLoF variants and 5 million deleterious missense variants, including 22,131 likely pathogenic missense variants identified exclusively by AlphaMissense. The gene-based association tests found 608 significant gene associations (at p < 1.25 × 10-7) across 24 common traits and diseases. Compared with pLoFs plus Missense (5/5), tests using pLoFs and AlphaMissense variants found slightly more significant gene-disease and gene-trait associations, albeit with a marginally lower proportion of positive control genes. Nevertheless, their overall performance was similar. Merging AlphaMissense with Missense (5/5), whether through their intersection or union, did not yield any further enhancement in performance. In summary, employing AlphaMissense to select deleterious variants for gene-based testing did not improve the ability to identify genes that are known to influence disease.
Assuntos
Predisposição Genética para Doença , Mutação de Sentido Incorreto , Humanos , Mutação de Sentido Incorreto/genética , Predisposição Genética para Doença/genética , Algoritmos , Sequenciamento do Exoma/métodos , Estudo de Associação Genômica Ampla/métodos , Biologia Computacional/métodosRESUMO
OBJECTIVES: Increased iron stores have been associated with elevated risks of different infectious diseases, suggesting that iron supplementation may increase the risk of infections. However, these associations may be biased by confounding or reverse causation. This is important, since up to 19% of the population takes iron supplementation. We used Mendelian randomization (MR) to bypass these biases and estimate the causal effect of iron on infections. METHODS: As instrumental variables, we used genetic variants associated with iron biomarkers in two genome-wide association studies (GWASs) of European ancestry participants. For outcomes, we used GWAS results from the UK Biobank, FinnGen, the COVID-19 Host Genetics Initiative or 23andMe, for seven infection phenotypes: 'any infections', combined, COVID-19 hospitalization, candidiasis, pneumonia, sepsis, skin and soft tissue infection (SSTI) and urinary tract infection (UTI). RESULTS: Most of our analyses showed increasing iron (measured by its biomarkers) was associated with only modest changes in the odds of infectious outcomes, with all 95% odds ratios confidence intervals within the 0.88 to 1.26 range. However, for the three predominantly bacterial infections (sepsis, SSTI, UTI), at least one analysis showed a nominally elevated risk with increased iron stores (P <0.05). CONCLUSION: Using MR, we did not observe an increase in risk of most infectious diseases with increases in iron stores. However for bacterial infections, higher iron stores may increase odds of infections. Hence, using genetic variation in iron pathways as a proxy for iron supplementation, iron supplements are likely safe on a population level, but we should continue the current practice of conservative iron supplementation during bacterial infections or in those at high risk of developing them.
Assuntos
COVID-19 , Doenças Transmissíveis , Sepse , Humanos , Estudo de Associação Genômica Ampla , Análise da Randomização Mendeliana/métodos , Ferro , Biomarcadores , Sepse/epidemiologia , Sepse/genética , Doenças Transmissíveis/epidemiologia , Doenças Transmissíveis/genética , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Metabolic processes can influence disease risk and provide therapeutic targets. By conducting genome-wide association studies of 1,091 blood metabolites and 309 metabolite ratios, we identified associations with 690 metabolites at 248 loci and associations with 143 metabolite ratios at 69 loci. Integrating metabolite-gene and gene expression information identified 94 effector genes for 109 metabolites and 48 metabolite ratios. Using Mendelian randomization (MR), we identified 22 metabolites and 20 metabolite ratios having estimated causal effect on 12 traits and diseases, including orotate for estimated bone mineral density, α-hydroxyisovalerate for body mass index and ergothioneine for inflammatory bowel disease and asthma. We further measured the orotate level in a separate cohort and demonstrated that, consistent with MR, orotate levels were positively associated with incident hip fractures. This study provides a valuable resource describing the genetic architecture of metabolites and delivers insights into their roles in common diseases, thereby offering opportunities for therapeutic targets.
Assuntos
Estudo de Associação Genômica Ampla , Metaboloma , Humanos , Metaboloma/genética , Fenótipo , Densidade Óssea/genética , Genômica , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
The human leukocyte antigen (HLA) region on chromosome 6 is strongly associated with many immune-mediated and infection-related diseases. Due to its highly polymorphic nature and complex linkage disequilibrium patterns, traditional genetic association studies of single nucleotide polymorphisms do not perform well in this region. Instead, the field has adopted the assessment of the association of HLA alleles (i.e., entire HLA gene haplotypes) with disease. Often based on genotyping arrays, these association studies impute HLA alleles, decreasing accuracy and thus statistical power for rare alleles and in non-European ancestries. Here, we use whole-exome sequencing (WES) from 454,824 UK Biobank (UKB) participants to directly call HLA alleles using the HLA-HD algorithm. We show this method is more accurate than imputing HLA alleles and harness the improved statistical power to identify 360 associations for 11 auto-immune phenotypes (at least 129 likely novel), leading to better insights into the specific coding polymorphisms that underlie these diseases. We show that HLA alleles with synonymous variants, often overlooked in HLA studies, can significantly influence these phenotypes. Lastly, we show that HLA sequencing may improve polygenic risk scores accuracy across ancestries. These findings allow better characterization of the role of the HLA region in human disease.
Assuntos
Doenças Autoimunes , Bancos de Espécimes Biológicos , Humanos , Alelos , Sequenciamento do Exoma , Predisposição Genética para Doença , Doenças Autoimunes/genética , Antígenos HLA/genética , Antígenos de Histocompatibilidade Classe I/genética , Antígenos de Histocompatibilidade Classe II , Polimorfismo de Nucleotídeo Único , Reino UnidoRESUMO
BACKGROUND: Vibrio cholerae, the causative agent of cholera, is a well-studied species, whereas Vibrio metoecus is a recently described close relative that is also associated with human infections. The availability of V. metoecus genomes provides further insight into its genetic differences from V. cholerae. Additionally, both species have been co-isolated from a cholera-free brackish coastal pond and have been suggested to interact with each other by horizontal gene transfer (HGT). RESULTS: The genomes of 17 strains from each species were sequenced. All strains share a large core genome (2675 gene families) and very few genes are unique to each species (< 3% of the pan-genome of both species). This led to the identification of potential molecular markers-for nitrite reduction, as well as peptidase and rhodanese activities-to further distinguish V. metoecus from V. cholerae. Interspecies HGT events were inferred in 21% of the core genes and 45% of the accessory genes. A directional bias in gene transfer events was found in the core genome, where V. metoecus was a recipient of three times (75%) more genes from V. cholerae than it was a donor (25%). CONCLUSION: V. metoecus was misclassified as an atypical variant of V. cholerae due to their resemblance in a majority of biochemical characteristics. More distinguishing phenotypic assays can be developed based on the discovery of potential gene markers to avoid any future misclassifications. Furthermore, differences in relative abundance or seasonality were observed between the species and could contribute to the bias in directionality of HGT.
RESUMO
The family Rhodobacteraceae consists of alphaproteobacteria that are metabolically, phenotypically, and ecologically diverse. It includes the roseobacter clade, an informal designation, representing one of the most abundant groups of marine bacteria. The rapid pace of discovery of novel roseobacters in the last three decades meant that the best practice for taxonomic classification, a polyphasic approach utilizing phenotypic, genotypic, and phylogenetic characteristics, was not always followed. Early efforts for classification relied heavily on 16S rRNA gene sequence similarity and resulted in numerous taxonomic inconsistencies, with several poly- and paraphyletic genera within this family. Next-generation sequencing technologies have allowed whole-genome sequences to be obtained for most type strains, making a revision of their taxonomy possible. In this study, we performed whole-genome phylogenetic and genotypic analyses combined with a meta-analysis of phenotypic data to review taxonomic classifications of 331 type strains (under 119 genera) within the Rhodobacteraceae family. Representatives of the roseobacter clade not only have different environmental adaptions from other Rhodobacteraceae isolates but were also found to be distinct based on genomic, phylogenetic, and in silico-predicted phenotypic data. As such, we propose to move this group of bacteria into a new family, Roseobacteraceae fam. nov. In total, reclassifications resulted to 327 species and 128 genera, suggesting that misidentification is more problematic at the genus than species level. By resolving taxonomic inconsistencies of type strains within this family, we have established a set of coherent criteria based on whole-genome-based analyses that will help guide future taxonomic efforts and prevent the propagation of errors.