RESUMO
Identification of genes encoding ß-lactamases (BLs) from short-read sequences remains challenging due to the high frequency of shared amino acid functional domains and motifs in proteins encoded by BL genes and related non-BL gene sequences. Divergent BL homologs can be frequently missed during similarity searches, which has important practical consequences for monitoring antibiotic resistance. To address this limitation, we built ROCker models that targeted broad classes (e.g., class A, B, C, and D) and individual families (e.g., TEM) of BLs and challenged them with mock 150-bp- and 250-bp-read data sets of known composition. ROCker identifies most-discriminant bit score thresholds in sliding windows along the sequence of the target protein sequence and hence can account for nondiscriminative domains shared by unrelated proteins. BL ROCker models showed a 0% false-positive rate (FPR), a 0% to 4% false-negative rate (FNR), and an up-to-50-fold-higher F1 score [2 × precision × recall/(precision + recall)] compared to alternative methods, such as similarity searches using BLASTx with various e-value thresholds and BL hidden Markov models, or tools like DeepARG, ShortBRED, and AMRFinder. The ROCker models and the underlying protein sequence reference data sets and phylogenetic trees for read placement are freely available through http://enve-omics.ce.gatech.edu/data/rocker-bla. Application of these BL ROCker models to metagenomics, metatranscriptomics, and high-throughput PCR gene amplicon data should facilitate the reliable detection and quantification of BL variants encoded by environmental or clinical isolates and microbiomes and more accurate assessment of the associated public health risk, compared to the current practice. IMPORTANCE Resistance genes encoding ß-lactamases (BLs) confer resistance to the widely prescribed antibiotic class ß-lactams. Therefore, it is important to assess the prevalence of BL genes in clinical or environmental samples for monitoring the spreading of these genes into pathogens and estimating public health risk. However, detecting BLs in short-read sequence data is technically challenging. Our ROCker model-based bioinformatics approach showcases the reliable detection and typing of BLs in complex data sets and thus contributes toward solving an important problem in antibiotic resistance surveillance. The ROCker models developed substantially expand the toolbox for monitoring antibiotic resistance in clinical or environmental settings.
Assuntos
Antibacterianos , beta-Lactamases , Humanos , beta-Lactamases/genética , Filogenia , Antibacterianos/farmacologia , beta-Lactamas , Resistência Microbiana a MedicamentosRESUMO
Background: Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the cause of coronavirus disease 2019 (COVID-19), has spread globally and is being surveilled with an international genome sequencing effort. Surveillance consists of sample acquisition, library preparation, and whole genome sequencing. This has necessitated a classification scheme detailing Variants of Concern (VOC) and Variants of Interest (VOI), and the rapid expansion of bioinformatics tools for sequence analysis. These bioinformatic tools are means for major actionable results: maintaining quality assurance and checks, defining population structure, performing genomic epidemiology, and inferring lineage to allow reliable and actionable identification and classification. Additionally, the pandemic has required public health laboratories to reach high throughput proficiency in sequencing library preparation and downstream data analysis rapidly. However, both processes can be limited by a lack of a standardized sequence dataset. Methods: We identified six SARS-CoV-2 sequence datasets from recent publications, public databases and internal resources. In addition, we created a method to mine public databases to identify representative genomes for these datasets. Using this novel method, we identified several genomes as either VOI/VOC representatives or non-VOI/VOC representatives. To describe each dataset, we utilized a previously published datasets format, which describes accession information and whole dataset information. Additionally, a script from the same publication has been enhanced to download and verify all data from this study. Results: The benchmark datasets focus on the two most widely used sequencing platforms: long read sequencing data from the Oxford Nanopore Technologies platform and short read sequencing data from the Illumina platform. There are six datasets: three were derived from recent publications; two were derived from data mining public databases to answer common questions not covered by published datasets; one unique dataset representing common sequence failures was obtained by rigorously scrutinizing data that did not pass quality checks. The dataset summary table, data mining script and quality control (QC) values for all sequence data are publicly available on GitHub: https://github.com/CDCgov/datasets-sars-cov-2. Discussion: The datasets presented here were generated to help public health laboratories build sequencing and bioinformatics capacity, benchmark different workflows and pipelines, and calibrate QC thresholds to ensure sequencing quality. Together, improvements in these areas support accurate and timely outbreak investigation and surveillance, providing actionable data for pandemic management. Furthermore, these publicly available and standardized benchmark data will facilitate the development and adjudication of new pipelines.
Assuntos
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , COVID-19/epidemiologia , Benchmarking , Biologia Computacional , Análise de SequênciaRESUMO
Genome-wide association studies have uncovered thousands of genetic variants that are associated with a wide variety of human traits. Knowledge of how trait-associated variants are distributed within and between populations can provide insight into the genetic basis of group-specific phenotypic differences, particularly for health-related traits. We analyzed the genetic divergence levels for 1) individual trait-associated variants and 2) collections of variants that function together to encode polygenic traits, between two neighboring populations in Colombia that have distinct demographic profiles: Antioquia (Mestizo) and Chocó (Afro-Colombian). Genetic ancestry analysis showed 62% European, 32% Native American, and 6% African ancestry for Antioquia compared with 76% African, 10% European, and 14% Native American ancestry for Chocó, consistent with demography and previous results. Ancestry differences can confound cross-population comparison of polygenic risk scores (PRS); however, we did not find any systematic bias in PRS distributions for the two populations studied here, and population-specific differences in PRS were, for the most part, small and symmetrically distributed around zero. Both genetic differentiation at individual trait-associated single nucleotide polymorphisms and population-specific PRS differences between Antioquia and Chocó largely reflected anthropometric phenotypic differences that can be readily observed between the populations along with reported disease prevalence differences. Cases where population-specific differences in genetic risk did not align with observed trait (disease) prevalence point to the importance of environmental contributions to phenotypic variance, for both infectious and complex, common disease. The results reported here are distributed via a web-based platform for searching trait-associated variants and PRS divergence levels at http://map.chocogen.com (last accessed August 12, 2020).
Assuntos
Predisposição Genética para Doença , Genoma Humano , Herança Multifatorial , Fenótipo , Grupos Raciais/genética , Colômbia , HumanosRESUMO
Differences in genetic ancestry and socioeconomic status (SES) among Latin American populations have been linked to health disparities for a number of complex diseases, such as diabetes. We used a population genomic approach to investigate the role that genetic ancestry and socioeconomic status (SES) play in the epidemiology of type 2 diabetes (T2D) for two Colombian populations: Chocó (Afro-Latino) and Antioquia (Mestizo). Chocó has significantly higher predicted genetic risk for T2D compared to Antioquia, and the elevated predicted risk for T2D in Chocó is correlated with higher African ancestry. Despite its elevated predicted genetic risk, the population of Chocó has a three-times lower observed T2D prevalence than Antioquia, indicating that environmental factors better explain differences in T2D outcomes for Colombia. Chocó has substantially lower SES than Antioquia, suggesting that low SES in Chocó serves as a protective factor against T2D. The combination of lower prevalence of T2D and lower SES in Chocó may seem surprising given the protective nature of elevated SES in many populations in developed countries. However, low SES has also been documented to be a protective factor in rural populations in less developed countries, and this appears to be the case when comparing Chocó to Antioquia.
Assuntos
Diabetes Mellitus Tipo 2/genética , Predisposição Genética para Doença , Colômbia , Diabetes Mellitus Tipo 2/epidemiologia , Humanos , Linhagem , Prevalência , Fatores SocioeconômicosRESUMO
Candidate gene and genome-wide association studies (GWAS) represent two complementary approaches to uncovering genetic contributions to common diseases. We systematically reviewed the contributions of these approaches to our knowledge of genetic associations with cancer risk by analyzing the data in the Cancer Genome-wide Association and Meta Analyses database (Cancer GAMAdb). The database catalogs studies published since January 1, 2000, by study and cancer type. In all, we found that meta-analyses and pooled analyses of candidate genes reported 349 statistically significant associations and GWAS reported 269, for a total of 577 unique associations. Only 41 (7.1%) associations were reported in both candidate gene meta-analyses and GWAS, usually with similar effect sizes. When considering only noteworthy associations (defined as those with false-positive report probabilities≤0.2) and accounting for indirect overlap, we found 202 associations, with 27 of those appearing in both meta-analyses and GWAS. Our findings suggest that meta-analyses of well-conducted candidate gene studies may continue to add to our understanding of the genetic associations in the post-GWAS era.
Assuntos
Estudo de Associação Genômica Ampla , Neoplasias/genética , Estudos de Casos e Controles , HumanosRESUMO
Pathogen genetics is already a mainstay of public health investigation and control efforts; now advances in technology make it possible to investigate the role of human genetic variation in the epidemiology of infectious diseases. To describe trends in this field, we analyzed articles that were published from 2001 through 2010 and indexed by the HuGE Navigator, a curated online database of PubMed abstracts in human genome epidemiology. We extracted the principal findings from all meta-analyses and genome-wide association studies (GWAS) with an infectious disease-related outcome. Finally, we compared the representation of diseases in HuGE Navigator with their contributions to morbidity worldwide. We identified 3,730 articles on infectious diseases, including 27 meta-analyses and 23 GWAS. The number published each year increased from 148 in 2001 to 543 in 2010 but remained a small fraction (about 7%) of all studies in human genome epidemiology. Most articles were by authors from developed countries, but the percentage by authors from resource-limited countries increased from 9% to 25% during the period studied. The most commonly studied diseases were HIV/AIDS, tuberculosis, hepatitis B infection, hepatitis C infection, sepsis, and malaria. As genomic research methods become more affordable and accessible, population-based research on infectious diseases will be able to examine the role of variation in human as well as pathogen genomes. This approach offers new opportunities for understanding infectious disease susceptibility, severity, treatment, control, and prevention.
Assuntos
Doenças Transmissíveis/epidemiologia , Doenças Transmissíveis/genética , Variação Genética , Métodos Epidemiológicos , Predisposição Genética para Doença , Genoma Humano , Estudo de Associação Genômica Ampla , Humanos , Metanálise como Assunto , PubMedRESUMO
BACKGROUND: The apolipoprotein E gene (apoE) has three major isoforms encoded by the ε2, ε3, and ε4 alleles, with the ε4 allele associated with hypercholesterolemia and the ε2 allele with the opposite effect. An inverse relationship between cholesterolemia and head and neck cancer (HNC) has been previously reported, although the relationship between apoE genotypes and HNC has not been explored to date. METHODS: Four hundred and seventeen HNC cases and 436 hospital controls were genotyped for apoE polymorphisms. Adjusted odds ratios (ORs) and 95% confidence intervals (CI) from logistic regression were used to explore the relationship between HNC and putative risk factors. A gene-environment interaction analysis was done. RESULTS: A borderline significant 40% decreased HNC risk (OR, 0.58; 95% CI, 0.31-1.05) was observed for individuals carrying at least one ε2 allele. Females carrying at least one ε2 allele showed a 60% risk reduction (OR, 0.43; 95% CI, 0.21-0.90) for HNC compared with ε3 homozygotes. A statistically significant interaction was found between alcohol use and the ε4 allele (P for interaction = 0.04), with a 2-fold increased risk (OR, 2.06; 95% CI, 0.95-4.48) among ever drinkers with an ε4 allele, with respect to ε3 homozygote nondrinkers. CONCLUSIONS: Our study provides novel evidence of a possible protective effect of the ε2 allele against HNC, probably due to its increased antioxidant properties. IMPACT: According to our results, apolipoprotein E may play a different role in carcinogenesis other than its well-known role in regulating blood serum cholesterol levels.