RESUMEN
Fewer than half of individuals with a suspected Mendelian or monogenic condition receive a precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest in using long-read sequencing (LRS) to streamline clinical genomic testing, but the absence of control data sets for variant filtering and prioritization has made tertiary analysis of LRS data challenging. To address this, the 1000 Genomes Project (1KGP) Oxford Nanopore Technologies Sequencing Consortium aims to generate LRS data from at least 800 of the 1KGP samples. Our goal is to use LRS to identify a broader spectrum of variation so we may improve our understanding of normal patterns of human variation. Here, we present data from analysis of the first 100 samples, representing all 5 superpopulations and 19 subpopulations. These samples, sequenced to an average depth of coverage of 37× and sequence read N50 of 54 kbp, have high concordance with previous studies for identifying single nucleotide and indel variants outside of homopolymer regions. Using multiple structural variant (SV) callers, we identify an average of 24,543 high-confidence SVs per genome, including shared and private SVs likely to disrupt gene function as well as pathogenic expansions within disease-associated repeats that were not detected using short reads. Evaluation of methylation signatures revealed expected patterns at known imprinted loci, samples with skewed X-inactivation patterns, and novel differentially methylated regions. All raw sequencing data, processed data, and summary statistics are publicly available, providing a valuable resource for the clinical genetics community to discover pathogenic SVs.
RESUMEN
The Pharmacogene Variation Consortium (PharmVar) provides nomenclature for the human CYP2A gene locus containing the highly polymorphic CYP2A6 gene. CYP2A6 plays a role in the metabolism of nicotine and various drugs. Thus, genetic variation can substantially contribute to the function of this enzyme and associated efficacy and safety. This GeneFocus provides an overview of the clinical significance of CYP2A6, including its genetic variation and function. We also highlight and discuss caveats in the identification and characterization of allelic variation of this complex pharmacogene, a prerequisite for accurate genotype determination and prediction of phenotype status.
Asunto(s)
Citocromo P-450 CYP2A6 , Humanos , Citocromo P-450 CYP2A6/genética , Citocromo P-450 CYP2A6/metabolismo , Farmacogenética/métodos , Variación Genética/genética , Fenotipo , Nicotina/metabolismo , Genotipo , Variantes Farmacogenómicas , Alelos , Polimorfismo GenéticoRESUMEN
Cytochrome P450 2D6 (CYP2D6) plays a crucial role in metabolizing approximately 20% of medications prescribed clinically. This enzyme is encoded by the CYP2D6 gene, known for its extensive polymorphism with over 170 catalogued haplotypes or star alleles, which can have a profound impact on drug efficacy and safety. Despite its importance, a gap exists in the global genomic databases, which are predominantly representative of European ancestries, thereby limiting comprehensive knowledge of CYP2D6 variation in ethnically diverse populations. In an effort to bridge this knowledge gap, we focused on elucidating the CYP2D6 variation landscape within a multi-ethnic Asian cohort, encompassing individuals of Chinese, Malay, and Indian descent. Our study comprised data analysis of 1850 whole genomes from the SG10K_Health dataset using an in-house consensus algorithm, which integrates the capabilities of Cyrius, Aldy, and StellarPGx. This analysis unveiled distinct population-specific star-allele distribution trends, highlighting the unique genetic makeup of the Singaporean population. Significantly, 46% of our cohort harbored actionable CYP2D6 variants-those with direct implications for drug dosing and treatment strategies. Furthermore, we identified 14 potential novel CYP2D6 star-alleles, of which 7 were observed in multiple individuals, suggesting their broader relevance. Overall, our study contributes novel data on CYP2D6 genetic variations specific to the Southeast Asian context. The findings are instrumental for the advancement of pharmacogenomics and personalized medicine, not only in Southeast Asia but also in other regions with comparable genetic diversity.
Asunto(s)
Alelos , Pueblo Asiatico , Citocromo P-450 CYP2D6 , Citocromo P-450 CYP2D6/genética , Citocromo P-450 CYP2D6/metabolismo , Humanos , Pueblo Asiatico/genética , Etnicidad/genética , Singapur , Variación Genética , Frecuencia de los Genes , Polimorfismo de Nucleótido Simple , HaplotiposRESUMEN
Hypervariable region sequencing of the 16S ribosomal RNA (rRNA) gene plays a critical role in microbial ecology by offering insights into bacterial communities within specific niches. While providing valuable genus-level information, its reliance on data from targeted genetic regions limits its overall utility. Recent advances in sequencing technologies have enabled characterisation of the full-length 16S rRNA gene, enhancing species-level classification. Although current short-read platforms are cost-effective and precise, they lack full-length 16S rRNA amplicon sequencing capability. This study aimed to evaluate the feasibility of a modified 150 bp paired-end full-length 16S rRNA amplicon short-read sequencing technique on the Illumina iSeq 100 and 16S rRNA amplicon assembly workflow by utilising a standard mock microbial community and subsequently performing exploratory characterisation of captive (zoo) and free-ranging African elephant (Loxodonta africana) respiratory microbiota. Our findings demonstrate that, despite generating assembled amplicons averaging 869 bp in length, this sequencing technique provides taxonomic assignments consistent with the theoretical composition of the mock community and respiratory microbiota of other mammals. Tentative bacterial signatures, potentially representing distinct respiratory tract compartments (trunk and lower respiratory tract) were visually identified, necessitating further investigation to gain deeper insights into their implication for elephant physiology and health.
Asunto(s)
Bacterias , Elefantes , Microbiota , ARN Ribosómico 16S , Animales , Elefantes/microbiología , Elefantes/genética , ARN Ribosómico 16S/genética , Bacterias/genética , Bacterias/clasificación , Microbiota/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Sistema Respiratorio/microbiología , Animales de Zoológico/microbiología , Análisis de Secuencia de ADN/métodos , Animales Salvajes/microbiología , FilogeniaRESUMEN
Less than half of individuals with a suspected Mendelian condition receive a precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest in using long-read sequencing (LRS) to streamline clinical genomic testing, but the absence of control datasets for variant filtering and prioritization has made tertiary analysis of LRS data challenging. To address this, the 1000 Genomes Project ONT Sequencing Consortium aims to generate LRS data from at least 800 of the 1000 Genomes Project samples. Our goal is to use LRS to identify a broader spectrum of variation so we may improve our understanding of normal patterns of human variation. Here, we present data from analysis of the first 100 samples, representing all 5 superpopulations and 19 subpopulations. These samples, sequenced to an average depth of coverage of 37x and sequence read N50 of 54 kbp, have high concordance with previous studies for identifying single nucleotide and indel variants outside of homopolymer regions. Using multiple structural variant (SV) callers, we identify an average of 24,543 high-confidence SVs per genome, including shared and private SVs likely to disrupt gene function as well as pathogenic expansions within disease-associated repeats that were not detected using short reads. Evaluation of methylation signatures revealed expected patterns at known imprinted loci, samples with skewed X-inactivation patterns, and novel differentially methylated regions. All raw sequencing data, processed data, and summary statistics are publicly available, providing a valuable resource for the clinical genetics community to discover pathogenic SVs.
RESUMEN
Genetic variation in CYP2B6 and CYP2A6 is known to impact interindividual response to antiretrovirals, nicotine, and bupropion, among other drugs. However, the full catalogue of clinically relevant pharmacogenetic variants in these genes is yet to be established, especially across African populations. This study therefore aimed to characterize the star allele (haplotype) distribution in CYP2B6 and CYP2A6 across diverse and understudied sub-Saharan African (SSA) populations. We called star alleles from 961 high-depth full genomes using StellarPGx, Aldy, and PyPGx. In addition, we performed CYP2B6 and CYP2A6 star allele frequency comparisons between SSA and other global biogeographical groups represented in the new 1000 Genomes Project high-coverage dataset (n = 2,000). This study presents frequency information for star alleles in CYP2B6 (e.g., *6 and *18; frequency of 21-47% and 2-19%, respectively) and CYP2A6 (e.g., *4, *9, and *17; frequency of 0-6%, 3-10%, and 6-20%, respectively), and predicted phenotypes (for CYP2B6), across various African populations. In addition, 50 potentially novel African-ancestry star alleles were computationally predicted by StellarPGx in CYP2B6 and CYP2A6 combined. For each of these genes, over 4% of the study participants had predicted novel star alleles. Three novel star alleles in CYP2A6 (*54, *55, and *56) and CYP2B6 apiece, and several suballeles were further validated via targeted Single-Molecule Real-Time resequencing. Our findings are important for informing the design of comprehensive pharmacogenetic testing platforms, and are highly relevant for personalized medicine strategies, especially relating to antiretroviral medication and smoking cessation treatment in Africa and the African diaspora. More broadly, this study highlights the importance of sampling diverse African ethnolinguistic groups for accurate characterization of the pharmacogene variation landscape across the continent.
Asunto(s)
Nicotina , Farmacogenética , Humanos , Citocromo P-450 CYP2B6/genética , Citocromo P-450 CYP2A6/genética , Frecuencia de los Genes , África del Sur del Sahara , Genotipo , AlelosRESUMEN
Background: CYP2C19 is important in the metabolism of clopidogrel and several antidepressants. This study aimed to characterize the distribution of CYP2C19 star alleles (haplotypes) across diverse African populations compared with global populations. Methods: CYP2C19 star alleles and diplotypes were called from high coverage genomes using the StellarPGx pipeline. Results: CYP2C19*1 (51%), *2 (17%) and *17 (22%) were the most common star alleles across African populations in this study. It was observed that 3% of African participants had potentially novel CYP2C19 haplotypes. Conclusion: This study supports the necessity for CYP2C19 pharmacogenetic testing in African and global clinical settings, as well as the importance of comprehensive star allele characterization in the African context.
Asunto(s)
Farmacogenética , Inhibidores de Agregación Plaquetaria , Humanos , Genotipo , Citocromo P-450 CYP2C19/genética , Haplotipos/genética , Clopidogrel/uso terapéutico , AlelosRESUMEN
Background: Cytochrome P450 (CYP) genetic variation largely impacts drug response. However, many CYP star alleles (haplotypes) lack functional annotation, impeding our understanding of drug metabolism mechanisms. We aimed to investigate the impact of missense variant combinations on CYP protein structures. Methods: Normal mode analysis was conducted on 261 missense variants within 91 CYP haplotypes. CYP2D6*2 and CYP2D6*17 were prioritized for molecular dynamics simulation. Results: Normal mode analysis and molecular dynamics highlight the effects of known CYP missense variants on protein stability and conformational dynamics. Missense variants within haplotypes may have intermodulating effects on protein structure and function. Conclusion: This study highlights the utility of multiscale modeling in interpreting CYP missense variants and particularly their combinations within various star alleles.
Asunto(s)
Citocromo P-450 CYP2D6 , Sistema Enzimático del Citocromo P-450 , Humanos , Citocromo P-450 CYP2D6/genética , Citocromo P-450 CYP2D6/metabolismo , Alelos , Sistema Enzimático del Citocromo P-450/genética , Haplotipos/genética , Mutación Missense/genéticaRESUMEN
Cytochrome P450 2D6 (CYP2D6) is a key enzyme in drug response owing to its involvement in the metabolism of ~ 25% of clinically prescribed medications. The encoding CYP2D6 gene is highly polymorphic, and many pharmacogenetics studies have been performed worldwide to investigate the distribution of CYP2D6 star alleles (haplotypes); however, African populations have been relatively understudied to date. In this study, the distributions of CYP2D6 star alleles and predicted drug metabolizer phenotypes-derived from activity scores-were examined across multiple sub-Saharan African populations based on bioinformatics analysis of 961 high-depth whole genome sequences. This was followed by characterization of novel star alleles and suballeles in a subset of the participants via targeted high-fidelity Single-Molecule Real-Time resequencing (Pacific Biosciences). This study revealed varying frequencies of known CYP2D6 alleles and predicted phenotypes across different African ethnolinguistic groups. Twenty-seven novel CYP2D6 star alleles were predicted computationally and two of them were further validated. This study highlights the importance of studying variation in key pharmacogenes such as CYP2D6 in the African context to better understand population-specific allele frequencies. This will aid in the development of better genotyping panels and star allele detection approaches with a view toward supporting effective implementation of precision medicine strategies in Africa and across the African diaspora.
Asunto(s)
Citocromo P-450 CYP2D6 , Farmacogenética , Humanos , Citocromo P-450 CYP2D6/genética , Citocromo P-450 CYP2D6/metabolismo , Frecuencia de los Genes , Haplotipos , Fenotipo , Alelos , África del Sur del Sahara , GenotipoRESUMEN
The CYP2D6 gene has been widely studied to characterize variants and/or star alleles, which account for a significant portion of variability in drug responses observed within and between populations. However, African populations remain under-represented in these studies. The increasing availability of high coverage genomes from African populations has provided the opportunity to fill this knowledge gap. In this study, we characterized computationally predicted novel CYP2D6 star alleles in 30 African subjects for whom DNA samples were available from the Coriell Institute. CYP2D6 genotyping and resequencing was performed using a variety of commercially available and laboratory-developed tests in a collaborative effort involving three laboratories. Fourteen novel CYP2D6 alleles and multiple novel suballeles were identified. This work adds to the growing catalogue of validated African ancestry CYP2D6 allelic variation in pharmacogenomic databases, thus laying the foundation for future functional studies and improving the accuracy of CYP2D6 genotyping, phenotype prediction, and the refinement of clinical pharmacogenomic implementation guidelines in African and global settings.
RESUMEN
Background & aim: POR is an enzyme that mediates electron transfer to enable the drug-metabolizing activity of CYP450 proteins. However, POR has been understudied in pharmacogenomics despite this vital role. This study aimed to characterize the genetic variation in POR across African populations and to compare the star allele (haplotype) distribution with that in other global populations. Materials & methods: POR star alleles were called from whole-genome sequencing data using the StellarPGx pipeline. Results: In addition to the common POR*1 and *28 (defined by rs1057868), five novel rare haplotypes were computationally inferred. No significant frequency differences were observed among the majority of African populations. However, POR*28 was observed at a higher frequency in individuals of non-African ancestry. Conclusion: This study highlights the distribution of POR alleles in Africa and across global populations with a view toward informing future precision medicine implementation.
Asunto(s)
Población Negra , Farmacogenética , Alelos , Población Negra/genética , Frecuencia de los Genes/genética , Haplotipos/genética , Humanos , Polimorfismo de Nucleótido Simple/genéticaRESUMEN
MOTIVATION: Wikipedia is one of the most important channels for the public communication of science and is frequently accessed as an educational resource in computational biology. Joint efforts between the International Society for Computational Biology (ISCB) and the Computational Biology taskforce of WikiProject Molecular Biology (a group of expert Wikipedia editors) have considerably improved computational biology representation on Wikipedia in recent years. However, there is still an urgent need for further improvement in quality, especially when compared to related scientific fields such as genetics and medicine. Facilitating involvement of members from ISCB Communities of Special Interest (COSIs) would improve a vital open education resource in computational biology, additionally allowing COSIs to provide a quality educational resource highly specific to their subfield. RESULTS: We generate a list of around 1500 English Wikipedia articles relating to computational biology and describe the development of a binary COSI-Article matrix, linking COSIs to relevant articles and thereby defining domain-specific open educational resources. Our analysis of the COSI-Article matrix data provides a quantitative assessment of computational biology representation on Wikipedia against other fields and at a COSI-specific level. Furthermore, we conducted similarity analysis and subsequent clustering of COSI-Article data to provide insight into potential relationships between COSIs. Finally, based on our analysis, we suggest courses of action to improve the quality of computational biology representation on Wikipedia.
Asunto(s)
Biología Computacional , Análisis por ConglomeradosRESUMEN
Introduction: Investigating variation in genes involved in the absorption, distribution, metabolism, and excretion (ADME) of drugs are key to characterizing pharmacogenomic (PGx) relationships. ADME gene variation is relatively well characterized in European and Asian populations, but data from African populations are under-studied-which has implications for drug safety and effective use in Africa. Results: We identified significant ADME gene variation in African populations using data from 458 high-coverage whole genome sequences, 412 of which are novel, and from previously available African sequences from the 1,000 Genomes Project. ADME variation was not uniform across African populations, particularly within high impact coding variation. Copy number variation was detected in 116 ADME genes, with equal ratios of duplications/deletions. We identified 930 potential high impact coding variants, of which most are discrete to a single African population cluster. Large frequency differences (i.e., >10%) were seen in common high impact variants between clusters. Several novel variants are predicted to have a significant impact on protein structure, but additional functional work is needed to confirm the outcome of these for PGx use. Most variants of known clinical outcome are rare in Africa compared to European populations, potentially reflecting a clinical PGx research bias to European populations. Discussion: The genetic diversity of ADME genes across sub-Saharan African populations is large. The Southern African population cluster is most distinct from that of far West Africa. PGx strategies based on European variants will be of limited use in African populations. Although established variants are important, PGx must take into account the full range of African variation. This work urges further characterization of variants in African populations including in vitro and in silico studies, and to consider the unique African ADME landscape when developing precision medicine guidelines and tools for African populations.
RESUMEN
Bioinformatics pipelines for calling star alleles (haplotypes) in cytochrome P450 (CYP) genes are important for the implementation of precision medicine. Genotyping CYP genes using high throughput sequencing data is complicated, e.g., by being highly polymorphic, not to mention the structural variations especially in CYP2D6, CYP2A6, and CYP2B6. Genome graph-based variant detection approaches have been shown to be reliable for genotyping HLA alleles. However, their application to enhancing star allele calling in CYP genes has not been extensively explored. We present StellarPGx, a Nextflow pipeline for accurately genotyping CYP genes by combining genome graph-based variant detection, read coverage information from the original reference-based alignments, and combinatorial diplotype assignments. The implementation of StellarPGx using Nextflow facilitates its portability, reproducibility, and scalability on various user platforms. StellarPGx is currently able to genotype 12 important pharmacogenes belonging to the CYP1, 2, and 3 families. For purposes of validation, we use CYP2D6 as a model gene owing to its high degree of polymorphisms (over 130 star alleles defined to date, including complex structural variants) and clinical importance. We applied StellarPGx and three existing callers to 109 whole genome sequenced samples for which the Genetic Testing Reference Material Coordination Program (GeT-RM) has recently provided consensus truth CYP2D6 diplotypes. StellarPGx had the highest CYP2D6 diplotype concordance (99%) with GeT-RM compared with Cyrius (98%), Aldy (82%), and Stargazer (84%). This exemplifies the high accuracy of StellarPGx and highlights its importance for both research and clinical pharmacogenomics applications. The StellarPGx pipeline is open-source and available from https://github.com/SBIMB/StellarPGx.
Asunto(s)
Sistema Enzimático del Citocromo P-450/genética , Haplotipos/genética , Alelos , Biología Computacional/métodos , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Farmacogenética/métodos , Polimorfismo Genético/genética , Reproducibilidad de los Resultados , Análisis de Secuencia de ADN/métodos , Secuenciación Completa del Genoma/métodosRESUMEN
Genetic variation in genes encoding cytochrome P450 enzymes has important clinical implications for drug metabolism. Bioinformatics algorithms for genotyping these highly polymorphic genes using high-throughput sequence data and automating phenotype prediction have recently been developed. The CYP2D6 gene is often used as a model during the validation of these algorithms due to its clinical importance, high polymorphism, and structural variations. However, the validation process is often limited to common star alleles due to scarcity of reference datasets. In addition, there has been no comprehensive benchmark of these algorithms to date. We performed a systematic comparison of three star allele calling algorithms using 4618 simulations as well as 75 whole-genome sequence samples from the GeT-RM project. Overall, we found that Aldy and Astrolabe are better suited to call both common and rare diplotypes compared to Stargazer, which is affected by population structure. Aldy was the best performing algorithm in calling CYP2D6 structural variants followed by Stargazer, whereas Astrolabe had limitations especially in calling hybrid rearrangements. We found that ensemble genotyping, characterised by taking a consensus of genotypes called by all three algorithms, has higher haplotype concordance but it is prone to ambiguities whenever complete discrepancies between the tools arise. Further, we evaluated the effects of sequencing coverage and indel misalignment on genotyping accuracy. Our account of the strengths and limitations of these algorithms is extremely important to clinicians and researchers in the pharmacogenomics and precision medicine communities looking to haplotype CYP2D6 and other pharmacogenes using high-throughput sequencing data.