RESUMEN
The contribution of FOXP3-expressing naturally occurring regulatory T (Treg) cells to common polygenic autoimmune diseases remains ambiguous. Here, we characterized genome-wide epigenetic profiles (CpG methylation and histone modifications) of human Treg and conventional T (Tconv) cells in naive and activated states. We found that single-nucleotide polymorphisms (SNPs) associated with common autoimmune diseases were predominantly enriched in CpG demethylated regions (DRs) specifically present in naive Treg cells but much less enriched in activation-induced DRs common in Tconv and Treg cells. Naive Treg cell-specific DRs were largely included in Treg cell-specific super-enhancers and closely associated with transcription and other epigenetic changes in naive and effector Treg cells. Thus, naive Treg cell-specific CpG hypomethylation had a key role in controlling Treg cell-specific gene transcription and epigenetic modification. The results suggest possible contribution of altered function or development of natural Treg cells to the susceptibility to common autoimmune diseases.
Asunto(s)
Enfermedades Autoinmunes/genética , Enfermedades Autoinmunes/inmunología , Epigénesis Genética , Epigenómica , Predisposición Genética a la Enfermedad , Linfocitos T Reguladores/inmunología , Linfocitos T Reguladores/metabolismo , Biomarcadores , Diferenciación Celular/genética , Diferenciación Celular/inmunología , Biología Computacional , Islas de CpG , Metilación de ADN , Epigenómica/métodos , Perfilación de la Expresión Génica , Variación Genética , Humanos , Inmunofenotipificación , Polimorfismo de Nucleótido Simple , Subgrupos de Linfocitos T , Linfocitos T Reguladores/citología , TranscriptomaRESUMEN
Mast cells (MCs) are versatile immune cells capable of rapidly responding to a diverse range of extracellular cues. Here, we mapped the genomic and transcriptomic changes in human MCs upon diverse stimuli. Our analyses revealed broad H3K4me3 domains and enhancers associated with activation. Notably, the rise of intracellular calcium concentration upon immunoglobulin E (IgE)-mediated crosslinking of the high-affinity IgE receptor (FcεRI) resulted in genome-wide reorganization of the chromatin landscape and was associated with a specific chromatin signature, which we term Ca2+-dependent open chromatin (COC) domains. Examination of differentially expressed genes revealed potential effectors of MC function, and we provide evidence for fibrinogen-like protein 2 (FGL2) as an MC mediator with potential relevance in chronic spontaneous urticaria. Disease-associated single-nucleotide polymorphisms mapped onto cis-regulatory regions of human MCs suggest that MC function may impact a broad range of pathologies. The datasets presented here constitute a resource for the further study of MC function.
Asunto(s)
Cromatina/genética , Susceptibilidad a Enfermedades , Estudio de Asociación del Genoma Completo , Genómica , Mastocitos/inmunología , Mastocitos/metabolismo , Biomarcadores , Células Cultivadas , Cromatina/metabolismo , Ensamble y Desensamble de Cromatina , Fibrinógeno/genética , Fibrinógeno/metabolismo , Perfilación de la Expresión Génica , Genómica/métodos , Histonas/metabolismo , Humanos , Hipersensibilidad/etiología , Hipersensibilidad/metabolismo , Inmunoglobulina E/inmunología , Inflamación/etiología , Inflamación/metabolismo , Polimorfismo de Nucleótido SimpleRESUMEN
Mycobacterium tuberculosis is considered by many to be the deadliest microbe, with the estimated annual cases numbering more than 10 million. The bacteria, including Mycobacterium africanum, are classified into nine major lineages and hundreds of sublineages, each with different geographical distributions and levels of virulence. The phylogeographic patterns can be a result of recent and early human migrations as well as coevolution between the bacteria and various human populations, which may explain why many studies on human genetic factors contributing to tuberculosis have not been replicable in different areas. Moreover, several studies have revealed the significance of interactions between human genetic variations and bacterial genotypes in determining the development of tuberculosis, suggesting coadaptation. The increased availability of whole-genome sequence data from both humans and bacteria has enabled a better understanding of these interactions, which can inform the development of vaccines and other control measures.
Asunto(s)
Genoma Bacteriano , Mycobacterium tuberculosis , Tuberculosis , Humanos , Mycobacterium tuberculosis/genética , Mycobacterium tuberculosis/patogenicidad , Tuberculosis/microbiología , Tuberculosis/genética , Genómica , Variación Genética , Filogenia , Genotipo , Evolución Molecular , Filogeografía , Interacciones Huésped-Patógeno/genéticaRESUMEN
Allele-specific expression plays a crucial role in unraveling various biological mechanisms, including genomic imprinting and gene expression controlled by cis-regulatory variants. However, existing methods for quantification from RNA-sequencing (RNA-seq) reads do not adequately and efficiently remove various allele-specific read mapping biases, such as reference bias arising from reads containing the alternative allele that do not map to the reference transcriptome or ambiguous mapping bias caused by reads containing the reference allele that map differently from reads containing the alternative allele. We present Ornaments, a computational tool for rapid and accurate estimation of allele-specific transcript expression at unphased heterozygous loci from RNA-seq reads while correcting for allele-specific read mapping biases. Ornaments removes reference bias by mapping reads to a personalized transcriptome and ambiguous mapping bias by probabilistically assigning reads to multiple transcripts and variant loci they map to. Ornaments is a lightweight extension of kallisto, a popular tool for fast RNA-seq quantification, that improves the efficiency and accuracy of WASP, a popular tool for bias correction in allele-specific read mapping. In experiments with simulated and human lymphoblastoid cell-line RNA-seq reads with the genomes of the 1000 Genomes Project, we demonstrate that Ornaments improves the accuracy of WASP and kallisto, is nearly as efficient as kallisto, and is an order of magnitude faster than WASP per sample, with the additional cost of constructing a personalized index for multiple samples. Additionally, we show that Ornaments finds imprinted transcripts with higher sensitivity than WASP, which detects imprinted signals only at gene level.
Asunto(s)
Alelos , Humanos , Transcriptoma/genética , Impresión Genómica , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Perfilación de la Expresión Génica/métodosRESUMEN
α9-nAChR, a subtype of nicotinic acetylcholine receptor, is significantly overexpressed in female breast cancer tumor tissues compared to normal tissues. Previous studies have proposed that specific single nucleotide polymorphisms (SNPs) in the CHRNA9 (α9-nAChR) gene are associated with an increased risk of breast cancer in interaction with smoking. The study conducted a breast cancer risk assessment of the α9-nAChR SNP rs10009228 (NM_017581.4:c.1325A > G) in the Taiwanese female population, including 308 breast cancer patients and 198 healthy controls revealed that individuals with the heterozygous A/G or A/A wild genotype have an increased susceptibility to developing breast cancer in the presence of smoking compared to carriers of the G/G variant genotype. Our investigation confirmed the presence of this missense variation, resulting in an alteration of the amino acid sequence from asparagine (N442) to serine (S442) to facilitate phosphorylation within the α9-nAchR protein. Additionally, overexpression of N442 (A/A) in breast cancer cells significantly enhanced cell survival, migration, and cancer stemness compared to S442 (G/G). Four-line triple-negative breast cancer patient-derived xenograft (TNBC-PDX) models with distinct α9-nAChR rs10009228 SNP genotypes (A/A, A/G, G/G) further demonstrated that chronic nicotine exposure accelerated tumor growth through sustained activation of the α9-nAChR downstream oncogenic AKT/ERK/STAT3 pathway, particularly in individuals with the A/G or A/A genotype. Collectively, our study established the links between genetic variations in α9-nAChR and smoking exposure in promoting breast tumor development. This emphasizes the need to consider gene-environment interactions carefully while developing effective breast cancer prevention and treatment strategies.
RESUMEN
Most of the single-nucleotide polymorphisms (SNPs) associated with insulin resistance (IR)-relevant phenotypes by genome-wide association studies (GWASs) are located in noncoding regions, complicating their functional interpretation. Here, we utilized an adapted STARR-seq to evaluate the regulatory activities of 5,987 noncoding SNPs associated with IR-relevant phenotypes. We identified 876 SNPs with biased allelic enhancer activity effects (baaSNPs) across 133 loci in three IR-relevant cell lines (HepG2, preadipocyte, and A673), which showed pervasive cell specificity and significant enrichment for cell-specific open chromatin regions or enhancer-indicative markers (H3K4me1, H3K27ac). Further functional characterization suggested several transcription factors (TFs) with preferential allelic binding to baaSNPs. We also incorporated multi-omics data to prioritize 102 candidate regulatory target genes for baaSNPs and revealed prevalent long-range regulatory effects and cell-specific IR-relevant biological functional enrichment on them. Specifically, we experimentally verified the distal regulatory mechanism at IRS1 locus, in which rs952227-A reinforces IRS1 expression by long-range chromatin interaction and preferential binding to the transcription factor HOXC6 to augment the enhancer activity. Finally, based on our STARR-seq screening data, we predicted the enhancer activity of 227,343 noncoding SNPs associated with IR-relevant phenotypes (fasting insulin adjusted for BMI, HDL cholesterol, and triglycerides) from the largest available GWAS summary statistics. We further provided an open resource (http://www.bigc.online/fnSNP-IR) for better understanding genetic regulatory mechanisms of IR-relevant phenotypes.
Asunto(s)
Resistencia a la Insulina , Polimorfismo de Nucleótido Simple , Humanos , Polimorfismo de Nucleótido Simple/genética , Estudio de Asociación del Genoma Completo , Resistencia a la Insulina/genética , Factores de Transcripción/genética , Cromatina/genética , Fenotipo , Elementos de Facilitación Genéticos/genéticaRESUMEN
N6-methyladenosine (m$^{6}$A) is a widely-studied methylation to messenger RNAs, which has been linked to diverse cellular processes and human diseases. Numerous databases that collate m$^{6}$A profiles of distinct cell types have been created to facilitate quick and easy mining of m$^{6}$A signatures associated with cell-specific phenotypes. However, these databases contain inherent complexities that have not been explicitly reported, which may lead to inaccurate identification and interpretation of m$^{6}$A-associated biology by end-users who are unaware of them. Here, we review various m$^{6}$A-related databases, and highlight several critical matters. In particular, differences in peak-calling pipelines across databases drive substantial variability in both peak number and coordinates with only moderate reproducibility, and the inclusion of peak calls from early m$^{6}$A sequencing protocols may lead to the reporting of false positives or negatives. The awareness of these matters will help end-users avoid the inclusion of potentially unreliable data in their studies and better utilize m$^{6}$A databases to derive biologically meaningful results.
Asunto(s)
Adenosina , Humanos , Adenosina/análogos & derivados , Adenosina/genética , Adenosina/metabolismo , Bases de Datos Genéticas , ARN Mensajero/genética , ARN Mensajero/metabolismoRESUMEN
Exposure to stressful life events increases the risk for psychiatric disorders. Mechanistic insight into the genetic factors moderating the impact of stress can increase our understanding of disease processes. Here, we test 3,662 single nucleotide polymorphisms (SNPs) from preselected expression quantitative trait loci in massively parallel reporter assays to identify genetic variants that modulate the activity of regulatory elements sensitive to glucocorticoids, important mediators of the stress response. Of the tested SNP sequences, 547 were located in glucocorticoid-responsive regulatory elements of which 233 showed allele-dependent activity. Transcripts regulated by these functional variants were enriched for those differentially expressed in psychiatric disorders in the postmortem brain. Phenome-wide Mendelian randomization analysis in 4,439 phenotypes revealed potentially causal associations specifically in neurobehavioral traits, including major depression and other psychiatric disorders. Finally, a functional gene score derived from these variants was significantly associated with differences in the physiological stress response, suggesting that these variants may alter disease risk by moderating the individual set point of the stress response.
Asunto(s)
Glucocorticoides , Trastornos Mentales , Humanos , Ensayos Analíticos de Alto Rendimiento , Secuencias Reguladoras de Ácidos Nucleicos , Sitios de Carácter Cuantitativo , Trastornos Mentales/genética , Polimorfismo de Nucleótido Simple , Estudio de Asociación del Genoma Completo , Predisposición Genética a la EnfermedadRESUMEN
Ischemic stroke, caused by vessel blockage, results in cerebral infarction, the death of brain tissue. Previously, quantitative trait locus (QTL) mapping of cerebral infarct volume and collateral vessel number identified a single, strong genetic locus regulating both phenotypes. Additional studies identified RAB GTPase-binding effector protein 2 (Rabep2) as the casual gene. However, there is yet no evidence that variation in the human ortholog of this gene plays any role in ischemic stroke outcomes. We established an in vivo evaluation platform in mice by using adeno-associated virus (AAV) gene replacement and verified that both mouse and human RABEP2 rescue the mouse Rabep2 knockout ischemic stroke volume and collateral vessel phenotypes. Importantly, this cross-species complementation enabled us to experimentally investigate the functional effects of coding sequence variation in human RABEP2. We chose four coding variants from the human population that are predicted by multiple in silico algorithms to be damaging to RABEP2 function. In vitro and in vivo analyses verify that all four led to decreased collateral vessel connections and increased infarct volume. Thus, there are naturally occurring loss-of-function alleles. This cross-species approach will expand the number of targets for therapeutics development for ischemic stroke.
Asunto(s)
Accidente Cerebrovascular Isquémico , Alelos , Animales , Encéfalo/metabolismo , Mapeo Cromosómico , Humanos , Ratones , Proteínas de Transporte Vesicular/genética , Proteínas de Unión al GTP rab/genética , Proteínas de Unión al GTP rab/metabolismoRESUMEN
BACKGROUND: Single-nucleotide polymorphisms (SNPs) are the most widely used form of molecular genetic variation studies. As reference genomes and resequencing data sets expand exponentially, tools must be in place to call SNPs at a similar pace. The genome analysis toolkit (GATK) is one of the most widely used SNP calling software tools publicly available, but unfortunately, high-performance computing versions of this tool have yet to become widely available and affordable. RESULTS: Here we report an open-source high-performance computing genome variant calling workflow (HPC-GVCW) for GATK that can run on multiple computing platforms from supercomputers to desktop machines. We benchmarked HPC-GVCW on multiple crop species for performance and accuracy with comparable results with previously published reports (using GATK alone). Finally, we used HPC-GVCW in production mode to call SNPs on a "subpopulation aware" 16-genome rice reference panel with ~ 3000 resequenced rice accessions. The entire process took ~ 16 weeks and resulted in the identification of an average of 27.3 M SNPs/genome and the discovery of ~ 2.3 million novel SNPs that were not present in the flagship reference genome for rice (i.e., IRGSP RefSeq). CONCLUSIONS: This study developed an open-source pipeline (HPC-GVCW) to run GATK on HPC platforms, which significantly improved the speed at which SNPs can be called. The workflow is widely applicable as demonstrated successfully for four major crop species with genomes ranging in size from 400 Mb to 2.4 Gb. Using HPC-GVCW in production mode to call SNPs on a 25 multi-crop-reference genome data set produced over 1.1 billion SNPs that were publicly released for functional and breeding studies. For rice, many novel SNPs were identified and were found to reside within genes and open chromatin regions that are predicted to have functional consequences. Combined, our results demonstrate the usefulness of combining a high-performance SNP calling architecture solution with a subpopulation-aware reference genome panel for rapid SNP discovery and public deployment.
Asunto(s)
Genoma de Planta , Polimorfismo de Nucleótido Simple , Flujo de Trabajo , Fitomejoramiento , Programas Informáticos , Secuenciación de Nucleótidos de Alto Rendimiento/métodosRESUMEN
Recently, elevated seawater temperatures have resulted numerous adverse effects, including significant mortality among bivalves. The dwarf surf clam, Mulinia lateralis, is considered a valuable model species for bivalve research due to its rapid growth and short generation time. The successful cultivation in laboratory setting throughout its entire life cycle makes it an ideal candidate for exploring the potential mechanisms underlying bivalve responses to thermal stress. In this study, a total of 600 clams were subjected to a 17-day thermal stress experiment at a temperature of 30 °C which is the semi-lethal temperature for this species. Ninety individuals who perished initially were classified as heat-sensitive populations (HSP), while 89 individuals who survived the experiment were classified as heat-tolerant populations (HTP). Subsequently, 179 individuals were then sequenced, and 21,292 single nucleotide polymorphisms (SNPs) were genotyped for downstream analysis. The heritability estimate for survival status was found to be 0.375 ± 0.127 suggesting a genetic basis for thermal tolerance trait. Furthermore, a genome-wide association study (GWAS) identified three SNPs and 10 candidate genes associated with thermal tolerance trait in M. lateralis. These candidate genes were involved in the ETHR/EHF signaling pathway and played pivotal role in signal sensory, cell adhesion, oxidative stress, DNA damage repair, etc. Additionally, qPCR results indicated that, excluding MGAT4A, ZAN, and RFC1 genes, all others exhibited significantly higher expression in the HTP (p < 0.05), underscoring the critical involvement of the ETHR/EHF signaling pathway in M. lateralis' thermal tolerance. These results unveil the presence of standing genetic variations associated with thermal tolerance in M. lateralis, highlighting the regulatory role of the ETHR/EHF signaling pathway in the bivalve's response to thermal stress, which contribute to comprehension of the genetic basis of thermal tolerance in bivalves.
Asunto(s)
Bivalvos , Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Termotolerancia , Animales , Bivalvos/genética , Bivalvos/fisiología , Termotolerancia/genética , Respuesta al Choque Térmico/genéticaRESUMEN
Assigning function to single nucleotide polymorphisms (SNPs) to understand the mechanisms that link genetic and phenotypic variation and disease is an area of intensive research that is necessary to contribute to the continuing development of precision medicine. However, despite the apparent simplicity that is captured in the name SNP - 'single nucleotide' changes are not easy to functionally characterize. This complexity arises from multiple features of the genome including the fact that function is development and environment specific. As such, we are often fooled by our terminology and underlying assumptions that there is a single function for a SNP. Here we discuss some of what is known about SNPs, their functions and how we can go about characterizing them.
Asunto(s)
Variación Genética/genética , Aprendizaje Automático/normas , Polimorfismo de Nucleótido Simple/genética , Medicina de Precisión/métodos , HumanosRESUMEN
Recent discoveries establish tRNAs as central regulators of mRNA translation dynamics, and therefore cotranslational folding and function of the encoded protein. The tRNA pool, whose composition and abundance change in a cell- and tissue-dependent manner, is the main factor which determines mRNA translation velocity. In this review, we discuss a group of pathogenic mutations, in the coding sequences of either protein-coding genes or in tRNA genes, that alter mRNA translation dynamics. We also summarize advances in tRNA biology that have uncovered how variations in tRNA levels on account of genetic mutations affect protein folding and function, and thereby contribute to phenotypic diversity in clinical manifestations.
Asunto(s)
Mutación , Biosíntesis de Proteínas , ARN Mensajero , ARN de Transferencia , Humanos , Codón/genética , Biosíntesis de Proteínas/genética , ARN Mensajero/genética , ARN Mensajero/metabolismo , ARN de Transferencia/genética , ARN de Transferencia/metabolismo , Polimorfismo de Nucleótido Simple , Factores de TiempoRESUMEN
BACKGROUND: Several single nucleotide polymorphism (SNP) pipelines exist, each offering its own advantages. Among them and described here is vSNP that has been developed over the past decade and is specifically tailored to meet the needs of diagnostic laboratories. Laboratories that aim to provide rapid whole genome sequencing results during outbreak investigations face unique challenges. vSNP addresses these challenges by enabling users to verify and validate sequence accuracy with ease- having utility across various pathogens, being fully auditable, and presenting results that are easy to interpret and can be comprehended by individuals with diverse backgrounds. RESULTS: vSNP has proven effective for real-time phylogenetic analysis of disease outbreaks and eradication efforts, including bovine tuberculosis, brucellosis, virulent Newcastle disease, SARS-CoV-2, African swine fever, and highly pathogenic avian influenza. The pipeline produces easy-to-read SNP matrices, sorted for convenience, as well as corresponding phylogenetic trees, making the output easily understandable. Essential data for verifying SNPs is included in the output, and the process has been divided into two steps for ease of use and faster processing times. vSNP requires minimal computational resources to run and can be run in a wide range of environments. Several utilities have been developed to make analysis more accessible for subject matter experts who may not have computational expertise. CONCLUSION: The vSNP pipeline integrates seamlessly into a diagnostic workflow and meets the criteria for quality control accreditation programs, such as 17025 by the International Organization for Standardization. Its versatility and robustness make it suitable for use with a diverse range of organisms, providing detailed, reproducible, and transparent results, making it a valuable tool in various applications, including phylogenetic analysis performed in real time.
Asunto(s)
Filogenia , Polimorfismo de Nucleótido Simple , Secuenciación Completa del Genoma , Secuenciación Completa del Genoma/métodos , Programas Informáticos , Animales , Humanos , Biología Computacional/métodosRESUMEN
BACKGROUND: The skeletal muscle growth rate and body size of Tibetan pigs (TIB) are lower than Large white pigs (LW). However, the underlying genetic basis attributing to these differences remains uncertain. To address this knowledge gap, the present study employed whole-genome sequencing of TIB (slow growth) and LW (fast growth) individuals, and integrated with existing NCBI sequencing datasets of TIB and LW individuals, enabling the identification of a comprehensive set of genetic variations for each breed. The specific and predominant SNPs in the TIB and LW populations were detected by using a cutoff value of 0.50 for SNP allele frequency and absolute allele frequency differences (â³AF) between the TIB and LW populations. RESULTS: A total of 21,767,938 SNPs were retrieved from 44 TIB and 29 LW genomes. The analysis detected 2,893,106 (13.29%) and 813,310 (3.74%) specific and predominant SNPs in the TIB and LW populations, and annotated to 24,560 genes. Further GO analysis revealed 291 genes involved in biological processes related to striated and/or skeletal muscle differentiation, proliferation, hypertrophy, regulation of striated muscle cell differentiation and proliferation, and myoblast differentiation and fusion. These 291 genes included crucial regulators of muscle cell determination, proliferation, differentiation, and hypertrophy, such as members of the Myogenic regulatory factors (MRF) (MYOD, MYF5, MYOG, MYF6) and Myocyte enhancer factor 2 (MEF2) (MEF2A, MEF2C, MEF2D) families, as well as muscle growth inhibitors (MSTN, ACVR1, and SMAD1); KEGG pathway analysis revealed 106 and 20 genes were found in muscle growth related positive and negative regulatory signaling pathways. Notably, genes critical for protein synthesis, such as MTOR, IGF1, IGF1R, IRS1, INSR, and RPS6KA6, were implicated in these pathways. CONCLUSION: This study employed an effective methodology to rigorously identify the potential genes associated with skeletal muscle development. A substantial number of SNPs and genes that potentially play roles in the divergence observed in skeletal muscle growth between the TIB and LW breeds were identified. These findings offer valuable insights into the genetic underpinnings of skeletal muscle development and present opportunities for enhancing meat production through pig breeding.
Asunto(s)
Frecuencia de los Genes , Desarrollo de Músculos , Músculo Esquelético , Polimorfismo de Nucleótido Simple , Animales , Músculo Esquelético/metabolismo , Músculo Esquelético/crecimiento & desarrollo , Porcinos/genética , Porcinos/crecimiento & desarrollo , Desarrollo de Músculos/genética , Secuenciación Completa del Genoma , Tibet , GenomaRESUMEN
BACKGROUND: Although many studies have been done to reveal artificial selection signatures in commercial and indigenous chickens, a limited number of genes have been linked to specific traits. To identify more trait-related artificial selection signatures and genes, we re-sequenced a total of 85 individuals of five indigenous chicken breeds with distinct traits from Yunnan Province, China. RESULTS: We found 30 million non-redundant single nucleotide variants and small indels (< 50 bp) in the indigenous chickens, of which 10 million were not seen in 60 broilers, 56 layers and 35 red jungle fowls (RJFs) that we compared with. The variants in each breed are enriched in non-coding regions, while those in coding regions are largely tolerant, suggesting that most variants might affect cis-regulatory sequences. Based on 27 million bi-allelic single nucleotide polymorphisms identified in the chickens, we found numerous selective sweeps and affected genes in each indigenous chicken breed and substantially larger numbers of selective sweeps and affected genes in the broilers and layers than previously reported using a rigorous statistical model. Consistent with the locations of the variants, the vast majority (~ 98.3%) of the identified selective sweeps overlap known quantitative trait loci (QTLs). Meanwhile, 74.2% known QTLs overlap our identified selective sweeps. We confirmed most of previously identified trait-related genes and identified many novel ones, some of which might be related to body size and high egg production traits. Using RT-qPCR, we validated differential expression of eight genes (GHR, GHRHR, IGF2BP1, OVALX, ELF2, MGARP, NOCT, SLC25A15) that might be related to body size and high egg production traits in relevant tissues of relevant breeds. CONCLUSION: We identify 30 million single nucleotide variants and small indels in the five indigenous chicken breeds, 10 million of which are novel. We predict substantially more selective sweeps and affected genes than previously reported in both indigenous and commercial breeds. These variants and affected genes are good candidates for further experimental investigations of genotype-phenotype relationships and practical applications in chicken breeding programs.
Asunto(s)
Pollos , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Selección Genética , Animales , Pollos/genética , Genoma , Mutación INDEL , Cruzamiento , Fenotipo , Genómica/métodosRESUMEN
BACKGROUND: Structural variants (SVs) such as deletions, duplications, and insertions are known to contribute to phenotypic variation but remain challenging to identify and genotype. A more complete, accessible, and assessable collection of SVs will assist efforts to study SV function in cattle and to incorporate SV genotyping into animal evaluation. RESULTS: In this work we produced a large and deeply characterized collection of SVs in Holstein cattle using two popular SV callers (Manta and Smoove) and publicly available Illumina whole-genome sequence (WGS) read sets from 310 samples (290 male, 20 female, mean 20X coverage). Manta and Smoove identified 31 K and 68 K SVs, respectively. In total the SVs cover 5% (Manta) and 6% (Smoove) of the reference genome, in contrast to the 1% impacted by SNPs and indels. SV genotypes from each caller were confirmed to accurately recapitulate animal relationships estimated using WGS SNP genotypes from the same dataset, with Manta genotypes outperforming Smoove, and deletions outperforming duplications. To support efforts to link the SVs to phenotypic variation, overlapping and tag SNPs were identified for each SV, using genotype sets extracted from the WGS results corresponding to two bovine SNP chips (BovineSNP50 and BovineHD). 9% (Manta) and 11% (Smoove) of the SVs were found to have overlapping BovineHD panel SNPs, while 21% (Manta) and 9% (Smoove) have BovineHD panel tag SNPs. A custom interactive database ( https://svdb-dc.pslab.ca ) containing the identified sequence variants with extensive annotations, gene feature information, and BAM file content for all SVs was created to enable the evaluation and prioritization of SVs for further study. Illustrative examples involving the genes POPDC3, ORM1, G2E3, FANCI, TFB1M, FOXC2, N4BP2, GSTA3, and COPA show how this resource can be used to find well-supported genic SVs, determine SV breakpoints, design genotyping approaches, and identify processed pseudogenes masquerading as deletions. CONCLUSIONS: The resources developed through this study can be used to explore sequence variation in Holstein cattle and to develop strategies for studying SVs of interest. The lack of overlapping and tag SNPs from commonly used SNP chips for most of the SVs suggests that other genotyping approaches will be needed (for example direct genotyping) to understand their potential contributions to phenotype. The included SV genotype assessments point to challenges in characterizing SVs, especially duplications, using short-read data and support ongoing efforts to better characterize cattle genomes through long-read sequencing. Lastly, the identification of previously known functional SVs and additional CDS-overlapping SVs supports the phenotypic relevance of this dataset.
Asunto(s)
Genotipo , Polimorfismo de Nucleótido Simple , Animales , Bovinos/genética , Femenino , Secuenciación Completa del Genoma , Masculino , Variación Estructural del Genoma , Bases de Datos Genéticas , Fenotipo , Genoma , Genómica/métodosRESUMEN
BACKGROUND: The application of biotechnologies which make use of genetic markers in chicken breeding is developing rapidly. Diversity Array Technology (DArT) is one of the current Genotyping-By-Sequencing techniques allowing the discovery of whole genome sequencing. In livestock, DArT has been applied in cattle, sheep, and horses. Currently, there is no study on the application of DArT markers in chickens. The aim was to study the effectiveness of DArTSeq markers in the genetic diversity and population structure of indigenous chickens (IC) and SASSO in the Eastern Province of Rwanda. METHODS: In total 87 blood samples were randomly collected from 37 males and 40 females of indigenous chickens and 10 females of SASSO chickens purposively selected from 5 sites located in two districts of the Eastern Province of Rwanda. Genotyping by Sequencing (GBS) using DArTseq technology was employed. This involved the complexity reduction method through digestion of genomic DNA and ligation of barcoded adapters followed by PCR amplification of adapter-ligated fragments. RESULTS: From 45,677 DArTseq SNPs and 25,444 SilicoDArTs generated, only 8,715 and 6,817 respectively remained for further analysis after quality control. The average call rates observed, 0.99 and 0.98 for DArTseq SNPs and SilicoDArTs respectively were quite similar. The polymorphic information content (PIC) from SilicoDArTs (0.33) was higher than that from DArTseq SNPs (0.22). DArTseq SNPs and SilicoDArTs had 34.4% and 34% of the loci respectively mapped on chromosome 1. DArTseq SNPs revealed distance averages of 0.17 and 0.15 within IC and SASSO chickens respectively while the respective averages observed with SilicoDArTs were 0.42 and 0.36. The average genetic distance between IC and SASSO chickens was moderate for SilicoDArTs (0.120) compared to that of DArTseq SNPs (0.048). The PCoA and population structure clustered the chicken samples into two subpopulations (1 and 2); 1 is composed of IC and 2 by SASSO chickens. An admixture was observed in subpopulation 2 with 12 chickens from subpopulation 1. CONCLUSIONS: The application of DArTseq markers have been proven to be effective and efficient for genetic relationship between IC and separated IC from exotic breed used which indicate their suitability in genomic studies. However, further studies using all chicken genetic resources available and large big sample sizes are required.
Asunto(s)
Pollos , Genómica , Masculino , Femenino , Animales , Bovinos , Caballos , Ovinos , Pollos/genética , Genotipo , Rwanda , Genómica/métodos , Polimorfismo de Nucleótido Simple , Variación GenéticaRESUMEN
BACKGROUND: Kelps are not only ecologically important, being primary producers and habitat forming species, they also hold substantial economic potential. Expansion of the kelp cultivation industry raises the interest for genetic improvement of kelp for cultivation, as well as concerns about genetic introgression from cultivated to wild populations. Thus, increased understanding of population genetics in natural kelp populations is crucial. Genotyping-by-sequencing (GBS) is a powerful tool for studying population genetics. Here, using Saccharina latissima (sugar kelp) as our study species, we characterize the population genetics at a fine geographic scale, while also investigating the influence of marker type (biallelic SNPs versus multi-allelic short read-backed haplotypes) and minor allele count (MAC) thresholds on estimated population genetic metrics. RESULTS: We examined 150 sporophytes from 10 locations within a small area in Mid-Norway. Employing GBS, we detected 20,710 bi-allelic SNPs and 42,264 haplotype alleles at 20,297 high quality GBS loci. We used both marker types as well as two MAC filtering thresholds (3 and 15) in the analyses. Overall, higher genetic diversity, more outbreeding and stronger substructure was estimated using haplotypes compared to SNPs, and with MAC 15 compared to MAC 3. The population displayed high genetic diversity (HE ranging from 0.18-0.37) and significant outbreeding (FIS ≤ - 0.076). Construction of a genomic relationship matrix, however, revealed a few close relatives within sampling locations. The connectivity between sampling locations was high (FST ≤ 0.09), but subtle, yet significant, genetic substructure was detected, even between sampling locations separated by less than 2 km. Isolation-by-distance was significant and explained 15% of the genetic variation, while incorporation of predicted currents in an "isolation-by-oceanography" model explained a larger proportion (~ 27%). CONCLUSION: The studied population is diverse, significantly outbred and exhibits high connectivity, partly due to local currents. The use of genome-wide markers combined with permutation testing provides high statistical power to detect subtle population substructure and inbreeding or outbreeding. Short haplotypes extracted from GBS data and removal of rare alleles enhances the resolution. Careful consideration of marker type and filtering thresholds is crucial when comparing independent studies, as they profoundly influence numerical estimates of population genetic metrics.
Asunto(s)
Genética de Población , Haplotipos , Kelp , Polimorfismo de Nucleótido Simple , Kelp/genética , Marcadores Genéticos , Alelos , Variación Genética , Algas Comestibles , LaminariaRESUMEN
Performing phylogenetic analysis with genome sequences maximizes the information used to estimate phylogenies and the resolution of closely related taxa. The use of single-nucleotide polymorphisms (SNPs) permits estimating trees without genome alignments and permits the use of data sets of hundreds of microbial genomes. kSNP4 is a program that identifies SNPs without using a reference genome, estimates parsimony, maximum likelihood, and neighbor-joining trees, and is able to annotate the discovered SNPs. kSNP4 is a command-line program that does not require any additional programs or dependencies to install or use. kSNP4 does not require any programming experience or bioinformatics experience to install and use. It is suitable for use by students through senior investigators. It includes a detailed user guide that explains all of the many features of kSNP4. In this study, we provide a detailed step-by-step protocol for downloading, installing, and using kSNP4 to build phylogenetic trees from genome sequences.