RESUMEN
In all human cells, human leukocyte antigen (HLA) class I glycoproteins assemble with a peptide and take it to the cell surface for surveillance by lymphocytes. These include natural killer (NK) cells and γδ T cells of innate immunity and αß T cells of adaptive immunity. In healthy cells, the presented peptides derive from human proteins, to which lymphocytes are tolerant. In pathogen-infected cells, HLA class I expression is perturbed. Reduced HLA class I expression is detected by KIR and CD94:NKG2A receptors of NK cells. Almost any change in peptide presentation can be detected by αß CD8+ T cells. In responding to extracellular pathogens, HLA class II glycoproteins, expressed by specialized antigen-presenting cells, present peptides to αß CD4+ T cells. In comparison to the families of major histocompatibility complex (MHC) class I, MHC class II and αß T cell receptors, the antigenic specificity of the γδ T cell receptors is incompletely understood.
Asunto(s)
Antígenos de Histocompatibilidad Clase II/química , Antígenos de Histocompatibilidad Clase I/química , Inmunidad Celular , Subfamília D de Receptores Similares a Lectina de las Células NK/química , Receptores de Antígenos de Linfocitos T alfa-beta/química , Receptores de Antígenos de Linfocitos T gamma-delta/química , Receptores KIR/química , Presentación de Antígeno , Linfocitos T CD8-positivos/citología , Linfocitos T CD8-positivos/inmunología , Evolución Molecular , Regulación de la Expresión Génica , Haplotipos , Antígenos de Histocompatibilidad Clase I/clasificación , Antígenos de Histocompatibilidad Clase I/genética , Antígenos de Histocompatibilidad Clase I/inmunología , Antígenos de Histocompatibilidad Clase II/clasificación , Antígenos de Histocompatibilidad Clase II/genética , Antígenos de Histocompatibilidad Clase II/inmunología , Humanos , Inmunidad Innata , Células Asesinas Naturales/citología , Células Asesinas Naturales/inmunología , Modelos Moleculares , Subfamília D de Receptores Similares a Lectina de las Células NK/genética , Subfamília D de Receptores Similares a Lectina de las Células NK/inmunología , Dominios y Motivos de Interacción de Proteínas , Estructura Secundaria de Proteína , Receptores de Antígenos de Linfocitos T alfa-beta/genética , Receptores de Antígenos de Linfocitos T alfa-beta/inmunología , Receptores de Antígenos de Linfocitos T gamma-delta/genética , Receptores de Antígenos de Linfocitos T gamma-delta/inmunología , Receptores KIR/clasificación , Receptores KIR/genética , Receptores KIR/inmunología , Transducción de SeñalRESUMEN
With broad genetic diversity and as a source of key agronomic traits, wild grape species (Vitis spp.) are crucial to enhance viticulture's climatic resilience and sustainability. This review discusses how recent breakthroughs in the genome assembly and analysis of wild grape species have led to discoveries on grape evolution, from wild species' adaptation to environmental stress to grape domestication. We detail how diploid chromosome-scale genomes from wild Vitis spp. have enabled the identification of candidate disease-resistance and flower sex determination genes and the creation of the first Vitis graph-based pangenome. Finally, we explore how wild grape genomics can impact grape research and viticulture, including aspects such as data sharing, the development of functional genomics tools, and the acceleration of genetic improvement.
Asunto(s)
Genoma de Planta , Genómica , Vitis , Vitis/genética , Genómica/métodos , Genoma de Planta/genética , Variación Genética , Resistencia a la Enfermedad/genética , Domesticación , Evolución MolecularRESUMEN
The C9orf72 hexanucleotide repeat expansion (HRE) is a common genetic cause of amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD). The inheritance is autosomal dominant, but a high proportion of subjects with the mutation are simplex cases. One possible explanation is de novo expansions of unstable intermediate-length alleles (IAs). Using haplotype sharing trees (HSTs) with the haplotype analysis tool kit (HAPTK), we derived majority-based ancestral haplotypes of HRE samples and discovered that IAs containing ≥18-20 repeats share large haplotypes in common with the HRE. Using HSTs of HRE and IA samples, we demonstrate that the longer IA haplotypes are largely indistinguishable from HRE haplotypes and that several ≥18-20 IA haplotypes share over 5 Mb (>600 markers) haplotypes in common with the HRE haplotypes. These analysis tools allow physical understanding of the haplotype blocks shared with the majority-based ancestral haplotype. Our results demonstrate that the haplotypes with longer IAs belong to the same pool of haplotypes as the HRE and suggest that longer IAs represent potential premutation alleles.
Asunto(s)
Esclerosis Amiotrófica Lateral , Proteína C9orf72 , Árboles , Humanos , Alelos , Esclerosis Amiotrófica Lateral/genética , Proteína C9orf72/genética , Expansión de las Repeticiones de ADN/genética , Haplotipos/genética , Proteínas Tirosina Quinasas Receptoras/genética , Árboles/genéticaRESUMEN
Haplotype-resolved genome assemblies were produced for Chasselas and Ugni Blanc, two heterozygous Vitis vinifera cultivars by combining high-fidelity long-read sequencing and high-throughput chromosome conformation capture (Hi-C). The telomere-to-telomere full coverage of the chromosomes allowed us to assemble separately the two haplo-genomes of both cultivars and revealed structural variations between the two haplotypes of a given cultivar. The deletions/insertions, inversions, translocations, and duplications provide insight into the evolutionary history and parental relationship among grape varieties. Integration of de novo single long-read sequencing of full-length transcript isoforms (Iso-Seq) yielded a highly improved genome annotation. Given its higher contiguity, and the robustness of the IsoSeq-based annotation, the Chasselas assembly meets the standard to become the annotated reference genome for V. vinifera. Building on these resources, we developed VitExpress, an open interactive transcriptomic platform, that provides a genome browser and integrated web tools for expression profiling, and a set of statistical tools (StatTools) for the identification of highly correlated genes. Implementation of the correlation finder tool for MybA1, a major regulator of the anthocyanin pathway, identified candidate genes associated with anthocyanin metabolism, whose expression patterns were experimentally validated as discriminating between black and white grapes. These resources and innovative tools for mining genome-related data are anticipated to foster advances in several areas of grapevine research.
Asunto(s)
Genoma de Planta , Haplotipos , Transcriptoma , Vitis , Vitis/genética , Haplotipos/genética , Transcriptoma/genética , Anotación de Secuencia Molecular/métodos , Perfilación de la Expresión Génica/métodos , Programas InformáticosRESUMEN
The clinical severity of sickle cell disease (SCD) is strongly influenced by the level of fetal haemoglobin (HbF) persistent in each patient. Three major HbF loci (BCL11A, HBS1L-MYB, and Xmn1-HBG2) have been reported, but a considerable hidden heritability remains. We conducted a genome-wide association study for HbF levels in 1006 Nigerian patients with SCD (HbSS/HbSß0), followed by a replication and meta-analysis exercise in four independent SCD cohorts (3,582 patients). To dissect association signals at the major loci, we performed stepwise conditional and haplotype association analyses and included public functional annotation datasets. Association signals were detected for BCL11A (lead SNP rs6706648, ß = -0.39, P = 4.96 × 10-34) and HBS1L-MYB (lead SNP rs61028892, ß = 0.73, P = 1.18 × 10-9), whereas the variant allele for Xmn1-HBG2 was found to be very rare. In addition, we detected three putative new trait-associated regions. Genetically, dissecting the two major loci BCL11A and HBS1L-MYB, we defined trait-increasing haplotypes (P < 0.0001) containing so far unidentified causal variants. At BCL11A, in addition to a haplotype harbouring the putative functional variant rs1427407-'T', we identified a second haplotype, tagged by the rs7565301-'A' allele, where a yet-to-be-discovered causal DNA variant may reside. Similarly, at HBS1L-MYB, one HbF-increasing haplotype contains the likely functional small indel rs66650371, and a second tagged by rs61028892-'C' is likely to harbour a presently unknown functional allele. Together, variants at BCL11A and HBS1L-MYB SNPs explained 24.1% of the trait variance. Our findings provide a path for further investigation of the causes of variable fetal haemoglobin persistence in sickle cell disease.
Asunto(s)
Anemia de Células Falciformes , Proteínas de Unión al GTP , Estudio de Asociación del Genoma Completo , Haplotipos , Femenino , Humanos , Masculino , Alelos , Anemia de Células Falciformes/genética , Anemia de Células Falciformes/sangre , Predisposición Genética a la Enfermedad , Nigeria , Proteínas Nucleares/genética , Polimorfismo de Nucleótido Simple/genética , Proteínas Represoras/genéticaRESUMEN
The first release of UK Biobank whole-genome sequence data contains 150,119 genomes. We present an open-source pipeline for filtering, phasing, and indexing these genomes on the cloud-based UK Biobank Research Analysis Platform. This pipeline makes it possible to apply haplotype-based methods to UK Biobank whole-genome sequence data. The pipeline uses BCFtools for marker filtering, Beagle for genotype phasing, and Tabix for VCF indexing. We used the pipeline to phase 406 million single-nucleotide variants on chromosomes 1-22 and X at a cost of £2,309. The maximum time required to process a chromosome was 2.6 days. In order to assess phase accuracy, we modified the pipeline to exclude trio parents. We observed a switch error rate of 0.0016 on chromosome 20 in the White British trio offspring. If we exclude markers with nonmajor allele frequency < 0.1% after phasing, this switch error rate decreases by 80% to 0.00032.
Asunto(s)
Bancos de Muestras Biológicas , Genoma , Humanos , Perros , Animales , Genotipo , Haplotipos/genética , Polimorfismo de Nucleótido Simple/genética , Reino Unido , Algoritmos , Análisis de Secuencia de ADN/métodosRESUMEN
Oculocutaneous albinism (OCA) is a rare disorder of pigment production. Affected individuals have variably decreased global pigmentation and visual-developmental changes that lead to low vision. OCA is notable for significant missing heritability, particularly among individuals with residual pigmentation. Tyrosinase (TYR) is the rate-limiting enzyme in melanin pigment biosynthesis and mutations that decrease enzyme function are one of the most common causes of OCA. We present the analysis of high-depth short-read TYR sequencing data for a cohort of 352 OCA probands, â¼50% of whom were previously sequenced without yielding a definitive diagnostic result. Our analysis identified 66 TYR single-nucleotide variants (SNVs) and small insertion/deletions (indels), 3 structural variants, and a rare haplotype comprised of two common frequency variants (p.Ser192Tyr and p.Arg402Gln) in cis-orientation, present in 149/352 OCA probands. We further describe a detailed analysis of the disease-causing haplotype, p.[Ser192Tyr; Arg402Gln] ("cis-YQ"). Haplotype analysis suggests that the cis-YQ allele arose by recombination and that multiple cis-YQ haplotypes are segregating in OCA-affected individuals and control populations. The cis-YQ allele is the most common disease-causing allele in our cohort, representing 19.1% (57/298) of TYR pathogenic alleles in individuals with type 1 (TYR-associated) OCA. Finally, among the 66 TYR variants, we found several additional alleles defined by a cis-oriented combination of minor, potentially hypomorph-producing alleles at common variant sites plus a second, rare pathogenic variant. Together, these results suggest that identification of phased variants for the full TYR locus are required for an exhaustive assessment for potentially disease-causing alleles.
Asunto(s)
Albinismo Oculocutáneo , Humanos , Haplotipos/genética , Albinismo Oculocutáneo/genética , Albinismo Oculocutáneo/diagnóstico , Mutación , AlelosRESUMEN
In this study, we present the results of community-engaged ancient DNA research initiated after the remains of 36 African-descended individuals dating to the late 18th century were unearthed in the port city of Charleston, South Carolina. The Gullah Society of Charleston, along with other Charleston community members, initiated a collaborative genomic study of these ancestors of presumed enslaved status, in an effort to visibilize their histories. We generated 18 low-coverage genomes and 31 uniparental haplotypes to assess their genetic origins and interrelatedness. Our results indicate that they have predominantly West and West-Central African genomic ancestry, with one individual exhibiting some genomic affiliation with populations in the Americas. Most were assessed as genetic males, and no autosomal kin were identified among them. Overall, this study expands our understanding of the colonial histories of African descendant populations in the US South.
Asunto(s)
Población Negra , ADN Antiguo , Humanos , Masculino , Población Negra/genética , ADN Mitocondrial/genética , Genómica , Haplotipos/genética , South Carolina/etnologíaRESUMEN
Advances in DNA sequencing technologies have enabled genotyping of complex genetic regions exhibiting copy number variation and high allelic diversity, yet it is impossible to derive exact genotypes in all cases, often resulting in ambiguous genotype calls, that is, partially missing data. An example of such a gene region is the killer-cell immunoglobulin-like receptor (KIR) genes. These genes are of special interest in the context of allogeneic hematopoietic stem cell transplantation. For such complex gene regions, current haplotype reconstruction methods are not feasible as they cannot cope with the complexity of the data. We present an expectation-maximization (EM)-algorithm to estimate haplotype frequencies (HTFs) which deals with the missing data components, and takes into account linkage disequilibrium (LD) between genes. To cope with the exponential increase in the number of haplotypes as genes are added, we add three components to a standard EM-algorithm implementation. First, reconstruction is performed iteratively, adding one gene at a time. Second, after each step, haplotypes with frequencies below a threshold are collapsed in a rare haplotype group. Third, the HTF of the rare haplotype group is profiled in subsequent iterations to improve estimates. A simulation study evaluates the effect of combining information of multiple genes on the estimates of these frequencies. We show that estimated HTFs are approximately unbiased. Our simulation study shows that the EM-algorithm is able to combine information from multiple genes when LD is high, whereas increased ambiguity levels increase bias. Linear regression models based on this EM, show that a large number of haplotypes can be problematic for unbiased effect size estimation and that models need to be sparse. In a real data analysis of KIR genotypes, we compare HTFs to those obtained in an independent study. Our new EM-algorithm-based method is the first to account for the full genetic architecture of complex gene regions, such as the KIR gene region. This algorithm can handle the numerous observed ambiguities, and allows for the collapsing of haplotypes to perform implicit dimension reduction. Combining information from multiple genes improves haplotype reconstruction.
Asunto(s)
Variaciones en el Número de Copia de ADN , Modelos Genéticos , Humanos , Haplotipos , Frecuencia de los Genes , GenotipoRESUMEN
B2 haplotype major histocompatibility complex (MHC) has been extensively reported to confer resistance to various avian diseases. But its peptide-binding motif is unknown, and the presenting peptide is rarely identified. Here, we identified its peptide-binding motif (X-A/V/I/L/P/S/G-X-X-X-X-X-X-V/I/L) in vitro using Random Peptide Library-based MHC I LC-MS/MS analysis. To further clarify the structure basis of motif, we determined the crystal structure of the BF2∗02:01-PB2552-560 complex at 1.9 Å resolution. We found that BF2∗02:01 had a relatively wide antigen-binding groove, and the structural characterization of pockets was consistent with the characterization of peptide-binding motif. The wider features of the peptide-binding motif and increased number of peptides bound by BF2∗02:01 than BF2∗04:01 might resolve the puzzles for the presence of potential H9N2 resistance in B2 chickens. Afterward, we explored the H9N2 avian influenza virus (AIV)-induced cellular immune response in B2 haplotype chickens in vivo. We found that ratio of CD8+ T cell and kinetic expression of cytotoxicity genes including Granzyme K, interferon-γ, NK lysin, and poly-(ADP-ribose) polymerase in peripheral blood mononuclear cells were significantly increased in defending against H9N2 AIV infection. Especially, we selected 425 epitopes as candidate epitopes based on the peptide-binding motif and further identified four CD8+ T-cell epitopes on H9N2 AIV including NS198-106, PB2552-560, NP182-190, and NP455-463 via ELI-spot interferon-γ detections after stimulating memory lymphocytes with peptides. More importantly, these epitopes were found to be conserved in H7N9 AIV and H9N2 AIV. These findings provide direction for developing effective T cell epitope vaccines using well-conserved internal viral antigens in chickens.
Asunto(s)
Pollos , Epítopos de Linfocito T , Subtipo H9N2 del Virus de la Influenza A , Gripe Aviar , Subtipo H9N2 del Virus de la Influenza A/inmunología , Animales , Epítopos de Linfocito T/inmunología , Gripe Aviar/inmunología , Gripe Aviar/virología , Linfocitos T CD8-positivos/inmunología , Antígenos de Histocompatibilidad Clase I/inmunología , Antígenos de Histocompatibilidad Clase I/metabolismoRESUMEN
Rosa roxburghii and Rosa sterilis, two species belonging to the Rosaceae family, are widespread in the southwest of China. These species have gained recognition for their remarkable abundance of ascorbate in their fresh fruits, making them an ideal vitamin C resource. In this study, we generated two high-quality chromosome-scale genome assemblies for R. roxburghii and R. sterilis, with genome sizes of 504 and 981.2 Mb, respectively. Notably, we present a haplotype-resolved, chromosome-scale assembly for diploid R. sterilis. Our results indicated that R. sterilis originated from the hybridization of R. roxburghii and R. longicuspis. Genome analysis revealed the absence of recent whole-genome duplications in both species and identified a series of duplicated genes that possibly contributing to the accumulation of flavonoids. We identified two genes in the ascorbate synthesis pathway, GGP and GalLDH, that show signs of positive selection, along with high expression levels of GDP-d-mannose 3', 5'-epimerase (GME) and GDP-l-galactose phosphorylase (GGP) during fruit development. Furthermore, through co-expression network analysis, we identified key hub genes (MYB5 and bZIP) that likely regulate genes in the ascorbate synthesis pathway, promoting ascorbate biosynthesis. Additionally, we observed the expansion of terpene synthase genes in these two species and tissue expression patterns, suggesting their involvement in terpenoid biosynthesis. Our research provides valuable insights into genome evolution and the molecular basis of the high concentration of ascorbate in these two Rosa species.
Asunto(s)
Rosa , Rosa/genética , Rosa/metabolismo , Ácido Ascórbico/metabolismo , Genes de Plantas , Cromosomas , Evolución MolecularRESUMEN
Several dwarf and semi-dwarf genes have been identified in barley. However, only a limited number have been effectively utilized in breeding programs to cultivate lodging resistant varieties. This is due to the common association of dwarf and semi-dwarf traits with negative effects on malt quality. In this study, we employed gene editing to generate three new haplotypes of sdw1/denso candidate gene gibberellin (GA) 20-oxidase2 (GA20ox2). These haplotypes induced a dwarfing phenotype and enhancing yield potential, and promoting seed dormancy, thereby reducing pre-harvest sprouting. Moreover, ß-amylase activity in the grains of the mutant lines was significantly increased, which is beneficial for malt quality. The haplotype analysis revealed significant genetic divergence of this gene during barley domestication and selection. A novel allele (sdw1.ZU9), containing a 96-bp fragment in the promoter region of HvGA20ox2, was discovered and primarily observed in East Asian and Russian barley varieties. The 96-bp fragment was associated with lower gene expression, leading to lower plant height but higher germination rate. In conclusion, HvGA20ox2 can be potentially used to develop semi-dwarf barley cultivars with high yield and improved malt quality.
RESUMEN
The Phyllanthaceae family comprises a diverse range of plants with medicinal, edible, and ornamental value, extensively cultivated worldwide. Polyploid species commonly occur in Phyllanthaceae. Due to the rather complex genomes and evolutionary histories, their speciation process has been still lacking in research. In this study, we generated chromosome-scale haplotype-resolved genomes of two octoploid species (Phyllanthus emblica and Sauropus spatulifolius) in Phyllanthaceae family. Combined with our previously reported one tetraploid (Sauropus androgynus) and one diploid species (Phyllanthus cochinchinensis) from the same family, we explored their speciation history. The three polyploid species were all identified as allopolyploids with subgenome A/B. Each of their two distinct subgenome groups from various species was uncovered to independently share a common diploid ancestor (Ancestor-AA and Ancestor-BB). Via different evolutionary routes, comprising various scenarios of bifurcating divergence, allopolyploidization (hybrid polyploidization), and autopolyploidization, they finally evolved to the current tetraploid S. androgynus, and octoploid S. spatulifolius and P. emblica, respectively. We further discuss the variations in copy number of alleles and the potential impacts within the two octoploids. In addition, we also investigated the fluctuation of metabolites with medical values and identified the key factor in its biosynthesis process in octoploids species. Our study reconstructed the evolutionary history of these Phyllanthaceae species, highlighting the critical roles of polyploidization and hybridization in their speciation processes. The high-quality genomes of the two octoploid species provide valuable genomic resources for further research of evolution and functional genomics.
Asunto(s)
Genoma de Planta , Haplotipos , Hibridación Genética , Poliploidía , Genoma de Planta/genética , Haplotipos/genética , Filogenia , Especiación Genética , Evolución MolecularRESUMEN
Phanera championii is a medicinal liana plant that has successfully adapted to hostile karst habitats. Despite extensive research on its medicinal components and pharmacological effects, the molecular mechanisms underlying the biosynthesis of critical flavonoids and its adaptation to karst habitats remain elusive. In this study, we performed high-coverage PacBio and Hi-C sequencing of P. championii, which revealed its high heterozygosity and phased the genome into two haplotypes: Hap1 (384.60 Mb) and Hap2 (383.70 Mb), encompassing a total of 58 612 annotated genes. Comparative genomes analysis revealed that P. championii experienced two whole-genome duplications (WGDs), with approximately 59.59% of genes originating from WGD events, thereby providing a valuable genetic resource for P. championii. Moreover, we identified a total of 112 genes that were strongly positively selected. Additionally, about 81.60 Mb of structural variations between the two haplotypes. The allele-specific expression patterns suggested that the dominant effect of P. championii was the elimination of deleterious mutations and the promotion of beneficial mutations to enhance fitness. Moreover, our transcriptome and metabolome analysis revealed alleles in different tissues or different haplotypes collectively regulate the synthesis of flavonoid metabolites. In summary, our comprehensive study highlights the significance of genomic and morphological adaptation in the successful adaptation of P. championii to karst habitats. The high-quality phased genomes obtained in this study serve as invaluable genomic resources for various applications, including germplasm conservation, breeding, evolutionary studies, and elucidation of pathways governing key biological traits of P. championii.
Asunto(s)
Genoma de Planta , Genómica , Haplotipos , Análisis de Secuencia de ADN , Genoma de Planta/genética , Flavonoides/genéticaRESUMEN
Identifying soft selective sweeps using genomic data is a challenging yet crucial task in population genetics. In this study, we present HaploSweep, a novel method for detecting and categorizing soft and hard selective sweeps based on haplotype structure. Through simulations spanning a broad range of selection intensities, softness levels, and demographic histories, we demonstrate that HaploSweep outperforms iHS, nSL, and H12 in detecting soft sweeps. HaploSweep achieves high classification accuracy-0.9247 for CHB, 0.9484 for CEU, and 0.9829 YRI-when applied to simulations in line with the human Out-of-Africa demographic model. We also observe that the classification accuracy remains consistently robust across different demographic models. Additionally, we introduce a refined method to accurately distinguish soft shoulders adjacent to hard sweeps from soft sweeps. Application of HaploSweep to genomic data of CHB, CEU, and YRI populations from the 1000 genomes project has led to the discovery of several new genes that bear strong evidence of population-specific soft sweeps (HRNR, AMBRA1, CBFA2T2, DYNC2H1, and RANBP2 etc.), with prevalent associations to immune functions and metabolic processes. The validated performance of HaploSweep, demonstrated through both simulated and real data, underscores its potential as a valuable tool for detecting and comprehending the role of soft sweeps in adaptive evolution.
Asunto(s)
Genética de Población , Haplotipos , Selección Genética , Humanos , Genética de Población/métodos , Modelos Genéticos , Simulación por Computador , Genoma Humano , Programas InformáticosRESUMEN
With the widespread clinical adoption of noninvasive screening for fetal chromosomal aneuploidies based on cell-free DNA analysis from maternal plasma, more researchers are turning their attention to noninvasive prenatal assessment for single-gene disorders. The development of a spectrum of approaches to analyze cell-free DNA in maternal circulation, including relative mutation dosage, relative haplotype dosage, and size-based methods, has expanded the scope of noninvasive prenatal testing to sex-linked and autosomal recessive disorders. Cell-free fetal DNA analysis for several of the more prevalent single-gene disorders has recently been introduced into clinical service. This article reviews the analytical approaches currently available and discusses the extent of the clinical implementation of noninvasive prenatal testing for single-gene disorders.
Asunto(s)
Ácidos Nucleicos Libres de Células , Aneuploidia , ADN/genética , Femenino , Feto , Humanos , Embarazo , Diagnóstico Prenatal/métodosRESUMEN
Haplotypes can be estimated from unphased genotype data via statistical methods. When parent-offspring trios are available for inferring the true phase from Mendelian inheritance rules, the accuracy of statistical phasing is usually measured by the switch error rate, which is the proportion of pairs of consecutive heterozygotes that are incorrectly phased. We present a method for estimating the genotype error rate from parent-offspring trios and a method for estimating the bias that occurs in the observed switch error rate as a result of genotype error. We apply these methods to 485,301 genotyped UK Biobank samples that include 898 White British trios and to 38,387 sequenced TOPMed samples that include 217 African Caribbean trios and 669 European American trios. We show that genotype error inflates the observed switch error rate and that the relative bias increases with sample size. For the UK Biobank White British trios, the observed switch error rate in the trio offspring is 2.4 times larger than the estimated true switch error rate (1.4 × 10-3 vs 5.8 × 10-4. We propose an alternate definition of phase error that counts two consecutive switch errors as a single error because back-to-back switch errors arise when a single heterozygote is incorrectly phased with respect to the surrounding heterozygotes. With this definition, we estimate that the average distance between phase errors is 64 megabases in the UK Biobank White British individuals.
Asunto(s)
Herencia , Polimorfismo de Nucleótido Simple , Sesgo , Genotipo , Haplotipos/genética , Humanos , Polimorfismo de Nucleótido Simple/genéticaRESUMEN
Haplotype networks are graphs used to represent evolutionary relationships between a set of taxa and are characterized by intuitiveness in analyzing genealogical relationships of closely related genomes. We here propose a novel algorithm termed McAN that considers mutation spectrum history (mutations in ancestry haplotype should be contained in descendant haplotype), node size (corresponding to sample count for a given node) and sampling time when constructing haplotype network. We show that McAN is two orders of magnitude faster than state-of-the-art algorithms without losing accuracy, making it suitable for analysis of a large number of sequences. Based on our algorithm, we developed an online web server and offline tool for haplotype network construction, community lineage determination, and interactive network visualization. We demonstrate that McAN is highly suitable for analyzing and visualizing massive genomic data and is helpful to enhance the understanding of genome evolution. Availability: Source code is written in C/C++ and available at https://github.com/Theory-Lun/McAN and https://ngdc.cncb.ac.cn/biocode/tools/BT007301 under the MIT license. Web server is available at https://ngdc.cncb.ac.cn/bit/hapnet/. SARS-CoV-2 dataset are available at https://ngdc.cncb.ac.cn/ncov/. Contact: songshh@big.ac.cn (Song S), zhaowm@big.ac.cn (Zhao W), baoym@big.ac.cn (Bao Y), zhangzhang@big.ac.cn (Zhang Z), ybxue@big.ac.cn (Xue Y).
Asunto(s)
COVID-19 , SARS-CoV-2 , Humanos , Haplotipos , SARS-CoV-2/genética , COVID-19/genética , Algoritmos , Genómica , Programas InformáticosRESUMEN
Population stratification (PS) is one major source of confounding in both single nucleotide polymorphism (SNP) and haplotype association studies. To address PS, principal component regression (PCR) and linear mixed model (LMM) are the current standards for SNP associations, which are also commonly borrowed for haplotype studies. However, the underfitting and overfitting problems introduced by PCR and LMM, respectively, have yet to be addressed. Furthermore, there have been only a few theoretical approaches proposed to address PS specifically for haplotypes. In this paper, we propose a new method under the Bayesian LASSO framework, QBLstrat, to account for PS in identifying rare and common haplotypes associated with a continuous trait of interest. QBLstrat utilizes a large number of principal components (PCs) with appropriate priors to sufficiently correct for PS, while shrinking the estimates of unassociated haplotypes and PCs. We compare the performance of QBLstrat with the Bayesian counterparts of PCR and LMM and a current method, haplo.stats. Extensive simulation studies and real data analyses show that QBLstrat is superior in controlling false positives while maintaining competitive power for identifying true positives under PS.
Asunto(s)
Modelos Genéticos , Polimorfismo de Nucleótido Simple , Haplotipos , Teorema de Bayes , Fenotipo , Estudio de Asociación del Genoma CompletoRESUMEN
INTRODUCTION: Joint linkage and association (JLA) analysis combines two disease gene mapping strategies: linkage information contained in families and association information contained in populations. Such a JLA analysis can increase mapping power, especially when the evidence for both linkage and association is low to moderate. Similarly, an association analysis based on haplotypes instead of single markers can increase mapping power when the association pattern is complex. METHODS: In this paper, we present an extension to the GENEHUNTER-MODSCORE software package that enables a JLA analysis based on haplotypes and uses information from arbitrary pedigree types and unrelated individuals. Our new JLA method is an extension of the MOD score approach for linkage analysis, which allows the estimation of trait-model and linkage disequilibrium (LD) parameters, i.e., penetrance, disease-allele frequency, and haplotype frequencies. LD is modeled between alleles at a single diallelic disease locus and up to three diallelic test markers. Linkage information is contributed by additional multi-allelic flanking markers. We investigated the statistical properties of our JLA implementation using extensive simulations, and we compared our approach to another commonly used single-marker JLA test. To demonstrate the applicability of our new method in practice, we analyzed pedigree data from the German National Case Collection for Familial Pancreatic Cancer (FaPaCa). RESULTS: Based on the simulated data, we demonstrated the validity of our JLA-MOD score analysis implementation and identified scenarios in which haplotype-based tests outperformed the single-marker test. The estimated trait-model and LD parameters were in good accordance with the simulated values. Our method outperformed another commonly used JLA single-marker test when the LD pattern was complex. The exploratory analysis of the FaPaCa families led to the identification of a promising genetic region on chromosome 22q13.33, which can serve as a starting point for future mutation analysis and molecular research in pancreatic cancer. CONCLUSION: Our newly proposed JLA-MOD score method proves to be a valuable gene mapping and characterization tool, especially when either linkage or association information alone provide insufficient power to identify the disease-causing genetic variants.