Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 8.528
Filter
Add more filters

Publication year range
1.
Cell ; 179(3): 589-603, 2019 10 17.
Article in English | MEDLINE | ID: mdl-31607513

ABSTRACT

Genome-wide association studies (GWASs) have focused primarily on populations of European descent, but it is essential that diverse populations become better represented. Increasing diversity among study participants will advance our understanding of genetic architecture in all populations and ensure that genetic research is broadly applicable. To facilitate and promote research in multi-ancestry and admixed cohorts, we outline key methodological considerations and highlight opportunities, challenges, solutions, and areas in need of development. Despite the perception that analyzing genetic data from diverse populations is difficult, it is scientifically and ethically imperative, and there is an expanding analytical toolbox to do it well.


Subject(s)
Genome-Wide Association Study/methods , Genotyping Techniques/methods , Human Genetics/methods , Data Accuracy , Genetic Variation , Genetics, Population/methods , Genetics, Population/standards , Genome-Wide Association Study/standards , Genotyping Techniques/standards , Human Genetics/standards , Humans , Pedigree
2.
Nat Immunol ; 22(6): 781-793, 2021 06.
Article in English | MEDLINE | ID: mdl-34031617

ABSTRACT

Multimodal T cell profiling can enable more precise characterization of elusive cell states underlying disease. Here, we integrated single-cell RNA and surface protein data from 500,089 memory T cells to define 31 cell states from 259 individuals in a Peruvian tuberculosis (TB) progression cohort. At immune steady state >4 years after infection and disease resolution, we found that, after accounting for significant effects of age, sex, season and genetic ancestry on T cell composition, a polyfunctional type 17 helper T (TH17) cell-like effector state was reduced in abundance and function in individuals who previously progressed from Mycobacterium tuberculosis (M.tb) infection to active TB disease. These cells are capable of responding to M.tb peptides. Deconvoluting this state-uniquely identifiable with multimodal analysis-from public data demonstrated that its depletion may precede and persist beyond active disease. Our study demonstrates the power of integrative multimodal single-cell profiling to define cell states relevant to disease and other traits.


Subject(s)
Immunologic Memory , Mycobacterium tuberculosis/immunology , Th17 Cells/immunology , Tuberculosis, Pulmonary/immunology , Adolescent , Adult , Age Factors , Aged , Aged, 80 and over , Case-Control Studies , Child , Disease Progression , Female , Follow-Up Studies , Genetic Predisposition to Disease , Genotyping Techniques , Humans , Male , Middle Aged , Mycobacterium tuberculosis/isolation & purification , Peru , RNA-Seq , Sex Factors , Single-Cell Analysis , Socioeconomic Factors , Tuberculosis, Pulmonary/blood , Tuberculosis, Pulmonary/genetics , Tuberculosis, Pulmonary/microbiology , Young Adult
3.
Cell ; 175(3): 848-858.e6, 2018 10 18.
Article in English | MEDLINE | ID: mdl-30318150

ABSTRACT

In familial searching in forensic genetics, a query DNA profile is tested against a database to determine whether it represents a relative of a database entrant. We examine the potential for using linkage disequilibrium to identify pairs of profiles as belonging to relatives when the query and database rely on nonoverlapping genetic markers. Considering data on individuals genotyped with both microsatellites used in forensic applications and genome-wide SNPs, we find that ∼30%-32% of parent-offspring pairs and ∼35%-36% of sib pairs can be identified from the SNPs of one member of the pair and the microsatellites of the other. The method suggests the possibility of performing familial searches of microsatellite databases using query SNP profiles, or vice versa. It also reveals that privacy concerns arising from computations across multiple databases that share no genetic markers in common entail risks, not only for database entrants, but for their close relatives as well.


Subject(s)
Family , Forensic Genetics/methods , Genetics, Population/methods , Genotyping Techniques/methods , Polymorphism, Single Nucleotide , Female , Humans , Linkage Disequilibrium , Male , Microsatellite Repeats , Models, Genetic , Models, Statistical , Pedigree
4.
Nat Rev Genet ; 25(7): 460-475, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38366034

ABSTRACT

Short tandem repeats (STRs) are highly polymorphic sequences throughout the human genome that are composed of repeated copies of a 1-6-bp motif. Over 1 million variable STR loci are known, some of which regulate gene expression and influence complex traits, such as height. Moreover, variants in at least 60 STR loci cause genetic disorders, including Huntington disease and fragile X syndrome. Accurately identifying and genotyping STR variants is challenging, in particular mapping short reads to repetitive regions and inferring expanded repeat lengths. Recent advances in sequencing technology and computational tools for STR genotyping from sequencing data promise to help overcome this challenge and solve genetically unresolved cases and the 'missing heritability' of polygenic traits. Here, we compare STR genotyping methods, analytical tools and their applications to understand the effect of STR variation on health and disease. We identify emergent opportunities to refine genotyping and quality-control approaches as well as to integrate STRs into variant-calling workflows and large cohort analyses.


Subject(s)
Genome, Human , Microsatellite Repeats , Humans , Microsatellite Repeats/genetics , Sequence Analysis, DNA/methods , Genotyping Techniques/methods , High-Throughput Nucleotide Sequencing/methods , Genotype
5.
Nature ; 629(8014): 1149-1157, 2024 May.
Article in English | MEDLINE | ID: mdl-38720070

ABSTRACT

In somatic tissue differentiation, chromatin accessibility changes govern priming and precursor commitment towards cellular fates1-3. Therefore, somatic mutations are likely to alter chromatin accessibility patterns, as they disrupt differentiation topologies leading to abnormal clonal outgrowth. However, defining the impact of somatic mutations on the epigenome in human samples is challenging due to admixed mutated and wild-type cells. Here, to chart how somatic mutations disrupt epigenetic landscapes in human clonal outgrowths, we developed genotyping of targeted loci with single-cell chromatin accessibility (GoT-ChA). This high-throughput platform links genotypes to chromatin accessibility at single-cell resolution across thousands of cells within a single assay. We applied GoT-ChA to CD34+ cells from patients with myeloproliferative neoplasms with JAK2V617F-mutated haematopoiesis. Differential accessibility analysis between wild-type and JAK2V617F-mutant progenitors revealed both cell-intrinsic and cell-state-specific shifts within mutant haematopoietic precursors, including cell-intrinsic pro-inflammatory signatures in haematopoietic stem cells, and a distinct profibrotic inflammatory chromatin landscape in megakaryocytic progenitors. Integration of mitochondrial genome profiling and cell-surface protein expression measurement allowed expansion of genotyping onto DOGMA-seq through imputation, enabling single-cell capture of genotypes, chromatin accessibility, RNA expression and cell-surface protein expression. Collectively, we show that the JAK2V617F mutation leads to epigenetic rewiring in a cell-intrinsic and cell type-specific manner, influencing inflammation states and differentiation trajectories. We envision that GoT-ChA will empower broad future investigations of the critical link between somatic mutations and epigenetic alterations across clonal populations in malignant and non-malignant contexts.


Subject(s)
Chromatin , Epigenesis, Genetic , Genotype , Mutation , Single-Cell Analysis , Animals , Female , Humans , Male , Mice , Antigens, CD34/metabolism , Cell Differentiation/genetics , Chromatin/chemistry , Chromatin/genetics , Chromatin/metabolism , Epigenesis, Genetic/genetics , Epigenome/genetics , Genome, Mitochondrial/genetics , Genotyping Techniques , Hematopoiesis/genetics , Hematopoietic Stem Cells/metabolism , Hematopoietic Stem Cells/pathology , Inflammation/genetics , Inflammation/pathology , Janus Kinase 2/genetics , Janus Kinase 2/metabolism , Megakaryocytes/metabolism , Megakaryocytes/pathology , Membrane Proteins/genetics , Myeloproliferative Disorders/genetics , Myeloproliferative Disorders/metabolism , Myeloproliferative Disorders/pathology , RNA/genetics , Clone Cells/metabolism
6.
Nature ; 622(7984): 784-793, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37821707

ABSTRACT

The Mexico City Prospective Study is a prospective cohort of more than 150,000 adults recruited two decades ago from the urban districts of Coyoacán and Iztapalapa in Mexico City1. Here we generated genotype and exome-sequencing data for all individuals and whole-genome sequencing data for 9,950 selected individuals. We describe high levels of relatedness and substantial heterogeneity in ancestry composition across individuals. Most sequenced individuals had admixed Indigenous American, European and African ancestry, with extensive admixture from Indigenous populations in central, southern and southeastern Mexico. Indigenous Mexican segments of the genome had lower levels of coding variation but an excess of homozygous loss-of-function variants compared with segments of African and European origin. We estimated ancestry-specific allele frequencies at 142 million genomic variants, with an effective sample size of 91,856 for Indigenous Mexican ancestry at exome variants, all available through a public browser. Using whole-genome sequencing, we developed an imputation reference panel that outperforms existing panels at common variants in individuals with high proportions of central, southern and southeastern Indigenous Mexican ancestry. Our work illustrates the value of genetic studies in diverse populations and provides foundational imputation and allele frequency resources for future genetic studies in Mexico and in the United States, where the Hispanic/Latino population is predominantly of Mexican descent.


Subject(s)
Exome Sequencing , Genome, Human , Genotype , Hispanic or Latino , Adult , Humans , Africa/ethnology , Americas/ethnology , Europe/ethnology , Gene Frequency/genetics , Genetics, Population , Genome, Human/genetics , Genotyping Techniques , Hispanic or Latino/genetics , Homozygote , Loss of Function Mutation/genetics , Mexico , Prospective Studies
7.
Nature ; 617(7962): 764-768, 2023 05.
Article in English | MEDLINE | ID: mdl-37198478

ABSTRACT

Critical illness in COVID-19 is an extreme and clinically homogeneous disease phenotype that we have previously shown1 to be highly efficient for discovery of genetic associations2. Despite the advanced stage of illness at presentation, we have shown that host genetics in patients who are critically ill with COVID-19 can identify immunomodulatory therapies with strong beneficial effects in this group3. Here we analyse 24,202 cases of COVID-19 with critical illness comprising a combination of microarray genotype and whole-genome sequencing data from cases of critical illness in the international GenOMICC (11,440 cases) study, combined with other studies recruiting hospitalized patients with a strong focus on severe and critical disease: ISARIC4C (676 cases) and the SCOURGE consortium (5,934 cases). To put these results in the context of existing work, we conduct a meta-analysis of the new GenOMICC genome-wide association study (GWAS) results with previously published data. We find 49 genome-wide significant associations, of which 16 have not been reported previously. To investigate the therapeutic implications of these findings, we infer the structural consequences of protein-coding variants, and combine our GWAS results with gene expression data using a monocyte transcriptome-wide association study (TWAS) model, as well as gene and protein expression using Mendelian randomization. We identify potentially druggable targets in multiple systems, including inflammatory signalling (JAK1), monocyte-macrophage activation and endothelial permeability (PDE4A), immunometabolism (SLC2A5 and AK5), and host factors required for viral entry and replication (TMPRSS2 and RAB2A).


Subject(s)
COVID-19 , Critical Illness , Genetic Predisposition to Disease , Genetic Variation , Genome-Wide Association Study , Humans , COVID-19/genetics , Genetic Predisposition to Disease/genetics , Genetic Variation/genetics , Genotype , Genotyping Techniques , Monocytes/metabolism , Phenotype , rab GTP-Binding Proteins/genetics , Transcriptome , Whole Genome Sequencing
8.
Genome Res ; 34(7): 1008-1026, 2024 Aug 20.
Article in English | MEDLINE | ID: mdl-39013593

ABSTRACT

Microsatellites are highly mutable sequences that can serve as markers for relationships among individuals or cells within a population. The accuracy and resolution of reconstructing these relationships depends on the fidelity of microsatellite profiling and the number of microsatellites profiled. However, current methods for targeted profiling of microsatellites incur significant "stutter" artifacts that interfere with accurate genotyping, and sequencing costs preclude whole-genome microsatellite profiling of a large number of samples. We developed a novel method for accurate and cost-effective targeted profiling of a panel of more than 150,000 microsatellites per sample, along with a computational tool for designing large-scale microsatellite panels. Our method addresses the greatest challenge for microsatellite profiling-"stutter" artifacts-with a low-temperature hybridization capture that significantly reduces these artifacts. We also developed a computational tool for accurate genotyping of the resulting microsatellite sequencing data that uses an ensemble approach integrating three microsatellite genotyping tools, which we optimize by analysis of de novo microsatellite mutations in human trios. Altogether, our suite of experimental and computational tools enables high-fidelity, large-scale profiling of microsatellites, which may find utility in diverse applications such as lineage tracing, population genetics, ecology, and forensics.


Subject(s)
Microsatellite Repeats , Humans , Genotyping Techniques/methods , Genotype , Sequence Analysis, DNA/methods
9.
Immunity ; 48(1): 59-74.e5, 2018 01 16.
Article in English | MEDLINE | ID: mdl-29343440

ABSTRACT

Toll-like receptors (TLRs) sense pathogen-associated molecular patterns to activate the production of inflammatory mediators. TLR4 recognizes lipopolysaccharide (LPS) and drives the secretion of inflammatory cytokines, often contributing to sepsis. We report that transient receptor potential melastatin-like 7 (TRPM7), a non-selective but Ca2+-conducting ion channel, mediates the cytosolic Ca2+ elevations essential for LPS-induced macrophage activation. LPS triggered TRPM7-dependent Ca2+ elevations essential for TLR4 endocytosis and the subsequent activation of the transcription factor IRF3. In a parallel pathway, the Ca2+ signaling initiated by TRPM7 was also essential for the nuclear translocation of NFκB. Consequently, TRPM7-deficient macrophages exhibited major deficits in the LPS-induced transcriptional programs in that they failed to produce IL-1ß and other key pro-inflammatory cytokines. In accord with these defects, mice with myeloid-specific deletion of Trpm7 are protected from LPS-induced peritonitis. Our study highlights the importance of Ca2+ signaling in macrophage activation and identifies the ion channel TRPM7 as a central component of TLR4 signaling.


Subject(s)
Calcium/metabolism , Macrophage Activation/drug effects , TRPM Cation Channels/metabolism , Toll-Like Receptor 4/metabolism , Animals , Cell Culture Techniques , Endocytosis/drug effects , Female , Flow Cytometry , Fluorescent Antibody Technique , Gene Expression Regulation , Genotyping Techniques , Immunoblotting , Interferon Regulatory Factor-3/metabolism , Lipopolysaccharides/pharmacology , Macrophages/metabolism , Male , Mice , NF-kappa B/metabolism , Patch-Clamp Techniques , Real-Time Polymerase Chain Reaction , Signal Transduction/drug effects , Signal Transduction/physiology , TRPM Cation Channels/genetics
10.
Genome Res ; 33(5): 787-797, 2023 May.
Article in English | MEDLINE | ID: mdl-37127332

ABSTRACT

High-throughput genotyping enables the large-scale analysis of genetic diversity in population genomics and genome-wide association studies that combine the genotypic and phenotypic characterization of large collections of accessions. Sequencing-based approaches for genotyping are progressively replacing traditional genotyping methods because of the lower ascertainment bias. However, genome-wide genotyping based on sequencing becomes expensive in species with large genomes and a high proportion of repetitive DNA. Here we describe the use of CRISPR-Cas9 technology to deplete repetitive elements in the 3.76-Gb genome of lentil (Lens culinaris), 84% consisting of repeats, thus concentrating the sequencing data on coding and regulatory regions (single-copy regions). We designed a custom set of 566,766 gRNAs targeting 2.9 Gbp of repeats and excluding repetitive regions overlapping annotated genes and putative regulatory elements based on ATAC-seq data. The novel depletion method removed ∼40% of reads mapping to repeats, increasing those mapping to single-copy regions by ∼2.6-fold. When analyzing 25 million fragments, this repeat-to-single-copy shift in the sequencing data increased the number of genotyped bases of ∼10-fold compared to nondepleted libraries. In the same condition, we were also able to identify ∼12-fold more genetic variants in the single-copy regions and increased the genotyping accuracy by rescuing thousands of heterozygous variants that otherwise would be missed because of low coverage. The method performed similarly regardless of the multiplexing level, type of library or genotypes, including different cultivars and a closely related species (L. orientalis). Our results showed that CRISPR-Cas9-driven repeat depletion focuses sequencing data on single-copy regions, thus improving high-density and genome-wide genotyping in large and repetitive genomes.


Subject(s)
CRISPR-Cas Systems , Genome-Wide Association Study , Genotype , Genome, Plant , Genotyping Techniques , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods
11.
Nature ; 581(7809): 444-451, 2020 05.
Article in English | MEDLINE | ID: mdl-32461652

ABSTRACT

Structural variants (SVs) rearrange large segments of DNA1 and can have profound consequences in evolution and human disease2,3. As national biobanks, disease-association studies, and clinical genetic testing have grown increasingly reliant on genome sequencing, population references such as the Genome Aggregation Database (gnomAD)4 have become integral in the interpretation of single-nucleotide variants (SNVs)5. However, there are no reference maps of SVs from high-coverage genome sequencing comparable to those for SNVs. Here we present a reference of sequence-resolved SVs constructed from 14,891 genomes across diverse global populations (54% non-European) in gnomAD. We discovered a rich and complex landscape of 433,371 SVs, from which we estimate that SVs are responsible for 25-29% of all rare protein-truncating events per genome. We found strong correlations between natural selection against damaging SNVs and rare SVs that disrupt or duplicate protein-coding sequence, which suggests that genes that are highly intolerant to loss-of-function are also sensitive to increased dosage6. We also uncovered modest selection against noncoding SVs in cis-regulatory elements, although selection against protein-truncating SVs was stronger than all noncoding effects. Finally, we identified very large (over one megabase), rare SVs in 3.9% of samples, and estimate that 0.13% of individuals may carry an SV that meets the existing criteria for clinically important incidental findings7. This SV resource is freely distributed via the gnomAD browser8 and will have broad utility in population genetics, disease-association studies, and diagnostic screening.


Subject(s)
Disease/genetics , Genetic Variation , Genetics, Medical/standards , Genetics, Population/standards , Genome, Human/genetics , Female , Genetic Testing , Genotyping Techniques , Humans , Male , Middle Aged , Mutation , Polymorphism, Single Nucleotide/genetics , Racial Groups/genetics , Reference Standards , Selection, Genetic , Whole Genome Sequencing
12.
Plant J ; 118(6): 2296-2317, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38459738

ABSTRACT

Next-generation sequencing (NGS) library construction often involves using restriction enzymes to decrease genome complexity, enabling versatile polymorphism detection in plants. However, plant leaves frequently contain impurities, such as polyphenols, necessitating DNA purification before enzymatic reactions. To overcome this problem, we developed a PCR-based method for expeditious NGS library preparation, offering flexibility in number of detected polymorphisms. By substituting a segment of the simple sequence repeat sequence in the MIG-seq primer set (MIG-seq being a PCR method enabling library construction with low-quality DNA) with degenerate oligonucleotides, we introduced variability in detectable polymorphisms across various crops. This innovation, named degenerate oligonucleotide primer MIG-seq (dpMIG-seq), enabled a streamlined protocol for constructing dpMIG-seq libraries from unpurified DNA, which was implemented stably in several crop species, including fruit trees. Furthermore, dpMIG-seq facilitated efficient lineage selection in wheat and enabled linkage map construction and quantitative trait loci analysis in tomato, rice, and soybean without necessitating DNA concentration adjustments. These findings underscore the potential of the dpMIG-seq protocol for advancing genetic analyses across diverse plant species.


Subject(s)
Genotyping Techniques , High-Throughput Nucleotide Sequencing , Polymerase Chain Reaction , High-Throughput Nucleotide Sequencing/methods , Polymerase Chain Reaction/methods , Genotyping Techniques/methods , DNA Primers/genetics , Quantitative Trait Loci/genetics , Oryza/genetics , Triticum/genetics , Solanum lycopersicum/genetics , Chromosome Mapping , DNA, Plant/genetics , Glycine max/genetics , Gene Library , Polymorphism, Genetic , Crops, Agricultural/genetics , Genotype
13.
Bioinformatics ; 40(Suppl 2): ii11-ii19, 2024 09 01.
Article in English | MEDLINE | ID: mdl-39230689

ABSTRACT

MOTIVATION: Complex structural variants (SVs) are genomic rearrangements that involve multiple segments of DNA. They contribute to human diversity and have been shown to cause Mendelian disease. Nevertheless, our abilities to analyse complex SVs are very limited. As opposed to deletions and other canonical types of SVs, there are no established tools that have explicitly been designed for analysing complex SVs. RESULTS: Here, we describe a new computational approach that we specifically designed for genotyping complex SVs in short-read sequenced genomes. Given a variant description, our approach computes genotype-specific probability distributions for observing aligned read pairs with a wide range of properties. Subsequently, these distributions can be used to efficiently determine the most likely genotype for any set of aligned read pairs observed in a sequenced genome. In addition, we use these distributions to compute a genotyping difficulty for a given variant, which predicts the amount of data needed to achieve a reliable call. Careful evaluation confirms that our approach outperforms other genotypers by making reliable genotype predictions across both simulated and real data. On up to 7829 human genomes, we achieve high concordance with population-genetic assumptions and expected inheritance patterns. On simulated data, we show that precision correlates well with our prediction of genotyping difficulty. This together with low memory and time requirements makes our approach well-suited for application in biomedical studies involving small to very large numbers of short-read sequenced genomes. AVAILABILITY AND IMPLEMENTATION: Source code is available at https://github.com/kehrlab/Complex-SV-Genotyping.


Subject(s)
Genome, Human , Genomic Structural Variation , Sequence Analysis, DNA , Software , Humans , Sequence Analysis, DNA/methods , Genotype , Genotyping Techniques/methods , Algorithms , High-Throughput Nucleotide Sequencing/methods , Genomics/methods
14.
Plant Physiol ; 196(1): 244-260, 2024 Sep 02.
Article in English | MEDLINE | ID: mdl-38743690

ABSTRACT

Veraison marks the transition from berry growth to berry ripening and is a crucial phenological stage in grapevine (Vitis vinifera): the berries become soft and begin to accumulate sugars, aromatic substances, and, in red cultivars, anthocyanins for pigmentation, while the organic acid levels begin to decrease. These changes determine the potential quality of wine. However, rising global temperatures lead to earlier flowering and ripening, which strongly influence wine quality. Here, we combined genotyping-by-sequencing with a bioinformatics pipeline on ∼150 F1 genotypes derived from a cross between the early ripening variety "Calardis Musqué" and the late-ripening variety "Villard Blanc". Starting from 20,410 haplotype-based markers, we generated a high-density genetic map and performed a quantitative trait locus analysis based on phenotypic datasets evaluated over 20 yrs. Through locus-specific marker enrichment and recombinant screening of ∼1,000 additional genotypes, we refined the originally postulated 5-mb veraison locus, Ver1, on chromosome 16 to only 112 kb, allowing us to pinpoint the ethylene response factor VviERF027 (VCost.v3 gene ID: Vitvi16g00942, CRIBIv1 gene ID: VIT_16s0100g00400) as veraison candidate gene. Furthermore, the early veraison allele could be traced back to a clonal "Pinot" variant first mentioned in the seventeenth century. "Pinot Precoce Noir" passed this allele over "Madeleine Royale" to the maternal grandparent "Bacchus Weiss" and, ultimately, to the maternal parent "Calardis Musqué". Our findings are crucial for ripening time control, thereby improving wine quality, and for breeding grapevines adjusted to climate change scenarios that have a major impact on agro-ecosystems in altering crop plant phenology.


Subject(s)
Chromosome Mapping , Fruit , Genotype , Quantitative Trait Loci , Vitis , Vitis/genetics , Vitis/growth & development , Chromosome Mapping/methods , Fruit/genetics , Fruit/growth & development , Quantitative Trait Loci/genetics , Genes, Plant/genetics , Phenotype , Haplotypes/genetics , Genotyping Techniques/methods , Plant Proteins/genetics , Plant Proteins/metabolism , Chromosomes, Plant/genetics , Genetic Markers
15.
PLoS Biol ; 20(1): e3001507, 2022 01.
Article in English | MEDLINE | ID: mdl-35041655

ABSTRACT

Genome editing can introduce designed mutations into a target genomic site. Recent research has revealed that it can also induce various unintended events such as structural variations, small indels, and substitutions at, and in some cases, away from the target site. These rearrangements may result in confounding phenotypes in biomedical research samples and cause a concern in clinical or agricultural applications. However, current genotyping methods do not allow a comprehensive analysis of diverse mutations for phasing and mosaic variant detection. Here, we developed a genotyping method with an on-target site analysis software named Determine Allele mutations and Judge Intended genotype by Nanopore sequencer (DAJIN) that can automatically identify and classify both intended and unintended diverse mutations, including point mutations, deletions, inversions, and cis double knock-in at single-nucleotide resolution. Our approach with DAJIN can handle approximately 100 samples under different editing conditions in a single run. With its high versatility, scalability, and convenience, DAJIN-assisted multiplex genotyping may become a new standard for validating genome editing outcomes.


Subject(s)
Gene Editing , Genotyping Techniques/methods , Software , Animals , Gene Knock-In Techniques , Genome , Genotype , INDEL Mutation , Machine Learning , Mice, Inbred C57BL , Mice, Inbred ICR , Mutation , Nanopore Sequencing , Sequence Analysis, DNA
16.
PLoS Comput Biol ; 20(9): e1012483, 2024 Sep.
Article in English | MEDLINE | ID: mdl-39316624

ABSTRACT

Triploidy is very useful in both aquaculture and some cultivated plants as the induced sterility helps to enhance growth and product quality, as well as acting as a barrier against the contamination of wild populations by escapees. To use genetic information from triploids for academic or breeding purposes, an efficient and robust method to genotype triploids is needed. We developed such a method for genotype calling from SNP arrays, and we implemented it in the R package named GenoTriplo. Our method requires no prior information on cluster positions and remains unaffected by shifted luminescence signals. The method relies on starting the clustering algorithm with an initial higher number of groups than expected from the ploidy level of the samples, followed by merging groups that are too close to each other to be considered as distinct genotypes. Accurate classification of SNPs is achieved through multiple thresholds of quality controls. We compared the performance of GenoTriplo with that of fitPoly, the only published method for triploid SNP genotyping with a free software access. This was assessed by comparing the genotypes generated by both methods for a dataset of 1232 triploid rainbow trout genotyped for 38,033 SNPs. The two methods were consistent for 89% of the genotypes, but for 26% of the SNPs, they exhibited a discrepancy in the number of different genotypes identified. For these SNPs, GenoTriplo had >95% concordance with fitPoly when fitPoly genotyped better. On the contrary, when GenoTriplo genotyped better, fitPoly had less than 50% concordance with GenoTriplo. GenoTriplo was more robust with less genotyping errors. It is also efficient at identifying low-frequency genotypes in the sample set. Finally, we assessed parentage assignment based on GenoTriplo genotyping and observed significant differences in mismatch rates between the best and second-best couples, indicating high confidence in the results. GenoTriplo could also be used to genotype diploids as well as individuals with higher ploidy level by adjusting a few input parameters.


Subject(s)
Algorithms , Genotype , Polymorphism, Single Nucleotide , Software , Triploidy , Polymorphism, Single Nucleotide/genetics , Animals , Oncorhynchus mykiss/genetics , Genotyping Techniques/methods , Computational Biology/methods
17.
PLoS Genet ; 18(1): e1009604, 2022 01.
Article in English | MEDLINE | ID: mdl-35007277

ABSTRACT

Short tandem repeats (STRs) are highly informative genetic markers that have been used extensively in population genetics analysis. They are an important source of genetic diversity and can also have functional impact. Despite the availability of bioinformatic methods that permit large-scale genome-wide genotyping of STRs from whole genome sequencing data, they have not previously been applied to sequencing data from large collections of malaria parasite field samples. Here, we have genotyped STRs using HipSTR in more than 3,000 Plasmodium falciparum and 174 Plasmodium vivax published whole-genome sequence data from samples collected across the globe. High levels of noise and variability in the resultant callset necessitated the development of a novel method for quality control of STR genotype calls. A set of high-quality STR loci (6,768 from P. falciparum and 3,496 from P. vivax) were used to study Plasmodium genetic diversity, population structures and genomic signatures of selection and these were compared to genome-wide single nucleotide polymorphism (SNP) genotyping data. In addition, the genome-wide information about genetic variation and other characteristics of STRs in P. falciparum and P. vivax have been available in an interactive web-based R Shiny application PlasmoSTR (https://github.com/bahlolab/PlasmoSTR).


Subject(s)
Genotyping Techniques/methods , Malaria/parasitology , Microsatellite Repeats , Plasmodium falciparum/genetics , Plasmodium vivax/genetics , Databases, Genetic , Genetics, Population , Humans , Logistic Models , Polymorphism, Single Nucleotide , Species Specificity , Whole Genome Sequencing
18.
Genes Dev ; 31(15): 1601-1614, 2017 08 01.
Article in English | MEDLINE | ID: mdl-28882854

ABSTRACT

In eukaryotes, transcriptionally inactive loci are enriched within highly condensed heterochromatin. In plants, as in mammals, the DNA of heterochromatin is densely methylated and wrapped by histones displaying a characteristic subset of post-translational modifications. Growing evidence indicates that these chromatin modifications are not sufficient for silencing. Instead, they are prerequisites for further assembly of higher-order chromatin structures that are refractory to transcription but not fully understood. We show that silencing of transposons in the pericentromeric heterochromatin of Arabidopsis thaliana requires SMC4, a core subunit of condensins I and II, acting in conjunction with CG methylation by MET1 (DNA METHYLTRANSFERASE 1), CHG methylation by CMT3 (CHROMOMETHYLASE 3), the chromatin remodeler DDM1 (DECREASE IN DNA METHYLATION 1), and histone modifications, including histone H3 Lys 27 monomethylation (H3K27me1), imparted by ATXR5 and ATXR6. SMC4/condensin also acts within the mostly euchromatic chromosome arms to suppress conditionally expressed genes involved in flowering or DNA repair, including the DNA glycosylase ROS1, which facilitates DNA demethylation. Collectively, our genome-wide analyses implicate condensin in the suppression of hundreds of loci, acting in both DNA methylation-dependent and methylation-independent pathways.


Subject(s)
Adenosine Triphosphatases/genetics , Arabidopsis Proteins/genetics , Arabidopsis/genetics , Centrosome/metabolism , DNA Transposable Elements/genetics , DNA-Binding Proteins/genetics , Gene Expression Regulation, Plant , Multiprotein Complexes/genetics , Chromatin/metabolism , DNA Methylation/genetics , DNA Repair/genetics , Gene Silencing/physiology , Genome-Wide Association Study , Genotyping Techniques , Heterochromatin/metabolism , Histones/metabolism , Methyltransferases/genetics , Mutation/genetics , Sequence Analysis, DNA , Sequence Analysis, RNA
19.
BMC Genomics ; 25(1): 818, 2024 Aug 29.
Article in English | MEDLINE | ID: mdl-39210290

ABSTRACT

BACKGROUND: Cannabis sativa is seeing a global resurgence as a food, fiber and medicinal crop for industrial hemp and medicinal Cannabis industries respectively. However, a widespread moratorium on the use and research of C. sativa throughout most of the 20th century has seen the development of improved cultivars for specific end uses lag behind that of conventional crops. While C. sativa research and development has seen significant investments in the recent past, resulting in a suite of publicly available genomic resources and tools, a versatile and cost-effective mid-density genotyping platform for applied purposes in breeding and pre-breeding is lacking. Here we report on a first mid-density fixed-target SNP platform for C. sativa. RESULTS: The High-throughput Amplicon-based SNP-platform for medicinal Cannabis and industrial Hemp (HASCH) was designed using a combination of filtering and Integer Linear Programming on publicly available whole-genome sequencing and RNA sequencing data, supplemented with in-house generated genotyping-by-sequencing (GBS) data. HASCH contains 1,504 genome-wide targets of high call rate (97% mean) and even distribution across the genome, designed to be highly informative (> 0.3 minor allele frequency) across both medicinal cannabis and industrial hemp gene pools. Average numbers of mismatch SNP between any two accessions were 251 for medicinal cannabis (N = 116) and 272 for industrial hemp (N = 87). Comparing HASCH data with corresponding GBS data on a collection of diverse C. sativa accessions demonstrated high concordance and resulted in comparable phylogenies and genetic distance matrices. Using HASCH on a segregating F2 population derived from a cross between a tetrahydrocannabinol (THC)-dominant and a cannabidiol (CBD)-dominant accession resulted in a genetic map consisting of 310 markers, comprising 10 linkage groups and a total size of 582.7 cM. Quantitative Trait Locus (QTL) mapping identified a major QTL for CBD content on chromosome 7, consistent with previous findings. CONCLUSION: HASCH constitutes a versatile, easy to use and cost-effective genotyping solution for the rapidly growing Cannabis research community. It provides consistent genetic fingerprints of 1504 SNPs with wide applicability genetic resource management, quantitative genetics and breeding.


Subject(s)
Cannabis , Genotyping Techniques , Medical Marijuana , Polymorphism, Single Nucleotide , Cannabis/genetics , Genotyping Techniques/methods , High-Throughput Nucleotide Sequencing/methods , Genome, Plant , Genotype
20.
BMC Genomics ; 25(1): 525, 2024 May 28.
Article in English | MEDLINE | ID: mdl-38807041

ABSTRACT

BACKGROUND: The Rh blood group system is characterized by its complexity and polymorphism, encompassing 56 different antigens. Accurately predicting the presence of the C antigen using genotyping methods has been challenging. The objective of this study was to evaluate the accuracy of various genotyping methods for predicting the Rh C and to identify a suitable method for the Chinese Han population. METHODS: In total, 317 donors, consisting 223 D+ (including 20 with the Del phenotype) and 94 D- were randomly selected. For RHC genotyping, 48C and 109bp insertion were detected on the Real-time PCR platform and -292 substitution was analyzed via restriction fragment length polymorphism (RFLP). Moreover, the promoter region of the RHCE gene was sequenced to search for other nucleotide substitutions between RHC and RHc. Agreement between prediction methods was evaluated using the Kappa statistic, and comparisons between methods were conducted via the χ2 test. RESULTS: The analysis revealed that the 48C allele, 109bp insertion, a specific pattern observed in RFLP results, and wild-type alleles of seven single nucleotide polymorphisms (SNPs) were in strong agreement with the Rh C, with Kappa coefficients exceeding 0.8. However, there were instances of false positives or false negatives (0.6% false negative rate for 109bp insertion and 5.4-8.2% false positive rates for other methods). The 109bp insertion method exhibited the highest accuracy in predicting the Rh C, at 99.4%, compared to other methods (P values≤0.001). Although no statistical differences were found among other methods for predicting Rh C (P values>0.05), the accuracies in descending order were 48C (94.6%) > rs586178 (92.7%) > rs4649082, rs2375313, rs2281179, rs2072933, rs2072932, and RFLP (92.4%) > rs2072931 (91.8%). CONCLUSIONS: None of the methods examined can independently and accurately predict the Rh C. However, the 109bp insertion test demonstrated the highest accuracy for predicting the Rh C in the Chinese Han population. Utilizing the 109bp insertion test in combination with other methods may enhance the accuracy of Rh C prediction.


Subject(s)
Asian People , Genotyping Techniques , Polymorphism, Single Nucleotide , Rh-Hr Blood-Group System , Humans , Alleles , Asian People/genetics , China , East Asian People , Gene Frequency , Genotype , Genotyping Techniques/methods , Polymorphism, Restriction Fragment Length , Promoter Regions, Genetic , Rh-Hr Blood-Group System/genetics
SELECTION OF CITATIONS
SEARCH DETAIL