|

1.

Developing an SNP dataset for efficiently evaluating soybean germplasm resources using the genome sequencing data of 3,661 soybean accessions.

Niu, Yongchao; Yung, Wai-Shing; Sze, Ching-Ching; Wong, Fuk-Ling; Li, Man-Wah; Chung, Gyuhwa; Lam, Hon-Ming.

BMC Genomics ; 25(1): 475, 2024 May 14.

Article En | MEDLINE | ID: mdl-38745120

BACKGROUND: Single nucleotide polymorphism (SNP) markers play significant roles in accelerating breeding and basic crop research. Several soybean SNP panels have been developed. However, there is still a lack of SNP panels for differentiating between wild and cultivated populations, as well as for detecting polymorphisms within both wild and cultivated populations. RESULTS: This study utilized publicly available resequencing data from over 3,000 soybean accessions to identify differentiating and highly conserved SNP and insertion/deletion (InDel) markers between wild and cultivated soybean populations. Additionally, a naturally occurring mutant gene library was constructed by analyzing large-effect SNPs and InDels in the population. CONCLUSION: The markers obtained in this study are associated with numerous genes governing agronomic traits, thus facilitating the evaluation of soybean germplasms and the efficient differentiation between wild and cultivated soybeans. The natural mutant gene library permits the quick identification of individuals with natural mutations in functional genes, providing convenience for accelerating soybean breeding using reverse genetics.

Glycine max , INDEL Mutation , Polymorphism, Single Nucleotide , Glycine max/genetics , Genome, Plant , Gene Library , Plant Breeding

2.

Fast and accurate variant identification tool for sequencing-based studies.

Gaston, Jeffry M; Alm, Eric J; Zhang, An-Ni.

BMC Biol ; 22(1): 90, 2024 Apr 22.

Article En | MEDLINE | ID: mdl-38644496

BACKGROUND: Accurate identification of genetic variants, such as point mutations and insertions/deletions (indels), is crucial for various genetic studies into epidemic tracking, population genetics, and disease diagnosis. Genetic studies into microbiomes often require processing numerous sequencing datasets, necessitating variant identifiers with high speed, accuracy, and robustness. RESULTS: We present QuickVariants, a bioinformatics tool that effectively summarizes variant information from read alignments and identifies variants. When tested on diverse bacterial sequencing data, QuickVariants demonstrates a ninefold higher median speed than bcftools, a widely used variant identifier, with higher accuracy in identifying both point mutations and indels. This accuracy extends to variant identification in virus samples, including SARS-CoV-2, particularly with significantly fewer false negative indels than bcftools. The high accuracy of QuickVariants is further demonstrated by its detection of a greater number of Omicron-specific indels (5 versus 0) and point mutations (61 versus 48-54) than bcftools in sewage metagenomes predominated by Omicron variants. Much of the reduced accuracy of bcftools was attributable to its misinterpretation of indels, often producing false negative indels and false positive point mutations at the same locations. CONCLUSIONS: We introduce QuickVariants, a fast, accurate, and robust bioinformatics tool designed for identifying genetic variants for microbial studies. QuickVariants is available at https://github.com/caozhichongchong/QuickVariants .

INDEL Mutation , SARS-CoV-2 , SARS-CoV-2/genetics , Computational Biology/methods , Humans , Software , COVID-19/virology , High-Throughput Nucleotide Sequencing/methods , Point Mutation , Genetic Variation , Sequence Analysis, DNA/methods

3.

Joint application of A-InDels and miniSTRs for forensic personal, full and half sibling identifications, and genetic differentiation analyses in two populations from China.

Cai, Meiming; Lei, Fanzhang; Liu, Yanfang; Wang, Xi; Wang, Hongdan; Xie, Weibing; Yang, Zi; Yang, Shangwu; Zhu, Bofeng.

BMC Genomics ; 25(1): 329, 2024 Apr 02.

Article En | MEDLINE | ID: mdl-38566035

BACKGROUND: Previously, a novel multiplex system of 64 loci was constructed based on capillary electrophoresis platform, including 59 autosomal insertion/deletions (A-InDels), two Y-chromosome InDels, two mini short tandem repeats (miniSTRs), and an Amelogenin gene. The aim of this study is to evaluate the efficiencies of this multiplex system for individual identification, paternity testing and biogeographic ancestry inference in Chinese Hezhou Han (CHH) and Hubei Tujia (CTH) groups, providing valuable insights for forensic anthropology and population genetics research. RESULTS: The cumulative values of power of discrimination (CDP) and probability of exclusion (CPE) for the 59 A-InDels and two miniSTRs were 0.99999999999999999999999999754, 0.99999905; and 0.99999999999999999999999999998, 0.99999898 in CTH and CHH groups, respectively. When the likelihood ratio thresholds were set to 1 or 10, more than 95% of the full sibling pairs could be identified from unrelated individual pairs, and the false positive rates were less than 1.2% in both CTH and CHH groups. Biogeographic ancestry inference models based on 35 populations were constructed with three algorithms: random forest, adaptive boosting and extreme gradient boosting, and then 10-fold cross-validation analyses were applied to test these three models with the average accuracies of 86.59%, 84.22% and 87.80%, respectively. In addition, we also investigated the genetic relationships between the two studied groups with 33 reference populations using population statistical methods of FST, DA, phylogenetic tree, PCA, STRUCTURE and TreeMix analyses. The present results showed that compared to other continental populations, the CTH and CHH groups had closer genetic affinities to East Asian populations. CONCLUSIONS: This novel multiplex system has high CDP and CPE in CTH and CHH groups, which can be used as a powerful tool for individual identification and paternity testing. According to various genetic analysis methods, the genetic structures of CTH and CHH groups are relatively similar to the reference East Asian populations.

Genetics, Population , Siblings , Humans , Phylogeny , China , INDEL Mutation , Microsatellite Repeats , Forensic Genetics/methods , Gene Frequency

4.

Genetic and phenotypic analysis of 225 Chinese children with developmental delay and/or intellectual disability using whole-exome sequencing.

Ma, Heqian; Zhu, Lina; Yang, Xiao; Ao, Meng; Zhang, Shunxiang; Guo, Meizhen; Dai, Xuelin; Ma, Xiuwei; Zhang, Xiaoying.

BMC Genomics ; 25(1): 391, 2024 Apr 22.

Article En | MEDLINE | ID: mdl-38649797

Developmental delay (DD), or intellectual disability (ID) is a very large group of early onset disorders that affects 1-2% of children worldwide, which have diverse genetic causes that should be identified. Genetic studies can elucidate the pathogenesis underlying DD/ID. In this study, whole-exome sequencing (WES) was performed on 225 Chinese DD/ID children (208 cases were sequenced as proband-parent trio) who were classified into seven phenotype subgroups. The phenotype and genomic data of patients with DD/ID were further retrospectively analyzed. There were 96/225 (42.67%; 95% confidence interval [CI] 36.15-49.18%) patients were found to have causative single nucleotide variants (SNVs) and small insertions/deletions (Indels) associated with DD/ID based on WES data. The diagnostic yields among the seven subgroups ranged from 31.25 to 71.43%. Three specific clinical features, hearing loss, visual loss, and facial dysmorphism, can significantly increase the diagnostic yield of WES in patients with DD/ID (P = 0.005, P = 0.005, and P = 0.039, respectively). Of note, hearing loss (odds ratio [OR] = 1.86%; 95% CI = 1.00-3.46, P = 0.046) or abnormal brainstem auditory evoked potential (BAEP) (OR = 1.91, 95% CI = 1.02-3.50, P = 0.042) was independently associated with causative genetic variants in DD/ID children. Our findings enrich the variation spectrums of SNVs/Indels associated with DD/ID, highlight the value genetic testing for DD/ID children, stress the importance of BAEP screen in DD/ID children, and help to facilitate early diagnose, clinical management and reproductive decisions, improve therapeutic response to medical treatment.

Developmental Disabilities , Exome Sequencing , Intellectual Disability , Child , Child, Preschool , Female , Humans , Infant , Male , Developmental Disabilities/genetics , Developmental Disabilities/diagnosis , East Asian People/genetics , INDEL Mutation , Intellectual Disability/genetics , Phenotype , Polymorphism, Single Nucleotide

5.

Genetic variant of the sheep E2F8 gene and its associations with litter size.

Zhu, Leijing; Akhmet, Nazar; Bo, Didi; Pan, Chuanying; Wu, Jiyao; Lan, Xianyong.

Anim Biotechnol ; 35(1): 2337751, 2024 Nov.

Article En | MEDLINE | ID: mdl-38597900

The economic efficiency of sheep breeding, aiming to enhance productivity, is a focal point for improvement of sheep breeding. Recent studies highlight the involvement of the Early Region 2 Binding Factor transcription factor 8 (E2F8) gene in female reproduction. Our group's recent genome-wide association study (GWAS) emphasizes the potential impact of the E2F8 gene on prolificacy traits in Australian White sheep (AUW). Herein, the purpose of this study was to assess the correlation of the E2F8 gene with litter size in AUW sheep breed. This work encompassed 659 AUW sheep, subject to genotyping through PCR-based genotyping technology. Furthermore, the results of PCR-based genotyping showed significant associations between the P1-del-32bp bp InDel and the fourth and fifth parities litter size in AUW sheep; the litter size of those with genotype ID were superior compared to those with DD and II genotypes. Thus, these results indicate that the P1-del-32bp InDel within the E2F8 gene can be useful in marker-assisted selection (MAS) in sheep.

Genome-Wide Association Study , INDEL Mutation , Female , Animals , Sheep/genetics , Pregnancy , Australia , Litter Size/genetics , Genotype , INDEL Mutation/genetics

6.

Towards accurate indel calling for oncopanel sequencing through an international pipeline competition at precisionFDA.

Gong, Binsheng; Lababidi, Samir; Kusko, Rebecca; Bouri, Khaled; Prezek, Sarah; Thovarai, Vishal; Prasanna, Anish; Maier, Ezekiel J; Golkaram, Mahdi; Sun, Xingqiang; Kyriakidis, Konstantinos; Kitajima, João Paulo; Ebrahim Sahraeian, Sayed Mohammad; Guo, Yunfei; Johanson, Elaine; Jones, Wendell; Tong, Weida; Xu, Joshua.

Sci Rep ; 14(1): 8165, 2024 04 08.

Article En | MEDLINE | ID: mdl-38589653

Accurately calling indels with next-generation sequencing (NGS) data is critical for clinical application. The precisionFDA team collaborated with the U.S. Food and Drug Administration's (FDA's) National Center for Toxicological Research (NCTR) and successfully completed the NCTR Indel Calling from Oncopanel Sequencing Data Challenge, to evaluate the performance of indel calling pipelines. Top performers were selected based on precision, recall, and F1-score. The performance of many other pipelines was close to the top performers, which produced a top cluster of performers. The performance was significantly higher in high confidence regions and coding regions, and significantly lower in low complexity regions. Oncopanel capture and other issues may have occurred that affected the recall rate. Indels with higher variant allele frequency (VAF) may generally be called with higher confidence. Many of the indel calling pipelines had good performance. Some of them performed generally well across all three oncopanels, while others were better for a specific oncopanel. The performance of indel calling can further be improved by restricting the calls within high confidence intervals (HCIs) and coding regions, and by excluding low complexity regions (LCR) regions. Certain VAF cut-offs could be applied according to the applications.

High-Throughput Nucleotide Sequencing , INDEL Mutation , Polymorphism, Single Nucleotide

7.

Measuring, visualizing, and diagnosing reference bias with biastools.

Lin, Mao-Jan; Iyer, Sheila; Chen, Nae-Chyun; Langmead, Ben.

Genome Biol ; 25(1): 101, 2024 Apr 19.

Article En | MEDLINE | ID: mdl-38641647

Many bioinformatics methods seek to reduce reference bias, but no methods exist to comprehensively measure it. Biastools analyzes and categorizes instances of reference bias. It works in various scenarios: when the donor's variants are known and reads are simulated; when donor variants are known and reads are real; and when variants are unknown and reads are real. Using biastools, we observe that more inclusive graph genomes result in fewer biased sites. We find that end-to-end alignment reduces bias at indels relative to local aligners. Finally, we use biastools to characterize how T2T references improve large-scale bias.

Genome , Genomics , Genomics/methods , Computational Biology , INDEL Mutation , Bias , Sequence Analysis, DNA/methods , Software , High-Throughput Nucleotide Sequencing/methods

8.

Artificial selection footprints in indigenous and commercial chicken genomes.

Wu, Siwen; Dou, Tengfei; Wang, Kun; Yuan, Sisi; Yan, Shixiong; Xu, Zhiqiang; Liu, Yong; Jian, Zonghui; Zhao, Jingying; Zhao, Rouhan; Wu, Hao; Gu, Dahai; Liu, Lixian; Li, Qihua; Wu, Dong-Dong; Ge, Changrong; Su, Zhengchang; Jia, Junjing.

BMC Genomics ; 25(1): 428, 2024 Apr 30.

Article En | MEDLINE | ID: mdl-38689225

BACKGROUND: Although many studies have been done to reveal artificial selection signatures in commercial and indigenous chickens, a limited number of genes have been linked to specific traits. To identify more trait-related artificial selection signatures and genes, we re-sequenced a total of 85 individuals of five indigenous chicken breeds with distinct traits from Yunnan Province, China. RESULTS: We found 30 million non-redundant single nucleotide variants and small indels (< 50 bp) in the indigenous chickens, of which 10 million were not seen in 60 broilers, 56 layers and 35 red jungle fowls (RJFs) that we compared with. The variants in each breed are enriched in non-coding regions, while those in coding regions are largely tolerant, suggesting that most variants might affect cis-regulatory sequences. Based on 27 million bi-allelic single nucleotide polymorphisms identified in the chickens, we found numerous selective sweeps and affected genes in each indigenous chicken breed and substantially larger numbers of selective sweeps and affected genes in the broilers and layers than previously reported using a rigorous statistical model. Consistent with the locations of the variants, the vast majority (~ 98.3%) of the identified selective sweeps overlap known quantitative trait loci (QTLs). Meanwhile, 74.2% known QTLs overlap our identified selective sweeps. We confirmed most of previously identified trait-related genes and identified many novel ones, some of which might be related to body size and high egg production traits. Using RT-qPCR, we validated differential expression of eight genes (GHR, GHRHR, IGF2BP1, OVALX, ELF2, MGARP, NOCT, SLC25A15) that might be related to body size and high egg production traits in relevant tissues of relevant breeds. CONCLUSION: We identify 30 million single nucleotide variants and small indels in the five indigenous chicken breeds, 10 million of which are novel. We predict substantially more selective sweeps and affected genes than previously reported in both indigenous and commercial breeds. These variants and affected genes are good candidates for further experimental investigations of genotype-phenotype relationships and practical applications in chicken breeding programs.

Chickens , Polymorphism, Single Nucleotide , Quantitative Trait Loci , Selection, Genetic , Animals , Chickens/genetics , Genome , INDEL Mutation , Breeding , Phenotype , Genomics/methods

9.

A 224-bp Indel in the Promoter of PeMYB114 Accounts for Anthocyanin Accumulation of Skin in Passion Fruit (Passiflora spp.).

Fang, Ting; Wang, Mengzhen; He, Ruijie; Chen, Qiaowen; He, Dayi; Chen, Xuerong; Li, Yongkang; Ren, Rui; Yu, Weijun; Zeng, Lihui.

J Agric Food Chem ; 72(17): 10138-10148, 2024 May 01.

Article En | MEDLINE | ID: mdl-38637271

Passion fruit (Passiflora spp.) is an important fruit tree in the family Passifloraceae. The color of the fruit skin, a significant agricultural trait, is determined by the content of anthocyanin in passion fruit. However, the regulatory mechanisms behind the accumulation of anthocyanin in different passion fruit skin colors remain unclear. In the study, we identified and characterized a R2R3-MYB transcription factor, PeMYB114, which functions as a transcriptional activator in anthocyanin biosynthesis. Yeast one-hybrid system and dual-luciferase analysis showed that PeMYB114 could directly activate the expression of anthocyanin structural genes (PeCHS and PeDFR). Furthermore, a natural variation in the promoter region of PeMYB114 alters its expression. PeMYB114purple accessions with the 224-bp insertion have a higher anthocyanin level than PeMYB114yellow accessions with the 224-bp deletion. The findings enhance our understanding of anthocyanin accumulation in fruits and provide genetic resources for genome design for improving passion fruit quality.

Anthocyanins , Fruit , Gene Expression Regulation, Plant , Passiflora , Plant Proteins , Promoter Regions, Genetic , Transcription Factors , Anthocyanins/metabolism , Anthocyanins/genetics , Passiflora/genetics , Passiflora/metabolism , Passiflora/chemistry , Fruit/metabolism , Fruit/genetics , Fruit/chemistry , Plant Proteins/genetics , Plant Proteins/metabolism , Transcription Factors/genetics , Transcription Factors/metabolism , INDEL Mutation

10.

PySNV for complex intra-host variation detection.

Li, Liandong; Fu, Haoyi; Ma, Wentai; Li, Mingkun.

Bioinformatics ; 40(3)2024 Mar 04.

Article En | MEDLINE | ID: mdl-38426352

MOTIVATION: Intra-host variants refer to genetic variations or mutations that occur within an individual host organism. These variants are typically studied in the context of viruses, bacteria, or other pathogens to understand the evolution of pathogens. Moreover, intra-host variants are also explored in the field of tumor biology and mitochondrial biology to characterize somatic mutations and inherited heteroplasmic mutations. Intra-host variants can involve long insertions, deletions, and combinations of different mutation types, which poses challenges in their identification. The performance of current methods in detecting of complex intra-host variants is unknown. RESULTS: First, we simulated a dataset comprising 10 samples with 1869 intra-host variants involving various mutation patterns and benchmarked current variant detection software. The results indicated that though current software can detect most variants with F1-scores between 0.76 and 0.97, their performance in detecting long indels and low frequency variants was limited. Thus, we developed a new software, PySNV, for the detection of complex intra-host variations. On the simulated dataset, PySNV successfully detected 1863 variant cases (F1-score: 0.99) and exhibited the highest Pearson correlation coefficient (PCC: 0.99) to the ground truth in predicting variant frequencies. The results demonstrated that PySNV delivered promising performance even for long indels and low frequency variants, while maintaining computational speed comparable to other methods. Finally, we tested its performance on SARS-CoV-2 replicate sequencing data and found that it reported 21% more variants compared to LoFreq, the best-performing benchmarked software, while showing higher consistency (62% over 54%) within replicates. The discrepancies mostly exist in low-depth regions and low frequency variants. AVAILABILITY AND IMPLEMENTATION: https://github.com/bnuLyndon/PySNV/.

High-Throughput Nucleotide Sequencing , Software , High-Throughput Nucleotide Sequencing/methods , Mutation , INDEL Mutation , Genetic Variation

11.

Extend the benchmarking indel set by manual review using the individual cell line sequencing data from the Sequencing Quality Control 2 (SEQC2) project.

Gong, Binsheng; Li, Dan; Zhang, Yifan; Kusko, Rebecca; Lababidi, Samir; Cao, Zehui; Chen, Mingyang; Chen, Ning; Chen, Qiaochu; Chen, Qingwang; Dai, Jiacheng; Gan, Qiang; Gao, Yuechen; Guo, Mingkun; Hariani, Gunjan; He, Yujie; Hou, Wanwan; Jiang, He; Kushwaha, Garima; Li, Jian-Liang; Li, Jianying; Li, Yulan; Liu, Liang-Chun; Liu, Ruimei; Liu, Shiming; Meriaux, Edwin; Mo, Mengqing; Moore, Mathew; Moss, Tyler J; Niu, Quanne; Patel, Ananddeep; Ren, Luyao; Saremi, Nedda F; Shang, Erfei; Shang, Jun; Song, Ping; Sun, Siqi; Urban, Brent J; Wang, Danke; Wang, Shangzi; Wen, Zhining; Xiong, Xiangyi; Yang, Jingcheng; Yin, Lihui; Zhang, Chao; Zhang, Ruolan; Bhandari, Ambica; Cai, Wanshi; Eterovic, Agda Karina; Megherbi, Dalila B.

Sci Rep ; 14(1): 7028, 2024 03 25.

Article En | MEDLINE | ID: mdl-38528062

Accurate indel calling plays an important role in precision medicine. A benchmarking indel set is essential for thoroughly evaluating the indel calling performance of bioinformatics pipelines. A reference sample with a set of known-positive variants was developed in the FDA-led Sequencing Quality Control Phase 2 (SEQC2) project, but the known indels in the known-positive set were limited. This project sought to provide an enriched set of known indels that would be more translationally relevant by focusing on additional cancer related regions. A thorough manual review process completed by 42 reviewers, two advisors, and a judging panel of three researchers significantly enriched the known indel set by an additional 516 indels. The extended benchmarking indel set has a large range of variant allele frequencies (VAFs), with 87% of them having a VAF below 20% in reference Sample A. The reference Sample A and the indel set can be used for comprehensive benchmarking of indel calling across a wider range of VAF values in the lower range. Indel length was also variable, but the majority were under 10 base pairs (bps). Most of the indels were within coding regions, with the remainder in the gene regulatory regions. Although high confidence can be derived from the robust study design and meticulous human review, this extensive indel set has not undergone orthogonal validation. The extended benchmarking indel set, along with the indels in the previously published known-positive set, was the truth set used to benchmark indel calling pipelines in a community challenge hosted on the precisionFDA platform. This benchmarking indel set and reference samples can be utilized for a comprehensive evaluation of indel calling pipelines. Additionally, the insights and solutions obtained during the manual review process can aid in improving the performance of these pipelines.

Benchmarking , High-Throughput Nucleotide Sequencing , Humans , Computational Biology , Quality Control , INDEL Mutation , Polymorphism, Single Nucleotide

12.

Contrasting somatic mutation patterns in aging human neurons and oligodendrocytes.

Ganz, Javier; Luquette, Lovelace J; Bizzotto, Sara; Miller, Michael B; Zhou, Zinan; Bohrson, Craig L; Jin, Hu; Tran, Antuan V; Viswanadham, Vinayak V; McDonough, Gannon; Brown, Katherine; Chahine, Yasmine; Chhouk, Brian; Galor, Alon; Park, Peter J; Walsh, Christopher A.

Cell ; 187(8): 1955-1970.e23, 2024 Apr 11.

Article En | MEDLINE | ID: mdl-38503282

Characterizing somatic mutations in the brain is important for disentangling the complex mechanisms of aging, yet little is known about mutational patterns in different brain cell types. Here, we performed whole-genome sequencing (WGS) of 86 single oligodendrocytes, 20 mixed glia, and 56 single neurons from neurotypical individuals spanning 0.4-104 years of age and identified >92,000 somatic single-nucleotide variants (sSNVs) and small insertions/deletions (indels). Although both cell types accumulate somatic mutations linearly with age, oligodendrocytes accumulated sSNVs 81% faster than neurons and indels 28% slower than neurons. Correlation of mutations with single-nucleus RNA profiles and chromatin accessibility from the same brains revealed that oligodendrocyte mutations are enriched in inactive genomic regions and are distributed across the genome similarly to mutations in brain cancers. In contrast, neuronal mutations are enriched in open, transcriptionally active chromatin. These stark differences suggest an assortment of active mutagenic processes in oligodendrocytes and neurons.

Aging , Brain , Neurons , Oligodendroglia , Humans , Aging/genetics , Aging/pathology , Chromatin/genetics , Chromatin/metabolism , Mutation , Neurons/metabolism , Neurons/pathology , Oligodendroglia/metabolism , Oligodendroglia/pathology , Single-Cell Gene Expression Analysis , Whole Genome Sequencing , Brain/metabolism , Brain/pathology , Polymorphism, Single Nucleotide , INDEL Mutation , Biological Specimen Banks , Oligodendrocyte Precursor Cells/metabolism , Oligodendrocyte Precursor Cells/pathology

13.

Accurate and sensitive mutational signature analysis with MuSiCal.

Jin, Hu; Gulhan, Doga C; Geiger, Benedikt; Ben-Isvy, Daniel; Geng, David; Ljungström, Viktor; Park, Peter J.

Nat Genet ; 56(3): 541-552, 2024 Mar.

Article En | MEDLINE | ID: mdl-38361034

Mutational signature analysis is a recent computational approach for interpreting somatic mutations in the genome. Its application to cancer data has enhanced our understanding of mutational forces driving tumorigenesis and demonstrated its potential to inform prognosis and treatment decisions. However, methodological challenges remain for discovering new signatures and assigning proper weights to existing signatures, thereby hindering broader clinical applications. Here we present Mutational Signature Calculator (MuSiCal), a rigorous analytical framework with algorithms that solve major problems in the standard workflow. Our simulation studies demonstrate that MuSiCal outperforms state-of-the-art algorithms for both signature discovery and assignment. By reanalyzing more than 2,700 cancer genomes, we provide an improved catalog of signatures and their assignments, discover nine indel signatures absent in the current catalog, resolve long-standing issues with the ambiguous 'flat' signatures and give insights into signatures with unknown etiologies. We expect MuSiCal and the improved catalog to be a step towards establishing best practices for mutational signature analysis.

Music , Neoplasms , Humans , Neoplasms/genetics , Mutation , Carcinogenesis/genetics , INDEL Mutation

14.

Deep whole-genome analysis of 494 hepatocellular carcinomas.

Chen, Lei; Zhang, Chong; Xue, Ruidong; Liu, Mo; Bai, Jian; Bao, Jinxia; Wang, Yin; Jiang, Nanhai; Li, Zhixuan; Wang, Wenwen; Wang, Ruiru; Zheng, Bo; Yang, Airong; Hu, Ji; Liu, Ke; Shen, Siyun; Zhang, Yangqianwen; Bai, Mixue; Wang, Yan; Zhu, Yanjing; Yang, Shuai; Gao, Qiang; Gu, Jin; Gao, Dong; Wang, Xin Wei; Nakagawa, Hidewaki; Zhang, Ning; Wu, Lin; Rozen, Steven G; Bai, Fan; Wang, Hongyang.

Nature ; 627(8004): 586-593, 2024 Mar.

Article En | MEDLINE | ID: mdl-38355797

Over half of hepatocellular carcinoma (HCC) cases diagnosed worldwide are in China1-3. However, whole-genome analysis of hepatitis B virus (HBV)-associated HCC in Chinese individuals is limited4-8, with current analyses of HCC mainly from non-HBV-enriched populations9,10. Here we initiated the Chinese Liver Cancer Atlas (CLCA) project and performed deep whole-genome sequencing (average depth, 120×) of 494 HCC tumours. We identified 6 coding and 28 non-coding previously undescribed driver candidates. Five previously undescribed mutational signatures were found, including aristolochic-acid-associated indel and doublet base signatures, and a single-base-substitution signature that we termed SBS_H8. Pentanucleotide context analysis and experimental validation confirmed that SBS_H8 was distinct to the aristolochic-acid-associated SBS22. Notably, HBV integrations could take the form of extrachromosomal circular DNA, resulting in elevated copy numbers and gene expression. Our high-depth data also enabled us to characterize subclonal clustered alterations, including chromothripsis, chromoplexy and kataegis, suggesting that these catastrophic events could also occur in late stages of hepatocarcinogenesis. Pathway analysis of all classes of alterations further linked non-coding mutations to dysregulation of liver metabolism. Finally, we performed in vitro and in vivo assays to show that fibrinogen alpha chain (FGA), determined as both a candidate coding and non-coding driver, regulates HCC progression and metastasis. Our CLCA study depicts a detailed genomic landscape and evolutionary history of HCC in Chinese individuals, providing important clinical implications.

Carcinoma, Hepatocellular , Genome, Human , High-Throughput Nucleotide Sequencing , Liver Neoplasms , Mutation , Whole Genome Sequencing , Humans , Aristolochic Acids/metabolism , Carcinogenesis , Carcinoma, Hepatocellular/genetics , Carcinoma, Hepatocellular/virology , China , Chromothripsis , Disease Progression , DNA, Circular/genetics , East Asian People/genetics , Evolution, Molecular , Genome, Human/genetics , Hepatitis B virus/genetics , INDEL Mutation/genetics , Liver/metabolism , Liver Neoplasms/genetics , Liver Neoplasms/virology , Mutation/genetics , Neoplasm Metastasis/genetics , Open Reading Frames/genetics , Reproducibility of Results

15.

Genetic network analysis indicate that individuals affected by neurodevelopmental conditions have genetic variations associated with ophthalmologic alterations: A critical review of literature.

Shinsato, Rogério N; Correa, Camila Graczyk; Herai, Roberto H.

Gene ; 908: 148246, 2024 May 25.

Article En | MEDLINE | ID: mdl-38325665

Changes in the nervous system are related to a wide range of mental disorders, which include neurodevelopmental disorders (NDD) that are characterized by early onset mental conditions, such as schizophrenia and autism spectrum disorders and correlated conditions (ASD). Previous studies have shown distinct genetic components associated with diverse schizophrenia and ASD phenotypes, with mostly focused on rescuing neural phenotypes and brain activity, but alterations related to vision are overlooked. Thus, as the vision is composed by the eyes that itself represents a part of the brain, with the retina being formed by neurons and cells originating from the glia, genetic variations affecting the brain can also affect the vision. Here, we performed a critical systematic literature review to screen for all genetic variations in individuals presenting NDD with reported alterations in vision. Using these restricting criteria, we found 20 genes with distinct types of genetic variations, inherited or de novo, that includes SNP, SNV, deletion, insertion, duplication or indel. The variations occurring within protein coding regions have different impact on protein formation, such as missense, nonsense or frameshift. Moreover, a molecular analysis of the 20 genes found revealed that 17 shared a common protein-protein or genetic interaction network. Moreover, gene expression analysis in samples from the brain and other tissues indicates that 18 of the genes found are highly expressed in the brain and retina, indicating their potential role in adult vision phenotype. Finally, we only found 3 genes from our study described in standard public databanks of ophthalmogenetics, suggesting that the other 17 genes could be novel target for vision diseases.

Autism Spectrum Disorder , Neurodevelopmental Disorders , Adult , Humans , Gene Regulatory Networks , Neurodevelopmental Disorders/genetics , Autism Spectrum Disorder/genetics , Phenotype , INDEL Mutation

16.

Whole-Genome Sequencing Reveals Rare Off-Target Mutations in MC1R-Edited Pigs Generated by Using CRISPR-Cas9 and Somatic Cell Nuclear Transfer.

Li, Zhenyang; Lan, Jin; Shi, Xuan; Lu, Tong; Hu, Xiaoli; Liu, Xiaohong; Chen, Yaosheng; He, Zuyong.

CRISPR J ; 7(1): 29-40, 2024 Feb.

Article En | MEDLINE | ID: mdl-38353621

The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas9 system has been widely used to create animal models for biomedical and agricultural use owing to its low cost and easy handling. However, the occurrence of erroneous cleavage (off-targeting) may raise certain concerns for the practical application of the CRISPR-Cas9 system. In this study, we created a melanocortin 1 receptor (MC1R)-edited pig model through somatic cell nuclear transfer (SCNT) by using porcine kidney cells modified by the CRISPR-Cas9 system. We then carried out whole-genome sequencing of two MC1R-edited pigs and two cloned wild-type siblings, together with the donor cells, to assess the genome-wide presence of single-nucleotide variants and small insertions and deletions (indels) and found only one candidate off-target indel in both MC1R-edited pigs. In summary, our study indicates that the minimal off-targeting effect induced by CRISPR-Cas9 may not be a major concern in gene-edited pigs created by SCNT.

CRISPR-Cas Systems , Receptor, Melanocortin, Type 1 , Animals , Swine/genetics , Receptor, Melanocortin, Type 1/genetics , CRISPR-Cas Systems/genetics , Gene Editing , Mutation , INDEL Mutation/genetics

17.

Systematic Comparison of Computational Tools for Sanger Sequencing-Based Genome Editing Analysis.

Aoki, Kanae; Yamasaki, Mai; Umezono, Riku; Hamamoto, Takanori; Kamachi, Yusuke.

Cells ; 13(3)2024 Jan 30.

Article En | MEDLINE | ID: mdl-38334653

Successful genome editing depends on the cleavage efficiency of programmable nucleases (PNs) such as the CRISPR-Cas system. Various methods have been developed to assess the efficiency of PNs, most of which estimate the occurrence of indels caused by PN-induced double-strand breaks. In these methods, PN genomic target sites are amplified through PCR, and the resulting PCR products are subsequently analyzed using Sanger sequencing, high-throughput sequencing, or mismatch detection assays. Among these methods, Sanger sequencing of PCR products followed by indel analysis using online web tools has gained popularity due to its user-friendly nature. This approach estimates indel frequencies by computationally analyzing sequencing trace data. However, the accuracy of these computational tools remains uncertain. In this study, we compared the performance of four web tools, TIDE, ICE, DECODR, and SeqScreener, using artificial sequencing templates with predetermined indels. Our results demonstrated that these tools were able to estimate indel frequency with acceptable accuracy when the indels were simple and contained only a few base changes. However, the estimated values became more variable among the tools when the sequencing templates contained more complex indels or knock-in sequences. Moreover, although these tools effectively estimated the net indel sizes, their capability to deconvolute indel sequences exhibited variability with certain limitations. These findings underscore the importance of judiciously selecting and using an appropriate tool with caution, depending on the type of genome editing being performed.

CRISPR-Cas Systems , Gene Editing , Gene Editing/methods , CRISPR-Cas Systems/genetics , INDEL Mutation/genetics , Genome/genetics , Genomics

18.

Development of a novel five-dye panel for human identification insertion/deletion (INDEL) polymorphisms.

Avellaneda, Lucio L; Johnson, Damani T; Gutierrez, Ryan; Thompson, Lindsey; Sage, Kelly A; Sturm, Sarah A; Houston, Rachel M; LaRue, Bobby L.

J Forensic Sci ; 69(3): 814-824, 2024 May.

Article En | MEDLINE | ID: mdl-38291825

DNA analysis of forensic case samples relies on short tandem repeats (STRs), a key component of the combined DNA index system (CODIS) used to identify individuals. However, limitations arise when dealing with challenging samples, prompting the exploration of alternative markers such as single nucleotide polymorphisms (SNPs) and insertion/deletion (INDELs) polymorphisms. Unlike SNPs, INDELs can be differentiated easily by size, making them compatible with electrophoresis methods. It is possible to design small INDEL amplicons (<200 bp) to enhance recovery from degraded samples. To this end, a set of INDEL Human Identification Markers (HID) was curated from the 1000 Genomes Project, employing criteria including a fixation index (FST) ≤ 0.06, minor allele frequency (MAF) >0.2, and high allele frequency divergence. A panel of 33 INDEL-HIDs was optimized and validated following the Scientific Working Group on DNA Analysis Methods (SWGDAM) guidelines, utilizing a five-dye multiplex electrophoresis system. A small sample set (n = 79 unrelated individuals) was genotyped to assess the assay's performance. The validation studies exhibited reproducibility, inhibition tolerance, ability to detect a two-person mixture from a 4:1 to 1:6 ratio, robustness with challenging samples, and sensitivity down to 125 pg of DNA. In summary, the 33-loci INDEL-HID panel exhibited robust recovery with low-template and degraded samples and proved effective for individualization within a small sample set.

DNA Fingerprinting , Gene Frequency , INDEL Mutation , Humans , DNA Fingerprinting/methods , Reproducibility of Results , Genetic Markers , Genotype , Fluorescent Dyes , Polymerase Chain Reaction , Polymorphism, Genetic , Electrophoresis, Capillary , Microsatellite Repeats

19.

BCFtools/liftover: an accurate and comprehensive tool to convert genetic variants across genome assemblies.

Genovese, Giulio; Rockweiler, Nicole B; Gorman, Bryan R; Bigdeli, Tim B; Pato, Michelle T; Pato, Carlos N; Ichihara, Kiku; McCarroll, Steven A.

Bioinformatics ; 40(2)2024 01 02.

Article En | MEDLINE | ID: mdl-38261650

MOTIVATION: Many genetics studies report results tied to genomic coordinates of a legacy genome assembly. However, as assemblies are updated and improved, researchers are faced with either realigning raw sequence data using the updated coordinate system or converting legacy datasets to the updated coordinate system to be able to combine results with newer datasets. Currently available tools to perform the conversion of genetic variants have numerous shortcomings, including poor support for indels and multi-allelic variants, that lead to a higher rate of variants being dropped or incorrectly converted. As a result, many researchers continue to work with and publish using legacy genomic coordinates. RESULTS: Here we present BCFtools/liftover, a tool to convert genomic coordinates across genome assemblies for variants encoded in the variant call format with improved support for indels represented by different reference alleles across genome assemblies and full support for multi-allelic variants. It further supports variant annotation fields updates whenever the reference allele changes across genome assemblies. The tool has the lowest rate of variants being dropped with an order of magnitude less indels dropped or incorrectly converted and is an order of magnitude faster than other tools typically used for the same task. It is particularly suited for converting variant callsets from large cohorts to novel telomere-to-telomere assemblies as well as summary statistics from genome-wide association studies tied to legacy genome assemblies. AVAILABILITY AND IMPLEMENTATION: The tool is written in C and freely available under the MIT open source license as a BCFtools plugin available at http://github.com/freeseek/score.

Genome-Wide Association Study , Software , Genomics/methods , Alleles , INDEL Mutation

20.

Statistical framework to determine indel-length distribution.

Wygoda, Elya; Loewenthal, Gil; Moshe, Asher; Alburquerque, Michael; Mayrose, Itay; Pupko, Tal.

Bioinformatics ; 40(2)2024 02 01.

Article En | MEDLINE | ID: mdl-38269647

MOTIVATION: Insertions and deletions (indels) of short DNA segments, along with substitutions, are the most frequent molecular evolutionary events. Indels were shown to affect numerous macro-evolutionary processes. Because indels may span multiple positions, their impact is a product of both their rate and their length distribution. An accurate inference of indel-length distribution is important for multiple evolutionary and bioinformatics applications, most notably for alignment software. Previous studies counted the number of continuous gap characters in alignments to determine the best-fitting length distribution. However, gap-counting methods are not statistically rigorous, as gap blocks are not synonymous with indels. Furthermore, such methods rely on alignments that regularly contain errors and are biased due to the assumption of alignment methods that indels lengths follow a geometric distribution. RESULTS: We aimed to determine which indel-length distribution best characterizes alignments using statistical rigorous methodologies. To this end, we reduced the alignment bias using a machine-learning algorithm and applied an Approximate Bayesian Computation methodology for model selection. Moreover, we developed a novel method to test if current indel models provide an adequate representation of the evolutionary process. We found that the best-fitting model varies among alignments, with a Zipf length distribution fitting the vast majority of them. AVAILABILITY AND IMPLEMENTATION: The data underlying this article are available in Github, at https://github.com/elyawy/SpartaSim and https://github.com/elyawy/SpartaPipeline.

Algorithms , Software , Bayes Theorem , Sequence Alignment , INDEL Mutation , Evolution, Molecular