Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 12.587
Filter
1.
Sci Data ; 11(1): 840, 2024 Aug 03.
Article in English | MEDLINE | ID: mdl-39097649

ABSTRACT

Recent advancements in sequencing and genome assembly technologies have led to rapid generation of high-quality genome assemblies for various species and breeds. Despite the importance as minipigs an animal model in biomedical research, the construction of high-quality genome assemblies of minipigs still lags behind other pig breeds. To address this problem, we constructed a high-quality chromosome-level genome assembly of the Korean minipig (KMP) utilizing multiple different types of sequencing reads and reference genomes. The KMP assembly included 19 chromosome-level sequences with a total length of 2.52 Gb and N50 of 137 Mb. Comparative analyses with the pig reference genome (Sscrofa11.1) demonstrated comparable contiguity and completeness of the KMP assembly. Additionally, genome annotation analyses identified 22,666 protein-coding genes and repetitive elements occupying 40.10% of the genome. The KMP assembly and genome annotation provide valuable resources that can contribute to various future research on minipig and other pig breeds.


Subject(s)
Genome , Swine, Miniature , Animals , Swine, Miniature/genetics , Swine/genetics , Sus scrofa/genetics , Molecular Sequence Annotation , Chromosomes
2.
Sci Data ; 11(1): 850, 2024 Aug 08.
Article in English | MEDLINE | ID: mdl-39117633

ABSTRACT

Rhabdophis nuchalis, a snake widely distributed in China, possesses a unique trait: glands beneath the skin on its neck and back, known as nucho-dorsal glands. These features make it a valuable subject for studying genetic diversity and the evolution of complex traits. In this study, we obtained a high-quality chromosome-level reference genome of R. nuchalis using MGI short-read sequencing, PacBio Revio long-read sequencing, and Hi-C sequencing techniques. The final assembly comprised 1.92 Gb of the R. nuchalis genome, anchored to 20 chromosomes (including 9 macrochromosomes and 11 microchromosomes), with a contig N50 of 104.79 Mb, a scaffold N50 of 204.96 Mb, and a BUSCO completeness of 97.50%. Additionally, we annotated a total of 1.09 Gb of repetitive sequences (which constitute 56.51% of the entire genome) and identified 22,057 protein-coding genes. This high-quality reference genome of R. nuchalis furnishes essential genomic data for comprehending the genetic diversity and evolutionary history of the species, as well as for facilitating species conservation efforts and comparative genomics studies.


Subject(s)
Chromosomes , Genome , Animals , Molecular Sequence Annotation , Snakes/genetics
3.
Sci Data ; 11(1): 866, 2024 Aug 10.
Article in English | MEDLINE | ID: mdl-39127825

ABSTRACT

Anthocidaris crassispina is a very popular edible sea urchin distributed along the coast of the South China Sea. In this study, we performed whole-genome sequencing and generated a chromosome-level assembly of this species. The total length of the genomic contig sequence was 891.02 Mb, and contig N50 was 808.15 kb when Hifiasm was used for assembly. The Hi-C library was constructed and sequenced, yielding approximately 68.61 Gb of data. After Hi-C assembly, approximately 886.72 Mb of sequence was able to be mapped onto 21 chromosomes, accounting for 99.52% of the total genome length. Among the sequences located on the chromosomes, those for which the order and direction could be determined accounted for approximately 826.82 Mb, or 93.24% of the total length. These results provide valuable resources for further study of A. crassispina at the genetic level.


Subject(s)
Molecular Sequence Annotation , Sea Urchins , Animals , Sea Urchins/genetics , Genome , Whole Genome Sequencing , Chromosomes , China
4.
Gigascience ; 132024 Jan 02.
Article in English | MEDLINE | ID: mdl-39110622

ABSTRACT

BACKGROUND: Rhododendron nivale subsp. boreale Philipson et M. N. Philipson is an alpine woody species with ornamental qualities that serve as the predominant species in mountainous scrub habitats found at an altitude of ∼4,200 m. As a high-altitude woody polyploid, this species may serve as a model to understand how plants adapt to alpine environments. Despite its ecological significance, the lack of genomic resources has hindered a comprehensive understanding of its evolutionary and adaptive characteristics in high-altitude mountainous environments. FINDINGS: We sequenced and assembled the genome of R. nivale subsp. boreale, an assembly of the first subgenus Rhododendron and the first high-altitude woody flowering tetraploid, contributing an important genomic resource for alpine woody flora. The assembly included 52 pseudochromosomes (scaffold N50 = 42.93 Mb; BUSCO = 98.8%; QV = 45.51; S-AQI = 98.69), which belonged to 4 haplotypes, harboring 127,810 predicted protein-coding genes. Conjoint k-mer analysis, collinearity assessment, and phylogenetic investigation corroborated autotetraploid identity. Comparative genomic analysis revealed that R. nivale subsp. boreale originated as a neopolyploid of R. nivale and underwent 2 rounds of ancient polyploidy events. Transcriptional expression analysis showed that differences in expression between alleles were common and randomly distributed in the genome. We identified extended gene families and signatures of positive selection that are involved not only in adaptation to the mountaintop ecosystem (response to stress and developmental regulation) but also in autotetraploid reproduction (meiotic stabilization). Additionally, the expression levels of the (group VII ethylene response factor transcription factors) ERF VIIs were significantly higher than the mean global gene expression. We suspect that these changes have enabled the success of this species at high altitudes. CONCLUSIONS: We assembled the first high-altitude autopolyploid genome and achieved chromosome-level assembly within the subgenus Rhododendron. In addition, a high-altitude adaptation strategy of R. nivale subsp. boreale was reasonably speculated. This study provides valuable data for the exploration of alpine mountaintop adaptations and the correlation between extreme environments and species polyploidization.


Subject(s)
Altitude , Genome, Plant , Haplotypes , Phylogeny , Rhododendron , Tetraploidy , Rhododendron/genetics , Adaptation, Physiological/genetics , Molecular Sequence Annotation , Polyploidy , Gene Expression Regulation, Plant
5.
BMC Genomics ; 25(1): 775, 2024 Aug 09.
Article in English | MEDLINE | ID: mdl-39118001

ABSTRACT

BACKGROUND: Appropriate regulation of genes expressed in oocytes and embryos is essential for acquisition of developmental competence in mammals. Here, we hypothesized that several genes expressed in oocytes and pre-implantation embryos remain unknown. Our goal was to reconstruct the transcriptome of oocytes (germinal vesicle and metaphase II) and pre-implantation cattle embryos (blastocysts) using short-read and long-read sequences to identify putative new genes. RESULTS: We identified 274,342 transcript sequences and 3,033 of those loci do not match a gene present in official annotations and thus are potential new genes. Notably, 63.67% (1,931/3,033) of potential novel genes exhibited coding potential. Also noteworthy, 97.92% of the putative novel genes overlapped annotation with transposable elements. Comparative analysis of transcript abundance identified that 1,840 novel genes (recently added to the annotation) or potential new genes were differentially expressed between developmental stages (FDR < 0.01). We also determined that 522 novel or potential new genes (448 and 34, respectively) were upregulated at eight-cell embryos compared to oocytes (FDR < 0.01). In eight-cell embryos, 102 novel or putative new genes were co-expressed (|r|> 0.85, P < 1 × 10-8) with several genes annotated with gene ontology biological processes related to pluripotency maintenance and embryo development. CRISPR-Cas9 genome editing confirmed that the disruption of one of the novel genes highly expressed in eight-cell embryos reduced blastocyst development (ENSBTAG00000068261, P = 1.55 × 10-7). CONCLUSIONS: Our results revealed several putative new genes that need careful annotation. Many of the putative new genes have dynamic regulation during pre-implantation development and are important components of gene regulatory networks involved in pluripotency and blastocyst formation.


Subject(s)
Blastocyst , Embryonic Development , Gene Expression Regulation, Developmental , Oocytes , Animals , Cattle , Embryonic Development/genetics , Oocytes/metabolism , Blastocyst/metabolism , Transcriptome , Molecular Sequence Annotation , Gene Expression Profiling , Female
6.
BMC Genomics ; 25(1): 773, 2024 Aug 08.
Article in English | MEDLINE | ID: mdl-39118028

ABSTRACT

BACKGROUND: Fritillaria ussuriensis is an endangered medicinal plant known for its notable therapeutic properties. Unfortunately, its population has drastically declined due to the destruction of forest habitats. Thus, effectively protecting F. ussuriensis from extinction poses a significant challenge. A profound understanding of its genetic foundation is crucial. To date, research on the complete mitochondrial genome of F. ussuriensis has not yet been reported. RESULTS: The complete mitochondrial genome of F. ussuriensis was sequenced and assembled by integrating PacBio and Illumina sequencing technologies, revealing 13 circular chromosomes totaling 737,569 bp with an average GC content of 45.41%. A total of 55 genes were annotated in this mitogenome, including 2 rRNA genes, 12 tRNA genes, and 41 PCGs. The mitochondrial genome of F. ussuriensis contained 192 SSRs and 4,027 dispersed repeats. In the PCGs of F. ussuriensis mitogenome, 90.00% of the RSCU values exceeding 1 exhibited a preference for A-ended or U-ended codons. In addition, 505 RNA editing sites were predicted across these PCGs. Selective pressure analysis suggested negative selection on most PCGs to preserve mitochondrial functionality, as the notable exception of the gene nad3 showed positive selection. Comparison between the mitochondrial and chloroplast genomes of F. ussuriensis revealed 20 homologous fragments totaling 8,954 bp. Nucleotide diversity analysis revealed the variation among genes, and gene atp9 was the most notable. Despite the conservation of GC content, mitogenome sizes varied significantly among six closely related species, and colinear analysis confirmed the lack of conservation in their genomic structures. Phylogenetic analysis indicated a close relationship between F. ussuriensis and Lilium tsingtauense. CONCLUSIONS: In this study, we sequenced and annotated the mitogenome of F. ussuriensis and compared it with the mitogenomes of other closely related species. In addition to genomic features and evolutionary position, this study also provides valuable genomic resources to further understand and utilize this medicinal plant.


Subject(s)
Endangered Species , Fritillaria , Genome, Mitochondrial , Phylogeny , Plants, Medicinal , RNA Editing , Fritillaria/genetics , Plants, Medicinal/genetics , Base Composition , RNA, Transfer/genetics , Molecular Sequence Annotation
7.
Brief Bioinform ; 25(5)2024 Jul 25.
Article in English | MEDLINE | ID: mdl-39120646

ABSTRACT

Cell-type annotation is a critical step in single-cell data analysis. With the development of numerous cell annotation methods, it is necessary to evaluate these methods to help researchers use them effectively. Reference datasets are essential for evaluation, but currently, the cell labels of reference datasets mainly come from computational methods, which may have computational biases and may not reflect the actual cell-type outcomes. This study first constructed an experimentally labeled immune cell-subtype single-cell dataset of the same batch and systematically evaluated 18 cell annotation methods. We assessed those methods under five scenarios, including intra-dataset validation, immune cell-subtype validation, unsupervised clustering, inter-dataset annotation, and unknown cell-type prediction. Accuracy and ARI were evaluation metrics. The results showed that SVM, scBERT, and scDeepSort were the best-performing supervised methods. Seurat was the best-performing unsupervised clustering method, but it couldn't fully fit the actual cell-type distribution. Our results indicated that experimentally labeled immune cell-subtype datasets revealed the deficiencies of unsupervised clustering methods and provided new dataset support for supervised methods.


Subject(s)
Single-Cell Analysis , Single-Cell Analysis/methods , Humans , Cluster Analysis , Computational Biology/methods , Molecular Sequence Annotation , RNA-Seq/methods , Single-Cell Gene Expression Analysis
8.
Sci Data ; 11(1): 875, 2024 Aug 13.
Article in English | MEDLINE | ID: mdl-39138223

ABSTRACT

Flueggea virosa (Roxb. ex Willd.) Royle, an evergreen shrub and small tree in the Phyllanthaceae family, holds significant potential in garden landscaping and pharmacological applications. However, the lack of genomic data has hindered further scientific understanding of its horticultural and medicinal values. In this study, we have assembled a haplotype-resolved genome of F. virosa for the first time. The two haploid genomes, named haplotype A genome and haplotype B genome, are 487.33 Mb and 477.53 Mb in size, respectively, with contig N50 lengths of 31.45 Mb and 32.81 Mb. More than 99% of the assembled sequences were anchored to 13 pairs of pseudo-chromosomes. Furthermore, 21,587 and 21,533 protein-coding genes were predicted in haplotype A and haplotype B genomes, respectively. The availability of this chromosome-level genome fills the gap in genomic data for F. virosa and provides valuable resources for molecular studies of this species, supporting future research on speciation, functional genomics, and comparative genomics within the Phyllanthaceae family.


Subject(s)
Genome, Plant , Chromosomes, Plant , Haplotypes , Molecular Sequence Annotation
9.
Genome Biol ; 25(1): 218, 2024 Aug 13.
Article in English | MEDLINE | ID: mdl-39138517

ABSTRACT

Genome sequencing has become a routine task for biologists, but the challenge of gene structure annotation persists, impeding accurate genomic and genetic research. Here, we present a bioinformatics toolkit, SynGAP (Synteny-based Gene structure Annotation Polisher), which uses gene synteny information to accomplish precise and automated polishing of gene structure annotation of genomes. SynGAP offers exceptional capabilities in the improvement of gene structure annotation quality and the profiling of integrative gene synteny between species. Furthermore, an expression variation index is designed for comparative transcriptomics analysis to explore candidate genes responsible for the development of distinct traits observed in phylogenetically related species.


Subject(s)
Molecular Sequence Annotation , Synteny , Software , Computational Biology/methods , Genomics/methods , Animals
10.
Genome Biol Evol ; 16(8)2024 Aug 05.
Article in English | MEDLINE | ID: mdl-39101619

ABSTRACT

The plant Arabidopsis thaliana is a model system used by researchers through much of plant research. Recent efforts have focused on discovering the genomic variation found in naturally occurring ecotypes isolated from around the world. These ecotypes have come from diverse climates and therefore have faced and adapted to a variety of abiotic and biotic stressors. The sequencing and comparative analysis of these genomes can offer insight into the adaptive strategies of plants. While there are a large number of ecotype genome sequences available, the majority were created using short-read technology. Mapping of short-reads containing structural variation to a reference genome bereft of that variation leads to incorrect mapping of those reads, resulting in a loss of genetic information and introduction of false heterozygosity. For this reason, long-read de novo sequencing of genomes is required to resolve structural variation events. In this article, we sequenced the genomes of eight natural variants of A. thaliana using nanopore sequencing. This resulted in highly contiguous assemblies with >95% of the genome contained within five contigs. The sequencing results from this study include five ecotypes from relict and African populations, an area of untapped genetic diversity. With this study, we increase the knowledge of diversity we have across A. thaliana ecotypes and contribute to ongoing production of an A. thaliana pan-genome.


Subject(s)
Arabidopsis , Ecotype , Genome, Plant , Arabidopsis/genetics , Chromosomes, Plant/genetics , Molecular Sequence Annotation , Genetic Variation
11.
PLoS Comput Biol ; 20(8): e1011831, 2024 Aug.
Article in English | MEDLINE | ID: mdl-39102416

ABSTRACT

Bacteriophages (phages) are viruses that infect bacteria. Many of them produce specific enzymes called depolymerases to break down external polysaccharide structures. Accurate annotation and domain identification of these depolymerases are challenging due to their inherent sequence diversity. Hence, we present DepoScope, a machine learning tool that combines a fine-tuned ESM-2 model with a convolutional neural network to identify depolymerase sequences and their enzymatic domains precisely. To accomplish this, we curated a dataset from the INPHARED phage genome database, created a polysaccharide-degrading domain database, and applied sequential filters to construct a high-quality dataset, which is subsequently used to train DepoScope. Our work is the first approach that combines sequence-level predictions with amino-acid-level predictions for accurate depolymerase detection and functional domain identification. In that way, we believe that DepoScope can greatly enhance our understanding of phage-host interactions at the level of depolymerases.


Subject(s)
Bacteriophages , Computational Biology , Bacteriophages/genetics , Bacteriophages/enzymology , Computational Biology/methods , Molecular Sequence Annotation , Viral Proteins/genetics , Viral Proteins/metabolism , Viral Proteins/chemistry , Neural Networks, Computer , Machine Learning , Software , Protein Domains , Genome, Viral/genetics , Carboxylic Ester Hydrolases/genetics , Carboxylic Ester Hydrolases/metabolism , Carboxylic Ester Hydrolases/chemistry
12.
Sci Data ; 11(1): 741, 2024 Jul 07.
Article in English | MEDLINE | ID: mdl-38972874

ABSTRACT

Our study presents the assembly of a high-quality Taihu goose genome at the Telomere-to-Telomere (T2T) level. By employing advanced sequencing technologies, including Pacific Biosciences HiFi reads, Oxford Nanopore long reads, Illumina short reads, and chromatin conformation capture (Hi-C), we achieved an exceptional assembly. The T2T assembly encompasses a total length of 1,197,991,206 bp, with contigs N50 reaching 33,928,929 bp and scaffold N50 attaining 81,007,908 bp. It consists of 73 scaffolds, including 38 autosomes and one pair of Z/W sex chromosomes. Importantly, 33 autosomes were assembled without any gap, resulting in a contiguous representation. Furthermore, gene annotation efforts identified 34,898 genes, including 436,162 RNA transcripts, encompassing 806,158 exons, 743,910 introns, 651,148 coding sequences (CDS), and 135,622 untranslated regions (UTR). The T2T-level chromosome-scale goose genome assembly provides a vital foundation for future genetic improvement and understanding the genetic mechanisms underlying important traits in geese.


Subject(s)
Geese , Genome , Telomere , Animals , Geese/genetics , Telomere/genetics , Molecular Sequence Annotation
13.
Sci Data ; 11(1): 745, 2024 Jul 09.
Article in English | MEDLINE | ID: mdl-38982096

ABSTRACT

Black scorch disease (BSD), caused by the fungal pathogen Thielaviopsis punctulata (Tp) DSM102798, poses a significant threat to date palm cultivation in the United Arab Emirates (UAE). In this study, Chicago and Hi-C libraries were prepared as input for the Dovetail HiRise pipeline to scaffold the genome of Tp DSM102798. We generated an assembly with a total length of 28.23 Mb comprising 1,256 scaffolds, and the assembly had a contig N50 of 18.56 kb, L50 of three, and a BUSCO completeness score of 98.6% for 758 orthologous genes. Annotation of this assembly produced 7,169 genes and 3,501 Gene Ontology (GO) terms. Compared to five other Thielaviopsis genomes, Tp DSM102798 exhibited the highest continuity with a cumulative size of 27.598 Mb for the first seven scaffolds, surpassing the assemblies of all examined strains. These findings offer a foundation for targeted strategies that enhance date palm resistance against BSD, and foster more sustainable and resilient agricultural systems.


Subject(s)
Genome, Fungal , Molecular Sequence Annotation , Plant Diseases , Plant Diseases/microbiology , Plant Diseases/genetics , Arecaceae/genetics , Arecaceae/microbiology , United Arab Emirates
14.
Methods Mol Biol ; 2836: 3-17, 2024.
Article in English | MEDLINE | ID: mdl-38995532

ABSTRACT

Proteogenomics has revealed the translation of unannotated open reading frames (ORFs) present in mRNAs and in noncoding RNAs (ncRNAs). OpenProt annotates all ORFs with a minimum of 30 codons in the transcriptome of several species and displays many functional features associated with the corresponding proteins. Two types of proteins are annotated: reference or canonical proteins which are proteins already annotated in UniProt, RefSeq, or Ensembl and noncanonical proteins. Noncanonical proteins form two groups: predicted novel isoforms that display a significant level of homology with a reference protein and alternative proteins that are new proteins with no significant homology to known proteins. This chapter describes how to check whether a gene and/or transcript contains multiple open reading frames and how to use OpenProt databases for the detection of alternative proteins and novel isoforms by mass spectrometry-based proteomics.


Subject(s)
Mass Spectrometry , Open Reading Frames , Proteome , Mass Spectrometry/methods , Proteomics/methods , Databases, Protein , Humans , Protein Isoforms/genetics , Protein Isoforms/metabolism , Molecular Sequence Annotation , Proteogenomics/methods
15.
Methods Mol Biol ; 2836: 285-298, 2024.
Article in English | MEDLINE | ID: mdl-38995546

ABSTRACT

The Gene Ontology (GO) project describes the functions of the gene products of organisms from all kingdoms of life in a standardized way, enabling powerful analyses of experiments involving genome-wide analysis. The scientific literature is used to convert experimental results into GO annotations that systematically classify gene products' functions. However, to address the fact that only a minor fraction of all genes has been characterized experimentally, multiple predictive methods to assign GO annotations have been developed since the inception of GO. Sequence homologies between novel genes and genes with known functions help to approximate the roles of these non-characterized genes. Here we describe the main sequence homology methods to produce annotations: pairwise comparison (BLAST), protein profile models (InterPro), and phylogenetic-based annotation (PAINT). Some of these methods can be implemented with genome analysis pipelines (BLAST and InterPro2GO), while PAINT is curated by the GO consortium.


Subject(s)
Computational Biology , Gene Ontology , Molecular Sequence Annotation , Molecular Sequence Annotation/methods , Computational Biology/methods , Phylogeny , Software , Sequence Homology , Databases, Genetic , Humans
16.
Sci Data ; 11(1): 776, 2024 Jul 13.
Article in English | MEDLINE | ID: mdl-39003298

ABSTRACT

Fructus hippophae (Hippophae rhamnoides spp. mongolica×Hippophae rhamnoides sinensis), a hybrid variety of sea buckthorn that Hippophae rhamnoides spp. mongolica serves as the female parent and Hippophae rhamnoides sinensis serves as the male parent, is a traditional plant with great potentials of economic and medical values. Herein, we gained a chromosome-level genome of Fructus hippophae about 918.59 Mb, with the scaffolds N50 reaching 83.65 Mb. Then, we anchored 440 contigs with 97.17% of the total genome sequences onto 12 pseudochromosomes. Next, de-novo, homology and transcriptome assembly strategies were adopted for gene structure prediction. This predicted 36475 protein-coding genes, of which 36226 genes could be functionally annotated. Simultaneously, various strategies were used for quality assessment, both the complete BUSCO value (98.80%) and the mapping rate indicated the high assembly quality. Repetitive elements, which occupied 63.68% of the genome, and 1483600 bp of non-coding RNA were annotated. Here, we provide genomic information on female plants of a popular variety, which can provide data for pan-genomic construction of sea buckthorn and for the resolution of the mechanism of sex differentiation.


Subject(s)
Chromosomes, Plant , Genome, Plant , Hippophae , Hippophae/genetics , Chromosomes, Plant/genetics , Transcriptome , Molecular Sequence Annotation
17.
Int J Mol Sci ; 25(13)2024 Jun 28.
Article in English | MEDLINE | ID: mdl-39000263

ABSTRACT

Cydia pomonella granulovirus is a natural pathogen for Cydia pomonella that is used as a biocontrol agent of insect populations. The study of granulovirus virulence is of particular interest since the development of resistance in natural populations of C. pomonella has been observed during the long-term use of the Mexican isolate CpGV. In our study, we present the genomes of 18 CpGV strains endemic to southern Russia and from Kazakhstan, as well as a strain included in the commercial preparation "Madex Twin", which were sequenced and analyzed. We performed comparative genomic analysis using several tools. From comparisons at the level of genes and protein products that are involved in the infection process of virosis, synonymous and missense substitution variants have been identified. The average nucleotide identity has demonstrated a high similarity with other granulovirus genomes of different geographic origins. Whole-genome alignment of the 18 genomes relative to the reference revealed regions of low similarity. Analysis of gene repertoire variation has shown that BZR GV 4, BZR GV 6, and BZR GV L-7 strains have been the closest in gene content to the commercial "Madex Twin" strain. We have confirmed two deletions using read depth coverage data in regions lacking genes shown by homology analysis for granuloviruses BZR GV L-4 and BZR GV L-6; however, they are not related to the known genes causing viral pathogenicity. Thus, we have isolated novel CpGV strains and analyzed their potential as strains producing highly effective bioinsecticides against C. pomonella.


Subject(s)
Genome, Viral , Granulovirus , Moths , Phylogeny , Granulovirus/genetics , Granulovirus/pathogenicity , Granulovirus/classification , Animals , Moths/virology , Molecular Sequence Annotation
18.
Genome Biol ; 25(1): 170, 2024 07 01.
Article in English | MEDLINE | ID: mdl-38951884

ABSTRACT

Microbial pangenome analysis identifies present or absent genes in prokaryotic genomes. However, current tools are limited when analyzing species with higher sequence diversity or higher taxonomic orders such as genera or families. The Roary ILP Bacterial core Annotation Pipeline (RIBAP) uses an integer linear programming approach to refine gene clusters predicted by Roary for identifying core genes. RIBAP successfully handles the complexity and diversity of Chlamydia, Klebsiella, Brucella, and Enterococcus genomes, outperforming other established and recent pangenome tools for identifying all-encompassing core genes at the genus level. RIBAP is a freely available Nextflow pipeline at github.com/hoelzer-lab/ribap and zenodo.org/doi/10.5281/zenodo.10890871.


Subject(s)
Genome, Bacterial , Molecular Sequence Annotation , Software , Brucella/genetics , Brucella/classification , Bacteria/genetics , Bacteria/classification , Chlamydia/genetics , Enterococcus/genetics , Klebsiella/genetics
19.
Nat Commun ; 15(1): 5573, 2024 Jul 02.
Article in English | MEDLINE | ID: mdl-38956036

ABSTRACT

Recent advancements in genome assembly have greatly improved the prospects for comprehensive annotation of Transposable Elements (TEs). However, existing methods for TE annotation using genome assemblies suffer from limited accuracy and robustness, requiring extensive manual editing. In addition, the currently available gold-standard TE databases are not comprehensive, even for extensively studied species, highlighting the critical need for an automated TE detection method to supplement existing repositories. In this study, we introduce HiTE, a fast and accurate dynamic boundary adjustment approach designed to detect full-length TEs. The experimental results demonstrate that HiTE outperforms RepeatModeler2, the state-of-the-art tool, across various species. Furthermore, HiTE has identified numerous novel transposons with well-defined structures containing protein-coding domains, some of which are directly inserted within crucial genes, leading to direct alterations in gene expression. A Nextflow version of HiTE is also available, with enhanced parallelism, reproducibility, and portability.


Subject(s)
DNA Transposable Elements , Molecular Sequence Annotation , DNA Transposable Elements/genetics , Molecular Sequence Annotation/methods , Animals , Software , Humans , Reproducibility of Results , Computational Biology/methods , Databases, Genetic , Algorithms , Genome/genetics
20.
Sci Rep ; 14(1): 15228, 2024 07 02.
Article in English | MEDLINE | ID: mdl-38956286

ABSTRACT

In order to resolve the key genes for weed control by Trichoderma polysporum at the genomic level, we extracted the genomic DNA and sequenced the whole genome of T. polysporum strain HZ-31 on the Illumina Hiseq platform. The raw data was cleaned up using Trimmomatic and checked for quality using FastQC. The sequencing data was assembled using SPAdes, and GeneMark was used to perform gene prediction on the assembly results. The results showed that the genome size of T. polysporum HZ-31 was 39,325,746 bp, with 48% GC content, and the number of genes encoded was 11,998. A total of 148 tRNAs and 45 rRNAs were predicted. A total of 782 genes were annotated in the Carbohydrase Database, 757 genes were annotated to the Pathogen-Host Interaction Database, and 67 gene clusters were identified. In addition, 1023 genes were predicted to be signal peptide proteins. The annotation and functional analysis of the whole genome sequence of T. polymorpha HZ-31 provide a basis for the in-depth study of the molecular mechanism of its herbicidal action and more effective utilization for weed control.


Subject(s)
Genome, Fungal , Trichoderma , Whole Genome Sequencing , Trichoderma/genetics , Whole Genome Sequencing/methods , Molecular Sequence Annotation , Base Composition , Fungal Proteins/genetics , Host-Pathogen Interactions/genetics
SELECTION OF CITATIONS
SEARCH DETAIL