Your browser doesn't support javascript.
loading
Montrer: 20 | 50 | 100
Résultats 1 - 20 de 12.590
Filtrer
1.
Gigascience ; 132024 Jan 02.
Article de Anglais | MEDLINE | ID: mdl-39110622

RÉSUMÉ

BACKGROUND: Rhododendron nivale subsp. boreale Philipson et M. N. Philipson is an alpine woody species with ornamental qualities that serve as the predominant species in mountainous scrub habitats found at an altitude of ∼4,200 m. As a high-altitude woody polyploid, this species may serve as a model to understand how plants adapt to alpine environments. Despite its ecological significance, the lack of genomic resources has hindered a comprehensive understanding of its evolutionary and adaptive characteristics in high-altitude mountainous environments. FINDINGS: We sequenced and assembled the genome of R. nivale subsp. boreale, an assembly of the first subgenus Rhododendron and the first high-altitude woody flowering tetraploid, contributing an important genomic resource for alpine woody flora. The assembly included 52 pseudochromosomes (scaffold N50 = 42.93 Mb; BUSCO = 98.8%; QV = 45.51; S-AQI = 98.69), which belonged to 4 haplotypes, harboring 127,810 predicted protein-coding genes. Conjoint k-mer analysis, collinearity assessment, and phylogenetic investigation corroborated autotetraploid identity. Comparative genomic analysis revealed that R. nivale subsp. boreale originated as a neopolyploid of R. nivale and underwent 2 rounds of ancient polyploidy events. Transcriptional expression analysis showed that differences in expression between alleles were common and randomly distributed in the genome. We identified extended gene families and signatures of positive selection that are involved not only in adaptation to the mountaintop ecosystem (response to stress and developmental regulation) but also in autotetraploid reproduction (meiotic stabilization). Additionally, the expression levels of the (group VII ethylene response factor transcription factors) ERF VIIs were significantly higher than the mean global gene expression. We suspect that these changes have enabled the success of this species at high altitudes. CONCLUSIONS: We assembled the first high-altitude autopolyploid genome and achieved chromosome-level assembly within the subgenus Rhododendron. In addition, a high-altitude adaptation strategy of R. nivale subsp. boreale was reasonably speculated. This study provides valuable data for the exploration of alpine mountaintop adaptations and the correlation between extreme environments and species polyploidization.


Sujet(s)
Altitude , Génome végétal , Haplotypes , Phylogenèse , Rhododendron , Tétraploïdie , Rhododendron/génétique , Adaptation physiologique/génétique , Annotation de séquence moléculaire , Polyploïdie , Régulation de l'expression des gènes végétaux
2.
Sci Data ; 11(1): 840, 2024 Aug 03.
Article de Anglais | MEDLINE | ID: mdl-39097649

RÉSUMÉ

Recent advancements in sequencing and genome assembly technologies have led to rapid generation of high-quality genome assemblies for various species and breeds. Despite the importance as minipigs an animal model in biomedical research, the construction of high-quality genome assemblies of minipigs still lags behind other pig breeds. To address this problem, we constructed a high-quality chromosome-level genome assembly of the Korean minipig (KMP) utilizing multiple different types of sequencing reads and reference genomes. The KMP assembly included 19 chromosome-level sequences with a total length of 2.52 Gb and N50 of 137 Mb. Comparative analyses with the pig reference genome (Sscrofa11.1) demonstrated comparable contiguity and completeness of the KMP assembly. Additionally, genome annotation analyses identified 22,666 protein-coding genes and repetitive elements occupying 40.10% of the genome. The KMP assembly and genome annotation provide valuable resources that can contribute to various future research on minipig and other pig breeds.


Sujet(s)
Génome , Porc miniature , Animaux , Porc miniature/génétique , Suidae/génétique , Sus scrofa/génétique , Annotation de séquence moléculaire , Chromosomes
3.
Sci Data ; 11(1): 850, 2024 Aug 08.
Article de Anglais | MEDLINE | ID: mdl-39117633

RÉSUMÉ

Rhabdophis nuchalis, a snake widely distributed in China, possesses a unique trait: glands beneath the skin on its neck and back, known as nucho-dorsal glands. These features make it a valuable subject for studying genetic diversity and the evolution of complex traits. In this study, we obtained a high-quality chromosome-level reference genome of R. nuchalis using MGI short-read sequencing, PacBio Revio long-read sequencing, and Hi-C sequencing techniques. The final assembly comprised 1.92 Gb of the R. nuchalis genome, anchored to 20 chromosomes (including 9 macrochromosomes and 11 microchromosomes), with a contig N50 of 104.79 Mb, a scaffold N50 of 204.96 Mb, and a BUSCO completeness of 97.50%. Additionally, we annotated a total of 1.09 Gb of repetitive sequences (which constitute 56.51% of the entire genome) and identified 22,057 protein-coding genes. This high-quality reference genome of R. nuchalis furnishes essential genomic data for comprehending the genetic diversity and evolutionary history of the species, as well as for facilitating species conservation efforts and comparative genomics studies.


Sujet(s)
Chromosomes , Génome , Animaux , Annotation de séquence moléculaire , Serpents/génétique
4.
BMC Genomics ; 25(1): 773, 2024 Aug 08.
Article de Anglais | MEDLINE | ID: mdl-39118028

RÉSUMÉ

BACKGROUND: Fritillaria ussuriensis is an endangered medicinal plant known for its notable therapeutic properties. Unfortunately, its population has drastically declined due to the destruction of forest habitats. Thus, effectively protecting F. ussuriensis from extinction poses a significant challenge. A profound understanding of its genetic foundation is crucial. To date, research on the complete mitochondrial genome of F. ussuriensis has not yet been reported. RESULTS: The complete mitochondrial genome of F. ussuriensis was sequenced and assembled by integrating PacBio and Illumina sequencing technologies, revealing 13 circular chromosomes totaling 737,569 bp with an average GC content of 45.41%. A total of 55 genes were annotated in this mitogenome, including 2 rRNA genes, 12 tRNA genes, and 41 PCGs. The mitochondrial genome of F. ussuriensis contained 192 SSRs and 4,027 dispersed repeats. In the PCGs of F. ussuriensis mitogenome, 90.00% of the RSCU values exceeding 1 exhibited a preference for A-ended or U-ended codons. In addition, 505 RNA editing sites were predicted across these PCGs. Selective pressure analysis suggested negative selection on most PCGs to preserve mitochondrial functionality, as the notable exception of the gene nad3 showed positive selection. Comparison between the mitochondrial and chloroplast genomes of F. ussuriensis revealed 20 homologous fragments totaling 8,954 bp. Nucleotide diversity analysis revealed the variation among genes, and gene atp9 was the most notable. Despite the conservation of GC content, mitogenome sizes varied significantly among six closely related species, and colinear analysis confirmed the lack of conservation in their genomic structures. Phylogenetic analysis indicated a close relationship between F. ussuriensis and Lilium tsingtauense. CONCLUSIONS: In this study, we sequenced and annotated the mitogenome of F. ussuriensis and compared it with the mitogenomes of other closely related species. In addition to genomic features and evolutionary position, this study also provides valuable genomic resources to further understand and utilize this medicinal plant.


Sujet(s)
Espèce en voie de disparition , Fritillaria , Génome mitochondrial , Phylogenèse , Plantes médicinales , Édition des ARN , Fritillaria/génétique , Plantes médicinales/génétique , Composition en bases nucléiques , ARN de transfert/génétique , Annotation de séquence moléculaire
5.
Brief Bioinform ; 25(5)2024 Jul 25.
Article de Anglais | MEDLINE | ID: mdl-39120646

RÉSUMÉ

Cell-type annotation is a critical step in single-cell data analysis. With the development of numerous cell annotation methods, it is necessary to evaluate these methods to help researchers use them effectively. Reference datasets are essential for evaluation, but currently, the cell labels of reference datasets mainly come from computational methods, which may have computational biases and may not reflect the actual cell-type outcomes. This study first constructed an experimentally labeled immune cell-subtype single-cell dataset of the same batch and systematically evaluated 18 cell annotation methods. We assessed those methods under five scenarios, including intra-dataset validation, immune cell-subtype validation, unsupervised clustering, inter-dataset annotation, and unknown cell-type prediction. Accuracy and ARI were evaluation metrics. The results showed that SVM, scBERT, and scDeepSort were the best-performing supervised methods. Seurat was the best-performing unsupervised clustering method, but it couldn't fully fit the actual cell-type distribution. Our results indicated that experimentally labeled immune cell-subtype datasets revealed the deficiencies of unsupervised clustering methods and provided new dataset support for supervised methods.


Sujet(s)
Analyse sur cellule unique , Analyse sur cellule unique/méthodes , Humains , Analyse de regroupements , Biologie informatique/méthodes , Annotation de séquence moléculaire , RNA-Seq/méthodes , Analyse de l'expression du gène de la cellule unique
6.
Sci Data ; 11(1): 866, 2024 Aug 10.
Article de Anglais | MEDLINE | ID: mdl-39127825

RÉSUMÉ

Anthocidaris crassispina is a very popular edible sea urchin distributed along the coast of the South China Sea. In this study, we performed whole-genome sequencing and generated a chromosome-level assembly of this species. The total length of the genomic contig sequence was 891.02 Mb, and contig N50 was 808.15 kb when Hifiasm was used for assembly. The Hi-C library was constructed and sequenced, yielding approximately 68.61 Gb of data. After Hi-C assembly, approximately 886.72 Mb of sequence was able to be mapped onto 21 chromosomes, accounting for 99.52% of the total genome length. Among the sequences located on the chromosomes, those for which the order and direction could be determined accounted for approximately 826.82 Mb, or 93.24% of the total length. These results provide valuable resources for further study of A. crassispina at the genetic level.


Sujet(s)
Annotation de séquence moléculaire , Echinoidea , Animaux , Echinoidea/génétique , Génome , Séquençage du génome entier , Chromosomes , Chine
7.
BMC Genomics ; 25(1): 775, 2024 Aug 09.
Article de Anglais | MEDLINE | ID: mdl-39118001

RÉSUMÉ

BACKGROUND: Appropriate regulation of genes expressed in oocytes and embryos is essential for acquisition of developmental competence in mammals. Here, we hypothesized that several genes expressed in oocytes and pre-implantation embryos remain unknown. Our goal was to reconstruct the transcriptome of oocytes (germinal vesicle and metaphase II) and pre-implantation cattle embryos (blastocysts) using short-read and long-read sequences to identify putative new genes. RESULTS: We identified 274,342 transcript sequences and 3,033 of those loci do not match a gene present in official annotations and thus are potential new genes. Notably, 63.67% (1,931/3,033) of potential novel genes exhibited coding potential. Also noteworthy, 97.92% of the putative novel genes overlapped annotation with transposable elements. Comparative analysis of transcript abundance identified that 1,840 novel genes (recently added to the annotation) or potential new genes were differentially expressed between developmental stages (FDR < 0.01). We also determined that 522 novel or potential new genes (448 and 34, respectively) were upregulated at eight-cell embryos compared to oocytes (FDR < 0.01). In eight-cell embryos, 102 novel or putative new genes were co-expressed (|r|> 0.85, P < 1 × 10-8) with several genes annotated with gene ontology biological processes related to pluripotency maintenance and embryo development. CRISPR-Cas9 genome editing confirmed that the disruption of one of the novel genes highly expressed in eight-cell embryos reduced blastocyst development (ENSBTAG00000068261, P = 1.55 × 10-7). CONCLUSIONS: Our results revealed several putative new genes that need careful annotation. Many of the putative new genes have dynamic regulation during pre-implantation development and are important components of gene regulatory networks involved in pluripotency and blastocyst formation.


Sujet(s)
Blastocyste , Développement embryonnaire , Régulation de l'expression des gènes au cours du développement , Ovocytes , Animaux , Bovins , Développement embryonnaire/génétique , Ovocytes/métabolisme , Blastocyste/métabolisme , Transcriptome , Annotation de séquence moléculaire , Analyse de profil d'expression de gènes , Femelle
8.
Genome Biol Evol ; 16(8)2024 Aug 05.
Article de Anglais | MEDLINE | ID: mdl-39101619

RÉSUMÉ

The plant Arabidopsis thaliana is a model system used by researchers through much of plant research. Recent efforts have focused on discovering the genomic variation found in naturally occurring ecotypes isolated from around the world. These ecotypes have come from diverse climates and therefore have faced and adapted to a variety of abiotic and biotic stressors. The sequencing and comparative analysis of these genomes can offer insight into the adaptive strategies of plants. While there are a large number of ecotype genome sequences available, the majority were created using short-read technology. Mapping of short-reads containing structural variation to a reference genome bereft of that variation leads to incorrect mapping of those reads, resulting in a loss of genetic information and introduction of false heterozygosity. For this reason, long-read de novo sequencing of genomes is required to resolve structural variation events. In this article, we sequenced the genomes of eight natural variants of A. thaliana using nanopore sequencing. This resulted in highly contiguous assemblies with >95% of the genome contained within five contigs. The sequencing results from this study include five ecotypes from relict and African populations, an area of untapped genetic diversity. With this study, we increase the knowledge of diversity we have across A. thaliana ecotypes and contribute to ongoing production of an A. thaliana pan-genome.


Sujet(s)
Arabidopsis , Écotype , Génome végétal , Arabidopsis/génétique , Chromosomes de plante/génétique , Annotation de séquence moléculaire , Variation génétique
9.
PLoS Comput Biol ; 20(8): e1011831, 2024 Aug.
Article de Anglais | MEDLINE | ID: mdl-39102416

RÉSUMÉ

Bacteriophages (phages) are viruses that infect bacteria. Many of them produce specific enzymes called depolymerases to break down external polysaccharide structures. Accurate annotation and domain identification of these depolymerases are challenging due to their inherent sequence diversity. Hence, we present DepoScope, a machine learning tool that combines a fine-tuned ESM-2 model with a convolutional neural network to identify depolymerase sequences and their enzymatic domains precisely. To accomplish this, we curated a dataset from the INPHARED phage genome database, created a polysaccharide-degrading domain database, and applied sequential filters to construct a high-quality dataset, which is subsequently used to train DepoScope. Our work is the first approach that combines sequence-level predictions with amino-acid-level predictions for accurate depolymerase detection and functional domain identification. In that way, we believe that DepoScope can greatly enhance our understanding of phage-host interactions at the level of depolymerases.


Sujet(s)
Bactériophages , Biologie informatique , Bactériophages/génétique , Bactériophages/enzymologie , Biologie informatique/méthodes , Annotation de séquence moléculaire , Protéines virales/génétique , Protéines virales/métabolisme , Protéines virales/composition chimique , , Apprentissage machine , Logiciel , Domaines protéiques , Génome viral/génétique , Carboxylic ester hydrolases/génétique , Carboxylic ester hydrolases/métabolisme , Carboxylic ester hydrolases/composition chimique
10.
Sci Data ; 11(1): 875, 2024 Aug 13.
Article de Anglais | MEDLINE | ID: mdl-39138223

RÉSUMÉ

Flueggea virosa (Roxb. ex Willd.) Royle, an evergreen shrub and small tree in the Phyllanthaceae family, holds significant potential in garden landscaping and pharmacological applications. However, the lack of genomic data has hindered further scientific understanding of its horticultural and medicinal values. In this study, we have assembled a haplotype-resolved genome of F. virosa for the first time. The two haploid genomes, named haplotype A genome and haplotype B genome, are 487.33 Mb and 477.53 Mb in size, respectively, with contig N50 lengths of 31.45 Mb and 32.81 Mb. More than 99% of the assembled sequences were anchored to 13 pairs of pseudo-chromosomes. Furthermore, 21,587 and 21,533 protein-coding genes were predicted in haplotype A and haplotype B genomes, respectively. The availability of this chromosome-level genome fills the gap in genomic data for F. virosa and provides valuable resources for molecular studies of this species, supporting future research on speciation, functional genomics, and comparative genomics within the Phyllanthaceae family.


Sujet(s)
Génome végétal , Chromosomes de plante , Haplotypes , Annotation de séquence moléculaire
11.
Genome Biol ; 25(1): 218, 2024 Aug 13.
Article de Anglais | MEDLINE | ID: mdl-39138517

RÉSUMÉ

Genome sequencing has become a routine task for biologists, but the challenge of gene structure annotation persists, impeding accurate genomic and genetic research. Here, we present a bioinformatics toolkit, SynGAP (Synteny-based Gene structure Annotation Polisher), which uses gene synteny information to accomplish precise and automated polishing of gene structure annotation of genomes. SynGAP offers exceptional capabilities in the improvement of gene structure annotation quality and the profiling of integrative gene synteny between species. Furthermore, an expression variation index is designed for comparative transcriptomics analysis to explore candidate genes responsible for the development of distinct traits observed in phylogenetically related species.


Sujet(s)
Annotation de séquence moléculaire , Synténie , Logiciel , Biologie informatique/méthodes , Génomique/méthodes , Animaux
12.
Sci Data ; 11(1): 891, 2024 Aug 16.
Article de Anglais | MEDLINE | ID: mdl-39152143

RÉSUMÉ

Paspalum notatum Flüggé is an economically important subtropical fodder grass that is widely used in the Americas. Here, we report a new chromosome-scale genome assembly and annotation of a diploid biotype collected in the center of origin of the species. Using Oxford Nanopore long reads, we generated a 557.81 Mb genome assembly (N50 = 56.1 Mb) with high gene completeness (BUSCO = 98.73%). Genome annotation identified 320 Mb (57.86%) of repetitive elements and 45,074 gene models, of which 36,079 have a high level of confidence. Further characterisation included the identification of 59 miRNA precursors together with their putative targets. The present work provides a comprehensive genomic resource for P. notatum improvement and a reference frame for functional and evolutionary research within the genus.


Sujet(s)
Génome végétal , Annotation de séquence moléculaire , Paspalum , Paspalum/génétique , Chromosomes de plante/génétique , microARN/génétique , Séquences répétées d'acides nucléiques
13.
Mol Biol Rep ; 51(1): 863, 2024 Jul 29.
Article de Anglais | MEDLINE | ID: mdl-39073678

RÉSUMÉ

BACKGROUND: Tetramethylpyrazine has been extensively studied as an anticancer substance and a flavor substance in the fields of medicine and food industry. A strain with high tetramethylpyrazine production was screened from the fermented grains of Danquan winery. Genome sequencing can reveal the potential roles of bacteria by thoroughly examining the connection between genes and phenotypes from a genomic perspective. METHODS AND RESULTS: In this study, whole genome of this strain was sequenced and analyzed. This paper summarized the genomic characteristics of strain TTMP2 and analyzed genes related to the synthesis of tetramethylpyrazine. Bacillus sp. TTMP2 has a complete metabolic pathway for acetoin and tetramethylpyrazine metabolism. Gene function was analyzed by COG annotation, GO annotation, KEGG annotation and functional annotations for lipoproteins, carbohydrate-active enzymes, and pathogen-host interactions. Phylogenetic analysis indicated that Bacillus velezensis had the high homology with Bacillus sp. TTMP2. Genomes of 16 Bacillus species cover all genes of Bacillus, suggesting that genus Bacillus has an open pan-genome and can survive in diverse environments. CONCLUSION: The analysis of genome sequencing data from Bacillus sp. TTMP2 showed that its metabolic characteristics could be deeply understood, indicating that this bacterium had a particular role in tetramethylpyrazine synthesis.


Sujet(s)
Bacillus , Génome bactérien , Phylogenèse , Pyrazines , Séquençage du génome entier , Bacillus/génétique , Bacillus/métabolisme , Pyrazines/métabolisme , Séquençage du génome entier/méthodes , Génome bactérien/génétique , Voies et réseaux métaboliques/génétique , Annotation de séquence moléculaire
14.
Commun Biol ; 7(1): 920, 2024 Jul 31.
Article de Anglais | MEDLINE | ID: mdl-39080448

RÉSUMÉ

Lettuce is one of the most widely cultivated and consumed dicotyledonous vegetables globally. Despite the availability of its reference genome sequence, lettuce gene annotation remains incomplete, impeding comprehensive research and the broad application of genomic resources. Long-read RNA isoform sequencing (Iso-Seq) offers substantial advantages for analyzing RNA alternative splicing and aiding gene annotation, yet it faces throughput limitations. We present the HIT-ISOseq method tailored for bulk sample analysis, significantly enhancing RNA sequencing throughput on the PacBio platform by concatenating cDNA. Here we show, HIT-ISOseq generates 3-4 cDNA molecules per CCS read in lettuce, yielding 15.7 million long reads per PacBio Sequel II SMRT Cell 8 M. We validate its effectiveness in analyzing six lettuce tissue samples, including roots, stems, and leaves, revealing tissue-specific gene expression patterns and RNA isoforms. Leveraging diverse tissue long-read RNA sequencing, we refine the transcript annotation of the lettuce reference genome, expanding its GO and KEGG annotation repertoire. Collectively, this study serves as a foundational reference for genome annotation and the analysis of multi-sample isoform expression, utilizing high-throughput long-read transcriptome sequencing.


Sujet(s)
Séquençage nucléotidique à haut débit , Lactuca , Analyse de séquence d'ARN , Lactuca/génétique , Séquençage nucléotidique à haut débit/méthodes , Analyse de séquence d'ARN/méthodes , ARN des plantes/génétique , Spécificité d'organe/génétique , Régulation de l'expression des gènes végétaux , Annotation de séquence moléculaire , Épissage alternatif , Isoformes d'ARN/génétique , Gènes de plante
15.
Sci Data ; 11(1): 776, 2024 Jul 13.
Article de Anglais | MEDLINE | ID: mdl-39003298

RÉSUMÉ

Fructus hippophae (Hippophae rhamnoides spp. mongolica×Hippophae rhamnoides sinensis), a hybrid variety of sea buckthorn that Hippophae rhamnoides spp. mongolica serves as the female parent and Hippophae rhamnoides sinensis serves as the male parent, is a traditional plant with great potentials of economic and medical values. Herein, we gained a chromosome-level genome of Fructus hippophae about 918.59 Mb, with the scaffolds N50 reaching 83.65 Mb. Then, we anchored 440 contigs with 97.17% of the total genome sequences onto 12 pseudochromosomes. Next, de-novo, homology and transcriptome assembly strategies were adopted for gene structure prediction. This predicted 36475 protein-coding genes, of which 36226 genes could be functionally annotated. Simultaneously, various strategies were used for quality assessment, both the complete BUSCO value (98.80%) and the mapping rate indicated the high assembly quality. Repetitive elements, which occupied 63.68% of the genome, and 1483600 bp of non-coding RNA were annotated. Here, we provide genomic information on female plants of a popular variety, which can provide data for pan-genomic construction of sea buckthorn and for the resolution of the mechanism of sex differentiation.


Sujet(s)
Chromosomes de plante , Génome végétal , Hippophae , Hippophae/génétique , Chromosomes de plante/génétique , Transcriptome , Annotation de séquence moléculaire
16.
BMC Genom Data ; 25(1): 70, 2024 Jul 15.
Article de Anglais | MEDLINE | ID: mdl-39009995

RÉSUMÉ

OBJECTIVES: Ants are ecologically dominant insects in most terrestrial ecosystems, with more than 14,000 extant species in about 340 genera recorded to date. However, genomic resources are still scarce for most species, especially for species endemic in East or Southeast Asia, limiting the study of phylogeny, speciation and adaptation of this evolutionarily successful animal lineage. Here, we assemble and annotate the genomes of Odontoponera transversa and Camponotus friedae, two ant species with a natural distribution in China, to facilitate future study of ant evolution. DATA DESCRIPTION: We obtained a total of 16 Gb and 51 Gb PacBio HiFi data for O. transversa and C. friedae, respectively, which were assembled into the draft genomes of 339 Mb for O. transversa and 233 Mb for C. friedae. Genome assessments by multiple metrics showed good completeness and high accuracy of the two assemblies. Gene annotations assisted by RNA-seq data yielded a comparable number of protein-coding genes in the two genomes (10,892 for O. transversa and 11,296 for C. friedae), while repeat annotations revealed a remarkable difference of repeat content between these two ant species (149.4 Mb for O. transversa versus 49.7 Mb for C. friedae). Besides, complete mitochondrial genomes for the two species were assembled and annotated.


Sujet(s)
Fourmis , Génome d'insecte , Animaux , Fourmis/génétique , Fourmis/classification , Génome d'insecte/génétique , Annotation de séquence moléculaire , Phylogenèse , Génomique/méthodes
17.
Sci Data ; 11(1): 780, 2024 Jul 16.
Article de Anglais | MEDLINE | ID: mdl-39013888

RÉSUMÉ

Euglena gracilis (E. gracilis), pivotal in the study of photosynthesis, endosymbiosis, and chloroplast development, is also an industrial microalga for paramylon production. Despite its importance, E. gracilis genome exploration faces challenges due to its intricate nature. In this study, we achieved a chromosome-level de novo assembly (2.37 Gb) using Illumina, PacBio, Bionano, and Hi-C data. The assembly exhibited a contig N50 of 619 Kb and scaffold N50 of 1.12 Mb, indicating superior continuity. Approximately 99.83% of the genome was anchored to 46 chromosomes, revealing structural insights. Repetitive elements constituted 58.84% of the sequences. Functional annotations were assigned to 39,362 proteins, enhancing interpretative power. BUSCO analysis confirmed assembly completeness at 80.39%. This first high-quality E. gracilis genome offers insights for genetics and genomics studies, overcoming previous limitations. The impact extends to academic and industrial research, providing a foundational resource.


Sujet(s)
Euglena gracilis , Euglena gracilis/génétique , Chromosomes , Microalgues/génétique , Annotation de séquence moléculaire , Glucanes
18.
mSystems ; 9(7): e0050524, 2024 Jul 23.
Article de Anglais | MEDLINE | ID: mdl-38953320

RÉSUMÉ

Nanopore direct RNA sequencing (DRS) enables the capture and full-length sequencing of native RNAs, without recoding or amplification bias. Resulting data sets may be interrogated to define the identity and location of chemically modified ribonucleotides, as well as the length of poly(A) tails, on individual RNA molecules. The success of these analyses is highly dependent on the provision of high-resolution transcriptome annotations in combination with workflows that minimize misalignments and other analysis artifacts. Existing software solutions for generating high-resolution transcriptome annotations are poorly suited to small gene-dense genomes of viruses due to the challenge of identifying distinct transcript isoforms where alternative splicing and overlapping RNAs are prevalent. To resolve this, we identified key characteristics of DRS data sets that inform resulting read alignments and developed the nanopore guided annotation of transcriptome architectures (NAGATA) software package (https://github.com/DepledgeLab/NAGATA). We demonstrate, using a combination of synthetic and original DRS data sets derived from adenoviruses, herpesviruses, coronaviruses, and human cells, that NAGATA outperforms existing transcriptome annotation software and yields a consistently high level of precision and recall when reconstructing both gene sparse and gene-dense transcriptomes. Finally, we apply NAGATA to generate the first high-resolution transcriptome annotation of the neglected pathogen human adenovirus type F41 (HAdV-41) for which we identify 77 distinct transcripts encoding at least 23 different proteins. IMPORTANCE: The transcriptome of an organism denotes the full repertoire of encoded RNAs that may be expressed. This is critical to understanding the biology of an organism and for accurate transcriptomic and epitranscriptomic-based analyses. Annotating transcriptomes remains a complex task, particularly in small gene-dense organisms such as viruses which maximize their coding capacity through overlapping RNAs. To resolve this, we have developed a new software nanopore guided annotation of transcriptome architectures (NAGATA) which utilizes nanopore direct RNA sequencing (DRS) datasets to rapidly produce high-resolution transcriptome annotations for diverse viruses and other organisms.


Sujet(s)
Annotation de séquence moléculaire , Logiciel , Transcriptome , Humains , Transcriptome/génétique , Annotation de séquence moléculaire/méthodes , Analyse de séquence d'ARN/méthodes , Herpesviridae/génétique , Coronavirus/génétique , Séquençage par nanopores/méthodes , Nanopores , Adenoviridae/génétique
19.
Sci Data ; 11(1): 812, 2024 Jul 22.
Article de Anglais | MEDLINE | ID: mdl-39039100

RÉSUMÉ

Reaumuria soongarica is a xerophytic shrub belonging to the Tamaricaceae family. The species is widely distributed in the deserts of Central Asia and is characterized by its remarkable adaptability to saline and barren desert environments. Using PacBio long-read sequencing and Hi-C technologies, we assembled a chromosome-level genome of R. soongarica. The genome assembly has a size of 1.28 Gb with a scaffold N50 of 116.15 Mb, and approximately 1.25 Gb sequences were anchored in 11 pseudo-chromosomes. A completeness assessment of the assembled genome revealed a BUSCO score of 97.5% and an LTR Assembly Index of 12.37. R. soongarica genome had approximately 60.07% repeat sequences. In total, 21,791 protein-coding genes were predicted, of which 95.64% were functionally annotated. This high-quality genome will serve as a foundation for studying the genomic evolution and adaptive mechanisms to arid-saline environments in R. soongarica, facilitating the exploration and utilization of its unique genetic resources.


Sujet(s)
Génome végétal , Annotation de séquence moléculaire , Tamaricaceae , Tamaricaceae/génétique , Chromosomes de plante
20.
Nature ; 632(8023): 166-173, 2024 Aug.
Article de Anglais | MEDLINE | ID: mdl-39020176

RÉSUMÉ

Gene expression in Arabidopsis is regulated by more than 1,900 transcription factors (TFs), which have been identified genome-wide by the presence of well-conserved DNA-binding domains. Activator TFs contain activation domains (ADs) that recruit coactivator complexes; however, for nearly all Arabidopsis TFs, we lack knowledge about the presence, location and transcriptional strength of their ADs1. To address this gap, here we use a yeast library approach to experimentally identify Arabidopsis ADs on a proteome-wide scale, and find that more than half of the Arabidopsis TFs contain an AD. We annotate 1,553 ADs, the vast majority of which are, to our knowledge, previously unknown. Using the dataset generated, we develop a neural network to accurately predict ADs and to identify sequence features that are necessary to recruit coactivator complexes. We uncover six distinct combinations of sequence features that result in activation activity, providing a framework to interrogate the subfunctionalization of ADs. Furthermore, we identify ADs in the ancient AUXIN RESPONSE FACTOR family of TFs, revealing that AD positioning is conserved in distinct clades. Our findings provide a deep resource for understanding transcriptional activation, a framework for examining function in intrinsically disordered regions and a predictive model of ADs.


Sujet(s)
Protéines d'Arabidopsis , Arabidopsis , Régulation de l'expression des gènes végétaux , Domaines protéiques , Facteurs de transcription , Activation de la transcription , Arabidopsis/composition chimique , Arabidopsis/génétique , Arabidopsis/métabolisme , Protéines d'Arabidopsis/composition chimique , Protéines d'Arabidopsis/classification , Protéines d'Arabidopsis/métabolisme , Séquence conservée/génétique , Jeux de données comme sujet , Régulation de l'expression des gènes végétaux/génétique , Acides indolacétiques/métabolisme , Protéines intrinsèquement désordonnées , Annotation de séquence moléculaire , , Protéome/composition chimique , Protéome/métabolisme , Facteurs de transcription/composition chimique , Facteurs de transcription/classification , Facteurs de transcription/métabolisme , Activation de la transcription/génétique
SÉLECTION CITATIONS
DÉTAIL DE RECHERCHE