Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 44
Filtrar
1.
Trends Plant Sci ; 2024 Jul 29.
Artigo em Inglês | MEDLINE | ID: mdl-39079769

RESUMO

Regulating gene expression in plant development and environmental responses is vital for mitigating the effects of climate change on crop growth and productivity. The eukaryotic genome largely shows the canonical B-DNA structure that is organized into nucleosomes with histone modifications shaping the epigenome. Nuclear proteins and RNA interactions influence chromatin conformations and dynamically modulate gene activity. Non-B DNA conformations and their transitions introduce novel aspects to gene expression modulation, particularly in response to environmental shifts. We explore the current understanding of non-B DNA structures in plant genomes, their interplay with epigenomics and gene expression, and advances in methods for their mapping and characterization. The exploration of so far uncharacterized non-B DNA structures remains an intriguing area in plant chromatin research and offers insights into their potential role in gene regulation.

2.
Genome Biol ; 25(1): 91, 2024 04 08.
Artigo em Inglês | MEDLINE | ID: mdl-38589937

RESUMO

BACKGROUND: Although sequencing technologies have boosted the measurement of the genomic diversity of plant crops, it remains challenging to accurately genotype millions of genetic variants, especially structural variations, with only short reads. In recent years, many graph-based variation genotyping methods have been developed to address this issue and tested for human genomes. However, their performance in plant genomes remains largely elusive. Furthermore, pipelines integrating the advantages of current genotyping methods might be required, considering the different complexity of plant genomes. RESULTS: Here we comprehensively evaluate eight such genotypers in different scenarios in terms of variant type and size, sequencing parameters, genomic context, and complexity, as well as graph size, using both simulated and real data sets from representative plant genomes. Our evaluation reveals that there are still great challenges to applying existing methods to plants, such as excessive repeats and variants or high resource consumption. Therefore, we propose a pipeline called Ensemble Variant Genotyper (EVG) that can achieve better genotyping performance in almost all experimental scenarios and comparably higher genotyping recall and precision even using 5× reads. Furthermore, we demonstrate that EVG is more robust with an increasing number of graphed genomes, especially for insertions and deletions. CONCLUSIONS: Our study will provide new insights into the development and application of graph-based genotyping algorithms. We conclude that EVG provides an accurate, unbiased, and cost-effective way for genotyping both small and large variations and will be potentially used in population-scale genotyping for large, repetitive, and heterozygous plant genomes.


Assuntos
Algoritmos , Benchmarking , Humanos , Genótipo , Genômica/métodos , Técnicas de Genotipagem/métodos , Genoma de Planta , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos
3.
Planta ; 259(5): 117, 2024 Apr 09.
Artigo em Inglês | MEDLINE | ID: mdl-38592421

RESUMO

MAIN CONCLUSION: In this review, we give an overview of plant sequencing efforts and how this impacts plant functional genomics research. Plant genome sequence information greatly facilitates the studies of plant biology, functional genomics, evolution of genomes and genes, domestication processes, phylogenetic relationships, among many others. More than two decades of sequencing efforts have boosted the number of available sequenced plant genomes. The first plant genome, of Arabidopsis, was published in the year 2000 and currently, 4604 plant genomes from 1482 plant species have been published. Various large sequence initiatives are running, which are planning to produce tens of thousands of sequenced plant genomes in the near future. In this review, we give an overview on the status of sequenced plant genomes and on the use of genome information in different research areas.


Assuntos
Arabidopsis , Genoma de Planta , Filogenia , Genoma de Planta/genética , Genômica , Arabidopsis/genética , Domesticação
4.
Mol Biol Evol ; 41(2)2024 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-38262464

RESUMO

The 5S rRNA genes are among the most conserved nucleotide sequences across all species. Similar to the 5S preservation we observe the occurrence of 5S-related nonautonomous retrotransposons, so-called Cassandras. Cassandras harbor highly conserved 5S rDNA-related sequences within their long terminal repeats, advantageously providing them with the 5S internal promoter. However, the dynamics of Cassandra retrotransposon evolution in the context of 5S rRNA gene sequence information and structural arrangement are still unclear, especially: (1) do we observe repeated or gradual domestication of the highly conserved 5S promoter by Cassandras and (2) do changes in 5S organization such as in the linked 35S-5S rDNA arrangements impact Cassandra evolution? Here, we show evidence for gradual co-evolution of Cassandra sequences with their corresponding 5S rDNAs. To follow the impact of 5S rDNA variability on Cassandra TEs, we investigate the Asteraceae family where highly variable 5S rDNAs, including 5S promoter shifts and both linked and separated 35S-5S rDNA arrangements have been reported. Cassandras within the Asteraceae mirror 5S rDNA promoter mutations of their host genome, likely as an adaptation to the host's specific 5S transcription factors and hence compensating for evolutionary changes in the 5S rDNA sequence. Changes in the 5S rDNA sequence and in Cassandras seem uncorrelated with linked/separated rDNA arrangements. We place all these observations into the context of angiosperm 5S rDNA-Cassandra evolution, discuss Cassandra's origin hypotheses (single or multiple) and Cassandra's possible impact on rDNA and plant genome organization, giving new insights into the interplay of ribosomal genes and transposable elements.


Assuntos
RNA Ribossômico 5S , Retroelementos , RNA Ribossômico 5S/genética , Retroelementos/genética , Genes de RNAr , Sequência de Bases , DNA Ribossômico/genética , Genoma de Planta , Mutação , Evolução Molecular
5.
Front Plant Sci ; 14: 1237426, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37810401

RESUMO

LTR-retrotransposons (LTR-RTs) are a class of RNA-replicating transposon elements (TEs) that can alter genome structure and function by moving positions, repositioning genes, shifting exons, and causing chromosomal rearrangements. LTR-RTs are widespread in many plant genomes and constitute a significant portion of the genome. Their movement and activity in eukaryotic genomes can provide insight into genome evolution and gene function, especially when LTR-RTs are located near or within genes. Building the redundant and non-redundant LTR-RTs libraries and their annotations for species lacking this resource requires extensive bioinformatics pipelines and expensive computing power to analyze large amounts of genomic data. This increases the need for online services that provide computational resources with minimal overhead and maximum efficiency. Here, we present MegaLTR as a web server and standalone pipeline that detects intact LTR-RTs at the whole-genome level and integrates multiple tools for structure-based, homologybased, and de novo identification, classification, annotation, insertion time determination, and LTR-RT gene chimera analysis. MegaLTR also provides statistical analysis and visualization with multiple tools and can be used to accelerate plant species discovery and assist breeding programs in their efforts to improve genomic resources. We hope that the development of online services such as MegaLTR, which can analyze large amounts of genomic data, will become increasingly important for the automated detection and annotation of LTR-RT elements.

6.
Life (Basel) ; 13(8)2023 Jul 31.
Artigo em Inglês | MEDLINE | ID: mdl-37629524

RESUMO

Sequencing technologies have rapidly evolved over the past two decades, and new technologies are being continually developed and commercialized. The emerging sequencing technologies target generating more data with fewer inputs and at lower costs. This has also translated to an increase in the number and type of corresponding applications in genomics besides enhanced computational capacities (both hardware and software). Alongside the evolving DNA sequencing landscape, bioinformatics research teams have also evolved to accommodate the increasingly demanding techniques used to combine and interpret data, leading to many researchers moving from the lab to the computer. The rich history of DNA sequencing has paved the way for new insights and the development of new analysis methods. Understanding and learning from past technologies can help with the progress of future applications. This review focuses on the evolution of sequencing technologies, their significant enabling role in generating plant genome assemblies and downstream applications, and the parallel development of bioinformatics tools and skills, filling the gap in data analysis techniques.

7.
Appl Plant Sci ; 11(4): e11533, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37601314

RESUMO

Premise: Robust standards to evaluate quality and completeness are lacking in eukaryotic structural genome annotation, as genome annotation software is developed using model organisms and typically lacks benchmarking to comprehensively evaluate the quality and accuracy of the final predictions. The annotation of plant genomes is particularly challenging due to their large sizes, abundant transposable elements, and variable ploidies. This study investigates the impact of genome quality, complexity, sequence read input, and method on protein-coding gene predictions. Methods: The impact of repeat masking, long-read and short-read inputs, and de novo and genome-guided protein evidence was examined in the context of the popular BRAKER and MAKER workflows for five plant genomes. The annotations were benchmarked for structural traits and sequence similarity. Results: Benchmarks that reflect gene structures, reciprocal similarity search alignments, and mono-exonic/multi-exonic gene counts provide a more complete view of annotation accuracy. Transcripts derived from RNA-read alignments alone are not sufficient for genome annotation. Gene prediction workflows that combine evidence-based and ab initio approaches are recommended, and a combination of short and long reads can improve genome annotation. Adding protein evidence from de novo assemblies, genome-guided transcriptome assemblies, or full-length proteins from OrthoDB generates more putative false positives as implemented in the current workflows. Post-processing with functional and structural filters is highly recommended. Discussion: While the annotation of non-model plant genomes remains complex, this study provides recommendations for inputs and methodological approaches. We discuss a set of best practices to generate an optimal plant genome annotation and present a more robust set of metrics to evaluate the resulting predictions.

8.
Methods Mol Biol ; 2703: 59-70, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37646937

RESUMO

Transposable elements (TEs) are repeat elements that can relocate or create novel copies of themselves in the genome and contribute to genomic complexity and expansion, via events such as chromosome recombination or regulation of gene expression. However, given the large number of such repeats across the genome, identifying repeats of interest can be a challenge in even well-annotated genomes, especially in more complex, TE-rich plant genomes. Here, we describe a protocol for PlanTEnrichment, a database we created comprising information on 11 plant genomes to analyze stress-associated TEs using publicly available data. By selecting a genome and providing a list of genes or genomic regions whose TE associations the user wants to identify, the user can rapidly obtain TE subfamilies found near the provided regions, as well as their superfamily and class, and the enrichment values of the repeats. The results also provide the locations of individual repeat instances found, alongside the input regions or genes they are associated with, and a bar graph of the top ten most significant repeat subfamilies identified. PlanTEnrichment is freely available at http://tools.ibg.deu.edu.tr/plantenrichment/ and can be used by researchers with rudimentary or no proficiency in computational analysis of TE elements, allowing for expedience in the identification of TEs of interest and helping further our understanding of the potential contributions of TEs in plant genomes.


Assuntos
Elementos de DNA Transponíveis , Genoma de Planta , Humanos , Elementos de DNA Transponíveis/genética , Bases de Dados Factuais , Genômica , Pesquisadores , Telúrio
9.
Biomolecules ; 13(7)2023 07 03.
Artigo em Inglês | MEDLINE | ID: mdl-37509105

RESUMO

The Caulimoviridae is a family of double-stranded DNA viruses that infect plants. The genomes of most vascular plants contain endogenous caulimovirids (ECVs), a class of repetitive DNA elements that is abundant in some plant genomes, resulting from the integration of viral DNA in the chromosomes of germline cells during episodes of infection that have sometimes occurred millions of years ago. In this review, we reflect on 25 years of research on ECVs that has shown that members of the Caulimoviridae have occupied an unprecedented range of ecological niches over time and shed light on their diversity and macroevolution. We highlight gaps in knowledge and prospects of future research fueled by increased access to plant genome sequence data and new tools for genome annotation for addressing the extent, impact, and role of ECVs on plant biology and the origin and evolutionary trajectories of the Caulimoviridae.


Assuntos
Caulimoviridae , Traqueófitas , Fósseis , Caulimoviridae/genética , Plantas/genética , Genoma de Planta , Filogenia
10.
Methods Mol Biol ; 2672: 3-21, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37335467

RESUMO

Chromosomes have been studied since the late nineteenth century in the disciplines of cytology and cytogenetics. Analyzing their numbers, features, and dynamics has been tightly linked to the technical development of preparation methods, microscopes, and chemicals to stain them, with latest continuing developments described in this volume. At the end of the twentieth and beginning of the twenty-first centuries, DNA technology, genome sequencing, and bioinformatics have revolutionized how we see, use, and analyze chromosomes. The advent of in situ hybridization has shaped our understanding of genome organization and behavior by linking molecular sequence information with the physical location along chromosomes and genomes. Microscopy is the best technique to accurately determine chromosome number. Many features of chromosomes in interphase nuclei or pairing and disjunction at meiosis, involving physical movement of chromosomes, can only be studied by microscopy. In situ hybridization is the method of choice to characterize the abundance and chromosomal distribution of repetitive sequences that make up the majority of most plant genomes. These most variable components of a genome are found to be species- and occasionally chromosome-specific and give information about evolution and phylogeny. Multicolor fluorescence hybridization and large pools of BAC or synthetic probes can paint chromosomes and we can follow them through evolution involving hybridization, polyploidization, and rearrangements, important at a time when structural variations in the genome are being increasingly recognized. This volume discusses many of the most recent developments in the field of plant cytogenetics and gives carefully compiled protocols and useful resources.


Assuntos
Cromossomos , DNA , Hibridização in Situ Fluorescente/métodos , Citogenética/métodos , Genoma de Planta
11.
Front Plant Sci ; 14: 1134627, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36950350

RESUMO

LTR-retrotransposons (LTR-RTs) are a large group of transposable elements that replicate through an RNA intermediate and alter genome structure. The activities of LTR-RTs in plant genomes provide helpful information about genome evolution and gene function. LTR-RTs near or within genes can directly alter gene function. This work introduces PlantLTRdb, an intact LTR-RT database for 195 plant species. Using homology- and de novo structure-based methods, a total of 150.18 Gbp representing 3,079,469 pseudomolecules/scaffolds were analyzed to identify, characterize, annotate LTR-RTs, estimate insertion ages, detect LTR-RT-gene chimeras, and determine nearby genes. Accordingly, 520,194 intact LTR-RTs were discovered, including 29,462 autonomous and 490,732 nonautonomous LTR-RTs. The autonomous LTR-RTs included 10,286 Gypsy and 19,176 Copia, while the nonautonomous were divided into 224,906 Gypsy, 218,414 Copia, 1,768 BARE-2, 3,147 TR-GAG and 4,2497 unknown. Analysis of the identified LTR-RTs located within genes showed that a total of 36,236 LTR-RTs were LTR-RT-gene chimeras and 11,619 LTR-RTs were within pseudo-genes. In addition, 50,026 genes are within 1 kbp of LTR-RTs, and 250,587 had a distance of 1 to 10 kbp from LTR-RTs. PlantLTRdb allows researchers to search, visualize, BLAST and analyze plant LTR-RTs. PlantLTRdb can contribute to the understanding of structural variations, genome organization, functional genomics, and the development of LTR-RT target markers for molecular plant breeding. PlantLTRdb is available at https://bioinformatics.um6p.ma/PlantLTRdb.

12.
New Phytol ; 237(5): 1505-1507, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-36727306
13.
Methods Mol Biol ; 2632: 57-77, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36781721

RESUMO

Although nanopore sequencer is a great tool, many plant scientists have suffered from bad sequencing results, even though they have exactly followed the official protocol in preparing a library. This is because the protocol is not optimized for plant genomic DNA. The protocol may be good for sequencing animal or bacterial genomes, but not for plants. However, if the protocol is properly modified, one can obtain lots of long reads and achieve a telomere-to-telomere assembly. Here I present a protocol to that end.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Nanoporos , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Genômica/métodos , Plantas/genética , Genoma de Planta , DNA de Plantas/genética , Genoma Bacteriano
14.
Genome ; 66(3): 51-61, 2023 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-36623262

RESUMO

Transposable elements (TEs) are mobile elements found in the majority of eukaryotic genomes. TEs deeply impact the structure and evolution of chromosomes and can induce mutations affecting coding genes. In plants, the major group of TEs is long terminal repeat retrotransposons (LTR-RTs). They are classified into superfamilies (Gypsy, Copia) and subclassified into lineages. Horizontal transfer (HT), defined as the nonsexual transmission of genetic material between species, is a process allowing LTR-RTs to invade a new genome. Although this phenomenon was considered rare, recent studies demonstrate numerous transfers of LTR-RTs. This study aims to determine which LTR-RT lineages are shared with high similarity among 69 plant genomes. We identified and classified 88 450 LTR-RTs and determined 143 cases of high similarities between pairs of genomes. Most of them involved three Copia lineages (Oryco/Ivana, Retrofit/Ale, and Tork/Tar/Ikeros). A detailed analysis of three cases of high similarities involving Tork/Tar/Ikeros group shows an uneven distribution in the phylogeny of the elements and incongruence with between phylogenetic trees topologies, indicating they could be originated from HTs. Overall, our results suggest that LTR-RT Copia lineages share outstanding similarity between distant species and may likely be involved in HT mechanisms more frequent than initially estimated.


Assuntos
Nucleotídeos , Retroelementos , Filogenia , Genoma de Planta , Sequências Repetidas Terminais/genética , Evolução Molecular
15.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36502372

RESUMO

LTR-retrotransposons are the most abundant repeat sequences in plant genomes and play an important role in evolution and biodiversity. Their characterization is of great importance to understand their dynamics. However, the identification and classification of these elements remains a challenge today. Moreover, current software can be relatively slow (from hours to days), sometimes involve a lot of manual work and do not reach satisfactory levels in terms of precision and sensitivity. Here we present Inpactor2, an accurate and fast application that creates LTR-retrotransposon reference libraries in a very short time. Inpactor2 takes an assembled genome as input and follows a hybrid approach (deep learning and structure-based) to detect elements, filter partial sequences and finally classify intact sequences into superfamilies and, as very few tools do, into lineages. This tool takes advantage of multi-core and GPU architectures to decrease execution times. Using the rice genome, Inpactor2 showed a run time of 5 minutes (faster than other tools) and has the best accuracy and F1-Score of the tools tested here, also having the second best accuracy and specificity only surpassed by EDTA, but achieving 28% higher sensitivity. For large genomes, Inpactor2 is up to seven times faster than other available bioinformatics tools.


Assuntos
Aprendizado Profundo , Retroelementos , Retroelementos/genética , Sequências Repetidas Terminais/genética , Genoma de Planta , Software , Evolução Molecular , Filogenia
16.
Mob DNA ; 13(1): 31, 2022 Dec 03.
Artigo em Inglês | MEDLINE | ID: mdl-36463202

RESUMO

Plant, animal and protist genomes often contain endogenous viral elements (EVEs), which correspond to partial and sometimes entire viral genomes that have been captured in the genome of their host organism through a variety of integration mechanisms. While the number of sequenced eukaryotic genomes is rapidly increasing, the annotation and characterization of EVEs remains largely overlooked. EVEs that derive from members of the family Caulimoviridae are widespread across tracheophyte plants, and sometimes they occur in very high copy numbers. However, existing programs for annotating repetitive DNA elements in plant genomes are poor at identifying and then classifying these EVEs. Other than accurately annotating plant genomes, there is intrinsic value in a tool that could identify caulimovirid EVEs as they testify to recent or ancient host-virus interactions and provide valuable insights into virus evolution. In response to this research need, we have developed CAULIFINDER, an automated and sensitive annotation software package. CAULIFINDER consists of two complementary workflows, one to reconstruct, annotate and group caulimovirid EVEs in a given plant genome and the second to classify these genetic elements into officially recognized or tentative genera in the Caulimoviridae. We have benchmarked the CAULIFINDER package using the Vitis vinifera reference genome, which contains a rich assortment of caulimovirid EVEs that have previously been characterized using manual methods. The CAULIFINDER package is distributed in the form of a Docker image.

17.
J Adv Res ; 42: 315-329, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36513421

RESUMO

INTRODUCTION: Legume crops are an important source of protein and oil for human health and in fixing atmospheric N2 for soil enrichment. With an objective to accelerate much-needed genetic analyses and breeding applications, draft genome assemblies were generated in several legume crops; many of them are not high quality because they are mainly based on short reads. However, the superior quality of genome assembly is crucial for a detailed understanding of genomic architecture, genome evolution, and crop improvement. OBJECTIVES: Present study was undertaken with an objective of developing improved chromosome-length genome assemblies in six different legumes followed by their systematic investigation to unravel different aspects of genome organization and legume evolution. METHODS: We employed in situ Hi-C data to improve the existing draft genomes and performed different evolutionary and comparative analyses using improved genome assemblies. RESULTS: We have developed chromosome-length genome assemblies in chickpea, pigeonpea, soybean, subterranean clover, and two wild progenitor species of cultivated groundnut (A. duranensis and A. ipaensis). A comprehensive comparative analysis of these genome assemblies offered improved insights into various evolutionary events that shaped the present-day legume species. We highlighted the expansion of gene families contributing to unique traits such as nodulation in legumes, gravitropism in groundnut, and oil biosynthesis in oilseed legume crops such as groundnut and soybean. As examples, we have demonstrated the utility of improved genome assemblies for enhancing the resolution of "QTL-hotspot" identification for drought tolerance in chickpea and marker-trait associations for agronomic traits in pigeonpea through genome-wide association study. Genomic resources developed in this study are publicly available through an online repository, 'Legumepedia'. CONCLUSION: This study reports chromosome-length genome assemblies of six legume species and demonstrates the utility of these assemblies in crop improvement. The genomic resources developed here will have significant role in accelerating genetic improvement applications of legume crops.


Assuntos
Cicer , Fabaceae , Humanos , Fabaceae/genética , Mapeamento Cromossômico , Genoma de Planta , Estudo de Associação Genômica Ampla , Melhoramento Vegetal , Cicer/genética , Produtos Agrícolas/genética , Glycine max/genética , Cromossomos
18.
Int J Mol Sci ; 23(20)2022 Oct 11.
Artigo em Inglês | MEDLINE | ID: mdl-36292971

RESUMO

GDSL-type esterase/lipase (GELP) enzymes have key functions in plants, such as developmental processes, anther and pollen development, and responses to biotic and abiotic stresses. Genes that encode GELP belong to a complex and large gene family, ranging from tens to more than hundreds of members per plant species. To facilitate functional transfer between them, we conducted a genome-wide classification of GELP in 46 plant species. First, we applied an iterative phylogenetic method using a selected set of representative angiosperm genomes (three monocots and five dicots) and identified 10 main clusters, subdivided into 44 orthogroups (OGs). An expert curation for gene structures, orthogroup composition, and functional annotation was made based on a literature review. Then, using the HMM profiles as seeds, we expanded the classification to 46 plant species. Our results revealed the variable evolutionary dynamics between OGs in which some expanded, mostly through tandem duplications, while others were maintained as single copies. Among these, dicot-specific clusters and specific amplifications in monocots and wheat were characterized. This approach, by combining manual curation and automatic identification, was effective in characterizing a large gene family, allowing the establishment of a classification framework for gene function transfer and a better understanding of the evolutionary history of GELP.


Assuntos
Esterases , Magnoliopsida , Esterases/genética , Filogenia , Lipase/metabolismo , Magnoliopsida/genética , Magnoliopsida/metabolismo , Genoma , Plantas/metabolismo , Regulação da Expressão Gênica de Plantas , Genoma de Planta , Proteínas de Plantas/genética
19.
Plant J ; 112(4): 1112-1119, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-36196656

RESUMO

PlantRNA (http://plantrna.ibmp.cnrs.fr/) is a comprehensive database of transfer RNA (tRNA) gene sequences retrieved from fully annotated nuclear, plastidial and mitochondrial genomes of photosynthetic organisms. In the first release (PlantRNA 1.0), tRNA genes from 11 organisms were annotated. In this second version, the annotation was implemented to 51 photosynthetic species covering the whole phylogenetic tree of photosynthetic organisms, from the most basal group of Archeplastida, the glaucophyte Cyanophora paradoxa, to various land plants. tRNA genes from lower photosynthetic organisms such as streptophyte algae or lycophytes as well as extremophile photosynthetic species such as Eutrema parvulum were incorporated in the database. As a whole, about 37 000 tRNA genes were accurately annotated. In the frame of the tRNA genes annotation from the genome of the Rhodophyte Chondrus crispus, non-canonical splicing sites in the D- or T-regions of tRNA molecules were identified and experimentally validated. As for PlantRNA 1.0, comprehensive biological information including 5'- and 3'-flanking sequences, A and B box sequences, region of transcription initiation and poly(T) transcription termination stretches, tRNA intron sequences and tRNA mitochondrial import are included.


Assuntos
Eucariotos , Genoma Mitocondrial , Eucariotos/genética , Filogenia , RNA de Transferência/genética , Fotossíntese/genética
20.
Front Plant Sci ; 13: 845835, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35237293

RESUMO

DNA N6-Methyladenine (6mA) is a common epigenetic modification, which plays some significant roles in the growth and development of plants. It is crucial to identify 6mA sites for elucidating the functions of 6mA. In this article, a novel model named i6mA-vote is developed to predict 6mA sites of plants. Firstly, DNA sequences were coded into six feature vectors with diverse strategies based on density, physicochemical properties, and position of nucleotides, respectively. To find the best coding strategy, the feature vectors were compared on several machine learning classifiers. The results suggested that the position of nucleotides has a significant positive effect on 6mA sites identification. Thus, the dinucleotide one-hot strategy which can describe position characteristics of nucleotides well was employed to extract DNA features in our method. Secondly, DNA sequences of Rosaceae were divided into a training dataset and a test dataset randomly. Finally, i6mA-vote was constructed by combining five different base-classifiers under a majority voting strategy and trained on the Rosaceae training dataset. The i6mA-vote was evaluated on the task of predicting 6mA sites from the genome of the Rosaceae, Rice, and Arabidopsis separately. In Rosaceae, the performances of i6mA-vote were 0.955 on accuracy (ACC), 0.909 on Matthew correlation coefficients (MCC), 0.955 on sensitivity (SN), and 0.954 on specificity (SP). Those indicators, in the order of ACC, MCC, SN, SP, were 0.882, 0.774, 0.961, and 0.803 on Rice while they were 0.798, 0.617, 0.666, and 0.929 on Arabidopsis. According to the indicators, our method was effectiveness and better than other concerned methods. The results also illustrated that i6mA-vote does not only well in 6mA sites prediction of intraspecies but also interspecies plants. Moreover, it can be seen that the specificity is distinctly lower than the sensitivity in Rice while it is just the opposite in Arabidopsis. It may be resulted from sequence similarity among Rosaceae, Rice and Arabidopsis.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA