Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
BMC Genomics ; 23(1): 451, 2022 Jun 20.
Artigo em Inglês | MEDLINE | ID: mdl-35725380

RESUMO

BACKGROUND: Insertion sequences (ISs) are mobile repeat sequences and most of them can copy themselves to new host genome locations, leading to genome plasticity and gene regulation in prokaryotes. In this study, we present functional and evolutionary relationships between IS and neighboring genes in a large-scale comparative genomic analysis. RESULTS: IS families were located in all prokaryotic phyla, with preferential occurrence of IS3, IS4, IS481, and IS5 families in Alpha-, Beta-, and Gammaproteobacteria, Actinobacteria and Firmicutes as well as in eukaryote host-associated organisms and autotrophic opportunistic pathogens. We defined the concept of the IS-Gene couple (IG), which allowed to highlight the functional and regulatory impacts of an IS on the closest gene. Genes involved in transcriptional regulation and transport activities were found overrepresented in IG. In particular, major facilitator superfamily (MFS) transporters, ATP-binding proteins and transposases raised as favorite neighboring gene functions of IS hotspots. Then, evolutionary conserved IS-Gene sets across taxonomic lineages enabled the classification of IS-gene couples into phylum, class-to-genus, and species syntenic IS-Gene couples. The IS5, IS21, IS4, IS607, IS91, ISL3 and IS200 families displayed two to four times more ISs in the phylum and/or class-to-genus syntenic IGs compared to other IS families. This indicates that those families were probably inserted earlier than others and then subjected to horizontal transfer, transposition and deletion events over time. In phylum syntenic IG category, Betaproteobacteria, Crenarchaeota, Calditrichae, Planctomycetes, Acidithiobacillia and Cyanobacteria phyla act as IS reservoirs for other phyla, and neighboring gene functions are mostly related to transcriptional regulators. Comparison of IS occurrences with predicted regulatory motifs led to ~ 26.5% of motif-containing ISs with 2 motifs per IS in average. These results, concomitantly with short IS-Gene distances, suggest that those ISs would interfere with the expression of neighboring genes and thus form strong candidates for an adaptive pairing. CONCLUSIONS: All together, our large-scale study provide new insights into the IS genetic context and strongly suggest their regulatory roles.


Assuntos
Archaea , Bactérias , Elementos de DNA Transponíveis , Archaea/genética , Bactérias/genética , Elementos de DNA Transponíveis/genética , Eucariotos/genética , Genômica , Filogenia , Transposases/genética
2.
Methods Mol Biol ; 2443: 327-385, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35037215

RESUMO

Plant genomes contain a particularly high proportion of repeated structures of various types. This chapter proposes a guided tour of the available software that can help biologists to scan automatically for these repeats in sequence data or check hypothetical models intended to characterize their structures. Since transposable elements (TEs) are a major source of repeats in plants, many methods have been used or developed for this broad class of sequences. They are representative of the range of tools available for other classes of repeats and we have provided two sections on this topic (for the analysis of genomes or directly of sequenced reads), as well as a selection of the main existing software. It may be hard to keep up with the profusion of proposals in this dynamic field and the rest of the chapter is devoted to the foundations of an efficient search for repeats and more complex patterns. We first introduce the key concepts of the art of indexing and mapping or querying sequences. We end the chapter with the more prospective issue of building models of repeat families. We present the Machine Learning approach first, seeking to build predictors automatically for some families of ET, from a set of sequences known to belong to this family. A second approach, the linguistic (or syntactic) approach, allows biologists to describe themselves and check the validity of models of their favorite repeat family.


Assuntos
Genoma de Planta , Software , Elementos de DNA Transponíveis/genética , Plantas/genética , Estudos Prospectivos
3.
Appl Environ Microbiol ; 85(18)2019 09 15.
Artigo em Inglês | MEDLINE | ID: mdl-31300400

RESUMO

The genus Shewanella is well known for its genetic diversity, its outstanding respiratory capacity, and its high potential for bioremediation. Here, a novel strain isolated from sediments of the Indian Ocean was characterized. A 16S rRNA analysis indicated that it belongs to the species Shewanella decolorationis It was named Shewanella decolorationis LDS1. This strain presented an unusual ability to grow efficiently at temperatures from 24°C to 40°C without apparent modifications of its metabolism, as shown by testing respiratory activities or carbon assimilation, and in a wide range of salt concentrations. Moreover, S. decolorationis LDS1 tolerates high chromate concentrations. Indeed, it was able to grow in the presence of 4 mM chromate at 28°C and 3 mM chromate at 40°C. Interestingly, whatever the temperature, when the culture reached the stationary phase, the strain reduced the chromate present in the growth medium. In addition, S. decolorationis LDS1 degrades different toxic dyes, including anthraquinone, triarylmethane, and azo dyes. Thus, compared to Shewanella oneidensis, this strain presented better capacity to cope with various abiotic stresses, particularly at high temperatures. The analysis of genome sequence preliminary data indicated that, in contrast to S. oneidensis and S. decolorationis S12, S. decolorationis LDS1 possesses the phosphorothioate modification machinery that has been described as participating in survival against various abiotic stresses by protecting DNA. We demonstrate that its heterologous production in S. oneidensis allows it to resist higher concentrations of chromate.IMPORTANCEShewanella species have long been described as interesting microorganisms in regard to their ability to reduce many organic and inorganic compounds, including metals. However, members of the Shewanella genus are often depicted as cold-water microorganisms, although their optimal growth temperature usually ranges from 25 to 28°C under laboratory growth conditions. Shewanella decolorationis LDS1 is highly attractive, since its metabolism allows it to develop efficiently at temperatures from 24 to 40°C, conserving its ability to respire alternative substrates and to reduce toxic compounds such as chromate or toxic dyes. Our results clearly indicate that this novel strain has the potential to be a powerful tool for bioremediation and unveil one of the mechanisms involved in its chromate resistance.


Assuntos
Cromatos/metabolismo , Farmacorresistência Bacteriana , Shewanella/metabolismo , Biotecnologia , Sedimentos Geológicos/microbiologia , Oceano Índico , Filogenia , RNA Bacteriano/análise , RNA Ribossômico 16S/análise , Shewanella/classificação , Shewanella/genética , Shewanella/crescimento & desenvolvimento
4.
Environ Microbiol ; 19(3): 1103-1119, 2017 03.
Artigo em Inglês | MEDLINE | ID: mdl-27902881

RESUMO

Magnetotactic bacteria (MTB) are a group of phylogenetically and physiologically diverse Gram-negative bacteria that synthesize intracellular magnetic crystals named magnetosomes. MTB are affiliated with three classes of Proteobacteria phylum, Nitrospirae phylum, Omnitrophica phylum and probably with the candidate phylum Latescibacteria. The evolutionary origin and physiological diversity of MTB compared with other bacterial taxonomic groups remain to be illustrated. Here, we analysed the genome of the marine magneto-ovoid strain MO-1 and found that it is closely related to Magnetococcus marinus MC-1. Detailed analyses of the ribosomal proteins and whole proteomes of 390 genomes reveal that, among the Proteobacteria analysed, only MO-1 and MC-1 have coding sequences (CDSs) with a similarly high proportion of origins from Alphaproteobacteria, Betaproteobacteria, Deltaproteobacteria and Gammaproteobacteria. Interestingly, a comparative metabolic network analysis with anoxic network enzymes from sequenced MTB and non-MTB successfully allows the eventual prediction of an organism with a metabolic profile compatible for magnetosome production. Altogether, our genomic analysis reveals multiple origins of MO-1 and M. marinus MC-1 genomes and suggests a metabolism-restriction model for explaining whether a bacterium could become an MTB upon acquisition of magnetosome encoding genes.


Assuntos
Genoma Bacteriano , Magnetossomos , Proteobactérias/classificação , Proteobactérias/genética , Sequência de Bases , Deltaproteobacteria/genética , Evolução Molecular , Magnetossomos/genética , Filogenia , Proteobactérias/ultraestrutura
5.
Nucleic Acids Res ; 44(W1): W181-4, 2016 07 08.
Artigo em Inglês | MEDLINE | ID: mdl-27242364

RESUMO

Computational methods are required for prediction of non-coding RNAs (ncRNAs), which are involved in many biological processes, especially at post-transcriptional level. Among these ncRNAs, miRNAs have been largely studied and biologists need efficient and fast tools for their identification. In particular, ab initio methods are usually required when predicting novel miRNAs. Here we present a web server dedicated for miRNA precursors identification at a large scale in genomes. It is based on an algorithm called miRNAFold that allows predicting miRNA hairpin structures quickly with high sensitivity. miRNAFold is implemented as a web server with an intuitive and user-friendly interface, as well as a standalone version. The web server is freely available at: http://EvryRNA.ibisc.univ-evry.fr/miRNAFold.


Assuntos
Algoritmos , Genoma , MicroRNAs/genética , Precursores de RNA/genética , Software , Animais , Gráficos por Computador , Humanos , Armazenamento e Recuperação da Informação , Internet , MicroRNAs/classificação , Plantas/genética , Dobramento de RNA , Precursores de RNA/classificação , Análise de Sequência de RNA
6.
Front Genet ; 7: 223, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-28083017

RESUMO

Microbial Molecular hydrogen (H2) cycling plays an important role in several ecological niches. Hydrogenases (H2ases), enzymes involved in H2 metabolism, are of great interest for investigating microbial communities, and producing BioH2. To obtain an overall picture of the genetic ability of Cyanobacteria to produce H2ases, we conducted a phylum wide analysis of the distribution of the genes encoding these enzymes in 130 cyanobacterial genomes. The concomitant presence of the H2ase and genes involved in the maturation process, and that of well-conserved catalytic sites in the enzymes were the three minimal criteria used to classify a strain as being able to produce a functional H2ase. The [NiFe] H2ases were found to be the only enzymes present in this phylum. Fifty-five strains were found to be potentially able produce the bidirectional Hox enzyme and 33 to produce the uptake (Hup) enzyme. H2 metabolism in Cyanobacteria has a broad ecological distribution, since only the genomes of strains collected from the open ocean do not possess hox genes. In addition, the presence of H2ase was found to increase in the late branching clades of the phylogenetic tree of the species. Surprisingly, five cyanobacterial genomes were found to possess homologs of oxygen tolerant H2ases belonging to groups 1, 3b, and 3d. Overall, these data show that H2ases are widely distributed, and are therefore probably of great functional importance in Cyanobacteria. The present finding that homologs to oxygen-tolerant H2ases are present in this phylum opens new perspectives for applying the process of photosynthesis in the field of H2 production.

7.
Methods Mol Biol ; 1374: 293-337, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26519414

RESUMO

Plant genomes contain a particularly high proportion of repeated structures of various types. This chapter proposes a guided tour of available software that can help biologists to look for these repeats and check some hypothetical models intended to characterize their structures. Since transposable elements are a major source of repeats in plants, many methods have been used or developed for this large class of sequences. They are representative of the range of tools available for other classes of repeats and we have provided a whole section on this topic as well as a selection of the main existing software. In order to better understand how they work and how repeats may be efficiently found in genomes, it is necessary to look at the technical issues involved in the large-scale search of these structures. Indeed, it may be hard to keep up with the profusion of proposals in this dynamic field and the rest of the chapter is devoted to the foundations of the search for repeats and more complex patterns. The second section introduces the key concepts that are useful for understanding the current state of the art in playing with words, applied to genomic sequences. This can be seen as the first stage of a very general approach called linguistic analysis that is interested in the analysis of natural or artificial texts. Words, the lexical level, correspond to simple repeated entities in texts or strings. In fact, biologists need to represent more complex entities where a repeat family is built on more abstract structures, including direct or inverted small repeats, motifs, composition constraints as well as ordering and distance constraints between these elementary blocks. In terms of linguistics, this corresponds to the syntactic level of a language. The last section introduces concepts and practical tools that can be used to reach this syntactic level in biological sequence analysis.


Assuntos
Biologia Computacional/métodos , Genoma de Planta , Genômica/métodos , Plantas/genética , Sequências Repetitivas de Ácido Nucleico , Software , Elementos de DNA Transponíveis
8.
BMC Genomics ; 16: 139, 2015 Feb 27.
Artigo em Inglês | MEDLINE | ID: mdl-25881276

RESUMO

BACKGROUND: Transposable elements are mobile DNA repeat sequences, known to have high impact on genes, genome structure and evolution. This has stimulated broad interest in the detailed biological studies of transposable elements. Hence, we have developed an easy-to-use tool for the comparative analysis of the structural organization and functional relationships of transposable elements, to help understand their functional role in genomes. RESULTS: We named our new software VisualTE and describe it here. VisualTE is a JAVA stand-alone graphical interface that allows users to visualize and analyze all occurrences of transposable element families in annotated genomes. VisualTE reads and extracts transposable elements and genomic information from annotation and repeat data. Result analyses are displayed in several graphical panels that include location and distribution on the chromosome, the occurrence of transposable elements in the genome, their size distribution, and neighboring genes' features and ontologies. With these hallmarks, VisualTE provides a convenient tool for studying transposable element copies and their functional relationships with genes, at the whole-genome scale, and in diverse organisms. CONCLUSIONS: VisualTE graphical interface makes possible comparative analyses of transposable elements in any annotated sequence as well as structural organization and functional relationships between transposable elements and other genetic object. This tool is freely available at: http://lcb.cnrs-mrs.fr/spip.php?article867 .


Assuntos
Elementos de DNA Transponíveis/genética , Genômica , Software , Arabidopsis/genética , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Mapeamento Cromossômico , Bases de Dados Genéticas , Internet
9.
RNA ; 21(5): 775-85, 2015 May.
Artigo em Inglês | MEDLINE | ID: mdl-25795417

RESUMO

Identification of microRNAs (miRNAs) is an important step toward understanding post-transcriptional gene regulation and miRNA-related pathology. Difficulties in identifying miRNAs through experimental techniques combined with the huge amount of data from new sequencing technologies have made in silico discrimination of bona fide miRNA precursors from non-miRNA hairpin-like structures an important topic in bioinformatics. Among various techniques developed for this classification problem, machine learning approaches have proved to be the most promising. However these approaches require the use of training data, which is problematic due to an imbalance in the number of miRNAs (positive data) and non-miRNAs (negative data), which leads to a degradation of their performance. In order to address this issue, we present an ensemble method that uses a boosting technique with support vector machine components to deal with imbalanced training data. Classification is performed following a feature selection on 187 novel and existing features. The algorithm, miRBoost, performed better in comparison with state-of-the-art methods on imbalanced human and cross-species data. It also showed the highest ability among the tested methods for discovering novel miRNA precursors. In addition, miRBoost was over 1400 times faster than the second most accurate tool tested and was significantly faster than most of the other tools. miRBoost thus provides a good compromise between prediction efficiency and execution time, making it highly suitable for use in genome-wide miRNA precursor prediction. The software miRBoost is available on our web server http://EvryRNA.ibisc.univ-evry.fr.


Assuntos
Biologia Computacional/métodos , MicroRNAs/classificação , Precursores de RNA/classificação , Software , Máquina de Vetores de Suporte , Animais , Bases de Dados Genéticas , Humanos , Armazenamento e Recuperação da Informação/métodos , MicroRNAs/genética , Precursores de RNA/genética , Sensibilidade e Especificidade , Alinhamento de Sequência/métodos
10.
Mob DNA ; 5: 9, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24678954

RESUMO

BACKGROUND: DNA repeats, such as transposable elements, minisatellites and palindromic sequences, are abundant in sequences and have been shown to have significant and functional roles in the evolution of the host genomes. In a previous study, we introduced the concept of a repeat DNA module, a flexible motif present in at least two occurences in the sequences. This concept was embedded into ModuleOrganizer, a tool allowing the detection of repeat modules in a set of sequences. However, its implementation remains difficult for larger sequences. RESULTS: Here we present Visual ModuleOrganizer, a Java graphical interface that enables a new and optimized version of the ModuleOrganizer tool. To implement this version, it was recoded in C++ with compressed suffix tree data structures. This leads to less memory usage (at least 120-fold decrease in average) and decreases by at least four the computation time during the module detection process in large sequences. Visual ModuleOrganizer interface allows users to easily choose ModuleOrganizer parameters and to graphically display the results. Moreover, Visual ModuleOrganizer dynamically handles graphical results through four main parameters: gene annotations, overlapping modules with known annotations, location of the module in a minimal number of sequences, and the minimal length of the modules. As a case study, the analysis of FoldBack4 sequences clearly demonstrated that our tools can be extended to comparative and evolutionary analyses of any repeat sequence elements in a set of genomic sequences. With the increasing number of sequences available in public databases, it is now possible to perform comparative analyses of repeated DNA modules in a graphic and friendly manner within a reasonable time period. AVAILABILITY: Visual ModuleOrganizer interface and the new version of the ModuleOrganizer tool are freely available at: http://lcb.cnrs-mrs.fr/spip.php?rubrique313.

11.
Environ Sci Pollut Res Int ; 21(8): 5619-27, 2014 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-24420563

RESUMO

Sulfonylurea herbicides are widely used on a wide range of crops to control weeds. Chevalier® OnePass herbicide is a sulfonylurea herbicide intensively used on cereal crops in Algeria. No information is yet available about the biodegradation of this herbicide or about its effect on the bacterial community of the soil. In this study, we collected an untreated soil sample, and another sample was collected 1 month after treatment with the herbicide. Using a high-resolution melting DNA technique, we have shown that treatment with Chevalier® OnePass herbicide only slightly changed the composition of the whole bacterial community. Two hundred fifty-nine macroscopically different clones were isolated from the untreated and treated soil under both aerobic and microaerobic conditions. The strains were identified by sequencing a conserved fragment of the 16S rRNA gene. The phylogenetic trees constructed using the sequencing results confirmed that the bacterial populations were similar in the two soil samples. Species belonging to the Lysinibacillus, Bacillus, Pseudomonas, and Paenibacillus genera were the most abundant species found. Surprisingly, we found that among ten strains isolated from the treated soil, only six were resistant to the herbicide. Furthermore, bacterial overlay experiments showed that only one resistant strain (related to Stenotrophomonas maltophilia) allowed all the sensitive strains tested to grow in the presence of the herbicide. The other resistant strains allowed only certain sensitive strains to grow. On the basis of these results, we propose that there must be several biodegradation pathways for this sulfonylurea herbicide.


Assuntos
Herbicidas/toxicidade , Microbiologia do Solo , Compostos de Sulfonilureia/toxicidade , Argélia , Biodegradação Ambiental , DNA Bacteriano , Pseudomonas/genética , Pseudomonas/metabolismo , RNA Ribossômico 16S , Medição de Risco , Solo/química
12.
BMC Bioinformatics ; 13: 246, 2012 Sep 25.
Artigo em Inglês | MEDLINE | ID: mdl-23009561

RESUMO

BACKGROUND: Inverted repeat genes encode precursor RNAs characterized by hairpin structures. These RNA hairpins are then metabolized by biosynthetic pathways to produce functional small RNAs. In eukaryotic genomes, short non-autonomous transposable elements can have similar size and hairpin structures as non-coding precursor RNAs. This resemblance leads to problems annotating small RNAs. RESULTS: We mapped all microRNA precursors from miRBASE to several genomes and studied the repetition and dispersion of the corresponding loci. We then searched for repetitive elements overlapping these loci. We developed an automatic method called ncRNAclassifier to classify pre-ncRNAs according to their relationship with transposable elements (TEs). We showed that there is a correlation between the number of scattered occurrences of ncRNA precursor candidates and the presence of TEs. We applied ncRNAclassifier on six chordate genomes and report our findings. Among the 1,426 human and 721 mouse pre-miRNAs of miRBase, we identified 235 and 68 mis-annotated pre-miRNAs respectively corresponding completely to TEs. CONCLUSIONS: We provide a tool enabling the identification of repetitive elements in precursor ncRNA sequences. ncRNAclassifier is available at http://EvryRNA.ibisc.univ-evry.fr.


Assuntos
Sequências Repetitivas Dispersas , Sequências Repetidas Invertidas , MicroRNAs/genética , Precursores de RNA/química , Software , Animais , Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Camundongos , MicroRNAs/química , MicroRNAs/classificação , Precursores de RNA/classificação , Precursores de RNA/genética , Pequeno RNA não Traduzido/química , Pequeno RNA não Traduzido/classificação , Pequeno RNA não Traduzido/genética , Ratos
13.
Nucleic Acids Res ; 40(11): e80, 2012 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-22362754

RESUMO

miRNAs are small non coding RNA structures which play important roles in biological processes. Finding miRNA precursors in genomes is therefore an important task, where computational methods are required. The goal of these methods is to select potential pre-miRNAs which could be validated by experimental methods. With the new generation of sequencing techniques, it is important to have fast algorithms that are able to treat whole genomes in acceptable times. We developed an algorithm based on an original method where an approximation of miRNA hairpins are first searched, before reconstituting the pre-miRNA structure. The approximation step allows a substantial decrease in the number of possibilities and thus the time required for searching. Our method was tested on different genomic sequences, and was compared with CID-miRNA, miRPara and VMir. It gives in almost all cases better sensitivity and selectivity. It is faster than CID-miRNA, miRPara and VMir: it takes ≈ 30 s to process a 1 MB sequence, when VMir takes 30 min, miRPara takes 20 h and CID-miRNA takes 55 h. We present here a fast ab-initio algorithm for searching for pre-miRNA precursors in genomes, called miRNAFold. miRNAFold is available at http://EvryRNA.ibisc.univ-evry.fr/.


Assuntos
Algoritmos , Genômica/métodos , MicroRNAs/química , Precursores de RNA/química , Animais , Interpretação Estatística de Dados , Humanos , Camundongos , Conformação de Ácido Nucleico , Software
14.
Methods Mol Biol ; 859: 29-51, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22367864

RESUMO

RepeatMasker is a program that screens DNA sequences for interspersed repeats and low-complexity DNA sequences. In this chapter, we present the procedure to routinely use this program on a personal computer.


Assuntos
Elementos de DNA Transponíveis/genética , Software , Algoritmos , Animais , Sequência de Bases , Humanos , Anotação de Sequência Molecular/métodos , Ferramenta de Busca , Análise de Sequência de DNA/métodos
15.
BMC Bioinformatics ; 11: 474, 2010 Sep 22.
Artigo em Inglês | MEDLINE | ID: mdl-20860790

RESUMO

BACKGROUND: Most known eukaryotic genomes contain mobile copied elements called transposable elements. In some species, these elements account for the majority of the genome sequence. They have been subject to many mutations and other genomic events (copies, deletions, captures) during transposition. The identification of these transformations remains a difficult issue. The study of families of transposable elements is generally founded on a multiple alignment of their sequences, a critical step that is adapted to transposons containing mostly localized nucleotide mutations. Many transposons that have lost their protein-coding capacity have undergone more complex rearrangements, needing the development of more complex methods in order to characterize the architecture of sequence variations. RESULTS: In this study, we introduce the concept of a transposable element module, a flexible motif present in at least two sequences of a family of transposable elements and built on a succession of maximal repeats. The paper proposes an assembly method working on a set of exact maximal repeats of a set of sequences to create such modules. It results in a graphical view of sequences segmented into modules, a representation that allows a flexible analysis of the transformations that have occurred between them. We have chosen as a demonstration data set in depth analysis of the transposable element Foldback in Drosophila melanogaster. Comparison with multiple alignment methods shows that our method is more sensitive for highly variable sequences. The study of this family and the two other families AtREP21 and SIDER2 reveals new copies of very different sizes and various combinations of modules which show the potential of our method. CONCLUSIONS: ModuleOrganizer is available on the Genouest bioinformatics center at http://moduleorganizer.genouest.org.


Assuntos
Elementos de DNA Transponíveis/genética , Genoma , Animais , Sequência de Bases , Drosophila melanogaster/genética , Variação Genética , Alinhamento de Sequência , Software
16.
Gene ; 448(2): 207-13, 2009 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-19651192

RESUMO

Rapidly growing number of sequenced genomes requires fast and accurate computational tools for analysis of different transposable elements (TEs). In this paper we focus on a rapid and reliable procedure for classification of autonomous non-LTR retrotransposons based on alignment and clustering of their reverse transcriptase (RT) domains. Typically, the RT domain protein sequences encoded by different non-LTR retrotransposons are similar to each other in terms of significant BLASTP E-values. Therefore, they can be easily detected by the routine BLASTP searches of genomic DNA sequences coding for proteins similar to the RT domains of known non-LTR retrotransposons. However, detailed classification of non-LTR retrotransposons, i.e. their assignment to specific clades, is a slow and complex procedure that is not formalized or integrated as a standard set of computational methods and data. Here we describe a tool (RTclass1) designed for the fast and accurate automated assignment of novel non-LTR retrotransposons to known or novel clades using phylogenetic analysis of the RT domain protein sequences. RTclass1 classifies a particular non-LTR retrotransposon based on its RT domain in less than 10 min on a standard desktop computer and achieves 99.5% accuracy. RT1class1 works either as a stand-alone program installed locally or as a web-server that can be accessed distantly by uploading sequence data through the internet (http://www.girinst.org/RTphylogeny/RTclass1).


Assuntos
Classificação/métodos , Filogenia , DNA Polimerase Dirigida por RNA/genética , Retroelementos , Algoritmos , Sequência de Aminoácidos , Modelos Genéticos , Estrutura Terciária de Proteína/genética , DNA Polimerase Dirigida por RNA/química , Reprodutibilidade dos Testes , Retroelementos/genética , Análise de Sequência de DNA/métodos , Sequências Repetidas Terminais/genética
17.
BMC Bioinformatics ; 9: 345, 2008 Aug 18.
Artigo em Inglês | MEDLINE | ID: mdl-18710569

RESUMO

BACKGROUND: Repbase is a reference database of eukaryotic repetitive DNA, which includes prototypic sequences of repeats and basic information described in annotations. Repbase already has software for entering new sequence families and for comparing the user's sequence with the database of consensus sequences. RESULTS: We describe the software named VisualRepbase and the associated database, which allow for displaying and analyzing all occurrences of transposable element families present in an annotated genome. VisualRepbase is a Java-based interface which can download selected occurrences of transposable elements, show the distribution of given families on the chromosome, and present the localization of these occurrences with regard to gene annotations and other families of transposable elements in Repbase. In addition, it has several features for saving the graphical representation of occurrences, saving all sequences in FASTA format, and searching and saving all annotated genes that are surrounded by these occurrences. CONCLUSION: VisualRepbase is available as a downloadable version. It can be found at http://girinst.org/repbase/update/visual repbase.html.


Assuntos
Elementos de DNA Transponíveis , Bases de Dados Genéticas , Software , Mapeamento Cromossômico , Gráficos por Computador , Sequências Repetitivas de Ácido Nucleico , Interface Usuário-Computador
18.
Gene ; 403(1-2): 18-28, 2007 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-17889452

RESUMO

Helitrons are a class of prolific transposable elements in the Arabidopsis thaliana genome. Although 37 families were identified after the recent discovery of Helitrons, no systematic classification is available because of the high variability of helitronic sequences. Since transposition proteins are assumed to interact with Helitron termini, a Helitron model was formalized based on terminus characterization in order to carry out an exhaustive analysis of all possible combinations of the pairs of termini present. This combinatorics approach resulted in the discovery of a number of new Helitron elements corresponding to termini associations from distinct previously-described Helitron families. The occurrence matrix of termini combinations yielded a structure that revealed clusters of Helitron families.


Assuntos
Arabidopsis/classificação , Arabidopsis/genética , Elementos de DNA Transponíveis/genética , Modelos Genéticos , Sequência de Bases , Cromossomos de Plantas , Análise por Conglomerados , Biologia Computacional/métodos , DNA de Plantas/química , DNA de Plantas/genética , Genoma de Planta , Dados de Sequência Molecular , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Análise de Sequência de DNA , Terminologia como Assunto
19.
Bioinformatics ; 22(16): 1948-54, 2006 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-16809391

RESUMO

MOTIVATION: The analysis of repeated elements in genomes is a fascinating domain of research that is lacking relevant tools for transposable elements (TEs), the most complex ones. The dynamics of TEs, which provides the main mechanism of mutation in some genomes, is an essential component of genome evolution. In this study we introduce a new concept of domain, a segmentation unit useful for describing the architecture of different copies of TEs. Our method extracts occurrences of a terminus-defined family of TEs, aligns the sequences, finds the domains in the alignment and searches the distribution of each domain in sequences. After a classification step relative to the presence or the absence of domains, the method results in a graphical view of sequences segmented into domains. RESULTS: Analysis of the new non-autonomous TE AtREP21 in the model plant Arabidopsis thaliana reveals copies of very different sizes and various combinations of domains which show the potential of our method. AVAILABILITY: DomainOrganizer web page is available at www.irisa.fr/symbiose/DomainOrganizer/.


Assuntos
Biologia Computacional/métodos , Elementos de DNA Transponíveis/genética , Análise de Sequência de DNA/métodos , Algoritmos , Sequência de Aminoácidos , Arabidopsis/genética , Genes de Plantas , Cadeias de Markov , Modelos Biológicos , Modelos Estatísticos , Dados de Sequência Molecular , Proteínas de Plantas/química , Estrutura Terciária de Proteína
20.
Bioinformatics ; 21(24): 4408-10, 2005 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-16223791

RESUMO

SUMMARY: We have developed STAN (suffix-tree analyser), a tool to search for nucleotidic and peptidic patterns within whole chromosomes. Pattern syntax uses a string variable grammar-like formalism which allows the description of complex patterns including ambiguities, insertions/deletions, gaps, repeats and palindromes. STAN is based on a reduction to multipart matching on a suffix-tree data structure and can handle large DNA sequences, whether assembled or not.


Assuntos
Cromossomos/genética , Genômica/estatística & dados numéricos , Reconhecimento Automatizado de Padrão , Sequência de Aminoácidos , Arabidopsis/genética , Sequência de Bases , Cromossomos de Plantas/genética , Biologia Computacional , Elementos de DNA Transponíveis/genética , DNA de Plantas/genética , Bases de Dados de Ácidos Nucleicos , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...