Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 34
Filtrar
1.
Mol Biol Evol ; 40(1)2023 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-36508357

RESUMEN

Interspecies RNA-Seq datasets are increasingly common, and have the potential to answer new questions about the evolution of gene expression. Single-species differential expression analysis is now a well-studied problem that benefits from sound statistical methods. Extensive reviews on biological or synthetic datasets have provided the community with a clear picture on the relative performances of the available methods in various settings. However, synthetic dataset simulation tools are still missing in the interspecies gene expression context. In this work, we develop and implement a new simulation framework. This tool builds on both the RNA-Seq and the phylogenetic comparative methods literatures to generate realistic count datasets, while taking into account the phylogenetic relationships between the samples. We illustrate the usefulness of this new framework through a targeted simulation study, that reproduces the features of a recently published dataset, containing gene expression data in adult eye tissue across blind and sighted freshwater crayfish species. Using our simulated datasets, we perform a fair comparison of several approaches used for differential expression analysis. This benchmark reveals some of the strengths and weaknesses of both the classical and phylogenetic approaches for interspecies differential expression analysis, and allows for a reanalysis of the crayfish dataset. The tool has been integrated in the R package compcodeR, freely available on Bioconductor.


Asunto(s)
Perfilación de la Expresión Génica , Programas Informáticos , RNA-Seq , Filogenia , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos
2.
Genome Res ; 31(12): 2303-2315, 2021 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-34810219

RESUMEN

The noncoding genome plays an important role in de novo gene birth and in the emergence of genetic novelty. Nevertheless, how noncoding sequences' properties could promote the birth of novel genes and shape the evolution and the structural diversity of proteins remains unclear. Therefore, by combining different bioinformatic approaches, we characterized the fold potential diversity of the amino acid sequences encoded by all intergenic open reading frames (ORFs) of S. cerevisiae with the aim of (1) exploring whether the structural states' diversity of proteomes is already present in noncoding sequences, and (2) estimating the potential of the noncoding genome to produce novel protein bricks that could either give rise to novel genes or be integrated into pre-existing proteins, thus participating in protein structure diversity and evolution. We showed that amino acid sequences encoded by most yeast intergenic ORFs contain the elementary building blocks of protein structures. Moreover, they encompass the large structural state diversity of canonical proteins, with the majority predicted as foldable. Then, we investigated the early stages of de novo gene birth by reconstructing the ancestral sequences of 70 yeast de novo genes and characterized the sequence and structural properties of intergenic ORFs with a strong translation signal. This enabled us to highlight sequence and structural factors determining de novo gene emergence. Finally, we showed a strong correlation between the fold potential of de novo proteins and one of their ancestral amino acid sequences, reflecting the relationship between the noncoding genome and the protein structure universe.

3.
RNA Biol ; 19(1): 1208-1227, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-36384383

RESUMEN

This study investigates the importance of the structural context in the formation of a type I/II A-minor motif. This very frequent structural motif has been shown to be important in the spatial folding of RNA molecules. We developed an automated method to classify A-minor motif occurrences according to their 3D context similarities, and we used a graph approach to represent both the structural A-minor motif occurrences and their classes at different scales. This approach leads us to uncover new subclasses of A-minor motif occurrences according to their local 3D similarities. The majority of classes are composed of homologous occurrences, but some of them are composed of non-homologous occurrences. The different classifications we obtain allow us to better understand the importance of the context in the formation of A-minor motifs. In a second step, we investigate how much knowledge of the context around an A-minor motif can help to infer its presence (and position). More specifically, we want to determine what kind of information, contained in the structural context, can be useful to characterize and predict A-minor motifs. We show that, for some A-minor motifs, the topology combined with a sequence signal is sufficient to predict the presence and the position of an A-minor motif occurrence. In most other cases, these signals are not sufficient for predicting the A-minor motif, however we show that they are good signals for this purpose. All the classification and prediction pipelines rely on automated processes, for which we describe the underlying algorithms and parameters.


Asunto(s)
Imagenología Tridimensional , ARN , Algoritmos , Valor Predictivo de las Pruebas , Humanos , ARN/química
4.
PLoS Comput Biol ; 14(3): e1005992, 2018 03.
Artículo en Inglés | MEDLINE | ID: mdl-29543809

RESUMEN

We present a new educational initiative called Meet-U that aims to train students for collaborative work in computational biology and to bridge the gap between education and research. Meet-U mimics the setup of collaborative research projects and takes advantage of the most popular tools for collaborative work and of cloud computing. Students are grouped in teams of 4-5 people and have to realize a project from A to Z that answers a challenging question in biology. Meet-U promotes "coopetition," as the students collaborate within and across the teams and are also in competition with each other to develop the best final product. Meet-U fosters interactions between different actors of education and research through the organization of a meeting day, open to everyone, where the students present their work to a jury of researchers and jury members give research seminars. This very unique combination of education and research is strongly motivating for the students and provides a formidable opportunity for a scientific community to unite and increase its visibility. We report on our experience with Meet-U in two French universities with master's students in bioinformatics and modeling, with protein-protein docking as the subject of the course. Meet-U is easy to implement and can be straightforwardly transferred to other fields and/or universities. All the information and data are available at www.meet-u.org.


Asunto(s)
Biología Computacional/educación , Biología Computacional/métodos , Investigación/educación , Humanos , Proyectos de Investigación , Estudiantes , Universidades
5.
BMC Genomics ; 18(1): 667, 2017 Aug 29.
Artículo en Inglés | MEDLINE | ID: mdl-28851275

RESUMEN

BACKGROUND: The ascomycete fungus Colletotrichum higginsianum causes anthracnose disease of brassica crops and the model plant Arabidopsis thaliana. Previous versions of the genome sequence were highly fragmented, causing errors in the prediction of protein-coding genes and preventing the analysis of repetitive sequences and genome architecture. RESULTS: Here, we re-sequenced the genome using single-molecule real-time (SMRT) sequencing technology and, in combination with optical map data, this provided a gapless assembly of all twelve chromosomes except for the ribosomal DNA repeat cluster on chromosome 7. The more accurate gene annotation made possible by this new assembly revealed a large repertoire of secondary metabolism (SM) key genes (89) and putative biosynthetic pathways (77 SM gene clusters). The two mini-chromosomes differed from the ten core chromosomes in being repeat- and AT-rich and gene-poor but were significantly enriched with genes encoding putative secreted effector proteins. Transposable elements (TEs) were found to occupy 7% of the genome by length. Certain TE families showed a statistically significant association with effector genes and SM cluster genes and were transcriptionally active at particular stages of fungal development. All 24 subtelomeres were found to contain one of three highly-conserved repeat elements which, by providing sites for homologous recombination, were probably instrumental in four segmental duplications. CONCLUSION: The gapless genome of C. higginsianum provides access to repeat-rich regions that were previously poorly assembled, notably the mini-chromosomes and subtelomeres, and allowed prediction of the complete SM gene repertoire. It also provides insights into the potential role of TEs in gene and genome evolution and host adaptation in this asexual pathogen.


Asunto(s)
Cromosomas Fúngicos/genética , Colletotrichum/genética , Colletotrichum/metabolismo , Elementos Transponibles de ADN/genética , Genómica , Familia de Multigenes/genética , Recombinación Homóloga/genética , Anotación de Secuencia Molecular , Filogenia , Mutación Puntual/genética
6.
BMC Genomics ; 15 Suppl 6: S16, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25573073

RESUMEN

BACKGROUND: In comparative genomics, orthologs are used to transfer annotation from genes already characterized to newly sequenced genomes. Many methods have been developed for finding orthologs in sets of genomes. However, the application of different methods on the same proteome set can lead to distinct orthology predictions. METHODS: We developed a method based on a meta-approach that is able to combine the results of several methods for orthologous group prediction. The purpose of this method is to produce better quality results by using the overlapping results obtained from several individual orthologous gene prediction procedures. Our method proceeds in two steps. The first aims to construct seeds for groups of orthologous genes; these seeds correspond to the exact overlaps between the results of all or several methods. In the second step, these seed groups are expanded by using HMM profiles. RESULTS: We evaluated our method on two standard reference benchmarks, OrthoBench and Orthology Benchmark Service. Our method presents a higher level of accurately predicted groups than the individual input methods of orthologous group prediction. Moreover, our method increases the number of annotated orthologous pairs without decreasing the annotation quality compared to twelve state-of-the-art methods. CONCLUSIONS: The meta-approach based method appears to be a reliable procedure for predicting orthologous groups. Since a large number of methods for predicting groups of orthologous genes exist, it is quite conceivable to apply this meta-approach to several combinations of different methods.


Asunto(s)
Biología Computacional/métodos , Genómica/métodos , Anotación de Secuencia Molecular/métodos , Programas Informáticos , Evolución Molecular , Reproducibilidad de los Resultados
7.
NAR Genom Bioinform ; 6(2): lqae069, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38915823

RESUMEN

Microbial specialized metabolite biosynthetic gene clusters (SMBGCs) are a formidable source of natural products of pharmaceutical interest. With the multiplication of genomic data available, very efficient bioinformatic tools for automatic SMBGC detection have been developed. Nevertheless, most of these tools identify SMBGCs based on sequence similarity with enzymes typically involved in specialised metabolism and thus may miss SMBGCs coding for undercharacterised enzymes. Here we present Synteruptor (https://bioi2.i2bc.paris-saclay.fr/synteruptor), a program that identifies genomic islands, known to be enriched in SMBGCs, in the genomes of closely related species. With this tool, we identified a SMBGC in the genome of Streptomyces ambofaciens ATCC23877, undetected by antiSMASH versions prior to antiSMASH 5, and experimentally demonstrated that it directs the biosynthesis of two metabolites, one of which was identified as sphydrofuran. Synteruptor is also a valuable resource for the delineation of individual SMBGCs within antiSMASH regions that may encompass multiple clusters, and for refining the boundaries of these SMBGCs.

8.
BMC Genomics ; 14: 623, 2013 Sep 14.
Artículo en Inglés | MEDLINE | ID: mdl-24034898

RESUMEN

BACKGROUND: Candida glabrata follows C. albicans as the second or third most prevalent cause of candidemia worldwide. These two pathogenic yeasts are distantly related, C. glabrata being part of the Nakaseomyces, a group more closely related to Saccharomyces cerevisiae. Although C. glabrata was thought to be the only pathogenic Nakaseomyces, two new pathogens have recently been described within this group: C. nivariensis and C. bracarensis. To gain insight into the genomic changes underlying the emergence of virulence, we sequenced the genomes of these two, and three other non-pathogenic Nakaseomyces, and compared them to other sequenced yeasts. RESULTS: Our results indicate that the two new pathogens are more closely related to the non-pathogenic N. delphensis than to C. glabrata. We uncover duplications and accelerated evolution that specifically affected genes in the lineage preceding the group containing N. delphensis and the three pathogens, which may provide clues to the higher propensity of this group to infect humans. Finally, the number of Epa-like adhesins is specifically enriched in the pathogens, particularly in C. glabrata. CONCLUSIONS: Remarkably, some features thought to be the result of adaptation of C. glabrata to a pathogenic lifestyle, are present throughout the Nakaseomyces, indicating these are rather ancient adaptations to other environments. Phylogeny suggests that human pathogenesis evolved several times, independently within the clade. The expansion of the EPA gene family in pathogens establishes an evolutionary link between adhesion and virulence phenotypes. Our analyses thus shed light onto the relationships between virulence and the recent genomic changes that occurred within the Nakaseomyces. SEQUENCE ACCESSION NUMBERS: Nakaseomyces delphensis: CAPT01000001 to CAPT01000179Candida bracarensis: CAPU01000001 to CAPU01000251Candida nivariensis: CAPV01000001 to CAPV01000123Candida castellii: CAPW01000001 to CAPW01000101Nakaseomyces bacillisporus: CAPX01000001 to CAPX01000186.


Asunto(s)
Candida glabrata/clasificación , Genoma Fúngico , Filogenia , Saccharomycetales/clasificación , Candida glabrata/genética , ADN de Hongos/genética , Evolución Molecular , Saccharomycetales/genética , Selección Genética , Análisis de Secuencia de ADN
9.
Sci Rep ; 13(1): 1417, 2023 01 25.
Artículo en Inglés | MEDLINE | ID: mdl-36697464

RESUMEN

We report here a new application, CustomProteinSearch (CusProSe), whose purpose is to help users to search for proteins of interest based on their domain composition. The application is customizable. It consists of two independent tools, IterHMMBuild and ProSeCDA. IterHMMBuild allows the iterative construction of Hidden Markov Model (HMM) profiles for conserved domains of selected protein sequences, while ProSeCDA scans a proteome of interest against an HMM profile database, and annotates identified proteins using user-defined rules. CusProSe was successfully used to identify, in fungal genomes, genes encoding key enzyme families involved in secondary metabolism, such as polyketide synthases (PKS), non-ribosomal peptide synthetases (NRPS), hybrid PKS-NRPS and dimethylallyl tryptophan synthases (DMATS), as well as to characterize distinct terpene synthases (TS) sub-families. The highly configurable characteristics of this application makes it a generic tool, which allows the user to refine the function of predicted proteins, to extend detection to new enzymes families, and may also be applied to biological systems other than fungi and to other proteins than those involved in secondary metabolism.


Asunto(s)
Hongos , Anotación de Secuencia Molecular , Metabolismo Secundario , Programas Informáticos , Secuencia de Aminoácidos , Anotación de Secuencia Molecular/métodos , Péptido Sintasas/genética , Sintasas Poliquetidas/genética , Metabolismo Secundario/genética , Hongos/enzimología , Hongos/genética , Triptófano Sintasa/genética , Secuencia Conservada/genética
10.
J Mol Evol ; 73(3-4): 230-43, 2011 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-22094890

RESUMEN

The recent availability of genome sequences of four different Fusarium species offers the opportunity to perform extensive comparative analyses, in particular of repeated sequences. In a recent work, the overall content of such sequences in the genomes of three phylogenetically related Fusarium species, F. graminearum, F. verticillioides, and F. oxysporum f. sp. lycopersici has been estimated. In this study, we present an exhaustive characterization of pogo-like elements, named Fots, in four Fusarium genomes. Overall 10 Fot and two Fot-related miniature inverted-repeat transposable element families were identified, revealing a diversification of multiple lineages of pogo-like elements, some of which accompanied by a gain of introns. This analysis also showed that such elements are present in an unusual high proportion in the genomes of F. oxysporum f. sp. lycopersici and Nectria haematococca (anamorph F. solani f. sp. pisi) in contrast with most other fungal genomes in which retroelements are the most represented. Interestingly, our analysis showed that the most numerous Fot families all contain potentially active or mobilisable copies, thus conferring a mutagenic potential of these transposable elements and consequently a role in strain adaptation and genome evolution. This role is strongly reinforced when examining their genomic distribution which is clearly biased with a high proportion (more than 80%) located on strain- or species-specific regions enriched in genes involved in pathogenicity and/or adaptation. Finally, the different reproductive characteristics of the four Fusarium species allowed us to investigate the impact of the process of repeat-induced point mutations on the expansion and diversification of Fot elements.


Asunto(s)
Elementos Transponibles de ADN/genética , Fusarium/genética , Genoma Fúngico , Secuencia de Bases , Análisis por Conglomerados , Evolución Molecular , Dosificación de Gen , Funciones de Verosimilitud , Modelos Genéticos , Familia de Multigenes , Sistemas de Lectura Abierta , Filogenia , Polimorfismo Genético
11.
Nat Commun ; 12(1): 5221, 2021 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-34471117

RESUMEN

Bacteria of the genus Streptomyces are prolific producers of specialized metabolites, including antibiotics. The linear chromosome includes a central region harboring core genes, as well as extremities enriched in specialized metabolite biosynthetic gene clusters. Here, we show that chromosome structure in Streptomyces ambofaciens correlates with genetic compartmentalization during exponential phase. Conserved, large and highly transcribed genes form boundaries that segment the central part of the chromosome into domains, whereas the terminal ends tend to be transcriptionally quiescent compartments with different structural features. The onset of metabolic differentiation is accompanied by a rearrangement of chromosome architecture, from a rather 'open' to a 'closed' conformation, in which highly expressed specialized metabolite biosynthetic genes form new boundaries. Thus, our results indicate that the linear chromosome of S. ambofaciens is partitioned into structurally distinct entities, suggesting a link between chromosome folding, gene expression and genome evolution.


Asunto(s)
Antibacterianos/metabolismo , Cromosomas Bacterianos , Streptomyces/genética , Streptomyces/metabolismo , Estructuras Cromosómicas , Regulación Bacteriana de la Expresión Génica , Genoma Bacteriano , Familia de Multigenes , Transcriptoma
12.
BMC Genomics ; 11: 81, 2010 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-20122162

RESUMEN

BACKGROUND: More and more completely sequenced fungal genomes are becoming available and many more sequencing projects are in progress. This deluge of data should improve our knowledge of the various primary and secondary metabolisms of Fungi, including their synthesis of useful compounds such as antibiotics or toxic molecules such as mycotoxins. Functional annotation of many fungal genomes is imperfect, especially of genes encoding enzymes, so we need dedicated tools to analyze their metabolic pathways in depth. DESCRIPTION: FUNGIpath is a new tool built using a two-stage approach. Groups of orthologous proteins predicted using complementary methods of detection were collected in a relational database. Each group was further mapped on to steps in the metabolic pathways published in the public databases KEGG and MetaCyc. As a result, FUNGIpath allows the primary and secondary metabolisms of the different fungal species represented in the database to be compared easily, making it possible to assess the level of specificity of various pathways at different taxonomic distances. It is freely accessible at http://www.fungipath.u-psud.fr. CONCLUSIONS: As more and more fungal genomes are expected to be sequenced during the coming years, FUNGIpath should help progressively to reconstruct the ancestral primary and secondary metabolisms of the main branches of the fungal tree of life and to elucidate the evolution of these ancestral fungal metabolisms to various specific derived metabolisms.


Asunto(s)
Biología Computacional/métodos , Bases de Datos de Proteínas , Genoma Fúngico , Redes y Vías Metabólicas , Minería de Datos , Hongos/genética
13.
Microb Genom ; 7(6)2019 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-33749576

RESUMEN

Streptomyces possess a large linear chromosome (6-12 Mb) consisting of a conserved central region flanked by variable arms covering several megabases. In order to study the evolution of the chromosome across evolutionary times, a representative panel of Streptomyces strains and species (125) whose chromosomes are completely sequenced and assembled was selected. The pan-genome of the genus was modelled and shown to be open with a core-genome reaching 1018 genes. The evolution of Streptomyces chromosome was analysed by carrying out pairwise comparisons, and by monitoring indexes measuring the conservation of genes (presence/absence) and their synteny along the chromosome. Using the phylogenetic depth offered by the chosen panel, it was possible to infer that within the central region of the chromosome, the core-genes form a highly conserved organization, which can reveal the existence of an ancestral chromosomal skeleton. Conversely, the chromosomal arms, enriched in variable genes evolved faster than the central region under the combined effect of rearrangements and addition of new information from horizontal gene transfer. The genes hosted in these regions may be localized there because of the adaptive advantage that their rapid evolution may confer. We speculate that (i) within a bacterial population, the variability of these genes may contribute to the establishment of social characters by the production of 'public goods' (ii) at the evolutionary scale, this variability contributes to the diversification of the genetic pool of the bacteria.

14.
Microbiol Resour Announc ; 8(38)2019 Sep 19.
Artículo en Inglés | MEDLINE | ID: mdl-31537669

RESUMEN

The genomes of 11 conspecific Streptomyces strains, i.e., from the same species and inhabiting the same ecological niche, were sequenced and assembled. This data set offers an ideal framework to assess the genome evolution of Streptomyces species in their ecological context.

15.
mBio ; 10(5)2019 09 03.
Artículo en Inglés | MEDLINE | ID: mdl-31481382

RESUMEN

In this work, by comparing genomes of closely related individuals of Streptomyces isolated at a spatial microscale (millimeters or centimeters), we investigated the extent and impact of horizontal gene transfer in the diversification of a natural Streptomyces population. We show that despite these conspecific strains sharing a recent common ancestor, all harbored significantly different gene contents, implying massive and rapid gene flux. The accessory genome of the strains was distributed across insertion/deletion events (indels) ranging from one to several hundreds of genes. Indels were preferentially located in the arms of the linear chromosomes (ca. 12 Mb) and appeared to form recombination hot spots. Some of them harbored biosynthetic gene clusters (BGCs) whose products confer an inhibitory capacity and may constitute public goods that can favor the cohesiveness of the bacterial population. Moreover, a significant proportion of these variable genes were either plasmid borne or harbored signatures of actinomycete integrative and conjugative elements (AICEs). We propose that conjugation is the main driver for the indel flux and diversity in Streptomyces populations.IMPORTANCE Horizontal gene transfer is a rapid and efficient way to diversify bacterial gene pools. Currently, little is known about this gene flux within natural soil populations. Using comparative genomics of Streptomyces strains belonging to the same species and isolated at microscale, we reveal frequent transfer of a significant fraction of the pangenome. We show that it occurs at a time scale enabling the population to diversify and to cope with its changing environment, notably, through the production of public goods.


Asunto(s)
Transferencia de Gen Horizontal , Genes Bacterianos/genética , Variación Genética , Streptomyces/genética , Actinobacteria/genética , Vías Biosintéticas/genética , Cromosomas Bacterianos , Conjugación Genética , ADN Bacteriano/genética , Genoma Bacteriano , Familia de Multigenes , Tipificación de Secuencias Multilocus , Filogenia , Plásmidos
16.
BMC Bioinformatics ; 9: 536, 2008 Dec 16.
Artículo en Inglés | MEDLINE | ID: mdl-19087285

RESUMEN

BACKGROUND: It has been repeatedly observed that gene order is rapidly lost in prokaryotic genomes. However, persistent synteny blocks are found when comparing more or less distant species. These genes that remain consistently adjacent are appealing candidates for the study of genome evolution and a more accurate definition of their functional role. Such studies require visualizing conserved synteny blocks in a large number of genomes at all taxonomic distances. RESULTS: After comparing nearly 600 completely sequenced genomes encompassing the whole prokaryotic tree of life, the computed synteny data were assembled in a relational database, SynteBase. SynteView was designed to visualize conserved synteny blocks in a large number of genomes after choosing one of them as a reference. SynteView functions with data stored either in SynteBase or in a home-made relational database of personal data. In addition, this software can compute on-the-fly and display the distribution of synteny blocks which are conserved in pairs of genomes. This tool has been designed to provide a wealth of information on each positional orthologous gene, to be user-friendly and customizable. It is also possible to download sequences of genes belonging to these synteny blocks for further studies. SynteView is accessible through Java Webstart at http://www.synteview.u-psud.fr. CONCLUSION: SynteBase answers queries about gene order conservation and SynteView visualizes the obtained results in a flexible and powerful way which provides a comparative overview of the conserved synteny in a large number of genomes, whatever their taxonomic distances.


Asunto(s)
Orden Génico/genética , Genoma Arqueal , Genoma Bacteriano , Programas Informáticos , Sintenía/genética , Biología Computacional/métodos , Secuencia Conservada , Bases de Datos Genéticas , Evolución Molecular , Genómica/métodos
17.
Biochimie ; 90(4): 595-608, 2008 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-17961904

RESUMEN

The incredible development of comparative genomics during the last decade has required a correct use of the concept of homology that was previously utilized only by evolutionary biologists. Unhappily, this concept has been often misunderstood and thus misused when exploited outside its evolutionary context. This review brings back to the correct definition of homology and explains how this definition has been progressively refined in order to adapt it to the various new kinds of analysis of gene properties and of their products that appear with the progress of comparative genomics. Then, we illustrate the power and the proficiency of such a concept when using the available genomics data in order to study the evolution of individual genes, of entire genomes and of species, respectively. After explaining how we detect homologues by an exhaustive comparison of a hundred of complete proteomes, we describe three main lines of research we have developed in the recent years. The first one exploits synteny and gene context data to better understand the mechanisms of genome evolution in prokaryotes. The second one is based on phylogenomics approaches to reconstruct the tree of life. The last one is devoted to reminding that protein homology is often limited to structural segments (SOH=segment of homology or module). Detecting and numbering modules allows tracing back protein history by identifying the events of gene duplication and gene fusion. We insist that one of the main present difficulties in such studies is a lack of a reliable method to identify genuine orthologues. Finally, we show how these homology studies are helpful to annotate genes and genomes and to study the complexity of the relationships between sequence and function of a gene.


Asunto(s)
Evolución Molecular , Genes/genética , Genoma , Genómica , Animales , Bacterias/clasificación , Bacterias/genética , Filogenia , Proteoma/análisis , Homología de Secuencia de Ácido Nucleico
18.
Antibiotics (Basel) ; 7(4)2018 Oct 02.
Artículo en Inglés | MEDLINE | ID: mdl-30279346

RESUMEN

Specialized metabolites are of great interest due to their possible industrial and clinical applications. The increasing number of antimicrobial resistant infectious agents is a major health threat and therefore, the discovery of chemical diversity and new antimicrobials is crucial. Extensive genomic data from Streptomyces spp. confirm their production potential and great importance. Genome sequencing of the same species strains indicates that specialized metabolite biosynthetic gene cluster (SMBGC) diversity is not exhausted, and instead, a pool of novel specialized metabolites still exists. Here, we analyze the genome sequence data from six phylogenetically close Streptomyces strains. The results reveal that the closer strains are phylogenetically, the number of shared gene clusters is higher. Eight specialized metabolites comprise the core metabolome, although some strains have only six core gene clusters. The number of conserved gene clusters common between the isolated strains and their closest phylogenetic counterparts varies from nine to 23 SMBGCs. However, the analysis of these phylogenetic relationships is not affected by the acquisition of gene clusters, probably by horizontal gene transfer events, as each strain also harbors strain-specific SMBGCs. Between one and 15 strain-specific gene clusters were identified, of which up to six gene clusters in a single strain are unknown and have no identifiable orthologs in other species, attesting to the existing SMBGC novelty at the strain level.

19.
BMC Evol Biol ; 7: 237, 2007 Nov 29.
Artículo en Inglés | MEDLINE | ID: mdl-18047665

RESUMEN

BACKGROUND: Comparison of completely sequenced microbial genomes has revealed how fluid these genomes are. Detecting synteny blocks requires reliable methods to determining the orthologs among the whole set of homologs detected by exhaustive comparisons between each pair of completely sequenced genomes. This is a complex and difficult problem in the field of comparative genomics but will help to better understand the way prokaryotic genomes are evolving. RESULTS: We have developed a suite of programs that automate three essential steps to study conservation of gene order, and validated them with a set of 107 bacteria and archaea that cover the majority of the prokaryotic taxonomic space. We identified the whole set of shared homologs between two or more species and computed the evolutionary distance separating each pair of homologs. We applied two strategies to extract from the set of homologs a collection of valid orthologs shared by at least two genomes. The first computes the Reciprocal Smallest Distance (RSD) using the PAM distances separating pairs of homologs. The second method groups homologs in families and reconstructs each family's evolutionary tree, distinguishing bona fide orthologs as well as paralogs created after the last speciation event. Although the phylogenetic tree method often succeeds where RSD fails, the reverse could occasionally be true. Accordingly, we used the data obtained with either methods or their intersection to number the orthologs that are adjacent in for each pair of genomes, the Positional Orthologous Genes (POGs), and to further study their properties. Once all these synteny blocks have been detected, we showed that POGs are subject to more evolutionary constraints than orthologs outside synteny groups, whichever the taxonomic distance separating the compared organisms. CONCLUSION: The suite of programs described in this paper allows a reliable detection of orthologs and is useful for evaluating gene order conservation in prokaryotes whichever their taxonomic distance. Thus, our approach will make easy the rapid identification of POGS in the next few years as we are expecting to be inundated with thousands of completely sequenced microbial genomes.


Asunto(s)
Archaea/genética , Bacterias/genética , Evolución Molecular , Genes Arqueales , Genes Bacterianos , Sintenía , Algoritmos , Análisis por Conglomerados , Filogenia , Proteoma , Especificidad de la Especie
20.
BMC Bioinformatics ; 7: 436, 2006 Oct 06.
Artículo en Inglés | MEDLINE | ID: mdl-17026747

RESUMEN

BACKGROUND: Despite the current availability of several hundreds of thousands of amino acid sequences, more than 36% of the enzyme activities (EC numbers) defined by the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB) are not associated with any amino acid sequence in major public databases. This wide gap separating knowledge of biochemical function and sequence information is found for nearly all classes of enzymes. Thus, there is an urgent need to explore these sequence-less EC numbers, in order to progressively close this gap. DESCRIPTION: We designed ORENZA, a PostgreSQL database of ORphan ENZyme Activities, to collate information about the EC numbers defined by the NC-IUBMB with specific emphasis on orphan enzyme activities. Complete lists of all EC numbers and of orphan EC numbers are available and will be periodically updated. ORENZA allows one to browse the complete list of EC numbers or the subset associated with orphan enzymes or to query a specific EC number, an enzyme name or a species name for those interested in particular organisms. It is possible to search ORENZA for the different biochemical properties of the defined enzymes, the metabolic pathways in which they participate, the taxonomic data of the organisms whose genomes encode them, and many other features. The association of an enzyme activity with an amino acid sequence is clearly underlined, making it easy to identify at once the orphan enzyme activities. Interactive publishing of suggestions by the community would provide expert evidence for re-annotation of orphan EC numbers in public databases. CONCLUSION: ORENZA is a Web resource designed to progressively bridge the unwanted gap between function (enzyme activities) and sequence (dataset present in public databases). ORENZA should increase interactions between communities of biochemists and of genomicists. This is expected to reduce the number of orphan enzyme activities by allocating gene sequences to the relevant enzymes.


Asunto(s)
Bases de Datos Genéticas , Enzimas/genética , Internet , Secuencia de Aminoácidos/genética , Internet/estadística & datos numéricos , Análisis de Secuencia de Proteína/métodos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA