Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Bioinformatics ; 39(12)2023 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-37975872

RESUMO

MOTIVATION: Phylogenetic placement enables phylogenetic analysis of massive collections of newly sequenced DNA, when de novo tree inference is too unreliable or inefficient. Assuming that a high-quality reference tree is available, the idea is to seek the correct placement of the new sequences in that tree. Recently, alignment-free approaches to phylogenetic placement have emerged, both to circumvent the need to align the new sequences and to avoid the calculations that typically follow the alignment step. A promising approach is based on the inference of k-mers that can be potentially related to the reference sequences, also called phylo-k-mers. However, its usage is limited by the time and memory-consuming stage of reference data preprocessing and the large numbers of k-mers to consider. RESULTS: We suggest a filtering method for selecting informative phylo-k-mers based on mutual information, which can significantly improve the efficiency of placement, at the cost of a small loss in placement accuracy. This method is implemented in IPK, a new tool for computing phylo-k-mers that significantly outperforms the software previously available. We also present EPIK, a new software for phylogenetic placement, supporting filtered phylo-k-mer databases. Our experiments on real-world data show that EPIK is the fastest phylogenetic placement tool available, when placing hundreds of thousands and millions of queries while still providing accurate placements. AVAILABILITY AND IMPLEMENTATION: IPK and EPIK are freely available at https://github.com/phylo42/IPK and https://github.com/phylo42/EPIK. Both are implemented in C++ and Python and supported on Linux and MacOS.


Assuntos
Algoritmos , Software , Filogenia , Análise de Sequência de DNA , Sequência de Bases
2.
IEEE/ACM Trans Comput Biol Bioinform ; 20(5): 2889-2897, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37204943

RESUMO

Finding the correct position of new sequences within an established phylogenetic tree is an increasingly relevant problem in evolutionary bioinformatics and metagenomics. Recently, alignment-free approaches for this task have been proposed. One such approach is based on the concept of phylogenetically-informative k-mers or phylo- k-mers for short. In practice, phylo- k-mers are inferred from a set of related reference sequences and are equipped with scores expressing the probability of their appearance in different locations within the input reference phylogeny. Computing phylo- k-mers, however, represents a computational bottleneck to their applicability in real-world problems such as the phylogenetic analysis of metabarcoding reads and the detection of novel recombinant viruses. Here we consider the problem of phylo- k-mer computation: how can we efficiently find all k-mers whose probability lies above a given threshold for a given tree node? We describe and analyze algorithms for this problem, relying on branch-and-bound and divide-and-conquer techniques. We exploit the redundancy of adjacent windows of the alignment to save on computation. Besides computational complexity analyses, we provide an empirical evaluation of the relative performance of their implementations on simulated and real-world data. The divide-and-conquer algorithms are found to surpass the branch-and-bound approach, especially when many phylo- k-mers are found.

3.
Mol Ecol ; 32(23): 6147-6160, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-36271787

RESUMO

To help address the underrepresentation of arthropods and Asian biodiversity from climate-change assessments, we carried out year-long, weekly sampling campaigns with Malaise traps at different elevations and latitudes in Gaoligongshan National Park in southwestern China. From these 623 samples, we barcoded 10,524 beetles and compared scenarios of climate-change-induced biodiversity loss, by designating seasonal, elevational, and latitudinal subsets of beetles as communities that plausibly could go extinct as a group, which we call "loss sets". The availability of a published mitochondrial-genome-based phylogeny of the Coleoptera allowed us to compare the loss of species diversity with and without accounting for phylogenetic relatedness. We hypothesised that phylogenetic relatedness would mitigate extinction, since the extinction of any loss set would result in the disappearance of all its species but only part of its evolutionary history, which is still extant in the remaining loss sets. We found different patterns of community clustering by season and latitude, depending on whether phylogenetic information was incorporated. However, accounting for phylogeny only slightly mitigated the amount of biodiversity loss under climate change scenarios, against our expectations: there is no phylogenetic "escape clause" for biodiversity conservation. We achieve the same results whether phylogenetic information was derived from the mitogenome phylogeny or from a de novo barcode-gene tree. We encourage interested researchers to use this data set to study lineage-specific community assembly patterns in conjunction with life-history traits and environmental covariates.


Assuntos
Artrópodes , Besouros , Animais , Filogenia , Biodiversidade , Insetos , Evolução Biológica , Besouros/genética
4.
Mol Biol Evol ; 38(8): 3033-3045, 2021 07 29.
Artigo em Inglês | MEDLINE | ID: mdl-33822172

RESUMO

Accurate determination of the evolutionary relationships between genes is a foundational challenge in biology. Homology-evolutionary relatedness-is in many cases readily determined based on sequence similarity analysis. By contrast, whether or not two genes directly descended from a common ancestor by a speciation event (orthologs) or duplication event (paralogs) is more challenging, yet provides critical information on the history of a gene. Since 2009, this task has been the focus of the Quest for Orthologs (QFO) Consortium. The sixth QFO meeting took place in Okazaki, Japan in conjunction with the 67th National Institute for Basic Biology conference. Here, we report recent advances, applications, and oncoming challenges that were discussed during the conference. Steady progress has been made toward standardization and scalability of new and existing tools. A feature of the conference was the presentation of a panel of accessible tools for phylogenetic profiling and several developments to bring orthology beyond the gene unit-from domains to networks. This meeting brought into light several challenges to come: leveraging orthology computations to get the most of the incoming avalanche of genomic data, integrating orthology from domain to biological network levels, building better gene models, and adapting orthology approaches to the broad evolutionary and genomic diversity recognized in different forms of life and viruses.


Assuntos
Especiação Genética , Genômica/tendências , Filogenia , Genoma Viral , Genômica/métodos
5.
Bioinformatics ; 36(21): 5264-5266, 2021 01 29.
Artigo em Inglês | MEDLINE | ID: mdl-32697844

RESUMO

MOTIVATION: Phylogenetic placement (PP) is a process of taxonomic identification for which several tools are now available. However, it remains difficult to assess which tool is more adapted to particular genomic data or a particular reference taxonomy. We developed Placement Evaluation WOrkflows (PEWO), the first benchmarking tool dedicated to PP assessment. Its automated workflows can evaluate PP at many levels, from parameter optimization for a particular tool, to the selection of the most appropriate genetic marker when PP-based species identifications are targeted. Our goal is that PEWO will become a community effort and a standard support for future developments and applications of PP. AVAILABILITY AND IMPLEMENTATION: https://github.com/phylo42/PEWO. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Benchmarking , Software , Genoma , Filogenia , Fluxo de Trabalho
6.
Bioinformatics ; 36(22-23): 5351-5360, 2021 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-33331849

RESUMO

MOTIVATION: Novel recombinant viruses may have important medical and evolutionary significance, as they sometimes display new traits not present in the parental strains. This is particularly concerning when the new viruses combine fragments coming from phylogenetically distinct viral types. Here, we consider the task of screening large collections of sequences for such novel recombinants. A number of methods already exist for this task. However, these methods rely on complex models and heavy computations that are not always practical for a quick scan of a large number of sequences. RESULTS: We have developed SHERPAS, a new program to detect novel recombinants and provide a first estimate of their parental composition. Our approach is based on the precomputation of a large database of 'phylogenetically-informed k-mers', an idea recently introduced in the context of phylogenetic placement in metagenomics. Our experiments show that SHERPAS is hundreds to thousands of times faster than existing software, and enables the analysis of thousands of whole genomes, or long-sequencing reads, within minutes or seconds, and with limited loss of accuracy. AVAILABILITY AND IMPLEMENTATION: The source code is freely available for download at https://github.com/phylo42/sherpas. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

7.
Mol Biol Evol ; 37(3): 683-694, 2020 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-31670799

RESUMO

High-throughput DNA methods hold great promise for phylogenetic analysis of lineages that are difficult to study with conventional molecular and morphological approaches. The mites (Acari), and in particular the highly diverse soil-dwelling lineages, are among the least known branches of the metazoan Tree-of-Life. We extracted numerous minute mites from soils in an area of mixed forest and grassland in southern Iberia. Selected specimens representing the full morphological diversity were shotgun sequenced in bulk, followed by genome assembly of short reads from the mixture, which produced >100 mitochondrial genomes representing diverse acarine lineages. Phylogenetic analyses in combination with taxonomically limited mitogenomes available publicly resulted in plausible trees defining basal relationships of the Acari. Several critical nodes were supported by ancestral-state reconstructions of mitochondrial gene rearrangements. Molecular calibration placed the minimum age for the common ancestor of the superorder Acariformes, which includes most soil-dwelling mites, to the Cambrian-Ordovician (likely within 455-552 Ma), whereas the origin of the superorder Parasitiformes was placed later in the Carboniferous-Permian. Most family-level taxa within the Acariformes were dated to the Jurassic and Triassic. The ancient origin of Acariformes and the early diversification of major extant lineages linked to the soil are consistent with a pioneering role for mites in building the earliest terrestrial ecosystems.


Assuntos
Ácaros/classificação , Mitocôndrias/genética , Solo/parasitologia , Animais , DNA Mitocondrial/genética , Metagenômica , Ácaros/genética , Filogenia , Análise de Sequência de DNA
8.
Bioinformatics ; 35(18): 3303-3312, 2019 09 15.
Artigo em Inglês | MEDLINE | ID: mdl-30698645

RESUMO

MOTIVATION: Taxonomic classification is at the core of environmental DNA analysis. When a phylogenetic tree can be built as a prior hypothesis to such classification, phylogenetic placement (PP) provides the most informative type of classification because each query sequence is assigned to its putative origin in the tree. This is useful whenever precision is sought (e.g. in diagnostics). However, likelihood-based PP algorithms struggle to scale with the ever-increasing throughput of DNA sequencing. RESULTS: We have developed RAPPAS (Rapid Alignment-free Phylogenetic Placement via Ancestral Sequences) which uses an alignment-free approach, removing the hurdle of query sequence alignment as a preliminary step to PP. Our approach relies on the precomputation of a database of k-mers that may be present with non-negligible probability in relatives of the reference sequences. The placement is performed by inspecting the stored phylogenetic origins of the k-mers in the query, and their probabilities. The database can be reused for the analysis of several different metagenomes. Experiments show that the first implementation of RAPPAS is already faster than competing likelihood-based PP algorithms, while keeping similar accuracy for short reads. RAPPAS scales PP for the era of routine metagenomic diagnostics. AVAILABILITY AND IMPLEMENTATION: Program and sources freely available for download at https://github.com/blinard-BIOINFO/RAPPAS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Metagenoma , Software , Algoritmos , Funções Verossimilhança , Filogenia , Alinhamento de Sequência , Análise de Sequência de DNA
9.
Nucleic Acids Res ; 47(D1): D411-D418, 2019 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-30380106

RESUMO

OrthoInspector is one of the leading software suites for orthology relations inference. In this paper, we describe a major redesign of the OrthoInspector online resource along with a significant increase in the number of species: 4753 organisms are now covered across the three domains of life, making OrthoInspector the most exhaustive orthology resource to date in terms of covered species (excluding viruses). The new website integrates original data exploration and visualization tools in an ergonomic interface. Distributions of protein orthologs are represented by heatmaps summarizing their evolutionary histories, and proteins with similar profiles can be directly accessed. Two novel tools have been implemented for comparative genomics: a phylogenetic profile search that can be used to find proteins with a specific presence-absence profile and investigate their functions and, inversely, a GO profiling tool aimed at deciphering evolutionary histories of molecular functions, processes or cell components. In addition to the re-designed website, the OrthoInspector resource now provides a REST interface for programmatic access. OrthoInspector 3.0 is available at http://lbgi.fr/orthoinspectorv3.


Assuntos
Bases de Dados Genéticas , Genômica , Algoritmos , Bactérias/genética , Classificação , Eucariotos/genética , Evolução Molecular , Previsões , Ontologia Genética , Internet , Filogenia , Proteoma , Homologia de Sequência do Ácido Nucleico , Software , Especificidade da Espécie
10.
Mitochondrial DNA B Resour ; 4(2): 2447-2450, 2019 Jul 12.
Artigo em Inglês | MEDLINE | ID: mdl-33365580

RESUMO

High-throughput DNA methods hold great promise for the study of the hyperdiverse arthropod fauna of the soil. We used the mitochondrial metagenomic approach to generate 39 mitochondrial genomes from adult and larval specimens of Coleoptera collected from soil samples. The mitogenomes correspond to species from the families Carabidae (6), Chrysomelidae (1), Curculionidae (9), Dermestidae (1), Elateridae (1), Latridiidae (1), Scarabaeidae (3), Silvanidae (1), Staphylinidae (12), and Tenebrionidae (4). All the mitogenomes followed the putative ancestral gene order for Coleoptera. We provide the first available mitogenome for 30 genera of Coleoptera, including endogean representatives of the genera Torneuma, Coiffaitiella, Otiorhynchus, Oligotyphlopsis, and Typhlocharis.

11.
Mol Phylogenet Evol ; 128: 1-11, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30055354

RESUMO

A phylogenetic tree at the species level is still far off for highly diverse insect orders, including the Coleoptera, but the taxonomic breadth of public sequence databases is growing. In addition, new types of data may contribute to increasing taxon coverage, such as metagenomic shotgun sequencing for assembly of mitogenomes from bulk specimen samples. The current study explores the application of these techniques for large-scale efforts to build the tree of Coleoptera. We used shotgun data from 17 different ecological and taxonomic datasets (5 unpublished) to assemble a total of 1942 mitogenome contigs of >3000 bp. These sequences were combined into a single dataset together with all mitochondrial data available at GenBank, in addition to nuclear markers widely used in molecular phylogenetics. The resulting matrix of nearly 16,000 species with two or more loci produced trees (RAxML) showing overall congruence with the Linnaean taxonomy at hierarchical levels from suborders to genera. We tested the role of full-length mitogenomes in stabilizing the tree from GenBank data, as mitogenomes might link terminals with non-overlapping gene representation. However, the mitogenome data were only partly useful in this respect, presumably because of the purely automated approach to assembly and gene delimitation, but improvements in future may be possible by using multiple assemblers and manual curation. In conclusion, the combination of data mining and metagenomic sequencing of bulk samples provided the largest phylogenetic tree of Coleoptera to date, which represents a summary of existing phylogenetic knowledge and a defensible tree of great utility, in particular for studies at the intra-familial level, despite some shortcomings for resolving basal nodes.


Assuntos
Besouros/genética , Metagenômica , Mitocôndrias/genética , Filogenia , Algoritmos , Animais , Sequência de Bases , Besouros/classificação , Bases de Dados Genéticas
12.
Mitochondrial DNA A DNA Mapp Seq Anal ; 28(2): 156-158, 2017 03.
Artigo em Inglês | MEDLINE | ID: mdl-27211011

RESUMO

The complete mitochondrial genome of the recently discovered beetle family Iberobaeniidae is described and compared with known coleopteran mitogenomes. The mitochondrial sequence was obtained by shotgun metagenomic sequencing using the Illumina Miseq technology and resulted in an average coverage of 130 × and a minimum coverage of 35×. The mitochondrial genome of Iberobaeniidae includes 13 protein-coding genes, 2 rRNAs, 22 tRNAs genes, and 1 putative control region, and showed a unique rearrangement of protein-coding genes. This is the first rearrangement affecting the relative position of protein-coding and ribosomal genes reported for the order Coleoptera.


Assuntos
Besouros/genética , Genes Mitocondriais , Genoma Mitocondrial , Filogenia , Análise de Sequência de DNA , Animais , DNA Mitocondrial , Ordem dos Genes , Genoma de Inseto , Genômica
13.
PLoS One ; 11(9): e0161841, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27622637

RESUMO

Characterizing trophic networks is fundamental to many questions in ecology, but this typically requires painstaking efforts, especially to identify the diet of small generalist predators. Several attempts have been devoted to develop suitable molecular tools to determine predatory trophic interactions through gut content analysis, and the challenge has been to achieve simultaneously high taxonomic breadth and resolution. General and practical methods are still needed, preferably independent of PCR amplification of barcodes, to recover a broader range of interactions. Here we applied shotgun-sequencing of the DNA from arthropod predator gut contents, extracted from four common coccinellid and dermapteran predators co-occurring in an agroecosystem in Brazil. By matching unassembled reads against six DNA reference databases obtained from public databases and newly assembled mitogenomes, and filtering for high overlap length and identity, we identified prey and other foreign DNA in the predator guts. Good taxonomic breadth and resolution was achieved (93% of prey identified to species or genus), but with low recovery of matching reads. Two to nine trophic interactions were found for these predators, some of which were only inferred by the presence of parasitoids and components of the microbiome known to be associated with aphid prey. Intraguild predation was also found, including among closely related ladybird species. Uncertainty arises from the lack of comprehensive reference databases and reliance on low numbers of matching reads accentuating the risk of false positives. We discuss caveats and some future prospects that could improve the use of direct DNA shotgun-sequencing to characterize arthropod trophic networks.


Assuntos
Besouros/fisiologia , Cadeia Alimentar , Conteúdo Gastrointestinal/química , Insetos/fisiologia , Análise de Sequência de DNA/métodos , Animais
14.
Nat Methods ; 13(5): 425-30, 2016 05.
Artigo em Inglês | MEDLINE | ID: mdl-27043882

RESUMO

Achieving high accuracy in orthology inference is essential for many comparative, evolutionary and functional genomic analyses, yet the true evolutionary history of genes is generally unknown and orthologs are used for very different applications across phyla, requiring different precision-recall trade-offs. As a result, it is difficult to assess the performance of orthology inference methods. Here, we present a community effort to establish standards and an automated web-based service to facilitate orthology benchmarking. Using this service, we characterize 15 well-established inference methods and resources on a battery of 20 different benchmarks. Standardized benchmarking provides a way for users to identify the most effective methods for the problem at hand, sets a minimum requirement for new tools and resources, and guides the development of more accurate orthology inference methods.


Assuntos
Biologia Computacional/normas , Genômica/normas , Filogenia , Proteômica/normas , Archaea/classificação , Archaea/genética , Bactérias/classificação , Bactérias/genética , Biologia Computacional/métodos , Bases de Dados Genéticas , Eucariotos/classificação , Eucariotos/genética , Ontologia Genética , Genômica/métodos , Modelos Genéticos , Proteômica/métodos , Análise de Sequência de Proteína , Homologia de Sequência , Especificidade da Espécie
15.
Genome Biol Evol ; 7(6): 1474-89, 2015 May 14.
Artigo em Inglês | MEDLINE | ID: mdl-25979752

RESUMO

Metagenomic analyses are challenging in metazoans, but high-copy number and repeat regions can be assembled from low-coverage sequencing by "genome skimming," which is applied here as a new way of characterizing metagenomes obtained in an ecological or taxonomic context. Illumina shotgun sequencing on two pools of Coleoptera (beetles) of approximately 200 species each were assembled into tens of thousands of scaffolds. Repeated low-coverage sequencing recovered similar scaffold sets consistently, although approximately 70% of scaffolds could not be identified against existing genome databases. Identifiable scaffolds included mitochondrial DNA, conserved sequences with hits to expressed sequence tag and protein databases, and known repeat elements of high and low complexity, including numerous copies of rRNA and histone genes. Assemblies of histones captured a diversity of gene order and primary sequence in Coleoptera. Scaffolds with similarity to multiple sites in available coleopteran genome sequences for Dendroctonus and Tribolium revealed high specificity of scaffolds to either of these genomes, in particular for high-copy number repeats. Numerous "clusters" of scaffolds mapped to the same genomic site revealed intra- and/or intergenomic variation within a metagenome pool. In addition to effect of taxonomic composition of the metagenomes, the number of mapped scaffolds also revealed structural differences between the two reference genomes, although the significance of this striking finding remains unclear. Finally, apparently exogenous sequences were recovered, including potential food plants, fungal pathogens, and bacterial symbionts. The "metagenome skimming" approach is useful for capturing the genomic diversity of poorly studied, species-rich lineages and opens new prospects in environmental genomics.


Assuntos
Besouros/genética , Metagenoma , Metagenômica/métodos , Animais , Bactérias/genética , Besouros/microbiologia , DNA/química , DNA de Plantas/química , Biblioteca Gênica , Genômica , Histonas/genética , Família Multigênica , Filogenia , Sequências Repetitivas de Ácido Nucleico
16.
Bioinformatics ; 31(3): 447-8, 2015 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-25273105

RESUMO

SUMMARY: We previously developed OrthoInspector, a package incorporating an original algorithm for the detection of orthology and inparalogy relations between different species. We have added new functionalities to the package. While its original algorithm was not modified, performing similar orthology predictions, we facilitated the prediction of very large databases (thousands of proteomes), refurbished its graphical interface, added new visualization tools for comparative genomics/protein family analysis and facilitated its deployment in a network environment. Finally, we have released three online databases of precomputed orthology relationships. AVAILABILITY: Package and databases are freely available at http://lbgi.fr/orthoinspector with all major browsers supported. CONTACT: odile.lecompte@unistra.fr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Gráficos por Computador , Bases de Dados Factuais , Proteômica/métodos , Análise de Sequência de Proteína/métodos , Software , Humanos , Anotação de Sequência Molecular , Filogenia
17.
Mol Ecol Resour ; 15(4): 880-92, 2015 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-25545417

RESUMO

DNA methods are useful to identify ingested prey items from the gut of predators, but reliable detection is hampered by low amounts of degraded DNA. PCR-based methods can retrieve minute amounts of starting material but suffer from amplification biases and cross-reactions with the predator and related species genomes. Here, we use PCR-free direct shotgun sequencing of total DNA isolated from the gut of the harlequin ladybird Harmonia axyridis at five time points after feeding on a single pea aphid Acyrthosiphon pisum. Sequence reads were matched to three reference databases: Insecta mitogenomes of 587 species, including H. axyridis sequenced here; A. pisum nuclear genome scaffolds; and scaffolds and complete genomes of 13 potential bacterial symbionts. Immediately after feeding, multicopy mtDNA of A. pisum was detected in tens of reads, while hundreds of matches to nuclear scaffolds were detected. Aphid nuclear DNA and mtDNA decayed at similar rates (0.281 and 0.11 h(-1) respectively), and the detectability periods were 32.7 and 23.1 h. Metagenomic sequencing also revealed thousands of reads of the obligate Buchnera aphidicola and facultative Regiella insecticola aphid symbionts, which showed exponential decay rates significantly faster than aphid DNA (0.694 and 0.80 h(-1) , respectively). However, the facultative aphid symbionts Hamiltonella defensa, Arsenophonus spp. and Serratia symbiotica showed an unexpected temporary increase in population size by 1-2 orders of magnitude in the predator guts before declining. Metagenomics is a powerful tool that can reveal complex relationships and the dynamics of interactions among predators, prey and their symbionts.


Assuntos
Afídeos/genética , Besouros/fisiologia , DNA/genética , DNA/isolamento & purificação , Enterobacteriaceae/genética , Trato Gastrointestinal/química , Metagenômica , Animais , Afídeos/classificação , Afídeos/microbiologia , Enterobacteriaceae/classificação , Dados de Sequência Molecular , Comportamento Predatório , Análise de Sequência de DNA
18.
Genomics ; 101(3): 178-86, 2013 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-23147676

RESUMO

TFIIH is a eukaryotic complex composed of two subcomplexes, the CAK (Cdk activating kinase) and the core-TFIIH. The core-TFIIH, composed of seven subunits (XPB, XPD, P62, P52, P44, P34, and P8), plays a crucial role in transcription and repair. Here, we performed an extended sequence analysis to establish the accurate phylogenetic distribution of the core-TFIIH in 63 eukaryotic organisms. In spite of the high conservation of the seven subunits at the sequence and genomic levels, the non-enzymatic P8, P34, P52 and P62 are absent from one or a few unicellular species. To gain insight into their respective roles, we undertook a comparative genomic analysis of the whole proteome to identify the gene sets sharing similar presence/absence patterns. While little information was inferred for P8 and P62, our studies confirm the known role of P52 in repair and suggest for the first time the implication of the core TFIIH in mRNA splicing via P34.


Assuntos
Evolução Molecular , Complexos Multiproteicos/genética , Filogenia , Fator de Transcrição TFIIH/genética , Animais , Quinases Ciclina-Dependentes/genética , Proteínas de Ligação a DNA , Humanos , Subunidades Proteicas/genética , Transcrição Gênica
19.
Nucleic Acids Res ; 40(Web Server issue): W71-5, 2012 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-22641855

RESUMO

A major challenge in the post-genomic era is a better understanding of how human genetic alterations involved in disease affect the gene products. The KD4v (Comprehensible Knowledge Discovery System for Missense Variant) server allows to characterize and predict the phenotypic effects (deleterious/neutral) of missense variants. The server provides a set of rules learned by Induction Logic Programming (ILP) on a set of missense variants described by conservation, physico-chemical, functional and 3D structure predicates. These rules are interpretable by non-expert humans and are used to accurately predict the deleterious/neutral status of an unknown mutation. The web server is available at http://decrypthon.igbmc.fr/kd4v.


Assuntos
Doença/genética , Mutação de Sentido Incorreto , Polimorfismo de Nucleotídeo Único , Software , Estudos de Associação Genética , Humanos , Internet , Bases de Conhecimento , Fenótipo , Proteínas/química , Proteínas/genética
20.
BMC Genomics ; 13: 5, 2012 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-22217008

RESUMO

BACKGROUND: The data from high throughput genomics technologies provide unique opportunities for studies of complex biological systems, but also pose many new challenges. The shift to the genome scale in evolutionary biology, for example, has led to many interesting, but often controversial studies. It has been suggested that part of the conflict may be due to errors in the initial sequences. Most gene sequences are predicted by bioinformatics programs and a number of quality issues have been raised, concerning DNA sequencing errors or badly predicted coding regions, particularly in eukaryotes. RESULTS: We investigated the impact of these errors on evolutionary studies and specifically on the identification of important genetic events. We focused on the detection of asymmetric evolution after duplication, which has been the subject of controversy recently. Using the human genome as a reference, we established a reliable set of 688 duplicated genes in 13 complete vertebrate genomes, where significantly different evolutionary rates are observed. We estimated the rates at which protein sequence errors occur and are accumulated in the higher-level analyses. We showed that the majority of the detected events (57%) are in fact artifacts due to the putative erroneous sequences and that these artifacts are sufficient to mask the true functional significance of the events. CONCLUSIONS: Initial errors are accumulated throughout the evolutionary analysis, generating artificially high rates of event predictions and leading to substantial uncertainty in the conclusions. This study emphasizes the urgent need for error detection and quality control strategies in order to efficiently extract knowledge from the new genome data.


Assuntos
Evolução Molecular , Genômica , Análise de Sequência de DNA/normas , Sequência de Aminoácidos , Animais , Artefatos , Biologia Computacional , Humanos , Dados de Sequência Molecular , Filogenia , Controle de Qualidade , Reprodutibilidade dos Testes , Alinhamento de Sequência , Homologia de Sequência
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA