Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Genes (Basel) ; 14(8)2023 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-37628678

RESUMO

Repetitive elements are a major component of DNA sequences due to their ability to propagate through the genome. Characterization of Metazoan repetitive profiles is improving; however, current pipelines fail to identify a significant proportion of divergent repeats in non-model organisms. The Decapoda order, for which repeat content analyses are largely lacking, is characterized by extremely variable genome sizes that suggest an important presence of repetitive elements. Here, we developed a new standardized pipeline to annotate repetitive elements in non-model organisms, which we applied to twenty Decapoda and six other Crustacea genomes. Using this new tool, we identified 10% more repetitive elements than standard pipelines. Repetitive elements were more abundant in Decapoda species than in other Crustacea, with a very large number of highly repeated satellite DNA families. Moreover, we demonstrated a high correlation between assembly size and transposable elements and different repeat dynamics between Dendrobranchiata and Reptantia. The patterns of repetitive elements largely reflect the phylogenetic relationships of Decapoda and the distinct evolutionary trajectories within Crustacea. In summary, our results highlight the impact of repetitive elements on genome evolution in Decapoda and the value of our novel annotation pipeline, which will provide a baseline for future comparative analyses.


Assuntos
Elementos de DNA Transponíveis , Decápodes , Animais , Filogenia , Elementos de DNA Transponíveis/genética , DNA Satélite
2.
Front Bioinform ; 3: 1178926, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37151482

RESUMO

Protein annotation errors can have significant consequences in a wide range of fields, ranging from protein structure and function prediction to biomedical research, drug discovery, and biotechnology. By comparing the domains of different proteins, scientists can identify common domains, classify proteins based on their domain architecture, and highlight proteins that have evolved differently in one or more species or clades. However, genome-wide identification of different protein domain architectures involves a complex error-prone pipeline that includes genome sequencing, prediction of gene exon/intron structures, and inference of protein sequences and domain annotations. Here we developed an automated fact-checking approach to distinguish true domain loss/gain events from false events caused by errors that occur during the annotation process. Using genome-wide ortholog sets and taking advantage of the high-quality human and Saccharomyces cerevisiae genome annotations, we analyzed the domain gain and loss events in the predicted proteomes of 9 non-human primates (NHP) and 20 non-S. cerevisiae fungi (NSF) as annotated in the Uniprot and Interpro databases. Our approach allowed us to quantify the impact of errors on estimates of protein domain gains and losses, and we show that domain losses are over-estimated ten-fold and three-fold in the NHP and NSF proteins respectively. This is in line with previous studies of gene-level losses, where issues with genome sequencing or gene annotation led to genes being falsely inferred as absent. In addition, we show that insistent protein domain annotations are a major factor contributing to the false events. For the first time, to our knowledge, we show that domain gains are also over-estimated by three-fold and two-fold respectively in NHP and NSF proteins. Based on our more accurate estimates, we infer that true domain losses and gains in NHP with respect to humans are observed at similar rates, while domain gains in the more divergent NSF are observed twice as frequently as domain losses with respect to S. cerevisiae. This study highlights the need to critically examine the scientific validity of protein annotations, and represents a significant step toward scalable computational fact-checking methods that may 1 day mitigate the propagation of wrong information in protein databases.

3.
Nucleic Acids Res ; 51(W1): W39-W45, 2023 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-37216590

RESUMO

Much of the human genetics variant repertoire is composed of single nucleotide variants (SNV) and small insertion/deletions (indel) but structural variants (SV) remain a major part of our modified DNA. SV detection has often been a complex question to answer either because of the necessity to use different technologies (array CGH, SNP array, Karyotype, Optical Genome Mapping…) to detect each category of SV or to get an appropriate resolution (Whole Genome Sequencing). Thanks to the deluge of pangenomic analysis, Human geneticists are accumulating SV and their interpretation remains time consuming and challenging. The AnnotSV webserver (https://www.lbgi.fr/AnnotSV/) aims at being an efficient tool to (i) annotate and interpret SV potential pathogenicity in the context of human diseases, (ii) recognize potential false positive variants from all the SV identified and (iii) visualize the patient variants repertoire. The most recent developments in the AnnotSV webserver are: (i) updated annotations sources and ranking, (ii) three novel output formats to allow diverse utilization (analysis, pipelines), as well as (iii) two novel user interfaces including an interactive circos view.


Assuntos
Mutação INDEL , Polimorfismo de Nucleotídeo Único , Software , Humanos , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Mapeamento por Restrição , Análise de Sequência de DNA , Sequenciamento Completo do Genoma , Doença/genética
4.
BMC Res Notes ; 15(1): 281, 2022 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-35989321

RESUMO

OBJECTIVES: Crayfish plague disease, caused by the oomycete pathogen Aphanomyces astaci represents one of the greatest risks for the biodiversity of the freshwater crayfish. This data article covers the de novo transcriptome assembly and annotation data of the noble crayfish and the marbled crayfish challenged with Ap. astaci. Following the controlled infection experiment (Francesconi et al. in Front Ecol Evol, 2021, https://doi.org/10.3389/fevo.2021.647037 ), we conducted a differential gene expression analysis described in (Bostjancic et al. in BMC Genom, 2022, https://doi.org/10.1186/s12864-022-08571-z ) DATA DESCRIPTION: In total, 25 noble crayfish and 30 marbled crayfish were selected. Hepatopancreas tissue was isolated, followed by RNA sequencing using the Illumina NovaSeq 6000 platform. Raw data was checked for quality with FastQC, adapter and quality trimming were conducted using Trimmomatic followed by de novo assembly with Trinity. Assembly quality was assessed with BUSCO, at 93.30% and 93.98% completeness for the noble crayfish and the marbled crayfish, respectively. Transcripts were annotated using the Dammit! pipeline and assigned to KEGG pathways. Respective transcriptome and raw datasets may be reused as the reference transcriptome assemblies for future expression studies.


Assuntos
Aphanomyces , Astacoidea , Animais , Aphanomyces/genética , Astacoidea/genética , Hepatopâncreas , Análise de Sequência de RNA , Transcriptoma/genética
5.
BMC Genomics ; 23(1): 600, 2022 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-35989333

RESUMO

BACKGROUND: For over a century, scientists have studied host-pathogen interactions between the crayfish plague disease agent Aphanomyces astaci and freshwater crayfish. It has been hypothesised that North American crayfish hosts are disease-resistant due to the long-lasting coevolution with the pathogen. Similarly, the increasing number of latent infections reported in the historically sensitive European crayfish hosts seems to indicate that similar coevolutionary processes are occurring between European crayfish and A. astaci. Our current understanding of these host-pathogen interactions is largely focused on the innate immunity processes in the crayfish haemolymph and cuticle, but the molecular basis of the observed disease-resistance and susceptibility remain unclear. To understand how coevolution is shaping the host's molecular response to the pathogen, susceptible native European noble crayfish and invasive disease-resistant marbled crayfish were challenged with two A. astaci strains of different origin: a haplogroup A strain (introduced to Europe at least 50 years ago, low virulence) and a haplogroup B strain (signal crayfish in lake Tahoe, USA, high virulence). Here, we compare the gene expression profiles of the hepatopancreas, an integrated organ of crayfish immunity and metabolism. RESULTS: We characterised several novel innate immune-related gene groups in both crayfish species. Across all challenge groups, we detected 412 differentially expressed genes (DEGs) in the noble crayfish, and 257 DEGs in the marbled crayfish. In the noble crayfish, a clear immune response was detected to the haplogroup B strain, but not to the haplogroup A strain. In contrast, in the marbled crayfish we detected an immune response to the haplogroup A strain, but not to the haplogroup B strain. CONCLUSIONS: We highlight the hepatopancreas as an important hub for the synthesis of immune molecules in the response to A. astaci. A clear distinction between the innate immune response in the marbled crayfish and the noble crayfish is the capability of the marbled crayfish to mobilise a higher variety of innate immune response effectors. With this study we outline that the type and strength of the host immune response to the pathogen is strongly influenced by the coevolutionary history of the crayfish with specific A. astaci strains.


Assuntos
Aphanomyces , Animais , Aphanomyces/genética , Astacoidea/genética , Resistência à Doença , Lagos , Transcriptoma
6.
Nucleic Acids Res ; 50(W1): W623-W632, 2022 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-35552456

RESUMO

The Orthology Benchmark Service (https://orthology.benchmarkservice.org) is the gold standard for orthology inference evaluation, supported and maintained by the Quest for Orthologs consortium. It is an essential resource to compare existing and new methods of orthology inference (the bedrock for many comparative genomics and phylogenetic analysis) over a standard dataset and through common procedures. The Quest for Orthologs Consortium is dedicated to maintaining the resource up to date, through regular updates of the Reference Proteomes and increasingly accessible data through the OpenEBench platform. For this update, we have added a new benchmark based on curated orthology assertion from the Vertebrate Gene Nomenclature Committee, and provided an example meta-analysis of the public predictions present on the platform.


Assuntos
Benchmarking , Genômica , Filogenia , Genômica/métodos , Proteoma
7.
BMC Bioinformatics ; 22(1): 561, 2021 Nov 23.
Artigo em Inglês | MEDLINE | ID: mdl-34814826

RESUMO

BACKGROUND: Ab initio prediction of splice sites is an essential step in eukaryotic genome annotation. Recent predictors have exploited Deep Learning algorithms and reliable gene structures from model organisms. However, Deep Learning methods for non-model organisms are lacking. RESULTS: We developed Spliceator to predict splice sites in a wide range of species, including model and non-model organisms. Spliceator uses a convolutional neural network and is trained on carefully validated data from over 100 organisms. We show that Spliceator achieves consistently high accuracy (89-92%) compared to existing methods on independent benchmarks from human, fish, fly, worm, plant and protist organisms. CONCLUSIONS: Spliceator is a new Deep Learning method trained on high-quality data, which can be used to predict splice sites in diverse organisms, ranging from human to protists, with consistently high accuracy.


Assuntos
Algoritmos , Redes Neurais de Computação , Animais , Genoma , Humanos
8.
Genes (Basel) ; 12(9)2021 09 21.
Artigo em Inglês | MEDLINE | ID: mdl-34573434

RESUMO

Multiciliogenesis is a complex process that allows the generation of hundreds of motile cilia on the surface of specialized cells, to create fluid flow across epithelial surfaces. Dysfunction of human multiciliated cells is associated with diseases of the brain, airway and reproductive tracts. Despite recent efforts to characterize the transcriptional events responsible for the differentiation of multiciliated cells, a lot of actors remain to be identified. In this work, we capitalize on the ever-growing quantity of high-throughput data to search for new candidate genes involved in multiciliation. After performing a large-scale screening using 10 transcriptomics datasets dedicated to multiciliation, we established a specific evolutionary signature involving Otomorpha fish to use as a criterion to select the most likely targets. Combining both approaches highlighted a list of 114 potential multiciliated candidates. We characterized these genes first by generating protein interaction networks, which showed various clusters of ciliated and multiciliated genes, and then by computing phylogenetic profiles. In the end, we selected 11 poorly characterized genes that seem like particularly promising multiciliated candidates. By combining functional and comparative genomics methods, we developed a novel type of approach to study biological processes and identify new promising candidates linked to that process.


Assuntos
Cílios/fisiologia , Proteínas de Peixes/genética , Peixes , Genômica/métodos , Animais , Evolução Biológica , Diferenciação Celular/genética , Cílios/genética , Bases de Dados Genéticas , Proteínas de Peixes/metabolismo , Expressão Gênica , Humanos , Filogenia , Transcriptoma
9.
Nucleic Acids Res ; 49(W1): W21-W28, 2021 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-34023905

RESUMO

With the dramatic increase of pangenomic analysis, Human geneticists have generated large amount of genomic data including millions of small variants (SNV/indel) but also thousands of structural variations (SV) mainly from next-generation sequencing and array-based techniques. While the identification of the complete SV repertoire of a patient is getting possible, the interpretation of each SV remains challenging. To help identifying human pathogenic SV, we have developed a web server dedicated to their annotation and ranking (AnnotSV) as well as their visualization and interpretation (knotAnnotSV) freely available at the following address: https://www.lbgi.fr/AnnotSV/. A large amount of annotations from >20 sources is integrated in our web server including among others genes, haploinsufficiency, triplosensitivity, regulatory elements, known pathogenic or benign genomic regions, phenotypic data. An ACMG/ClinGen compliant prioritization module allows the scoring and the ranking of SV into 5 SV classes from pathogenic to benign. Finally, the visualization interface displays the annotated SV in an interactive way including popups, search fields, filtering options, advanced colouring to highlight pathogenic SV and hyperlinks to the UCSC genome browser or other public databases. This web server is designed for diagnostic and research analysis by providing important resources to the user.


Assuntos
Variação Estrutural do Genoma , Software , Genoma Humano , Genômica , Humanos , Internet , Anotação de Sequência Molecular , Fenótipo , Polimorfismo de Nucleotídeo Único
10.
Genome Biol Evol ; 13(1)2021 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-33211099

RESUMO

In the multiomics era, comparative genomics studies based on gene repertoire comparison are increasingly used to investigate evolutionary histories of species, to study genotype-phenotype relations, species adaptation to various environments, or to predict gene function using phylogenetic profiling. However, comparisons of orthologs have highlighted the prevalence of sequence plasticity among species, showing the benefits of combining protein and subprotein levels of analysis to allow for a more comprehensive study of genotype/phenotype correlations. In this article, we introduce a new approach called BLUR (BLAST Unexpected Ranking), capable of detecting genotype divergence or specialization between two related clades at different levels: gain/loss of proteins but also of subprotein regions. These regions can correspond to known domains, uncharacterized regions, or even small motifs. Our method was created to allow two types of research strategies: 1) the comparison of two groups of species with no previous knowledge, with the aim of predicting phenotype differences or specializations between close species or 2) the study of specific phenotypes by comparing species that present the phenotype of interest with species that do not. We designed a website to facilitate the use of BLUR with a possibility of in-depth analysis of the results with various tools, such as functional enrichments, protein-protein interaction networks, and multiple sequence alignments. We applied our method to the study of two different biological pathways and to the comparison of several groups of close species, all with very promising results. BLUR is freely available at http://lbgi.fr/blur/.


Assuntos
Evolução Molecular , Genômica/métodos , Proteínas/genética , Proteoma/genética , Proteoma/metabolismo , Animais , Proteínas do Domínio Armadillo , Bactérias , Sequência Conservada/genética , Fungos , Genótipo , Humanos , Fenótipo , Filogenia , Alinhamento de Sequência , Análise de Sequência , Software
11.
PLoS One ; 15(7): e0236962, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32735577

RESUMO

The diffusion of next-generation sequencing technologies has revolutionized research and diagnosis in the field of rare Mendelian disorders, notably via whole-exome sequencing (WES). However, one of the main issues hampering achievement of a diagnosis via WES analyses is the extended list of variants of unknown significance (VUS), mostly composed of missense variants. Hence, improved solutions are needed to address the challenges of identifying potentially deleterious variants and ranking them in a prioritized short list. We present MISTIC (MISsense deleTeriousness predICtor), a new prediction tool based on an original combination of two complementary machine learning algorithms using a soft voting system that integrates 113 missense features, ranging from multi-ethnic minor allele frequencies and evolutionary conservation, to physiochemical and biochemical properties of amino acids. Our approach also uses training sets with a wide spectrum of variant profiles, including both high-confidence positive (deleterious) and negative (benign) variants. Compared to recent state-of-the-art prediction tools in various benchmark tests and independent evaluation scenarios, MISTIC exhibits the best and most consistent performance, notably with the highest AUC value (> 0.95). Importantly, MISTIC maintains its high performance in the specific case of discriminating deleterious variants from benign variants that are rare or population-specific. In a clinical context, MISTIC drastically reduces the list of VUS (<30%) and significantly improves the ranking of "causative" deleterious variants. Pre-computed MISTIC scores for all possible human missense variants are available at http://lbgi.fr/mistic.


Assuntos
Sequenciamento do Exoma/métodos , Doenças Genéticas Inatas , Mutação de Sentido Incorreto , Software , Biologia Computacional , Doenças Genéticas Inatas/diagnóstico , Doenças Genéticas Inatas/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Aprendizado de Máquina
12.
Nucleic Acids Res ; 47(D1): D411-D418, 2019 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-30380106

RESUMO

OrthoInspector is one of the leading software suites for orthology relations inference. In this paper, we describe a major redesign of the OrthoInspector online resource along with a significant increase in the number of species: 4753 organisms are now covered across the three domains of life, making OrthoInspector the most exhaustive orthology resource to date in terms of covered species (excluding viruses). The new website integrates original data exploration and visualization tools in an ergonomic interface. Distributions of protein orthologs are represented by heatmaps summarizing their evolutionary histories, and proteins with similar profiles can be directly accessed. Two novel tools have been implemented for comparative genomics: a phylogenetic profile search that can be used to find proteins with a specific presence-absence profile and investigate their functions and, inversely, a GO profiling tool aimed at deciphering evolutionary histories of molecular functions, processes or cell components. In addition to the re-designed website, the OrthoInspector resource now provides a REST interface for programmatic access. OrthoInspector 3.0 is available at http://lbgi.fr/orthoinspectorv3.


Assuntos
Bases de Dados Genéticas , Genômica , Algoritmos , Bactérias/genética , Classificação , Eucariotos/genética , Evolução Molecular , Previsões , Ontologia Genética , Internet , Filogenia , Proteoma , Homologia de Sequência do Ácido Nucleico , Software , Especificidade da Espécie
13.
Bioinformatics ; 34(19): 3390-3392, 2018 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-29741582

RESUMO

Summary: Comparative studies of protein sequences are widely used in evolutionary and comparative genomics studies, but there is a lack of efficient tools to identify conserved regions ab initio within a protein multiple alignment. PROBE provides a fully automatic analysis of protein family conservation, to identify conserved regions, or 'blocks', that may correspond to structural/functional domains or motifs. Conserved blocks are identified at two different levels: (i) family level blocks indicate sites that are probably of central importance to the protein's structure or function, and (ii) sub-family level blocks highlight regions that may signify functional specialization, such as binding partners, etc. All conserved blocks are mapped onto a phylogenetic tree and can also be visualized in the context of the multiple sequence alignment. PROBE thus facilitates in-depth studies of sequence-structure-function-evolution relationships, and opens the way to block-level phylogenetic profiling. Availability and implementation: Freely available on the web at http://www.lbgi.fr/∼julie/probe/web.


Assuntos
Evolução Molecular , Proteínas/genética , Software , Sequência de Aminoácidos , Biologia Computacional , Sequência Conservada , Filogenia , Alinhamento de Sequência
14.
Bioinformatics ; 34(20): 3572-3574, 2018 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-29669011

RESUMO

Summary: Structural Variations (SV) are a major source of variability in the human genome that shaped its actual structure during evolution. Moreover, many human diseases are caused by SV, highlighting the need to accurately detect those genomic events but also to annotate them and assist their biological interpretation. Therefore, we developed AnnotSV that compiles functionally, regulatory and clinically relevant information and aims at providing annotations useful to (i) interpret SV potential pathogenicity and (ii) filter out SV potential false positive. In particular, AnnotSV reports heterozygous and homozygous counts of single nucleotide variations (SNVs) and small insertions/deletions called within each SV for the analyzed patients, this genomic information being extremely useful to support or question the existence of an SV. We also report the computed allelic frequency relative to overlapping variants from DGV (MacDonald et al., 2014), that is especially powerful to filter out common SV. To delineate the strength of AnnotSV, we annotated the 4751 SV from one sample of the 1000 Genomes Project, integrating the sample information of four million of SNV/indel, in less than 60 s. Availability and implementation: AnnotSV is implemented in Tcl and runs in command line on all platforms. The source code is available under the GNU GPL license. Source code, README and Supplementary data are available at http://lbgi.fr/AnnotSV/. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Frequência do Gene , Genoma Humano , Genômica , Humanos , Anotação de Sequência Molecular
15.
J Med Internet Res ; 19(6): e212, 2017 06 16.
Artigo em Inglês | MEDLINE | ID: mdl-28623182

RESUMO

BACKGROUND: The constant and massive increase of biological data offers unprecedented opportunities to decipher the function and evolution of genes and their roles in human diseases. However, the multiplicity of sources and flow of data mean that efficient access to useful information and knowledge production has become a major challenge. This challenge can be addressed by taking inspiration from Web 2.0 and particularly social networks, which are at the forefront of big data exploration and human-data interaction. OBJECTIVE: MyGeneFriends is a Web platform inspired by social networks, devoted to genetic disease analysis, and organized around three types of proactive agents: genes, humans, and genetic diseases. The aim of this study was to improve exploration and exploitation of biological, postgenomic era big data. METHODS: MyGeneFriends leverages conventions popularized by top social networks (Facebook, LinkedIn, etc), such as networks of friends, profile pages, friendship recommendations, affinity scores, news feeds, content recommendation, and data visualization. RESULTS: MyGeneFriends provides simple and intuitive interactions with data through evaluation and visualization of connections (friendships) between genes, humans, and diseases. The platform suggests new friends and publications and allows agents to follow the activity of their friends. It dynamically personalizes information depending on the user's specific interests and provides an efficient way to share information with collaborators. Furthermore, the user's behavior itself generates new information that constitutes an added value integrated in the network, which can be used to discover new connections between biological agents. CONCLUSIONS: We have developed MyGeneFriends, a Web platform leveraging conventions from popular social networks to redefine the relationship between humans and biological big data and improve human processing of biomedical data. MyGeneFriends is available at lbgi.fr/mygenefriends.


Assuntos
Doenças Genéticas Inatas/genética , Testes Genéticos/métodos , Rede Social , Telemedicina/estatística & dados numéricos , Amigos , Humanos , Pesquisadores
16.
Mol Biol Evol ; 34(8): 2016-2034, 2017 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-28460059

RESUMO

Cilia (flagella) are important eukaryotic organelles, present in the Last Eukaryotic Common Ancestor, and are involved in cell motility and integration of extracellular signals. Ciliary dysfunction causes a class of genetic diseases, known as ciliopathies, however current knowledge of the underlying mechanisms is still limited and a better characterization of genes is needed. As cilia have been lost independently several times during evolution and they are subject to important functional variation between species, ciliary genes can be investigated through comparative genomics. We performed phylogenetic profiling by predicting orthologs of human protein-coding genes in 100 eukaryotic species. The analysis integrated three independent methods to predict a consensus set of 274 ciliary genes, including 87 new promising candidates. A fine-grained analysis of the phylogenetic profiles allowed a partitioning of ciliary genes into modules with distinct evolutionary histories and ciliary functions (assembly, movement, centriole, etc.) and thus propagation of potential annotations to previously undocumented genes. The cilia/basal body localization was experimentally confirmed for five of these previously unannotated proteins (LRRC23, LRRC34, TEX9, WDR27, and BIVM), validating the relevance of our approach. Furthermore, our multi-level analysis sheds light on the core gene sets retained in gamete-only flagellates or Ecdysozoa for instance. By combining gene-centric and species-oriented analyses, this work reveals new ciliary and ciliopathy gene candidates and provides clues about the evolution of ciliary processes in the eukaryotic domain. Additionally, the positive and negative reference gene sets and the phylogenetic profile of human genes constructed during this study can be exploited in future work.


Assuntos
Cílios/genética , Ciliopatias/genética , Animais , Movimento Celular/genética , Cílios/metabolismo , Ciliopatias/metabolismo , Bases de Dados de Ácidos Nucleicos , Eucariotos , Células Eucarióticas , Evolução Molecular , Flagelos/genética , Flagelos/metabolismo , Genômica , Humanos , Filogenia , Análise de Sequência de DNA/métodos
17.
BMC Bioinformatics ; 17(1): 271, 2016 Jul 07.
Artigo em Inglês | MEDLINE | ID: mdl-27387560

RESUMO

BACKGROUND: A standard procedure in many areas of bioinformatics is to use a multiple sequence alignment (MSA) as the basis for various types of homology-based inference. Applications include 3D structure modelling, protein functional annotation, prediction of molecular interactions, etc. These applications, however sophisticated, are generally highly sensitive to the alignment used, and neglecting non-homologous or uncertain regions in the alignment can lead to significant bias in the subsequent inferences. RESULTS: Here, we present a new method, LEON-BIS, which uses a robust Bayesian framework to estimate the homologous relations between sequences in a protein multiple alignment. Sequences are clustered into sub-families and relations are predicted at different levels, including 'core blocks', 'regions' and full-length proteins. The accuracy and reliability of the predictions are demonstrated in large-scale comparisons using well annotated alignment databases, where the homologous sequence segments are detected with very high sensitivity and specificity. CONCLUSIONS: LEON-BIS uses robust Bayesian statistics to distinguish the portions of multiple sequence alignments that are conserved either across the whole family or within subfamilies. LEON-BIS should thus be useful for automatic, high-throughput genome annotations, 2D/3D structure predictions, protein-protein interaction predictions etc.


Assuntos
Teorema de Bayes , Biologia Computacional/métodos , Proteínas/química , Alinhamento de Sequência/métodos , Sequência de Aminoácidos , Humanos , Proteínas/genética , Homologia de Sequência de Aminoácidos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA