Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
PLoS One ; 14(9): e0216885, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31498807

RESUMO

Unknown sequences, or gaps, are present in many published genomes across public databases. Gap filling is an important finishing step in de novo genome assembly, especially in large genomes. The gap filling problem is nontrivial and while there are many computational tools partially solving the problem, several have shortcomings as to the reliability and correctness of the output, i.e. the gap filled draft genome. SSPACE-LongRead is a scaffolding tool that utilizes long reads from multiple third-generation sequencing platforms in finding links between contigs and combining them. The long reads potentially contain sequence information to fill the gaps created in the scaffolding, but SSPACE-LongRead currently lacks this functionality. We present an automated pipeline called gapFinisher to process SSPACE-LongRead output to fill gaps after the scaffolding. gapFinisher is based on the controlled use of a previously published gap filling tool FGAP and works on all standard Linux/UNIX command lines. We compare the performance of gapFinisher against two other published gap filling tools PBJelly and GMcloser. We conclude that gapFinisher can fill gaps in draft genomes quickly and reliably. In addition, the serial design of gapFinisher makes it scale well from prokaryote genomes to larger genomes with no increase in the computational footprint.


Assuntos
Algoritmos , Mapeamento de Sequências Contíguas/estatística & dados numéricos , Genoma , Genômica/métodos , Análise de Sequência de DNA/estatística & dados numéricos , Software , Animais , Bactérias/genética , Benchmarking , Bases de Dados Genéticas , Genômica/estatística & dados numéricos , Sequenciamento de Nucleotídeos em Larga Escala , Focas Verdadeiras/genética
2.
BMC Bioinformatics ; 19(1): 257, 2018 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-29976145

RESUMO

BACKGROUND: Current high-throughput sequencing platforms provide capacity to sequence multiple samples in parallel. Different samples are labeled by attaching a short sample specific nucleotide sequence, barcode, to each DNA molecule prior pooling them into a mix containing a number of libraries to be sequenced simultaneously. After sequencing, the samples are binned by identifying the barcode sequence within each sequence read. In order to tolerate sequencing errors, barcodes should be sufficiently apart from each other in sequence space. An additional constraint due to both nucleotide usage and basecalling accuracy is that the proportion of different nucleotides should be in balance in each barcode position. The number of samples to be mixed in each sequencing run may vary and this introduces a problem how to select the best subset of available barcodes at sequencing core facility for each sequencing run. There are plenty of tools available for de novo barcode design, but they are not suitable for subset selection. RESULTS: We have developed a tool which can be used for three different tasks: 1) selecting an optimal barcode set from a larger set of candidates, 2) checking the compatibility of user-defined set of barcodes, e.g. whether two or more libraries with existing barcodes can be combined in a single sequencing pool, and 3) augmenting an existing set of barcodes. In our approach the selection process is formulated as a minimization problem. We define the cost function and a set of constraints and use integer programming to solve the resulting combinatorial problem. Based on the desired number of barcodes to be selected and the set of candidate sequences given by user, the necessary constraints are automatically generated and the optimal solution can be found. The method is implemented in C programming language and web interface is available at http://ekhidna2.biocenter.helsinki.fi/barcosel . CONCLUSIONS: Increasing capacity of sequencing platforms raises the challenge of mixing barcodes. Our method allows the user to select a given number of barcodes among the larger existing barcode set so that both sequencing errors are tolerated and the nucleotide balance is optimized. The tool is easy to access via web browser.


Assuntos
Código de Barras de DNA Taxonômico/métodos , Ensaios de Triagem em Larga Escala/métodos , Humanos
3.
Stand Genomic Sci ; 12: 87, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29276572

RESUMO

Bacteria of the genus Pectobacterium are economically important plant pathogens that cause soft rot disease on a wide variety of plant species. Here, we report the genome sequence of Pectobacterium carotovorum strain SCC1, a Finnish soft rot model strain isolated from a diseased potato tuber in the early 1980's. The genome of strain SCC1 consists of one circular chromosome of 4,974,798 bp and one circular plasmid of 5524 bp. In total 4451 genes were predicted, of which 4349 are protein coding and 102 are RNA genes.

4.
Genome Biol ; 17(1): 184, 2016 09 07.
Artigo em Inglês | MEDLINE | ID: mdl-27604469

RESUMO

BACKGROUND: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging. RESULTS: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2. CONCLUSIONS: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.


Assuntos
Biologia Computacional , Proteínas/química , Software , Relação Estrutura-Atividade , Algoritmos , Bases de Dados de Proteínas , Ontologia Genética , Humanos , Anotação de Sequência Molecular , Proteínas/genética
5.
Curr Biol ; 26(15): 1990-1997, 2016 08 08.
Artigo em Inglês | MEDLINE | ID: mdl-27426519

RESUMO

Despite the crucial roles of phytohormones in plant development, comparison of the exact distribution profiles of different hormones within plant meristems has thus far remained scarce. Vascular cambium, a wide lateral meristem with an extensive developmental zonation, provides an optimal system for hormonal and genetic profiling. By taking advantage of this spatial resolution, we show here that two major phytohormones, cytokinin and auxin, display different yet partially overlapping distribution profiles across the cambium. In contrast to auxin, which has its highest concentration in the actively dividing cambial cells, cytokinins peak in the developing phloem tissue of a Populus trichocarpa stem. Gene expression patterns of cytokinin biosynthetic and signaling genes coincided with this hormonal gradient. To explore the functional significance of cytokinin signaling for cambial development, we engineered transgenic Populus tremula × tremuloides trees with an elevated cytokinin biosynthesis level. Confirming that cytokinins function as major regulators of cambial activity, these trees displayed stimulated cambial cell division activity resulting in dramatically increased (up to 80% in dry weight) production of the lignocellulosic trunk biomass. To connect the increased growth to hormonal status, we analyzed the hormone distribution and genome-wide gene expression profiles in unprecedentedly high resolution across the cambial zone. Interestingly, in addition to showing an elevated cambial cytokinin content and signaling level, the cambial auxin concentration and auxin-responsive gene expression were also increased in the transgenic trees. Our results indicate that cytokinin signaling specifies meristematic activity through a graded distribution that influences the amplitude of the cambial auxin gradient.


Assuntos
Câmbio/crescimento & desenvolvimento , Citocininas/metabolismo , Ácidos Indolacéticos/metabolismo , Reguladores de Crescimento de Plantas/metabolismo , Populus/fisiologia , Transdução de Sinais , Genoma de Planta , Plantas Geneticamente Modificadas/genética , Plantas Geneticamente Modificadas/crescimento & desenvolvimento , Plantas Geneticamente Modificadas/fisiologia , Populus/genética , Populus/crescimento & desenvolvimento , Transcriptoma
6.
Stand Genomic Sci ; 10: 83, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26500719

RESUMO

Propionibacterium freudenreichii subsp. freudenreichii DSM 20271(T) is the type strain of species Propionibacterium freudenreichii that has a long history of safe use in the production dairy products and B12 vitamin. P. freudenreichii is the type species of the genus Propionibacterium which contains Gram-positive, non-motile and non-sporeforming bacteria with a high G + C content. We describe the genome of P. freudenreichii subsp. freudenreichii DSM 20271(T) consisting of a 2,649,166 bp chromosome containing 2320 protein-coding genes and 50 RNA-only encoding genes.

7.
Mol Ecol ; 24(19): 4886-900, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26331775

RESUMO

Insect flight is one of the most energetically demanding activities in the animal kingdom, yet for many insects flight is necessary for reproduction and foraging. Moreover, dispersal by flight is essential for the viability of species living in fragmented landscapes. Here, working on the Glanville fritillary butterfly (Melitaea cinxia), we use transcriptome sequencing to investigate gene expression changes caused by 15 min of flight in two contrasting populations and the two sexes. Male butterflies and individuals from a large metapopulation had significantly higher peak flight metabolic rate (FMR) than female butterflies and those from a small inbred population. In the pooled data, FMR was significantly positively correlated with genome-wide heterozygosity, a surrogate of individual inbreeding. The flight experiment changed the expression level of 1513 genes, including genes related to major energy metabolism pathways, ribosome biogenesis and RNA processing, and stress and immune responses. Males and butterflies from the population with high FMR had higher basal expression of genes related to energy metabolism, whereas females and butterflies from the small population with low FMR had higher expression of genes related to ribosome/RNA processing and immune response. Following the flight treatment, genes related to energy metabolism were generally down-regulated, while genes related to ribosome/RNA processing and immune response were up-regulated. These results suggest that common molecular mechanisms respond to flight and can influence differences in flight metabolic capacity between populations and sexes.


Assuntos
Borboletas/genética , Voo Animal , Expressão Gênica , Caracteres Sexuais , Transcriptoma , Animais , Borboletas/fisiologia , Metabolismo Energético/genética , Feminino , Finlândia , Masculino , Dados de Sequência Molecular , Polimorfismo de Nucleotídeo Único , Análise de Sequência de RNA
8.
BMC Genomics ; 16: 348, 2015 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-25933608

RESUMO

BACKGROUND: The symbiotic phenotype of Neorhizobium galegae, with strains specifically fixing nitrogen with either Galega orientalis or G. officinalis, has made it a target in research on determinants of host specificity in nitrogen fixation. The genomic differences between representative strains of the two symbiovars are, however, relatively small. This introduced a need for a dataset representing a larger bacterial population in order to make better conclusions on characteristics typical for a subset of the species. In this study, we produced draft genomes of eight strains of N. galegae having different symbiotic phenotypes, both with regard to host specificity and nitrogen fixation efficiency. These genomes were analysed together with the previously published complete genomes of N. galegae strains HAMBI 540T and HAMBI 1141. RESULTS: The results showed that the presence of an additional rpoN sigma factor gene in the symbiosis gene region is a characteristic specific to symbiovar orientalis, required for nitrogen fixation. Also the nifQ gene was shown to be crucial for functional symbiosis in both symbiovars. Genome-wide analyses identified additional genes characteristic of strains of the same symbiovar and of strains having similar plant growth promoting properties on Galega orientalis. Many of these genes are involved in transcriptional regulation or in metabolic functions. CONCLUSIONS: The results of this study confirm that the only symbiosis-related gene that is present in one symbiovar of N. galegae but not in the other is an rpoN gene. The specific function of this gene remains to be determined, however. New genes that were identified as specific for strains of one symbiovar may be involved in determining host specificity, while others are defined as potential determinant genes for differences in efficiency of nitrogen fixation.


Assuntos
Genoma Bacteriano , Rhizobiaceae/genética , Simbiose/genética , Sequência de Aminoácidos , Proteínas de Bactérias/química , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , DNA Bacteriano/análise , DNA Bacteriano/isolamento & purificação , DNA Bacteriano/metabolismo , Galega/crescimento & desenvolvimento , Galega/microbiologia , Dados de Sequência Molecular , Fixação de Nitrogênio/genética , Fenótipo , Sementes/crescimento & desenvolvimento , Sementes/metabolismo , Sementes/microbiologia , Alinhamento de Sequência , Análise de Sequência de DNA , Fator sigma/química , Fator sigma/genética , Fator sigma/metabolismo
9.
Bioinformatics ; 31(10): 1544-52, 2015 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-25653249

RESUMO

MOTIVATION: The last decade has seen a remarkable growth in protein databases. This growth comes at a price: a growing number of submitted protein sequences lack functional annotation. Approximately 32% of sequences submitted to the most comprehensive protein database UniProtKB are labelled as 'Unknown protein' or alike. Also the functionally annotated parts are reported to contain 30-40% of errors. Here, we introduce a high-throughput tool for more reliable functional annotation called Protein ANNotation with Z-score (PANNZER). PANNZER predicts Gene Ontology (GO) classes and free text descriptions about protein functionality. PANNZER uses weighted k-nearest neighbour methods with statistical testing to maximize the reliability of a functional annotation. RESULTS: Our results in free text description line prediction show that we outperformed all competing methods with a clear margin. In GO prediction we show clear improvement to our older method that performed well in CAFA 2011 challenge.


Assuntos
Mineração de Dados , Bases de Dados de Proteínas , Anotação de Sequência Molecular , Proteínas/metabolismo , Vocabulário Controlado , Análise por Conglomerados , Biologia Computacional/métodos , Interpretação Estatística de Dados , Bases de Dados Genéticas , Ontologia Genética , Humanos , Proteínas/genética
10.
Nat Commun ; 5: 4737, 2014 Sep 05.
Artigo em Inglês | MEDLINE | ID: mdl-25189940

RESUMO

Previous studies have reported that chromosome synteny in Lepidoptera has been well conserved, yet the number of haploid chromosomes varies widely from 5 to 223. Here we report the genome (393 Mb) of the Glanville fritillary butterfly (Melitaea cinxia; Nymphalidae), a widely recognized model species in metapopulation biology and eco-evolutionary research, which has the putative ancestral karyotype of n=31. Using a phylogenetic analyses of Nymphalidae and of other Lepidoptera, combined with orthologue-level comparisons of chromosomes, we conclude that the ancestral lepidopteran karyotype has been n=31 for at least 140 My. We show that fusion chromosomes have retained the ancestral chromosome segments and very few rearrangements have occurred across the fusion sites. The same, shortest ancestral chromosomes have independently participated in fusion events in species with smaller karyotypes. The short chromosomes have higher rearrangement rate than long ones. These characteristics highlight distinctive features of the evolutionary dynamics of butterflies and moths.


Assuntos
Borboletas/genética , Aberrações Cromossômicas , Evolução Molecular , Genoma/genética , Filogenia , Sintenia , Animais , Sequência de Bases , Mapeamento Cromossômico , Cariótipo , Funções Verossimilhança , Modelos Genéticos , Dados de Sequência Molecular , Análise de Sequência de DNA
11.
PLoS One ; 9(7): e101467, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24988207

RESUMO

We characterize allelic and gene expression variation between populations of the Glanville fritillary butterfly (Melitaea cinxia) from two fragmented and two continuous landscapes in northern Europe. The populations exhibit significant differences in their life history traits, e.g. butterflies from fragmented landscapes have higher flight metabolic rate and dispersal rate in the field, and higher larval growth rate, than butterflies from continuous landscapes. In fragmented landscapes, local populations are small and have a high risk of local extinction, and hence the long-term persistence at the landscape level is based on frequent re-colonization of vacant habitat patches, which is predicted to select for increased dispersal rate. Using RNA-seq data and a common garden experiment, we found that a large number of genes (1,841) were differentially expressed between the landscape types. Hexamerin genes, the expression of which has previously been shown to have high heritability and which correlate strongly with larval development time in the Glanville fritillary, had higher expression in fragmented than continuous landscapes. Genes that were more highly expressed in butterflies from newly-established than old local populations within a fragmented landscape were also more highly expressed, at the landscape level, in fragmented than continuous landscapes. This result suggests that recurrent extinctions and re-colonizations in fragmented landscapes select a for specific expression profile. Genes that were significantly up-regulated following an experimental flight treatment had higher basal expression in fragmented landscapes, indicating that these butterflies are genetically primed for frequent flight. Active flight causes oxidative stress, but butterflies from fragmented landscapes were more tolerant of hypoxia. We conclude that differences in gene expression between the landscape types reflect genomic adaptations to landscape fragmentation.


Assuntos
Adaptação Fisiológica , Borboletas/genética , Perfilação da Expressão Gênica , Animais , Borboletas/fisiologia , Proteínas de Transporte/genética , Análise por Conglomerados , Ecossistema , Expressão Gênica , Frequência do Gene , Variação Genética , Genoma , Proteínas de Insetos/genética , Polimorfismo de Nucleotídeo Único , Regulação para Cima
12.
J Proteome Res ; 13(8): 3748-3762, 2014 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-25014494

RESUMO

The present study reports comparative genomics and proteomics of Staphylococcus epidermidis (SE) strains isolated from bovine intramammary infection (PM221) and human hosts (ATCC12228 and RP62A). Genome-level profiling and protein expression analyses revealed that the bovine strain and the mildly infectious ATCC12228 strain are highly similar. Their genomes share high sequence identity and synteny, and both were predicted to encode the commensal-associated fdr marker gene. In contrast, PM221 was judged to differ from the sepsis-associated virulent human RP62A strain on the basis of distinct protein expression patterns and overall lack of genome synteny. The 2D DIGE and phenotypic analyses suggest that PM221 and ATCC12228 coordinate the TCA cycle activity and the formation of small colony variants in a way that could result in increased viability. Pilot experimental infection studies indicated that although ATCC12228 was able to infect a bovine host, the PM221 strain caused more severe clinical signs. Further investigation revealed strain- and condition-specific differences among surface bound proteins with likely roles in adhesion, biofilm formation, and immunomodulatory functions. Thus, our findings revealed a close link between the bovine and commensal-type human strains and suggest that humans could act as a reservoir of bovine mastitis-causing SE strains.

13.
Nat Methods ; 10(3): 221-7, 2013 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-23353650

RESUMO

Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Fifty-four methods representing the state of the art for protein function prediction were evaluated on a target set of 866 proteins from 11 organisms. Two findings stand out: (i) today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is considerable need for improvement of currently available tools.


Assuntos
Biologia Computacional/métodos , Biologia Molecular/métodos , Anotação de Sequência Molecular , Proteínas/fisiologia , Algoritmos , Animais , Bases de Dados de Proteínas , Exorribonucleases/classificação , Exorribonucleases/genética , Exorribonucleases/fisiologia , Previsões , Humanos , Proteínas/química , Proteínas/classificação , Proteínas/genética , Especificidade da Espécie
14.
PLoS Pathog ; 8(11): e1003013, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23133391

RESUMO

Soft rot disease is economically one of the most devastating bacterial diseases affecting plants worldwide. In this study, we present novel insights into the phylogeny and virulence of the soft rot model Pectobacterium sp. SCC3193, which was isolated from a diseased potato stem in Finland in the early 1980s. Genomic approaches, including proteome and genome comparisons of all sequenced soft rot bacteria, revealed that SCC3193, previously included in the species Pectobacterium carotovorum, can now be more accurately classified as Pectobacterium wasabiae. Together with the recently revised phylogeny of a few P. carotovorum strains and an increasing number of studies on P. wasabiae, our work indicates that P. wasabiae has been unnoticed but present in potato fields worldwide. A combination of genomic approaches and in planta experiments identified features that separate SCC3193 and other P. wasabiae strains from the rest of soft rot bacteria, such as the absence of a type III secretion system that contributes to virulence of other soft rot species. Experimentally established virulence determinants include the putative transcriptional regulator SirB, two partially redundant type VI secretion systems and two horizontally acquired clusters (Vic1 and Vic2), which contain predicted virulence genes. Genome comparison also revealed other interesting traits that may be related to life in planta or other specific environmental conditions. These traits include a predicted benzoic acid/salicylic acid carboxyl methyltransferase of eukaryotic origin. The novelties found in this work indicate that soft rot bacteria have a reservoir of unknown traits that may be utilized in the poorly understood latent stage in planta. The genomic approaches and the comparison of the model strain SCC3193 to other sequenced Pectobacterium strains, including the type strain of P. wasabiae, provides a solid basis for further investigation of the virulence, distribution and phylogeny of soft rot bacteria and, potentially, other bacteria as well.


Assuntos
Transferência Genética Horizontal , Família Multigênica , Pectobacterium/genética , Pectobacterium/patogenicidade , Filogenia , Doenças das Plantas/genética , Fatores de Virulência/genética , Doenças das Plantas/microbiologia , Raízes de Plantas/microbiologia , Solanum tuberosum/microbiologia , Fatores de Virulência/metabolismo
15.
PLoS One ; 7(12): e52492, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23300684

RESUMO

BACKGROUND: Molecular tools may greatly improve our understanding of pathogen evolution and epidemiology but technical constraints have hindered the development of genetic resources for parasites compared to free-living organisms. This study aims at developing molecular tools for Podosphaera plantaginis, an obligate fungal pathogen of Plantago lanceolata. This interaction has been intensively studied in the Åland archipelago of Finland with epidemiological data collected from over 4,000 host populations annually since year 2001. PRINCIPAL FINDINGS: A cDNA library of a pooled sample of fungal conidia was sequenced on the 454 GS-FLX platform. Over 549,411 reads were obtained and annotated into 45,245 contigs. Annotation data was acquired for 65.2% of the assembled sequences. The transcriptome assembly was screened for SNP loci, as well as for functionally important genes (mating-type genes and potential effector proteins). A genotyping assay of 27 SNP loci was designed and tested on 380 infected leaf samples from 80 populations within the Åland archipelago. With this panel we identified 85 multilocus genotypes (MLG) with uneven frequencies across the pathogen metapopulation. Approximately half of the sampled populations contain polymorphism. Our genotyping protocol revealed mixed-genotype infection within a single host leaf to be common. Mixed infection has been proposed as one of the main drivers of pathogen evolution, and hence may be an important process in this pathosystem. SIGNIFICANCE: The developed SNP panel offers exciting research perspectives for future studies in this well-characterized pathosystem. Also, the transcriptome provides an invaluable novel genomic resource for powdery mildews, which cause significant yield losses on commercially important crops annually. Furthermore, the features that render genetic studies in this system a challenge are shared with the majority of obligate parasitic species, and hence our results provide methodological insights from SNP calling to field sampling protocols for a wide range of biological systems.


Assuntos
Ascomicetos/genética , Ascomicetos/fisiologia , Perfilação da Expressão Gênica , Genótipo , Micoses/microbiologia , Polimorfismo de Nucleotídeo Único/genética , Análise de Sequência de RNA , Evolução Molecular , Marcadores Genéticos/genética , Interações Hospedeiro-Patógeno , Anotação de Sequência Molecular , Plantago/microbiologia , RNA Fúngico/genética , Reprodutibilidade dos Testes
16.
Bioinformatics ; 27(5): 700-6, 2011 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-21169380

RESUMO

MOTIVATION: Functional linkages implicate pairwise relationships between proteins that work together to implement biological tasks. During evolution, functionally linked proteins are likely to be preserved or eliminated across a range of genomes in a correlated fashion. Based on this hypothesis, phylogenetic profiling-based approaches try to detect pairs of protein families that show similar evolutionary patterns. Traditionally, the evolutionary pattern of a protein is encoded by either a binary profile of presence and absence of this protein across species or an occurrence profile that indicates the distribution of copies of this protein across species. RESULTS: In our study, we characterize each protein by its enhanced phylogenetic tree, a novel graphical model of the evolution of a protein family with explicitly marked by speciation and duplication events. By topological comparison between enhanced phylogenetic trees, we are able to detect the functionally associated protein pairs. Because the enhanced phylogenetic trees contain more evolutionary information of proteins, our method shows greater performance and discovers functional linkages among proteins more reliably compared with the conventional approaches.


Assuntos
Evolução Molecular , Filogenia , Proteínas/classificação , Algoritmos , Biologia Computacional/métodos , Humanos , Proteínas/genética , Saccharomyces cerevisiae/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...