Pesquisa | Portal Regional da BVS

BIR Pipeline for Preparation of Phylogenomic Data.

Kumar, Surendra; Krabberød, Anders K; Neumann, Ralf S; Michalickova, Katerina; Zhao, Sen; Zhang, Xiaoli; Shalchian-Tabrizi, Kamran.

Evol Bioinform Online ; 11: 79-83, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-25987827

RESUMO

SUMMARY: We present a pipeline named BIR (Blast, Identify and Realign) developed for phylogenomic analyses. BIR is intended for the identification of gene sequences applicable for phylogenomic inference. The pipeline allows users to apply their own manually curated sequence alignments (seed) in search for homologous genes in sequence databases and available genomes. BIR automatically adds the identified sequences from these databases to the seed alignments and reconstruct a phylogenetic tree from each. The BIR pipeline is an efficient tool for the identification of orthologous gene copies because it expands user-defined sequence alignments and conducts massive parallel phylogenetic reconstruction. The application is also particularly useful for large-scale sequencing projects that require management of a large number of single-gene alignments for gene comparison, functional annotation, and evolutionary analyses. AVAILABILITY: The BIR user manual is available at http://www.bioportal.no/ and can be accessed through Lifeportal at https://lifeportal.uio.no. Access is free but requires a user account registration using the link "Register for BIR access" from the Lifeportal homepage.

A survey of protein interaction data and multigenic inherited disorders.

Mora, Antonio; Michalickova, Katerina; Donaldson, Ian M.

BMC Bioinformatics ; 14: 47, 2013 Feb 11.

Artigo em Inglês | MEDLINE | ID: mdl-23398688

RESUMO

BACKGROUND: Multigenic diseases are often associated with protein complexes or interactions involved in the same pathway. We wanted to estimate to what extent this is true given a consolidated protein interaction data set. The study stresses data integration and data representation issues. RESULTS: We constructed 497 multigenic disease groups from OMIM and tested for overlaps with interaction and pathway data. A total of 159 disease groups had significant overlaps with protein interaction data consolidated by iRefIndex. A further 68 disease overlaps were found only in the KEGG pathway database. No single database contained all significant overlaps thus stressing the importance of data integration. We also found that disease groups overlapped with all three interaction data types: n-ary, spoke-represented complexes and binary data - thus stressing the importance of considering each of these data types separately. CONCLUSIONS: Almost half of our multigenic disease groups could potentially be explained by protein complexes and pathways. However, the fact that no database or data type was able to cover all disease groups suggests that no single database has systematically covered all disease groups for potential related complex and pathway data. This survey provides a basis for further curation efforts to confirm and search for overlaps between diseases and interaction data. The accompanying R script can be used to reproduce the work and track progress in this area as databases change. Disease group overlaps can be further explored using the iRefscape plugin for Cytoscape.

Assuntos

Doenças Genéticas Inatas/genética , Complexos Multiproteicos/genética , Algoritmos , Bases de Dados Genéticas , Bases de Dados de Proteínas , Humanos , Hiperglicinemia não Cetótica/genética , Síndrome de Liddle/genética , Nefrite Hereditária/genética , Mapeamento de Interação de Proteínas

iRefScape. A Cytoscape plug-in for visualization and data mining of protein interaction data from iRefIndex.

Razick, Sabry; Mora, Antonio; Michalickova, Katerina; Boddie, Paul; Donaldson, Ian M.

BMC Bioinformatics ; 12: 388, 2011 Oct 05.

Artigo em Inglês | MEDLINE | ID: mdl-21975162

RESUMO

BACKGROUND: The iRefIndex consolidates protein interaction data from ten databases in a rigorous manner using sequence-based hash keys. Working with consolidated interaction data comes with distinct challenges: data are redundant, overlapping, highly interconnected and may be collected and represented using different curation practices. These phenomena were quantified in our previous studies. RESULTS: The iRefScape plug-in for the Cytoscape graphical viewer addresses these challenges. We show how these factors impact on data-mining tasks and how our solutions resolve them in a simple and efficient manner. A uniform accession space is used to limit redundancy and support search expansion and searching on multiple accession types. Multiple node and edge features support data filtering and mining. Node colours and features supply information about search result provenance. Overlapping evidence is presented using a multi-graph and a bi-partite representation is used to distinguish binary and n-ary source data. Searching for interactions between sets of proteins is supported and specifically includes searches on disease-related genes found in OMIM. Finally, a synchronized adjacency-matrix view facilitates visualization of relationships between sets of user defined groups. CONCLUSIONS: The iRefScape plug-in will be of interest to advanced users of interaction data. The plug-in provides access to a consolidated data set in a uniform accession space while remaining faithful to the underlying source data. Tools are provided to facilitate a range of tasks from a simple search to knowledge discovery. The plug-in uses a number of strategies that will be of interest to other plug-in developers.

Assuntos

Mineração de Dados , Bases de Dados de Proteínas , Proteínas/metabolismo , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Mapeamento de Interação de Proteínas , Software

The contribution of DNA base damage to human cancer is modulated by the base excision repair interaction network.

Arczewska, Katarzyna D; Michalickova, Katerina; Donaldson, Ian M; Nilsen, Hilde.

Crit Rev Oncog ; 14(4): 217-73, 2008.

Artigo em Inglês | MEDLINE | ID: mdl-19645683

RESUMO

Base excision repair (BER) is a major mode of repair of DNA base damage. BER is required for maintenance of genetic stability, which is important in the prevention of cancer. However, direct genetic associations between BER deficiency and human cancer have been difficult to firmly establish, and the first-generation mouse models deficient in individual DNA-glycosylases, which are the enzymes that give lesion specificity to the BER pathway, generally do not develop spontaneous tumors. This review summarizes our current understanding of the contribution of DNA base damage to human cancer, with a particular focus on DNA-glycosylases and two of the main enzymes that prevent misincorporation of damaged deoxynucleotide triphosphates into DNA: the dUTPase and MTH1. The available evidence suggests that the most important factors determining individual susceptibility to cancer are not mutations in individual DNA repair enzymes but rather the regulation of expression and modulation of function by protein modification and interaction partners. With this in mind, we present a comprehensive list of protein-protein interactions involving DNA-glycosylases or either of the two enzymes that limit incorporation of damaged nucleotides into DNA. Interacting partners with a known role in human cancer are specifically highlighted.

Assuntos

Dano ao DNA/fisiologia , Reparo do DNA/fisiologia , Neoplasias/genética , Animais , Sequência de Bases , Dano ao DNA/genética , Reparo do DNA/genética , DNA de Neoplasias/genética , DNA de Neoplasias/metabolismo , Redes Reguladoras de Genes/fisiologia , Humanos , Camundongos , Modelos Biológicos , Neoplasias/metabolismo , Ligação Proteica/fisiologia

PreBIND and Textomy--mining the biomedical literature for protein-protein interactions using a support vector machine.

Donaldson, Ian; Martin, Joel; de Bruijn, Berry; Wolting, Cheryl; Lay, Vicki; Tuekam, Brigitte; Zhang, Shudong; Baskin, Berivan; Bader, Gary D; Michalickova, Katerina; Pawson, Tony; Hogue, Christopher W V.

BMC Bioinformatics ; 4: 11, 2003 Mar 27.

Artigo em Inglês | MEDLINE | ID: mdl-12689350

RESUMO

BACKGROUND: The majority of experimentally verified molecular interaction and biological pathway data are present in the unstructured text of biomedical journal articles where they are inaccessible to computational methods. The Biomolecular interaction network database (BIND) seeks to capture these data in a machine-readable format. We hypothesized that the formidable task-size of backfilling the database could be reduced by using Support Vector Machine technology to first locate interaction information in the literature. We present an information extraction system that was designed to locate protein-protein interaction data in the literature and present these data to curators and the public for review and entry into BIND. RESULTS: Cross-validation estimated the support vector machine's test-set precision, accuracy and recall for classifying abstracts describing interaction information was 92%, 90% and 92% respectively. We estimated that the system would be able to recall up to 60% of all non-high throughput interactions present in another yeast-protein interaction database. Finally, this system was applied to a real-world curation problem and its use was found to reduce the task duration by 70% thus saving 176 days. CONCLUSIONS: Machine learning methods are useful as tools to direct interaction and pathway database back-filling; however, this potential can only be realized if these techniques are coupled with human review and entry into a factual database such as BIND. The PreBIND system described here is available to the public at http://bind.ca. Current capabilities allow searching for human, mouse and yeast protein-interaction information.

Assuntos

Inteligência Artificial , Armazenamento e Recuperação da Informação/tendências , Mapeamento de Interação de Proteínas/métodos , Algoritmos , Biologia Computacional/métodos , Biologia Computacional/estatística & dados numéricos , Bases de Dados Factuais/tendências , Bases de Dados de Proteínas/tendências , Genoma Fúngico , Mapeamento de Interação de Proteínas/classificação , Mapeamento de Interação de Proteínas/estatística & dados numéricos , PubMed/classificação , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/química

Species-specific protein sequence and fold optimizations.

Dumontier, Michel; Michalickova, Katerina; Hogue, Christopher W V.

BMC Bioinformatics ; 3: 39, 2002 Dec 17.

Artigo em Inglês | MEDLINE | ID: mdl-12487631

RESUMO

BACKGROUND: An organism's ability to adapt to its particular environmental niche is of fundamental importance to its survival and proliferation. In the largest study of its kind, we sought to identify and exploit the amino-acid signatures that make species-specific protein adaptation possible across 100 complete genomes. RESULTS: Environmental niche was determined to be a significant factor in variability from correspondence analysis using the amino acid composition of over 360,000 predicted open reading frames (ORFs) from 17 archaea, 76 bacteria and 7 eukaryote complete genomes. Additionally, we found clusters of phylogenetically unrelated archaea and bacteria that share similar environments by amino acid composition clustering. Composition analyses of conservative, domain-based homology modeling suggested an enrichment of small hydrophobic residues Ala, Gly, Val and charged residues Asp, Glu, His and Arg across all genomes. However, larger aromatic residues Phe, Trp and Tyr are reduced in folds, and these results were not affected by low complexity biases. We derived two simple log-odds scoring functions from ORFs (CG) and folds (CF) for each of the complete genomes. CF achieved an average cross-validation success rate of 85 +/- 8% whereas the CG detected 73 +/- 9% species-specific sequences when competing against all other non-redundant CG. Continuously updated results are available at http://genome.mshri.on.ca. CONCLUSION: Our analysis of amino acid compositions from the complete genomes provides stronger evidence for species-specific and environmental residue preferences in genomic sequences as well as in folds. Scoring functions derived from this work will be useful in future protein engineering experiments and possibly in identifying horizontal transfer events.

Assuntos

Biologia Computacional/métodos , Dobramento de Proteína , Proteínas/química , Adaptação Fisiológica/genética , Animais , Proteínas Arqueais/química , Proteínas Arqueais/genética , Proteínas de Bactérias/química , Proteínas de Bactérias/genética , Proteínas de Caenorhabditis elegans/química , Proteínas de Caenorhabditis elegans/genética , Proteínas Fúngicas/química , Proteínas Fúngicas/genética , Genoma , Genoma Arqueal , Genoma Bacteriano , Genoma Fúngico , Genoma Humano , Humanos , Valor Preditivo dos Testes , Estrutura Secundária de Proteína/genética , Proteínas/genética , Proteoma/química , Proteoma/genética , Proteômica/métodos , Especificidade da Espécie

SeqHound: biological sequence and structure database as a platform for bioinformatics research.

Michalickova, Katerina; Bader, Gary D; Dumontier, Michel; Lieu, Hao; Betel, Doron; Isserlin, Ruth; Hogue, Christopher W V.

BMC Bioinformatics ; 3: 32, 2002 Oct 25.

Artigo em Inglês | MEDLINE | ID: mdl-12401134

RESUMO

BACKGROUND: SeqHound has been developed as an integrated biological sequence, taxonomy, annotation and 3-D structure database system. It provides a high-performance server platform for bioinformatics research in a locally-hosted environment. RESULTS: SeqHound is based on the National Center for Biotechnology Information data model and programming tools. It offers daily updated contents of all Entrez sequence databases in addition to 3-D structural data and information about sequence redundancies, sequence neighbours, taxonomy, complete genomes, functional annotation including Gene Ontology terms and literature links to PubMed. SeqHound is accessible via a web server through a Perl, C or C++ remote API or an optimized local API. It provides functionality necessary to retrieve specialized subsets of sequences, structures and structural domains. Sequences may be retrieved in FASTA, GenBank, ASN.1 and XML formats. Structures are available in ASN.1, XML and PDB formats. Emphasis has been placed on complete genomes, taxonomy, domain and functional annotation as well as 3-D structural functionality in the API, while fielded text indexing functionality remains under development. SeqHound also offers a streamlined WWW interface for simple web-user queries. CONCLUSIONS: The system has proven useful in several published bioinformatics projects such as the BIND database and offers a cost-effective infrastructure for research. SeqHound will continue to develop and be provided as a service of the Blueprint Initiative at the Samuel Lunenfeld Research Institute. The source code and examples are available under the terms of the GNU public license at the Sourceforge site http://sourceforge.net/projects/slritools/ in the SLRI Toolkit.

Assuntos

Biologia Computacional/métodos , Bases de Dados Genéticas , Software , Sequência de Aminoácidos , Sequência de Bases , Bases de Dados Genéticas/classificação , Armazenamento e Recuperação da Informação/métodos , Internet , Modelos Genéticos , Modelos Moleculares , Dados de Sequência Molecular , Relação Estrutura-Atividade

Mutation profiling of mismatch repair-deficient colorectal cncers using an in silico genome scan to identify coding microsatellites.

Park, Jane; Betel, Doron; Gryfe, Robert; Michalickova, Katerina; Di Nicola, Nando; Gallinger, Steven; Hogue, Christopher W V; Redston, Mark.

Cancer Res ; 62(5): 1284-8, 2002 Mar 01.

Artigo em Inglês | MEDLINE | ID: mdl-11888892

RESUMO

Human colorectal, endometrial, and gastric cancers with defective DNA mismatch repair (MMR) have microsatellite instability, a unique molecular alteration characterized by widespread frameshift mutations of repetitive DNA sequences. We developed "Kangaroo," a bioinformatics program for searches in nucleotide and protein sequence databases, and performed an in silico genome scan for DNA coding microsatellites that may have novel mutations in MMR-deficient cancers. Examination of 29 previously untested coding polyadenines revealed widespread mutations in MMR-deficient colorectal cancers, with the highest frequencies in ERCC5, CASP8AP2, p72, RAD50, CDC25, RECQL1, CBF2, RACK7, GRK4, and DNAPK (range, 10-33%). This algorithm allows comprehensive mutation profiling of MMR-deficient cancers, an important step in understanding the pathogenesis of these neoplasms.

Assuntos

Pareamento Incorreto de Bases , Neoplasias Colorretais/genética , Repetições de Microssatélites , Mutação , Algoritmos , Biologia Computacional , Reparo do DNA , Humanos

Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry.

Ho, Yuen; Gruhler, Albrecht; Heilbut, Adrian; Bader, Gary D; Moore, Lynda; Adams, Sally-Lin; Millar, Anna; Taylor, Paul; Bennett, Keiryn; Boutilier, Kelly; Yang, Lingyun; Wolting, Cheryl; Donaldson, Ian; Schandorff, Søren; Shewnarane, Juanita; Vo, Mai; Taggart, Joanne; Goudreault, Marilyn; Muskat, Brenda; Alfarano, Cris; Dewar, Danielle; Lin, Zhen; Michalickova, Katerina; Willems, Andrew R; Sassi, Holly; Nielsen, Peter A; Rasmussen, Karina J; Andersen, Jens R; Johansen, Lene E; Hansen, Lykke H; Jespersen, Hans; Podtelejnikov, Alexandre; Nielsen, Eva; Crawford, Janne; Poulsen, Vibeke; Sørensen, Birgitte D; Matthiesen, Jesper; Hendrickson, Ronald C; Gleeson, Frank; Pawson, Tony; Moran, Michael F; Durocher, Daniel; Mann, Matthias; Hogue, Christopher W V; Figeys, Daniel; Tyers, Mike.

Nature ; 415(6868): 180-3, 2002 Jan 10.

Artigo em Inglês | MEDLINE | ID: mdl-11805837

RESUMO

The recent abundance of genome sequence data has brought an urgent need for systematic proteomics to decipher the encoded protein networks that dictate cellular function. To date, generation of large-scale protein-protein interaction maps has relied on the yeast two-hybrid system, which detects binary interactions through activation of reporter gene expression. With the advent of ultrasensitive mass spectrometric protein identification methods, it is feasible to identify directly protein complexes on a proteome-wide scale. Here we report, using the budding yeast Saccharomyces cerevisiae as a test case, an example of this approach, which we term high-throughput mass spectrometric protein complex identification (HMS-PCI). Beginning with 10% of predicted yeast proteins as baits, we detected 3,617 associated proteins covering 25% of the yeast proteome. Numerous protein complexes were identified, including many new interactions in various signalling pathways and in the DNA damage response. Comparison of the HMS-PCI data set with interactions reported in the literature revealed an average threefold higher success rate in detection of known complexes compared with large-scale two-hybrid studies. Given the high degree of connectivity observed in this study, even partial HMS-PCI coverage of complex proteomes, including that of humans, should allow comprehensive identification of cellular networks.

Assuntos

Proteínas de Ciclo Celular , Proteínas de Saccharomyces cerevisiae/isolamento & purificação , Saccharomyces cerevisiae/química , Sequência de Aminoácidos , Clonagem Molecular , Dano ao DNA , Reparo do DNA , DNA Fúngico , Humanos , Substâncias Macromoleculares , Espectrometria de Massas , Dados de Sequência Molecular , Monoéster Fosfórico Hidrolases/metabolismo , Ligação Proteica , Proteínas Quinases/química , Proteínas Quinases/metabolismo , Proteínas Serina-Treonina Quinases , Proteoma , Proteínas de Saccharomyces cerevisiae/química , Alinhamento de Sequência , Transdução de Sinais

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA