Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 30
Filtrar
1.
Molecules ; 26(20)2021 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-34684805

RESUMO

Xmipp is an open-source software package consisting of multiple programs for processing data originating from electron microscopy and electron tomography, designed and managed by the Biocomputing Unit of the Spanish National Center for Biotechnology, although with contributions from many other developers over the world. During its 25 years of existence, Xmipp underwent multiple changes and updates. While there were many publications related to new programs and functionality added to Xmipp, there is no single publication on the Xmipp as a package since 2013. In this article, we give an overview of the changes and new work since 2013, describe technologies and techniques used during the development, and take a peek at the future of the package.

2.
Bioinformatics ; 35(14): 2427-2433, 2019 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-30500892

RESUMO

MOTIVATION: Cryo electron microscopy (EM) is currently one of the main tools to reveal the structural information of biological macromolecules. The re-construction of three-dimensional (3D) maps is typically carried out following an iterative process that requires an initial estimation of the 3D map to be refined in subsequent steps. Therefore, its determination is key in the quality of the final results, and there are cases in which it is still an open issue in single particle analysis (SPA). Small angle X-ray scattering (SAXS) is a well-known technique applied to structural biology. It is useful from small nanostructures up to macromolecular ensembles for its ability to obtain low resolution information of the biological sample measuring its X-ray scattering curve. These curves, together with further analysis, are able to yield information on the sizes, shapes and structures of the analyzed particles. RESULTS: In this paper, we show how the low resolution structural information revealed by SAXS is very useful for the validation of EM initial 3D models in SPA, helping the following refinement process to obtain more accurate 3D structures. For this purpose, we approximate the initial map by pseudo-atoms and predict the SAXS curve expected for this pseudo-atomic structure. The match between the predicted and experimental SAXS curves is considered as a good sign of the correctness of the EM initial map. AVAILABILITY AND IMPLEMENTATION: The algorithm is freely available as part of the Scipion 1.2 software at http://scipion.i2pc.es/.


Assuntos
Microscopia Crioeletrônica , Espalhamento a Baixo Ângulo , Difração de Raios X , Raios X
3.
J Struct Biol ; 194(2): 156-63, 2016 May.
Artigo em Inglês | MEDLINE | ID: mdl-26873784

RESUMO

Three-dimensional electron microscopy (3DEM) of ice-embedded samples allows the structural analysis of large biological macromolecules close to their native state. Different techniques have been developed during the last forty years to process cryo-electron microscopy (cryo-EM) data. Not surprisingly, success in analysis and interpretation is highly correlated with the continuous development of image processing packages. The field has matured to the point where further progress in data and methods sharing depends on an agreement between the packages on how to describe common image processing tasks. Such standardization will facilitate the use of software as well as seamless collaboration, allowing the sharing of rich information between different platforms. Our aim here is to describe the Electron Microscopy eXchange (EMX) initiative, launched at the 2012 Instruct Image Processing Center Developer Workshop, with the intention of developing a first set of standard conventions for the interchange of information for single-particle analysis (EMX version 1.0). These conventions cover the specification of the metadata for micrograph and particle images, including contrast transfer function (CTF) parameters and particle orientations. EMX v1.0 has already been implemented in the Bsoft, EMAN, Xmipp and Scipion image processing packages. It has been and will be used in the CTF and EMDataBank Validation Challenges respectively. It is also being used in EMPIAR, the Electron Microscopy Pilot Image Archive, which stores raw image data related to the 3DEM reconstructions in EMDB.


Assuntos
Microscopia Crioeletrônica/normas , Processamento de Imagem Assistida por Computador/normas , Software/normas , Algoritmos , Microscopia Crioeletrônica/instrumentação , Humanos , Processamento de Imagem Assistida por Computador/estatística & dados numéricos , Disseminação de Informação
4.
J Virol ; 89(18): 9653-64, 2015 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-26178997

RESUMO

UNLABELLED: Adenovirus is one of the most complex icosahedral, nonenveloped viruses. Even after its structure was solved at near-atomic resolution by both cryo-electron microscopy and X-ray crystallography, the location of minor coat proteins is still a subject of debate. The elaborated capsid architecture is the product of a correspondingly complex assembly process, about which many aspects remain unknown. Genome encapsidation involves the concerted action of five virus proteins, and proteolytic processing by the virus protease is needed to prime the virion for sequential uncoating. Protein L1 52/55k is required for packaging, and multiple cleavages by the maturation protease facilitate its release from the nascent virion. Light-density particles are routinely produced in adenovirus infections and are thought to represent assembly intermediates. Here, we present the molecular and structural characterization of two different types of human adenovirus light particles produced by a mutant with delayed packaging. We show that these particles lack core polypeptide V but do not lack the density corresponding to this protein in the X-ray structure, thereby adding support to the adenovirus cryo-electron microscopy model. The two types of light particles present different degrees of proteolytic processing. Their structures provide the first glimpse of the organization of L1 52/55k protein inside the capsid shell and of how this organization changes upon partial maturation. Immature, full-length L1 52/55k is poised beneath the vertices to engage the virus genome. Upon proteolytic processing, L1 52/55k disengages from the capsid shell, facilitating genome release during uncoating. IMPORTANCE: Adenoviruses have been extensively characterized as experimental systems in molecular biology, as human pathogens, and as therapeutic vectors. However, a clear picture of many aspects of their basic biology is still lacking. Two of these aspects are the location of minor coat proteins in the capsid and the molecular details of capsid assembly. Here, we provide evidence supporting one of the two current models for capsid architecture. We also show for the first time the location of the packaging protein L1 52/55k in particles lacking the virus genome and how this location changes during maturation. Our results contribute to clarifying standing questions in adenovirus capsid architecture and provide new details on the role of L1 52/55k protein in assembly.


Assuntos
Adenoviridae/química , Proteínas do Capsídeo/química , Capsídeo/química , Modelos Moleculares , Adenoviridae/fisiologia , Capsídeo/metabolismo , Proteínas do Capsídeo/metabolismo , Cristalografia por Raios X , Células HEK293 , Humanos , Estrutura Terciária de Proteína , Montagem de Vírus/fisiologia
5.
J Struct Biol ; 190(3): 348-59, 2015 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-25913484

RESUMO

Image formation in bright field electron microscopy can be described with the help of the contrast transfer function (CTF). In this work the authors describe the "CTF Estimation Challenge", called by the Madrid Instruct Image Processing Center (I2PC) in collaboration with the National Center for Macromolecular Imaging (NCMI) at Houston. Correcting for the effects of the CTF requires accurate knowledge of the CTF parameters, but these have often been difficult to determine. In this challenge, researchers have had the opportunity to test their ability in estimating some of the key parameters of the electron microscope CTF on a large micrograph data set produced by well-known laboratories on a wide set of experimental conditions. This work presents the first analysis of the results of the CTF Estimation Challenge, including an assessment of the performance of the different software packages under different conditions, so as to identify those areas of research where further developments would be desirable in order to achieve high-resolution structural information.


Assuntos
Substâncias Macromoleculares/química , Microscopia Eletrônica/métodos , Algoritmos , Processamento de Imagem Assistida por Computador/métodos , Software
6.
Bioinformatics ; 30(20): 2891-8, 2014 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-24974203

RESUMO

MOTIVATION: Structural information of macromolecular complexes provides key insights into the way they carry out their biological functions. The reconstruction process leading to the final 3D map requires an approximate initial model. Generation of an initial model is still an open and challenging problem in single-particle analysis. RESULTS: We present a fast and efficient approach to obtain a reliable, low-resolution estimation of the 3D structure of a macromolecule, without any a priori knowledge, addressing the well-known issue of initial volume estimation in the field of single-particle analysis. The input of the algorithm is a set of class average images obtained from individual projections of a biological object at random and unknown orientations by transmission electron microscopy micrographs. The proposed method is based on an initial non-lineal dimensionality reduction approach, which allows to automatically selecting representative small sets of class average images capturing the most of the structural information of the particle under study. These reduced sets are then used to generate volumes from random orientation assignments. The best volume is determined from these guesses using a random sample consensus (RANSAC) approach. We have tested our proposed algorithm, which we will term 3D-RANSAC, with simulated and experimental data, obtaining satisfactory results under the low signal-to-noise conditions typical of cryo-electron microscopy. AVAILABILITY: The algorithm is freely available as part of the Xmipp 3.1 package [http://xmipp.cnb.csic.es]. CONTACT: jvargas@cnb.csic.es SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Imageamento Tridimensional/métodos , Microscopia Eletrônica de Transmissão/métodos , Substâncias Macromoleculares/química , Modelos Moleculares , Fatores de Tempo
7.
Opt Express ; 23(8): 9567-72, 2015 Apr 20.
Artigo em Inglês | MEDLINE | ID: mdl-25968993

RESUMO

Soft X-ray tomography (SXT) is becoming a powerful imaging technique to analyze eukaryotic whole cells close to their native state. Central to the analysis of the quality of SXT 3D reconstruction is the estimation of the spatial resolution and Depth of Field of the X-ray microscope. In turn, the characterization of the Modulation Transfer Function (MTF) of the optical system is key to calculate both parameters. Consequently, in this work we introduce a fully automated technique to accurately estimate the transfer function of such an optical system. Our proposal is based on the preprocessing of the experimental images to obtain an estimate of the input pattern, followed by the analysis in Fourier space of multiple orders of a Siemens Star test sample, extending in this way its measured frequency range.

8.
J Struct Biol ; 178(1): 29-37, 2012 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-22343468

RESUMO

Soft X-ray Tomographic (TomoX) microscopy has become a reality in the last years. The resolution range of this technique nicely fits between confocal and electron microscopies and will play a key role in the elucidation of the organization between the molecular and the organelle levels. In fact, it offers the possibility of imaging three-dimensional structures of hydrated biological specimens near their native state without chemical pre-treatment. Ideally, TomoX reconstructs the specimen absorption coefficients from projections of this specimen, but, unfortunately, X-ray micrographs are only an approximation to projections of the specimen, resulting in inaccuracies if a tomographic reconstruction is performed without explicitly incorporating these approximations. In an attempt to mitigate some of these inaccuracies, we develop in this work an image formation model within the approximation of assuming incoherent illumination.


Assuntos
Imageamento Tridimensional/métodos , Microtomografia por Raio-X/métodos , Candida albicans/ultraestrutura , Processamento de Imagem Assistida por Computador/métodos , Modelos Teóricos , Imagens de Fantasmas
9.
Nucleic Acids Res ; 38(Web Server issue): W228-32, 2010 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-20513648

RESUMO

The enormous amount of data available in public gene expression repositories such as Gene Expression Omnibus (GEO) offers an inestimable resource to explore gene expression programs across several organisms and conditions. This information can be used to discover experiments that induce similar or opposite gene expression patterns to a given query, which in turn may lead to the discovery of new relationships among diseases, drugs or pathways, as well as the generation of new hypotheses. In this work, we present MARQ, a web-based application that allows researchers to compare a query set of genes, e.g. a set of over- and under-expressed genes, against a signature database built from GEO datasets for different organisms and platforms. MARQ offers an easy-to-use and integrated environment to mine GEO, in order to identify conditions that induce similar or opposite gene expression patterns to a given experimental condition. MARQ also includes additional functionalities for the exploration of the results, including a meta-analysis pipeline to find genes that are differentially expressed across different experiments. The application is freely available at http://marq.dacya.ucm.es.


Assuntos
Perfilação da Expressão Gênica , Software , Animais , Bases de Dados Genéticas , Humanos , Internet , Camundongos , Análise de Sequência com Séries de Oligonucleotídeos , Ratos , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo
10.
IUCrJ ; 9(Pt 5): 632-638, 2022 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-36071808

RESUMO

Single-particle cryo-electron microscopy has become a powerful technique for the 3D structure determination of biological molecules. The last decade has seen an astonishing development of both hardware and software, and an exponential growth of new structures obtained at medium-high resolution. However, the knowledge accumulated in this field over the years has hardly been utilized as feedback in the reconstruction of new structures. In this context, this article explores the use of the deep-learning approach deepEMhancer as a regularizer in the RELION refinement process. deepEMhancer introduces prior information derived from macromolecular structures, and contributes to noise reduction and signal enhancement, as well as a higher degree of isotropy. These features have a direct effect on image alignment and reduction of overfitting during iterative refinement. The advantages of this combination are demonstrated for several membrane proteins, for which it is especially useful because of their high disorder and flexibility.

11.
Proteomics ; 11(2): 334-7, 2011 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-21204261

RESUMO

Current standardization initiatives have greatly contributed to share the information derived by proteomics experiments. One of these initiatives is the XML-based repository PRIDE (PRoteomics IDEntification database), although an XML-based document does not appear to present a user-friendly view at the first glance. PRIDEViewer is a novel Java-based application that presents the information available in a PRIDE XML file in a user-friendly manner, facilitating the interaction among end users as well as the understanding and evaluation of the compiled information. PRIDEViewer is freely available at: http://proteo.cnb.csic.es/prideviewer/.


Assuntos
Bases de Dados de Proteínas , Proteômica/métodos , Software , Interface Usuário-Computador
12.
Structure ; 15(4): 461-72, 2007 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-17437718

RESUMO

The existence of similar folds among major structural subunits of viral capsids has shown unexpected evolutionary relationships suggesting common origins irrespective of the capsids' host life domain. Tailed bacteriophages are emerging as one such family, and we have studied the possible existence of the HK97-like fold in bacteriophage T7. The procapsid structure at approximately 10 A resolution was used to obtain a quasi-atomic model by fitting a homology model of the T7 capsid protein gp10 that was based on the atomic structure of the HK97 capsid protein. A number of fold similarities, such as the fitting of domains A and P into the L-shaped procapsid subunit, are evident between both viral systems. A different feature is related to the presence of the amino-terminal domain of gp10 found at the inner surface of the capsid that might play an important role in the interaction of capsid and scaffolding proteins.


Assuntos
Bacteriófago T7/química , Evolução Biológica , Capsídeo/química , Sequência de Aminoácidos , Bacteriófago T7/genética , DNA , Dados de Sequência Molecular , Ligação Proteica , Dobramento de Proteína , Estrutura Terciária de Proteína
13.
Commun Biol ; 2: 241, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31263785

RESUMO

Monoclonal antibody (mAb) cooperativity is a phenomenon triggered when mAbs couples promote increased bactericidal killing compared to individual partners. Cooperativity has been deeply investigated among mAbs elicited by factor H-binding protein (fHbp), a Neisseria meningitidis surface-exposed lipoprotein and one of the key antigens included in both serogroup B meningococcus vaccine Bexsero and Trumenba. Here we report the structural and functional characterization of two cooperative mAbs pairs isolated from Bexsero vaccines. The 3D electron microscopy structures of the human mAb-fHbp-mAb cooperative complexes indicate that the angle formed between the antigen binding fragments (fAbs) assume regular angle and that fHbp is able to bind simultaneously and stably the cooperative mAbs pairs and human factor H (fH) in vitro. These findings shed light on molecular basis of the antibody-based mechanism of protection driven by simultaneous recognition of the different epitopes of the fHbp and underline that cooperativity is crucial in vaccine efficacy.


Assuntos
Anticorpos Monoclonais/química , Antígenos de Bactérias/imunologia , Proteínas de Bactérias/imunologia , Anticorpos Monoclonais/imunologia , Atividade Bactericida do Sangue , Fator H do Complemento/metabolismo , Mapeamento de Epitopos , Humanos , Vacinas Meningocócicas/imunologia , Microscopia Eletrônica de Transmissão , Ressonância de Plasmônio de Superfície
14.
BMC Bioinformatics ; 9: 444, 2008 Oct 20.
Artigo em Inglês | MEDLINE | ID: mdl-18937846

RESUMO

BACKGROUND: Analysis of large-scale experimental datasets frequently produces one or more sets of proteins that are subsequently mined for functional interpretation and validation. To this end, a number of computational methods have been devised that rely on the analysis of functional annotations. Although current methods provide valuable information (e.g. significantly enriched annotations, pairwise functional similarities), they do not specifically measure the degree of homogeneity of a protein set. RESULTS: In this work we present a method that scores the degree of functional homogeneity, or coherence, of a set of proteins on the basis of the global similarity of their functional annotations. The method uses statistical hypothesis testing to assess the significance of the set in the context of the functional space of a reference set. As such, it can be used as a first step in the validation of sets expected to be homogeneous prior to further functional interpretation. CONCLUSION: We evaluate our method by analysing known biologically relevant sets as well as random ones. The known relevant sets comprise macromolecular complexes, cellular components and pathways described for Saccharomyces cerevisiae, which are mostly significantly coherent. Finally, we illustrate the usefulness of our approach for validating 'functional modules' obtained from computational analysis of protein-protein interaction networks. Matlab code and supplementary data are available at http://www.cnb.csic.es/~monica/coherence/


Assuntos
Biologia Computacional/métodos , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/metabolismo , Candida albicans/química , Candida albicans/metabolismo , Bases de Dados de Proteínas , Proteínas Fúngicas/metabolismo , Redes e Vias Metabólicas , Mapeamento de Interação de Proteínas , Saccharomyces cerevisiae/química
15.
PLoS One ; 12(5): e0178316, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28542306

RESUMO

Benign neurofibromas, the main phenotypic manifestations of the rare neurological disorder neurofibromatosis type 1, degenerate to malignant tumors associated to poor prognosis in about 10% of patients. Despite efforts in the field of (epi)genomics, the lack of prognostic biomarkers with which to predict disease evolution frustrates the adoption of appropriate early therapeutic measures. To identify potential biomarkers of malignant neurofibroma transformation, we integrated four human experimental studies and one for mouse, using a gene score-based meta-analysis method, from which we obtained a score-ranked signature of 579 genes. Genes with the highest absolute scores were classified as promising disease biomarkers. By grouping genes with similar neurofibromatosis-related profiles, we derived panels of potential biomarkers. The addition of promoter methylation data to gene profiles indicated a panel of genes probably silenced by hypermethylation. To identify possible therapeutic treatments, we used the gene signature to query drug expression databases. Trichostatin A and other histone deacetylase inhibitors, as well as cantharidin and tamoxifen, were retrieved as putative therapeutic means to reverse the aberrant regulation that drives to malignant cell proliferation and metastasis. This in silico prediction corroborated reported experimental results that suggested the inclusion of these compounds in clinical trials. This experimental validation supported the suitability of the meta-analysis method used to integrate several sources of public genomic information, and the reliability of the gene signature associated to the malignant evolution of neurofibromas to generate working hypotheses for prognostic and drug-responsive biomarkers or therapeutic measures, thus showing the potential of this in silico approach for biomarker discovery.


Assuntos
Neoplasias de Bainha Neural/genética , Neurofibroma/genética , Animais , Biomarcadores Tumorais/genética , Cantaridina/farmacologia , Mapeamento Cromossômico , Simulação por Computador , Ilhas de CpG , Metilação de DNA , Ensaios de Seleção de Medicamentos Antitumorais , Inativação Gênica , Inibidores de Histona Desacetilases/farmacologia , Humanos , Camundongos , Neoplasias de Bainha Neural/tratamento farmacológico , Neoplasias de Bainha Neural/patologia , Neurofibroma/tratamento farmacológico , Neurofibroma/patologia , Neurofibromatose 1/tratamento farmacológico , Neurofibromatose 1/genética , Neurofibromatose 1/patologia , Prognóstico , Regiões Promotoras Genéticas , Tamoxifeno/farmacologia , Transcriptoma
16.
Front Mol Biosci ; 4: 17, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28396859

RESUMO

Centrosomal P4.1-associated protein (CPAP) is a cell cycle regulated protein fundamental for centrosome assembly and centriole elongation. In humans, the region between residues 897-1338 of CPAP mediates interactions with other proteins and includes a homodimerization domain. CPAP mutations cause primary autosomal recessive microcephaly and Seckel syndrome. Despite of the biological/clinical relevance of CPAP, its mechanistic behavior remains unclear and its C-terminus (the G-box/TCP domain) is the only part whose structure has been solved. This situation is perhaps due in part to the challenges that represent obtaining the protein in a soluble, homogeneous state for structural studies. Our work constitutes a systematic structural analysis on multiple oligomers of HsCPAP897-1338, using single-particle electron microscopy (EM) of negatively stained (NS) samples. Based on image classification into clearly different regular 3D maps (putatively corresponding to dimers and tetramers) and direct observation of individual images representing other complexes of HsCPAP897-1338 (i.e., putative flexible monomers and higher-order multimers), we report a dynamic oligomeric behavior of this protein, where different homo-oligomers coexist in variable proportions. We propose that dimerization of the putative homodimer forms a putative tetramer which could be the structural unit for the scaffold that either tethers the pericentriolar material to centrioles or promotes procentriole elongation. A coarse fitting of atomic models into the NS 3D maps at resolutions around 20 Å is performed only to complement our experimental data, allowing us to hypothesize on the oligomeric composition of the different complexes. In this way, the current EM work represents an initial step toward the structural characterization of different oligomers of CPAP, suggesting further insights to understand how this protein works, contributing to the elucidation of control mechanisms for centriole biogenesis.

17.
Sci Rep ; 7: 45808, 2017 04 04.
Artigo em Inglês | MEDLINE | ID: mdl-28374769

RESUMO

We have developed a new data collection method and processing framework in full field cryo soft X-ray tomography to computationally extend the depth of field (DOF) of a Fresnel zone plate lens. Structural features of 3D-reconstructed eukaryotic cells that are affected by DOF artifacts in standard reconstruction are now recovered. This approach, based on focal series projections, is easily applicable with closed expressions to select specific data acquisition parameters.


Assuntos
Imageamento Tridimensional/métodos , Tomografia por Raios X/métodos , Algoritmos , Processamento de Imagem Assistida por Computador
18.
BMC Bioinformatics ; 7: 366, 2006 Jul 28.
Artigo em Inglês | MEDLINE | ID: mdl-16875499

RESUMO

BACKGROUND: In the Bioinformatics field, a great deal of interest has been given to Non-negative matrix factorization technique (NMF), due to its capability of providing new insights and relevant information about the complex latent relationships in experimental data sets. This method, and some of its variants, has been successfully applied to gene expression, sequence analysis, functional characterization of genes and text mining. Even if the interest on this technique by the bioinformatics community has been increased during the last few years, there are not many available simple standalone tools to specifically perform these types of data analysis in an integrated environment. RESULTS: In this work we propose a versatile and user-friendly tool that implements the NMF methodology in different analysis contexts to support some of the most important reported applications of this new methodology. This includes clustering and biclustering gene expression data, protein sequence analysis, text mining of biomedical literature and sample classification using gene expression. The tool, which is named bioNMF, also contains a user-friendly graphical interface to explore results in an interactive manner and facilitate in this way the exploratory data analysis process. CONCLUSION: bioNMF is a standalone versatile application which does not require any special installation or libraries. It can be used for most of the multiple applications proposed in the bioinformatics field or to support new research using this method. This tool is publicly available at http://www.dacya.ucm.es/apascual/bioNMF.


Assuntos
Biologia Computacional/métodos , Regulação da Expressão Gênica , Algoritmos , Animais , Análise por Conglomerados , Gráficos por Computador , Interpretação Estatística de Dados , Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos , Humanos , Modelos Genéticos , Reconhecimento Automatizado de Padrão , Software , Interface Usuário-Computador
19.
BMC Bioinformatics ; 7: 363, 2006 Jul 26.
Artigo em Inglês | MEDLINE | ID: mdl-16872502

RESUMO

BACKGROUND: Recent analyses in systems biology pursue the discovery of functional modules within the cell. Recognition of such modules requires the integrative analysis of genome-wide experimental data together with available functional schemes. In this line, methods to bridge the gap between the abstract definitions of cellular processes in current schemes and the interlinked nature of biological networks are required. RESULTS: This work explores the use of the scientific literature to establish potential relationships among cellular processes. To this end we have used a document based similarity method to compute pair-wise similarities of the biological processes described in the Gene Ontology (GO). The method has been applied to the biological processes annotated for the Saccharomyces cerevisiae genome. We compared our results with similarities obtained with two ontology-based metrics, as well as with gene product annotation relationships. We show that the literature-based metric conserves most direct ontological relationships, while reveals biologically sounded similarities that are not obtained using ontology-based metrics and/or genome annotation. CONCLUSION: The scientific literature is a valuable source of information from which to compute similarities among biological processes. The associations discovered by literature analysis are a valuable complement to those encoded in existing functional schemes, and those that arise by genome annotation. These similarities can be used to conveniently map the interlinked structure of cellular processes in a particular organism.


Assuntos
Bases de Dados Bibliográficas , Fenômenos Fisiológicos , Saccharomyces cerevisiae/genética , Bases de Dados Genéticas , Genes Fúngicos , Transporte de Íons/genética , Processamento de Linguagem Natural , Reconhecimento Automatizado de Padrão , Saccharomyces cerevisiae/fisiologia , Proteínas de Saccharomyces cerevisiae/classificação , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/fisiologia , Transdução de Sinais/genética
20.
BMC Bioinformatics ; 7: 41, 2006 Jan 26.
Artigo em Inglês | MEDLINE | ID: mdl-16438716

RESUMO

BACKGROUND: Experimental techniques such as DNA microarray, serial analysis of gene expression (SAGE) and mass spectrometry proteomics, among others, are generating large amounts of data related to genes and proteins at different levels. As in any other experimental approach, it is necessary to analyze these data in the context of previously known information about the biological entities under study. The literature is a particularly valuable source of information for experiment validation and interpretation. Therefore, the development of automated text mining tools to assist in such interpretation is one of the main challenges in current bioinformatics research. RESULTS: We present a method to create literature profiles for large sets of genes or proteins based on common semantic features extracted from a corpus of relevant documents. These profiles can be used to establish pair-wise similarities among genes, utilized in gene/protein classification or can be even combined with experimental measurements. Semantic features can be used by researchers to facilitate the understanding of the commonalities indicated by experimental results. Our approach is based on non-negative matrix factorization (NMF), a machine-learning algorithm for data analysis, capable of identifying local patterns that characterize a subset of the data. The literature is thus used to establish putative relationships among subsets of genes or proteins and to provide coherent justification for this clustering into subsets. We demonstrate the utility of the method by applying it to two independent and vastly different sets of genes. CONCLUSION: The presented method can create literature profiles from documents relevant to sets of genes. The representation of genes as additive linear combinations of semantic features allows for the exploration of functional associations as well as for clustering, suggesting a valuable methodology for the validation and interpretation of high-throughput experimental data.


Assuntos
Inteligência Artificial , Genes , MEDLINE , Processamento de Linguagem Natural , Publicações Periódicas como Assunto , Proteínas , Terminologia como Assunto , Indexação e Redação de Resumos/métodos , Algoritmos , Bibliometria , Bases de Dados Bibliográficas , Semântica , Vocabulário Controlado
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA