RESUMO
The Rat Genome Database (RGD, http://rgd.mcw.edu) was developed to provide a core resource for rat researchers combining genetic, genomic, pathway, phenotype and strain information with a focus on disease. RGD users are provided with access to structured and curated data from the molecular level through to the level of the whole organism, including the variations associated with disease phenotypes. To fully support use of the rat as a translational model for biological systems and human disease, RGD continues to curate these datasets while enhancing and developing tools to allow efficient and effective access to the data in a variety of formats including linear genome viewers, pathway diagrams and biological ontologies. To support pathophysiological analysis of data, RGD Disease Portals provide an entryway to integrated gene, QTL and strain data specific to a particular disease. In addition to tool and content development and maintenance, RGD promotes rat research and provides user education by creating and disseminating tutorials on the curated datasets, submission processes, and tools available at RGD. By curating, storing, integrating, visualizing and promoting rat data, RGD ensures that the investment made into rat genomics and genetics can be leveraged by all interested investigators.
Assuntos
Bases de Dados Genéticas , Genômica , Ratos/genética , Animais , Doença/genética , Modelos Animais de Doenças , Variação Genética , Genoma , Fenótipo , Ratos/metabolismo , Ratos/fisiologia , Transdução de Sinais , Software , Terminologia como AssuntoRESUMO
The Rat Genome Database (RGD, http://rgd.mcw.edu) is one of the core resources for rat genomics and recent developments have focused on providing support for disease-based research using the rat model. Recognizing the importance of the rat as a disease model we have employed targeted curation strategies to curate genes, QTL and strain data for neurological and cardiovascular disease areas. This work has centered on rat but also includes data for mouse and human to create 'disease portals' that provide a unified view of the genes, QTL and strain models for these diseases across the three species. The disease curation efforts combined with normal curation activities have served to greatly increase the content of the database, particularly for biological information, including gene ontology, disease, pathway and phenotype ontology annotations. In addition to improving the features and database content, community outreach has been expanded to demonstrate how investigators can leverage the resources at RGD to facilitate their research and to elicit suggestions and needs for future developments. We have published a number of papers that provide additional information on the ontology annotations and the tools at RGD for data mining and analysis to better enable researchers to fully utilize the database.
Assuntos
Bases de Dados Genéticas , Modelos Animais de Doenças , Genômica , Ratos/genética , Animais , Doenças Cardiovasculares/genética , Mapeamento Cromossômico , Humanos , Internet , Camundongos , Doenças do Sistema Nervoso/genética , Locos de Características Quantitativas , Interface Usuário-ComputadorRESUMO
One of the core activities of high-throughput proteomics is the identification of peptides from mass spectra. Some peptides can be identified using spectral matching programs like Sequest or Mascot, but many spectra do not produce high quality database matches. De novo peptide sequencing is an approach to determine partial peptide sequences for some of the unidentified spectra. A drawback of de novo peptide sequencing is that it produces a series of ordered and disordered sequence tags and mass tags rather than a complete, non-degenerate peptide amino acid sequence. This incomplete data is difficult to use in conventional search programs such as BLAST or FASTA. DeNovoID is a program that has been specifically designed to use degenerate amino acid sequence and mass data derived from MS experiments to search a peptide database. Since the algorithm employed depends on the amino acid composition of the peptide and not its sequence, DeNovoID does not have to consider all possible sequences, but rather a smaller number of compositions consistent with a spectrum. DeNovoID also uses a geometric indexing scheme that reduces the number of calculations required to determine the best peptide match in the database. DeNovoID is available at http://proteomics.mcw.edu/denovoid.
Assuntos
Espectrometria de Massas , Peptídeos/análise , Proteômica/métodos , Análise de Sequência de Proteína/métodos , Software , Algoritmos , Aminoácidos/análise , Bases de Dados de Proteínas , Internet , Peptídeos/química , Interface Usuário-ComputadorRESUMO
ProMoST is a flexible web tool that calculates the effect of single or multiple posttranslational modifications (PTMs) on protein isoelectric point (pI) and molecular weight and displays the calculated patterns as two-dimensional (2D) gel images. PTMs of proteins control many biological regulatory and signaling mechanisms and 2D gel electrophoresis is able to resolve many PTM-induced isoforms, such as those due to phosphorylation, acetylation, deamination, alkylation, cysteine oxidation or tyrosine nitration. These modifications cause changes in the pI of the protein by adding, removing or changing titratable groups. Proteins differ widely in buffering capacity and pI and therefore the same PTMs may give rise to quite different patterns of pI shifts in different proteins. It is impossible by visual inspection of a pattern of spots on a gel to determine which modifications are most likely to be present. The patterns of PTM shifts for different proteins can be calculated and are often quite distinctive. The theoretical gel images produced by ProMoST can be compared to the experimental 2D gel results to implicate probable PTMs and focus efforts on more detailed study of modified proteins. ProMoST has been implemented as cgi script in Perl available on a WWW server at http://proteomics.mcw.edu/promost.
Assuntos
Eletroforese em Gel Bidimensional , Processamento de Proteína Pós-Traducional , Software , Algoritmos , Internet , Ponto Isoelétrico , Peso Molecular , Proteínas/química , Interface Usuário-ComputadorRESUMO
The broad goal of physiological genomics research is to link genes to their functions using appropriate experimental and computational techniques. Modern genomics experiments enable the generation of vast quantities of data, and interpretation of this data requires the integration of information derived from many diverse sources. Computational biology and bioinformatics offer the ability to manage and channel this information torrent. The Rat Genome Database (RGD; http://rgd.mcw.edu) has developed computational tools and strategies specifically supporting the goal of linking genes to their functional roles in rat and, using comparative genomics, to human and mouse. We present an overview of the database with a focus on these unique computational tools and describe strategies for the use of these resources in the area of physiological genomics.
Assuntos
Bases de Dados Genéticas , Genoma/genética , Genômica/métodos , Ratos/genética , Ratos/fisiologia , Animais , Clonagem Molecular , Perfilação da Expressão GênicaRESUMO
Stable isotope labeling with (18)O is a promising technique for obtaining both qualitative and quantitative information from a single differential protein expression experiment. The small 4 Da mass shift produced by incorporation of two molecules of (18)O, and the lack of available methods for automated quantification of large data sets has limited the use of this approach with electrospray ionization-ion trap (ESI-IT) mass spectrometers. In this paper, we describe a method of acquiring ESI-IT mass spectrometric data that provides accurate calculation of relative ratios of peptides that have been differentially labeled using(18)O. The method utilizes zoom scans to provide high resolution data. This allows for accurate calculation of (18)O/(16)O ratios for peptides even when as much as 50% of a (18)O labeled peptide is present as the singly labeled species. The use of zoom scan data also provides sufficient resolution for calculating accurate ratios for peptides of +3 and lower charge states. Sequence coverage is comparable to that obtained with data acquisition modes that use only MS and MS/MS scans. We have employed a newly developed analysis software tool, ZoomQuant, which allows for the automated analysis of large data sets. We show that the combination of zoom scan data acquisition and analysis using ZoomQuant provides calculation of isotopic ratios accurate to approximately 21%. This compares well with data produced from (18)O labeling experiments using time of flight (TOF) and Fourier transform-ion cyclotron resonance (FT-ICR) MS instruments.
Assuntos
Espectrometria de Massas/métodos , Software , Sequência de Aminoácidos , Animais , Cavalos , Humanos , Marcação por Isótopo , Dados de Sequência Molecular , Mioglobina/análise , Isótopos de Oxigênio , Proteína Tirosina Fosfatase não Receptora Tipo 1 , Proteínas Tirosina Fosfatases/análise , Coelhos , Ratos , Reprodutibilidade dos Testes , Fator A de Crescimento do Endotélio Vascular/análiseRESUMO
The main goal of comparative proteomics is the quantitation of the differences in abundance of many proteins between two different biological samples in a single experiment. By differentially labeling the peptides from the two samples and combining them in a single analysis, relative ratios of protein abundance can be accurately determined. Protease catalyzed (18)O exchange is a simple method to differentially label peptides, but the lack of robust software tools to analyze the data from mass spectra of (18)O labeled peptides generated by common ion trap mass spectrometers has been a limitation. ZoomQuant is a stand-alone computational tool that analyzes the mass spectra of (18)O labeled peptides from ion trap instruments and determines relative abundance ratios between two samples. Starting with a filtered list of candidate peptides that have been successfully identified by Sequest, ZoomQuant analyzes the isotopic forms of the peptides using high-resolution zoom scan spectrum data. The theoretical isotope distribution is determined from the peptide sequence and is used to deconvolute the peak areas associated with the unlabeled, partially labeled, and fully labeled species. The ratio between the labeled and unlabeled peptides is then calculated using several different methods. ZoomQuant's graphical user interface allows the user to view and adjust the parameters for peak calling and quantitation and select which peptides should contribute to the overall abundance ratio calculation. Finally, ZoomQuant generates a summary report of the relative abundance of the peptides identified in the two samples.
Assuntos
Espectrometria de Massas/métodos , Peptídeos/análise , Software , Animais , Cavalos , Marcação por Isótopo , Mioglobina/análise , Isótopos de Oxigênio , Proteômica , TripsinaRESUMO
The set of interacting molecules collectively referred to as a pathway or network represents a fundamental structural unit, the building block of the larger, highly integrated networks of biological systems. The scientific community's interest in understanding the fine details of how pathways work, communicate with each other and synergize, and how alterations in one or several pathways may converge into a disease phenotype, places heightened demands on pathway data and information providers. To meet such demands, the Rat Genome Database [(RGD) http://rgd.mcw.edu] has adopted a multitiered approach to pathway data acquisition and presentation. Resources and tools are continuously added or expanded to offer more comprehensive pathway data sets as well as enhanced pathway data manipulation, exploration and visualization capabilities. At RGD, users can easily identify genes in pathways, see how pathways relate to each other and visualize pathways in a dynamic and integrated manner. They can access these and other components from several entry points and effortlessly navigate between them and they can download the data of interest. The Pathway Portal resources at RGD are presented, and future directions are discussed. Database URL: http://rgd.mcw.edu.
Assuntos
Bases de Dados Genéticas , Genoma/genética , Internet , Transdução de Sinais/genética , Animais , Redes Reguladoras de Genes/genética , Humanos , Masculino , Anotação de Sequência Molecular , Neoplasias da Próstata/genética , RatosRESUMO
The Rat Genome Database (RGD) is the premier repository of rat genomic and genetic data and currently houses over 40,000 rat gene records as well as human and mouse orthologs, 1771 rat and 1911 human quantitative trait loci (QTLs) and 2209 rat strains. Biological information curated for these data objects includes disease associations, phenotypes, pathways, molecular functions, biological processes and cellular components. A suite of tools has been developed to aid curators in acquiring and validating data objects, assigning nomenclature, attaching biological information to objects and making connections among data types. The software used to assign nomenclature, to create and edit objects and to make annotations to the data objects has been specifically designed to make the curation process as fast and efficient as possible. The user interfaces have been adapted to the work routines of the curators, creating a suite of tools that is intuitive and powerful. Database URL: http://rgd.mcw.edu.
Assuntos
Bases de Dados Genéticas , Genoma/genética , Anotação de Sequência Molecular/métodos , Ratos/genética , Software , Animais , Biologia Computacional , Humanos , Camundongos , Locos de Características Quantitativas/genéticaRESUMO
One of the major difficulties for many laboratories setting up proteomics programs has been obtaining and maintaining the computational infrastructure required for the analysis of the large flow of proteomics data. We describe a system that combines distributed cloud computing and open source software to allow laboratories to set up scalable virtual proteomics analysis clusters without the investment in computational hardware or software licensing fees. Additionally, the pricing structure of distributed computing providers, such as Amazon Web Services, allows laboratories or even individuals to have large-scale computational resources at their disposal at a very low cost per run. We provide detailed step-by-step instructions on how to implement the virtual proteomics analysis clusters as well as a list of current available preconfigured Amazon machine images containing the OMSSA and X!Tandem search algorithms and sequence databases on the Medical College of Wisconsin Proteomics Center Web site ( http://proteomics.mcw.edu/vipdac ).
Assuntos
Algoritmos , Proteômica/métodos , Software , Análise por Conglomerados , Bases de Dados de Proteínas , InternetRESUMO
It has been four years since the original publication of the draft sequence of the rat genome. Five groups are now working together to assemble, annotate and release an updated version of the rat genome. As the prevailing model for physiology, complex disease and pharmacological studies, there is an acute need for the rat's genomic resources to keep pace with the rat's prominence in the laboratory. In this commentary, we describe the current status of the rat genome sequence and the plans for its impending 'upgrade'. We then cover the key online resources providing access to the rat genome, including the new SNP views at Ensembl, the RefSeq and Genes databases at the US National Center for Biotechnology Information, Genome Browser at the University of California Santa Cruz and the disease portals for cardiovascular disease and obesity at the Rat Genome Database.
Assuntos
Bases de Dados Genéticas , Genoma , Ratos/genética , Animais , Biologia Computacional , Modelos Animais de Doenças , Doenças Genéticas Inatas/genética , Variação Genética , Genômica , Haplótipos , Humanos , Internet , Polimorfismo de Nucleotídeo Único , Ratos Mutantes , Análise de Sequência de DNARESUMO
The rat is an important system for modeling human disease. Four years ago, the rich 150-year history of rat research was transformed by the sequencing of the rat genome, ushering in an era of exceptional opportunity for identifying genes and pathways underlying disease phenotypes. Genome-wide association studies in human populations have recently provided a direct approach for finding robust genetic associations in common diseases, but identifying the precise genes and their mechanisms of action remains problematic. In the context of significant progress in rat genomic resources over the past decade, we outline achievements in rat gene discovery to date, show how these findings have been translated to human disease, and document an increasing pace of discovery of new disease genes, pathways and mechanisms. Finally, we present a set of principles that justify continuing and strengthening genetic studies in the rat model, and further development of genomic infrastructure for rat research.
Assuntos
Modelos Animais de Doenças , Doenças Genéticas Inatas/genética , Genoma , Genômica/tendências , Ratos/genética , Animais , Animais Geneticamente Modificados , Mapeamento Cromossômico , Marcação de Genes , HumanosRESUMO
The Center for Eukaryotic Structural Genomics (CESG) produces and solves the structures of proteins from eukaryotes. We have developed and operate a pipeline to both solve structures and to test new methodologies. Both NMR and X-ray crystallography methods are used for structure solution. CESG chooses targets based on sequence dissimilarity to known structures, medical relevance, and nominations from members of the scientific community. Many times proteins qualify in more than one of these categories. Here we review some of the structures that have connections to human health and disease.
Assuntos
Genômica , Proteínas/química , Cristalografia por Raios X/tendências , Genômica/métodos , Genômica/tendências , Humanos , Ressonância Magnética Nuclear BiomolecularRESUMO
The laboratory rat, Rattus norvegicus, is an important model of human health and disease, and experimental findings in the rat have direct relevance to human-based research. The Rat Genome Database (RGD, http://rgd.mcw.edu) is a model-organism database that provides access to wide variety of curated rat data such as genes and their homologs, quantitative trait loci, phenotypes, comparative mapping, and genome analysis. We present an overview of the database followed by specific examples that can be used to gain experience in employing RGD to explore the wealth of functional data available for the rat. We show how to make associations with the genome and use comparative tools to link the rat with human and mouse in order to integrate results from these three species of critical biomedical importance.
Assuntos
Mapeamento Cromossômico/métodos , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Armazenamento e Recuperação da Informação/métodos , Fenótipo , Interface Usuário-Computador , Animais , Gráficos por Computador , RatosRESUMO
Microsatellite length polymorphisms are useful for the mapping of heritable traits in rats. Over 4000 such microsatellites have been characterized for 48 inbred rat strains and used successfully to map phenotypes that differ between strains. At present, however, it is difficult to use this microsatellite database for mapping phenotypes in selectively bred rats of unknown genotype derived from outbred populations because it is not immediately obvious which markers might differ between strains and be informative. We predicted that markers represented by many alleles among the known inbred rat strains would also be most likely to differ between selectively bred strains derived from outbred populations. Here we describe the development and successful application of a new genotyping tool (HUMMER) that assigns "heterozygosity" (Het) and "uncertainty" (Unc) scores to each microsatellite marker that corresponds to its degree of heterozygosity among the 48 genotyped inbred strains. We tested the efficiency of HUMMER on two rat strains that were selectively bred from an outbred Sprague-Dawley stock for either high or low activity in the forced swim test (SwHi rats and SwLo rats, respectively). We found that the markers with high Het and Unc scores allowed the efficient selection of markers that differed between SwHi and SwLo rats, while markers with low Het and Unc scores typically identified markers that did not differ between strains. Thus, picking markers based on Het and Unc scores is a valuable method for identifying informative microsatellite markers in selectively bred rodent strains derived from outbred populations.
Assuntos
Triagem de Portadores Genéticos/métodos , Marcadores Genéticos , Repetições de Microssatélites , Polimorfismo Genético , Ratos Sprague-Dawley/genética , Animais , Mapeamento Cromossômico , Cruzamentos Genéticos , Feminino , Biblioteca Genômica , Genótipo , Masculino , Camundongos , Atividade Motora/genética , Ratos , NataçãoRESUMO
We describe the theoretical basis for a peptide identification method wherein peptides are represented as vectors based on their amino acid composition and grouped into clusters. Unknown peptides are identified by finding the database cluster and peptide entries with the shortest Euclidian distance. We demonstrate that the amino acid composition of peptides is virtually as informative as the sequence and allows rapid peptide identification more accurately than peptide mass alone.
Assuntos
Peptídeos/análise , Proteômica/métodos , Algoritmos , Animais , Biologia Computacional/métodos , Humanos , Peptídeos/química , Proteínas/química , Ratos , Análise de Sequência de ProteínaRESUMO
Integration of the large variety of genome maps from several organisms provides the mechanism by which physiological knowledge obtained in model systems such as the rat can be projected onto the human genome to further the research on human disease. The release of the rat genome sequence provides new information for studies using the rat model and is a key reference against which existing and new rat physiological results can be aligned. Previously, we described comparative maps of the rat, mouse, and human based on EST sequence comparisons combined with radiation hybrid maps. Here, we use new data and introduce the Integrated Genomics Environment, an extensive database of curated and integrated maps, markers, and physiological results. These results are integrated by using VCMapview, a java-based map integration and visualization tool. This unique environment allows researchers to relate results from cytogenetic, genetic, and radiation hybrid studies to the genome sequence and compare regions of interest between human, mouse, and rat. Integrating rat physiology with mouse genetics and clinical results from human by using the respective genomes provides a novel route to capitalize on comparative genomics and the strengths of model organism biology.