RESUMO
BACKGROUND: Peach (Prunus persica (L.) Batsch) is a major temperate fruit crop with an intense breeding activity. Breeding is facilitated by knowledge of the inheritance of the key traits that are often of a quantitative nature. QTLs have traditionally been studied using the phenotype of a single progeny (usually a full-sib progeny) and the correlation with a set of markers covering its genome. This approach has allowed the identification of various genes and QTLs but is limited by the small numbers of individuals used and by the narrow transect of the variability analyzed. In this article we propose the use of a multi-progeny mapping strategy that used pedigree information and Bayesian approaches that supports a more precise and complete survey of the available genetic variability. RESULTS: Seven key agronomic characters (data from 1 to 3 years) were analyzed in 18 progenies from crosses between occidental commercial genotypes and various exotic lines including accessions of other Prunus species. A total of 1467 plants from these progenies were genotyped with a 9 k SNP array. Forty-seven QTLs were identified, 22 coinciding with major genes and QTLs that have been consistently found in the same populations when studied individually and 25 were new. A substantial part of the QTLs observed (47%) would not have been detected in crosses between only commercial materials, showing the high value of exotic lines as a source of novel alleles for the commercial gene pool. Our strategy also provided estimations on the narrow sense heritability of each character, and the estimation of the QTL genotypes of each parent for the different QTLs and their breeding value. CONCLUSIONS: The integrated strategy used provides a broader and more accurate picture of the variability available for peach breeding with the identification of many new QTLs, information on the sources of the alleles of interest and the breeding values of the potential donors of such valuable alleles. These results are first-hand information for breeders and a step forward towards the implementation of DNA-informed strategies to facilitate selection of new cultivars with improved productivity and quality.
Assuntos
Cruzamento , Prunus persica/genética , Locos de Características Quantitativas/genética , Flores/crescimento & desenvolvimento , Frutas/crescimento & desenvolvimento , Genótipo , Polimorfismo de Nucleotídeo Único , Probabilidade , Prunus persica/crescimento & desenvolvimento , SolubilidadeRESUMO
The distribution of the N-glycoproteome in integral membrane proteins of the vacuolar membrane (tonoplast) or the plasma membrane of Arabidopsis thaliana and, for further comparison, of the Rattus norvegicus lysosomal and plasma membranes, was analyzed. In silico analysis showed that potential N-glycosylation sites are much less frequent in tonoplast proteins. Biochemical analysis of Arabidopsis subcellular fractions with the lectin concanavalin A, which recognizes mainly unmodified N-glycans, or with antiserum against Golgi-modified N-glycans confirmed the in silico results and showed that, unlike the plant plasma membrane, the tonoplast is almost or totally devoid of N-glycoproteins with Golgi-modified glycans. Lysosomes share with vacuoles the hydrolytic functions and the position along the secretory pathway; however, our results indicate that their membranes had a divergent evolution. We propose that protection against the luminal hydrolases that are abundant in inner hydrolytic compartments, which seems to have been achieved in many lysosomal membrane proteins by extensive N-glycosylation of the luminal domains, has instead been obtained in the vast majority of tonoplast proteins by limiting the length of such domains.
Assuntos
Proteínas de Arabidopsis/metabolismo , Arabidopsis/metabolismo , Glicoproteínas/metabolismo , Membranas Intracelulares/metabolismo , Lisossomos/metabolismo , Polissacarídeos/metabolismo , Vacúolos/metabolismo , Animais , Proteínas de Arabidopsis/química , Membrana Celular/metabolismo , Simulação por Computador , Retículo Endoplasmático/metabolismo , Glicoproteínas/química , Glicosilação , Proteínas de Membrana/metabolismo , Microssomos/metabolismo , Oligossacarídeos/metabolismo , Peptídeos/metabolismo , Proteoma/metabolismo , RatosRESUMO
BACKGROUND: In recent years, the use of genomic information in livestock species for genetic improvement, association studies and many other fields has become routine. In order to accommodate different market requirements in terms of genotyping cost, manufacturers of single nucleotide polymorphism (SNP) arrays, private companies and international consortia have developed a large number of arrays with different content and different SNP density. The number of currently available SNP arrays differs among species: ranging from one for goats to more than ten for cattle, and the number of arrays available is increasing rapidly. However, there is limited or no effort to standardize and integrate array- specific (e.g. SNP IDs, allele coding) and species-specific (i.e. past and current assemblies) SNP information. RESULTS: Here we present SNPchiMp v.3, a solution to these issues for the six major livestock species (cow, pig, horse, sheep, goat and chicken). Original data was collected directly from SNP array producers and specific international genome consortia, and stored in a MySQL database. The database was then linked to an open-access web tool and to public databases. SNPchiMp v.3 ensures fast access to the database (retrieving within/across SNP array data) and the possibility of annotating SNP array data in a user-friendly fashion. CONCLUSIONS: This platform allows easy integration and standardization, and it is aimed at both industry and research. It also enables users to easily link the information available from the array producer with data in public databases, without the need of additional bioinformatics tools or pipelines. In recognition of the open-access use of Ensembl resources, SNPchiMp v.3 was officially credited as an Ensembl E!mpowered tool. Availability at http://bioinformatics.tecnoparco.org/SNPchimp.
Assuntos
Bases de Dados Genéticas , Polimorfismo de Nucleotídeo Único , Animais , Bovinos , Biologia Computacional , Genoma , Cabras/genética , Internet , Especificidade da Espécie , Interface Usuário-ComputadorRESUMO
BACKGROUND: Porcine reproductive and respiratory syndrome (PRRS) is one of the most significant swine diseases worldwide. Despite its relevance, serum biomarkers associated with early-onset viral infection, when clinical signs are not detectable and the disease is characterized by a weak anti-viral response and persistent infection, have not yet been identified. Surface-enhanced laser desorption ionization time of flight mass spectrometry (SELDI-TOF MS) is a reproducible, accurate, and simple method for the identification of biomarker proteins related to disease in serum. This work describes the SELDI-TOF MS analyses of sera of 60 PRRSV-positive and 60 PRRSV-negative, as measured by PCR, asymptomatic Large White piglets at weaning. Sera with comparable and low content of hemoglobin (< 4.52 µg/mL) were fractionated in 6 different fractions by anion-exchange chromatography and protein profiles in the mass range 1-200 kDa were obtained with the CM10, IMAC30, and H50 surfaces. RESULTS: A total of 200 significant peaks (p < 0.05) were identified in the initial discovery phase of the study and 47 of them were confirmed in the validation phase. The majority of peaks (42) were up-regulated in PRRSV-positive piglets, while 5 were down-regulated. A panel of 14 discriminatory peaks identified in fraction 1 (pH = 9), on the surface CM10, and acquired at low focus mass provided a serum protein profile diagnostic pattern that enabled to discriminate between PRRSV-positive and -negative piglets with a sensitivity and specificity of 77% and 73%, respectively. CONCLUSIONS: SELDI-TOF MS profiling of sera from PRRSV-positive and PRRSV-negative asymptomatic piglets provided a proteomic signature with large scale diagnostic potential for early identification of PRRSV infection in weaning piglets. Furthermore, SELDI-TOF protein markers represent a refined phenotype of PRRSV infection that might be useful for whole genome association studies.
RESUMO
BACKGROUND: The NCBI dbEST currently contains more than eight million human Expressed Sequenced Tags (ESTs). This wide collection represents an important source of information for gene expression studies, provided it can be inspected according to biologically relevant criteria. EST data can be browsed using different dedicated web resources, which allow to investigate library specific gene expression levels and to make comparisons among libraries, highlighting significant differences in gene expression. Nonetheless, no tool is available to examine distributions of quantitative EST collections in Gene Ontology (GO) categories, nor to retrieve information concerning library-dependent EST involvement in metabolic pathways. In this work we present the Human EST Ontology Explorer (HEOE) http://www.itb.cnr.it/ptp/human_est_explorer, a web facility for comparison of expression levels among libraries from several healthy and diseased tissues. RESULTS: The HEOE provides library-dependent statistics on the distribution of sequences in the GO Direct Acyclic Graph (DAG) that can be browsed at each GO hierarchical level. The tool is based on large-scale BLAST annotation of EST sequences. Due to the huge number of input sequences, this BLAST analysis was performed with the aid of grid computing technology, which is particularly suitable to address data parallel task. Relying on the achieved annotation, library-specific distributions of ESTs in the GO Graph were inferred. A pathway-based search interface was also implemented, for a quick evaluation of the representation of libraries in metabolic pathways. EST processing steps were integrated in a semi-automatic procedure that relies on Perl scripts and stores results in a MySQL database. A PHP-based web interface offers the possibility to simultaneously visualize, retrieve and compare data from the different libraries. Statistically significant differences in GO categories among user selected libraries can also be computed. CONCLUSION: The HEOE provides an alternative and complementary way to inspect EST expression levels with respect to approaches currently offered by other resources. Furthermore, BLAST computation on the whole human EST dataset was a suitable test of grid scalability in the context of large-scale bioinformatics analysis. The HEOE currently comprises sequence analysis from 70 non-normalized libraries, representing a comprehensive overview on healthy and unhealthy tissues. As the analysis procedure can be easily applied to other libraries, the number of represented tissues is intended to increase.
Assuntos
Biologia Computacional/métodos , Etiquetas de Sequências Expressas , Bases de Dados Genéticas , Biblioteca Gênica , Humanos , Análise de Sequência com Séries de Oligonucleotídeos , Interface Usuário-ComputadorRESUMO
BACKGROUND: With the rapid growth in the availability of genome sequence data, the automated identification of orthologous genes between species (orthologs) is of fundamental importance to facilitate functional annotation and studies on comparative and evolutionary genomics. Genes with no apparent orthologs between the bovine and human genome may be responsible for major differences between the species, however, such genes are often neglected in functional genomics studies. RESULTS: A BLAST-based method was exploited to explore the current annotation and orthology predictions in Ensembl. Genes with no orthologs between the two genomes were classified into groups based on alignments, ontology, manual curation and publicly available information. Starting from a high quality and specific set of orthology predictions, as provided by Ensembl, hidden relationship between genes and genomes of different mammalian species were unveiled using a highly sensitive approach, based on sequence similarity and genomic comparison. CONCLUSIONS: The analysis identified 3,801 bovine genes with no orthologs in human and 1010 human genes with no orthologs in cow, among which 411 and 43 genes, respectively, had no match at all in the other species. Most of the apparently non-orthologous genes may potentially have orthologs which were missed in the annotation process, despite having a high percentage of identity, because of differences in gene length and structure. The comparative analysis reported here identified gene variants, new genes and species-specific features and gave an overview of the other side of orthology which may help to improve the annotation of the bovine genome and the knowledge of structural differences between species.
Assuntos
Bovinos/genética , Hibridização Genômica Comparativa , Animais , Biologia Computacional , Cães , Genoma Humano , Biblioteca Genômica , Genômica/métodos , Humanos , Camundongos , Alinhamento de Sequência , Análise de Sequência de DNA , Especificidade da EspécieRESUMO
BACKGROUND: Two complete genome sequences are available for Vitis vinifera Pinot noir. Based on the sequence and gene predictions produced by the IASMA, we performed an in silico detection of putative microRNA genes and of their targets, and collected the most reliable microRNA predictions in a web database. The application is available at http://www.itb.cnr.it/ptp/grapemirna/. DESCRIPTION: The program FindMiRNA was used to detect putative microRNA genes in the grape genome. A very high number of predictions was retrieved, calling for validation. Nine parameters were calculated and, based on the grape microRNAs dataset available at miRBase, thresholds were defined and applied to FindMiRNA predictions having targets in gene exons. In the resulting subset, predictions were ranked according to precursor positions and sequence similarity, and to target identity. To further validate FindMiRNA predictions, comparisons to the Arabidopsis genome, to the grape Genoscope genome, and to the grape EST collection were performed. Results were stored in a MySQL database and a web interface was prepared to query the database and retrieve predictions of interest. CONCLUSION: The GrapeMiRNA database encompasses 5,778 microRNA predictions spanning the whole grape genome. Predictions are integrated with information that can be of use in selection procedures. Tools added in the web interface also allow to inspect predictions according to gene ontology classes and metabolic pathways of targets. The GrapeMiRNA database can be of help in selecting candidate microRNA genes to be validated.
Assuntos
Bases de Dados de Ácidos Nucleicos , MicroRNAs/genética , RNA de Plantas/genética , Vitis/genética , Biologia Computacional , Etiquetas de Sequências Expressas , Genoma de Planta , Genômica/métodos , Internet , Análise de Sequência de RNA/métodos , Interface Usuário-ComputadorRESUMO
BACKGROUND: Tissue MicroArray technique is becoming increasingly important in pathology for the validation of experimental data from transcriptomic analysis. This approach produces many images which need to be properly managed, if possible with an infrastructure able to support tissue sharing between institutes. Moreover, the available frameworks oriented to Tissue MicroArray provide good storage for clinical patient, sample treatment and block construction information, but their utility is limited by the lack of data integration with biomolecular information. RESULTS: In this work we propose a Tissue MicroArray web oriented system to support researchers in managing bio-samples and, through the use of ontologies, enables tissue sharing aimed at the design of Tissue MicroArray experiments and results evaluation. Indeed, our system provides ontological description both for pre-analysis tissue images and for post-process analysis image results, which is crucial for information exchange. Moreover, working on well-defined terms it is then possible to query web resources for literature articles to integrate both pathology and bioinformatics data. CONCLUSIONS: Using this system, users associate an ontology-based description to each image uploaded into the database and also integrate results with the ontological description of biosequences identified in every tissue. Moreover, it is possible to integrate the ontological description provided by the user with a full compliant gene ontology definition, enabling statistical studies about correlation between the analyzed pathology and the most commonly related biological processes.
Assuntos
Sistemas de Gerenciamento de Base de Dados , Sistemas Computadorizados de Registros Médicos/organização & administração , Processamento de Linguagem Natural , Terminologia como Assunto , Análise Serial de Tecidos/métodos , Algoritmos , Bases de Dados Factuais , Itália , Semântica , SoftwareRESUMO
BACKGROUND: The ESTree database (db) is a collection of Prunus persica and Prunus dulcis EST sequences that in its current version encompasses 75,404 sequences from 3 almond and 19 peach libraries. Nine peach genotypes and four peach tissues are represented, from four fruit developmental stages. The aim of this work was to implement the already existing ESTree db by adding new sequences and analysis programs. Particular care was given to the implementation of the web interface, that allows querying each of the database features. RESULTS: A Perl modular pipeline is the backbone of sequence analysis in the ESTree db project. Outputs obtained during the pipeline steps are automatically arrayed into the fields of a MySQL database. Apart from standard clustering and annotation analyses, version VI of the ESTree db encompasses new tools for tandem repeat identification, annotation against genomic Rosaceae sequences, and positioning on the database of oligomer sequences that were used in a peach microarray study. Furthermore, known protein patterns and motifs were identified by comparison to PROSITE. Based on data retrieved from sequence annotation against the UniProtKB database, a script was prepared to track positions of homologous hits on the GO tree and build statistics on the ontologies distribution in GO functional categories. EST mapping data were also integrated in the database. The PHP-based web interface was upgraded and extended. The aim of the authors was to enable querying the database according to all the biological aspects that can be investigated from the analysis of data available in the ESTree db. This is achieved by allowing multiple searches on logical subsets of sequences that represent different biological situations or features. CONCLUSIONS: The version VI of ESTree db offers a broad overview on peach gene expression. Sequence analyses results contained in the database, extensively linked to external related resources, represent a large amount of information that can be queried via the tools offered in the web interface. Flexibility and modularity of the ESTree analysis pipeline and of the web interface allowed the authors to set up similar structures for different datasets, with limited manual intervention.
Assuntos
Mapeamento Cromossômico/métodos , Bases de Dados Genéticas , Etiquetas de Sequências Expressas , Proteínas de Plantas/genética , Proteoma/genética , Prunus/genética , Software , Fatores de Transcrição/genética , Sequência de Bases , Sistemas de Gerenciamento de Base de Dados , Armazenamento e Recuperação da Informação/métodos , Dados de Sequência Molecular , Interface Usuário-ComputadorRESUMO
UNLABELLED: The GoSh database is a collection of 58 990 Capra hircus and Ovis aries expressed sequence tags. A perl pipeline was prepared to process sequences, and data were collected in a MySQL database. A PHP-based web interface allows browsing and querying the database. Putative single nucleotide polymorphism (SNP) detection, as well as search to repeats were performed, and links to external related resources were provided. Sequences were annotated against three different databases and an algorithm was implemented to create statistics of the distribution of retrieved homologous ontologies in the Gene Ontology categories. The GoSh database is a repository of data and links related to goat and sheep expressed genes. AVAILABILITY: The GoSh database is available at http://www.itb.cnr.it/gosh/
Assuntos
Bases de Dados Genéticas , Etiquetas de Sequências Expressas , Cabras/genética , Armazenamento e Recuperação da Informação/métodos , Internet , Ovinos/genética , Interface Usuário-Computador , Algoritmos , Animais , Gráficos por Computador , Sistemas de Gerenciamento de Base de Dados , Análise de Sequência de DNA/métodosRESUMO
Online resources for the bovine genome analysis are provided at the most important Web sites. Nonetheless, retrieval of single-nucleotide polymorphism (SNP)-related information is not always easy when searches must focus on complementary features. In this work, we present the Bovine SNP Retriever: a user-friendly tool for bovine SNP retrieval that also facilities the retrieval of SNP-related information within user-selected quantitative traits loci regions and reverse electronic polymerase chain reaction analysis on the bovine genome. The Bovine SNP Retriever is available at http://www.itb.cnr.it/ptp/bovine_snp_retriever/.
Assuntos
Bovinos/genética , Genoma , Polimorfismo de Nucleotídeo Único , Software , Animais , Bases de Dados de Ácidos NucleicosRESUMO
Despite the availability of whole genome sequences of apple and peach, there has been a considerable gap between genomics and breeding. To bridge the gap, the European Union funded the FruitBreedomics project (March 2011 to August 2015) involving 28 research institutes and private companies. Three complementary approaches were pursued: (i) tool and software development, (ii) deciphering genetic control of main horticultural traits taking into account allelic diversity and (iii) developing plant materials, tools and methodologies for breeders. Decisive breakthroughs were made including the making available of ready-to-go DNA diagnostic tests for Marker Assisted Breeding, development of new, dense SNP arrays in apple and peach, new phenotypic methods for some complex traits, software for gene/QTL discovery on breeding germplasm via Pedigree Based Analysis (PBA). This resulted in the discovery of highly predictive molecular markers for traits of horticultural interest via PBA and via Genome Wide Association Studies (GWAS) on several European genebank collections. FruitBreedomics also developed pre-breeding plant materials in which multiple sources of resistance were pyramided and software that can support breeders in their selection activities. Through FruitBreedomics, significant progresses were made in the field of apple and peach breeding, genetics, genomics and bioinformatics of which advantage will be made by breeders, germplasm curators and scientists. A major part of the data collected during the project has been stored in the FruitBreedomics database and has been made available to the public. This review covers the scientific discoveries made in this major endeavour, and perspective in the apple and peach breeding and genomics in Europe and beyond.
RESUMO
BACKGROUND: The ESTuber database (http://www.itb.cnr.it/estuber) includes 3,271 Tuber borchii expressed sequence tags (EST). The dataset consists of 2,389 sequences from an in-house prepared cDNA library from truffle vegetative hyphae, and 882 sequences downloaded from GenBank and representing four libraries from white truffle mycelia and ascocarps at different developmental stages. An automated pipeline was prepared to process EST sequences using public software integrated by in-house developed Perl scripts. Data were collected in a MySQL database, which can be queried via a php-based web interface. RESULTS: Sequences included in the ESTuber db were clustered and annotated against three databases: the GenBank nr database, the UniProtKB database and a third in-house prepared database of fungi genomic sequences. An algorithm was implemented to infer statistical classification among Gene Ontology categories from the ontology occurrences deduced from the annotation procedure against the UniProtKB database. Ontologies were also deduced from the annotation of more than 130,000 EST sequences from five filamentous fungi, for intra-species comparison purposes. Further analyses were performed on the ESTuber db dataset, including tandem repeats search and comparison of the putative protein dataset inferred from the EST sequences to the PROSITE database for protein patterns identification. All the analyses were performed both on the complete sequence dataset and on the contig consensus sequences generated by the EST assembly procedure. CONCLUSION: The resulting web site is a resource of data and links related to truffle expressed genes. The Sequence Report and Contig Report pages are the web interface core structures which, together with the Text search utility and the Blast utility, allow easy access to the data stored in the database.
Assuntos
Ascomicetos/genética , Mapeamento Cromossômico/métodos , DNA Fúngico/genética , Bases de Dados Genéticas , Armazenamento e Recuperação da Informação/métodos , Análise de Sequência de DNA/métodos , Interface Usuário-Computador , Sistemas de Gerenciamento de Base de Dados , Internet , Sistemas On-Line , Alinhamento de Sequência/métodosRESUMO
BACKGROUND: Activated Protein C (ProC) is an anticoagulant plasma serine protease which also plays an important role in controlling inflammation and cell proliferation. Several mutations of the gene are associated with phenotypic functional deficiency of protein C, and with the risk of developing venous thrombosis. Structure prediction and computational analysis of the mutants have proven to be a valuable aid in understanding the molecular aspects of clinical thrombophilia. RESULTS: We have built a specialized relational database and a search tool for natural mutants of protein C. It contains 195 entries that include 182 missense and 13 stop mutations. A menu driven search engine allows the user to retrieve stored information for each variant, that include genetic as well as structural data and a multiple alignment highlighting the substituted position. Molecular models of variants can be visualized with interactive tools; PDB coordinates of the models are also available for further analysis. Furthermore, an automatic modelling interface allows the user to generate multiple alignments and 3D models of new variants. CONCLUSION: ProCMD is an up-to-date interactive mutant database that integrates phenotypical descriptions with functional and structural data obtained by computational approaches. It will be useful in the research and clinical fields to help elucidate the chain of events leading from a molecular defect to the related disease. It is available for academics at the URL http://www.itb.cnr.it/procmd/.
Assuntos
Sistemas de Gerenciamento de Base de Dados , Bases de Dados de Proteínas , Imageamento Tridimensional/métodos , Armazenamento e Recuperação da Informação/métodos , Internet , Proteína C/química , Proteína C/genética , Interface Usuário-Computador , Sequência de Aminoácidos , Gráficos por Computador , Dados de Sequência Molecular , MutaçãoRESUMO
Alzheimer's disease (AD) is considered to be a conformational disease arising from the accumulation of misfolded and unfolded proteins in the endoplasmic reticulum (ER). SEL1L is a component of the ER stress degradation system, which serves to remove unfolded proteins by retrograde degradation using the ubiquitin-proteosome system. In order to identify genetic variations possibly involved in the disease, we analysed the entire SEL1L gene sequence in Italian sporadic AD patients. Here we report on the identification of a new polymorphism within the SEL1L intron 3 (IVS3-88 A>G), which contains potential binding sites for transcription factors involved in ER-induced stress. Our statistical analysis shows a possible role of the novel polymorphism as independent susceptibility factor of Alzheimer's dementia.
Assuntos
Doença de Alzheimer/genética , Predisposição Genética para Doença , Proteínas/genética , Idoso , Feminino , Humanos , Íntrons , Masculino , Polimorfismo GenéticoRESUMO
BACKGROUND: The ESTree db http://www.itb.cnr.it/estree/ represents a collection of Prunus persica expressed sequenced tags (ESTs) and is intended as a resource for peach functional genomics. A total of 6,155 successful EST sequences were obtained from four in-house prepared cDNA libraries from Prunus persica mesocarps at different developmental stages. Another 12,475 peach EST sequences were downloaded from public databases and added to the ESTree db. An automated pipeline was prepared to process EST sequences using public software integrated by in-house developed Perl scripts and data were collected in a MySQL database. A php-based web interface was developed to query the database. RESULTS: The ESTree db version as of April 2005 encompasses 18,630 sequences representing eight libraries. Contig assembly was performed with CAP3. Putative single nucleotide polymorphism (SNP) detection was performed with the AutoSNP program and a search engine was implemented to retrieve results. All the sequences and all the contig consensus sequences were annotated both with blastx against the GenBank nr db and with GOblet against the viridiplantae section of the Gene Ontology db. Links to NiceZyme (Expasy) and to the KEGG metabolic pathways were provided. A local BLAST utility is available. A text search utility allows querying and browsing the database. Statistics were provided on Gene Ontology occurrences to assign sequences to Gene Ontology categories. CONCLUSION: The resulting database is a comprehensive resource of data and links related to peach EST sequences. The Sequence Report and Contig Report pages work as the web interface core structures, giving quick access to data related to each sequence/contig.
Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Etiquetas de Sequências Expressas , Regulação da Expressão Gênica de Plantas , Genes de Plantas , Genômica/métodos , Prunus/genética , Mapeamento Cromossômico , DNA Complementar/metabolismo , Sistemas de Gerenciamento de Base de Dados , Biblioteca Gênica , Genoma de Planta , Internet , Polimorfismo de Nucleotídeo Único , Linguagens de Programação , Alinhamento de Sequência , Análise de Sequência de DNA , Software , Interface Usuário-ComputadorRESUMO
Sturgeons are archaic fishes phylogenetically distinct from Teleosts. They represent an important niche for aquaculture, particularly for the production of caviar and high quality fillets, while many natural populations in various world areas are today threatened by extinction. Knowledge of the sturgeon genome is limited, as it is the case of many other species of interest for fishery, aquaculture and conservation. Sequences from non-normalized libraries of skin and spleen of the American sturgeon (A. transmontanus) produced in our laboratories were analysed via a bioinformatic procedure, and compared to similar resources available for three Teleost species. Data collected during the analyses were stored in a database - the Sturgeon database (db) - that can be queried via a web interface. The Sturgeon db contains a total of 16,404 sequences from Acipenser transmontanus, Ictalurus punctatus, Salmo salar and Takifugu rubripes, each specie being represented by expressed sequence tags (ESTs) from skin and spleen. Data contained in the database are the results of a number of analyses that mostly focus on sequence annotation and intra- and inter-species comparison. Putative SNP sites, tandem repeats, and sequences matching known protein patterns and motifs were also identified. The Sturgeon db is by now the only online resource dedicated to the analysis of A. transmontanus EST sequences, and represents a starting point for the investigation of the genome of sturgeons from a physiological perspective; it will be used to identify polymorphic markers to study, for example, fish pathologies or to survey fish disease resistance, and to produce gene expression arrays. Introduction of sequences from other species in the analysis pipeline allowed inter-species comparisons of transcripts distribution in Gene Ontology categories, as well as orthologs identification, despite the high sturgeon phylogenetic distance from other fish species. As a result of the EST analysis procedure, 1058 sturgeon novel unigenes were identified.