Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
1.
BMC Bioinformatics ; 17(Suppl 19): 513, 2016 Dec 22.
Artigo em Inglês | MEDLINE | ID: mdl-28155708

RESUMO

BACKGROUND: Next-generation sequencing promises the de novo genomic and transcriptomic analysis of samples of interests. However, there are only a few organisms having reference genomic sequences and even fewer having well-defined or curated annotations. For transcriptome studies focusing on organisms lacking proper reference genomes, the common strategy is de novo assembly followed by functional annotation. However, things become even more complicated when multiple transcriptomes are compared. RESULTS: Here, we propose a new analysis strategy and quantification methods for quantifying expression level which not only generate a virtual reference from sequencing data, but also provide comparisons between transcriptomes. First, all reads from the transcriptome datasets are pooled together for de novo assembly. The assembled contigs are searched against NCBI NR databases to find potential homolog sequences. Based on the searched result, a set of virtual transcripts are generated and served as a reference transcriptome. By using the same reference, normalized quantification values including RC (read counts), eRPKM (estimated RPKM) and eTPM (estimated TPM) can be obtained that are comparable across transcriptome datasets. In order to demonstrate the feasibility of our strategy, we implement it in the web service PARRoT. PARRoT stands for Pipeline for Analyzing RNA Reads of Transcriptomes. It analyzes gene expression profiles for two transcriptome sequencing datasets. For better understanding of the biological meaning from the comparison among transcriptomes, PARRoT further provides linkage between these virtual transcripts and their potential function through showing best hits in SwissProt, NR database, assigning GO terms. Our demo datasets showed that PARRoT can analyze two paired-end transcriptomic datasets of approximately 100 million reads within just three hours. CONCLUSIONS: In this study, we proposed and implemented a strategy to analyze transcriptomes from non-reference organisms which offers the opportunity to quantify and compare transcriptome profiles through a homolog based virtual transcriptome reference. By using the homolog based reference, our strategy effectively avoids the problems that may cause from inconsistencies among transcriptomes. This strategy will shed lights on the field of comparative genomics for non-model organism. We have implemented PARRoT as a web service which is freely available at http://parrot.cgu.edu.tw .


Assuntos
Cnidários/genética , Perfilação da Expressão Gênica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Modelos Biológicos , Análise de Sequência de RNA/métodos , Software , Transcriptoma , Animais , Genômica/métodos , Internet , Anotação de Sequência Molecular
2.
BMC Genomics ; 16: 648, 2015 Aug 28.
Artigo em Inglês | MEDLINE | ID: mdl-26315384

RESUMO

BACKGROUND: Whole genome sequence construction is becoming increasingly feasible because of advances in next generation sequencing (NGS), including increasing throughput and read length. By simply overlapping paired-end reads, we can obtain longer reads with higher accuracy, which can facilitate the assembly process. However, the influences of different library sizes and assembly methods on paired-end sequencing-based de novo assembly remain poorly understood. RESULTS: We used 250 bp Illumina Miseq paired-end reads of different library sizes generated from genomic DNA from Escherichia coli DH1 and Streptococcus parasanguinis FW213 to compare the assembly results of different library sizes and assembly approaches. Our data indicate that overlapping paired-end reads can increase read accuracy but sometimes cause insertion or deletions. Regarding genome assembly, merged reads only outcompete original paired-end reads when coverage depth is low, and larger libraries tend to yield better assembly results. These results imply that distance information is the most critical factor during assembly. Our results also indicate that when depth is sufficiently high, assembly from subsets can sometimes produce better results. CONCLUSIONS: In summary, this study provides systematic evaluations of de novo assembly from paired end sequencing data. Among the assembly strategies, we find that overlapping paired-end reads is not always beneficial for bacteria genome assembly and should be avoided or used with caution especially for genomes containing high fraction of repetitive sequences. Because increasing numbers of projects aim at bacteria genome sequencing, our study provides valuable suggestions for the field of genomic sequence construction.


Assuntos
Escherichia coli/genética , Genoma Bacteriano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Streptococcus/genética , Pareamento Incorreto de Bases/genética , Pareamento de Bases/genética , Mapeamento de Sequências Contíguas , Biblioteca Gênica , Genes Bacterianos , Mutação INDEL/genética , Padrões de Referência
3.
BMC Genomics ; 15: 539, 2014 Jun 30.
Artigo em Inglês | MEDLINE | ID: mdl-24974934

RESUMO

BACKGROUND: Chromatin is a dynamic but highly regulated structure. DNA-binding proteins such as transcription factors, epigenetic and chromatin modifiers are responsible for regulating specific gene expression pattern and may result in different phenotypes. To reveal the identity of the proteins associated with the specific region on DNA, chromatin immunoprecipitation (ChIP) is the most widely used technique. ChIP assay followed by next generation sequencing (ChIP-seq) or microarray (ChIP-chip) is often used to study patterns of protein-binding profiles in different cell types and in cancer samples on a genome-wide scale. However, only a limited number of bioinformatics tools are available for ChIP datasets analysis. RESULTS: We present ChIPseek, a web-based tool for ChIP data analysis providing summary statistics in graphs and offering several commonly demanded analyses. ChIPseek can provide statistical summary of the dataset including histogram of peak length distribution, histogram of distances to the nearest transcription start site (TSS), and pie chart (or bar chart) of genomic locations for users to have a comprehensive view on the dataset for further analysis. For examining the potential functions of peaks, ChIPseek provides peak annotation, visualization of peak genomic location, motif identification, sequence extraction, and comparison between datasets. Beyond that, ChIPseek also offers users the flexibility to filter peaks and re-analyze the filtered subset of peaks. ChIPseek supports 20 different genome assemblies for 12 model organisms including human, mouse, rat, worm, fly, frog, zebrafish, chicken, yeast, fission yeast, Arabidopsis, and rice. We use demo datasets to demonstrate the usage and intuitive user interface of ChIPseek. CONCLUSIONS: ChIPseek provides a user-friendly interface for biologists to analyze large-scale ChIP data without requiring any programing skills. All the results and figures produced by ChIPseek can be downloaded for further analysis. The analysis tools built into ChIPseek, especially the ones for selecting and examine a subset of peaks from ChIP data, provides invaluable helps for exploring the high through-put data from either ChIP-seq or ChIP-chip. ChIPseek is freely available at http://chipseek.cgu.edu.tw.


Assuntos
Imunoprecipitação da Cromatina , Sequenciamento de Nucleotídeos em Larga Escala , Software , Navegador , Animais , Biologia Computacional/métodos , Genômica/métodos , Humanos
4.
Genomics ; 100(3): 149-56, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-22735743

RESUMO

During the viral infection and replication processes, viral proteins are highly regulated and may interact with host proteins. However, the functions and interaction partners of many viral proteins have yet to be explored. Here, we compiled a VIral Protein domain DataBase (VIP DB) to associate viral proteins with putative functions and interaction partners. We systematically assign domains and infer the functions of proteins and their protein interaction partners from their domain annotations. A total of 2,322 unique domains that were identified from 2,404 viruses are used as a starting point to correlate GO classification, KEGG metabolic pathway annotation and domain-domain interactions. Of the unique domains, 42.7% have GO records, 39.6% have at least one domain-domain interaction record and 26.3% can also be found in either mammals or plants. This database provides a resource to help virologists identify potential roles for viral protein. All of the information is available at http://vipdb.cgu.edu.tw.


Assuntos
Bases de Dados de Proteínas , Interface Usuário-Computador , Proteínas Virais/química , Vírus/química , Animais , Biologia Computacional/métodos , Humanos , Internet , Redes e Vias Metabólicas , Anotação de Sequência Molecular , Mapeamento de Interação de Proteínas , Estrutura Terciária de Proteína , Relação Estrutura-Atividade , Proteínas Virais/análise
5.
iScience ; 26(8): 107269, 2023 Aug 18.
Artigo em Inglês | MEDLINE | ID: mdl-37609633

RESUMO

We present DoSurvive, a user-friendly survival analysis web tool and a cancer prognostic biomarker centered database. DoSurvive is the first database that allows users to perform multivariant survival analysis for cancers with customized gene/patient list. DoSurvive offers three survival analysis methods, Log rank test, Cox regression and accelerated failure time model (AFT), for users to analyze five types of quantitative features (mRNA, miRNA, lncRNA, protein and methylation of CpG islands) with four survival types, i.e. overall survival, disease-specific survival, disease-free interval, and progression-free interval, in 33 cancer types. Notably, the implemented AFT model provides an alternative method for genes/features which failed the proportional hazard assumption in Cox regression. With the unprecedented number of survival models implemented and high flexibility in analysis, DoSurvive is a unique platform for the identification of clinically relevant targets for cancer researcher and practitioners. DoSurvive is freely available at http://dosurvive.lab.nycu.edu.tw/.

6.
BMC Genomics ; 13 Suppl 7: S12, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23282184

RESUMO

BACKGROUND: Researches have been conducted for the identification of differentially expressed genes (DEGs) by generating and mining of cDNA expressed sequence tags (ESTs) for more than a decade. Although the availability of public databases make possible the comprehensive mining of DEGs among the ESTs from multiple tissue types, existing studies usually employed statistics suitable only for two categories. Multi-class test has been developed to enable the finding of tissue specific genes, but subsequent search for cancer genes involves separate two-category test only on the ESTs of the tissue of interest. This constricts the amount of data used. On the other hand, simple pooling of cancer and normal genes from multiple tissue types runs the risk of Simpson's paradox. Here we presented a different approach which searched for multi-cancer DEG candidates by analyzing all pertinent ESTs in all categories and narrowing down the cancer biomarker candidates via integrative analysis with microarray data and selection of secretory and membrane protein genes as well as incorporation of network analysis. Finally, the differential expression patterns of three selected cancer biomarker candidates were confirmed by real-time qPCR analysis. RESULTS: Seven hundred and twenty three primary DEG candidates (p-value < 0.05 and lower bound of confidence interval of odds ratio ≥ 1.65) were selected from a curated EST database with the application of Cochran-Mantel-Haenszel statistic (CMH). GeneGO analysis results indicated this set as neoplasm enriched. Cross-examination with microarray data further narrowed the list down to 235 genes, among which 96 had membrane or secretory annotations. After examined the candidates in protein interaction network, public tissue expression databases, and literatures, we selected three genes for further evaluation by real-time qPCR with eight major normal and cancer tissues. The higher-than-normal tissue expression of COL3A1, DLG3, and RNF43 in some of the cancer tissues is in agreement with our in silico predictions. CONCLUSIONS: Searching digitized transcriptome using CMH enabled us to identify multi-cancer differentially expressed gene candidates. Our methodology demonstrated simultaneously analysis for cancer biomarkers of multiple tissue types with the EST data. With the revived interest in digitizing the transcriptomes by NGS, cancer biomarkers could be more precisely detected from the ESTs. The three candidates identified in this study, COL3A1, DLG3, and RNF43, are valuable targets for further evaluation with a larger sample size of normal and cancer tissue or serum samples.


Assuntos
Biomarcadores Tumorais/metabolismo , Etiquetas de Sequências Expressas , Biomarcadores Tumorais/genética , Colágeno Tipo III/genética , Colágeno Tipo III/metabolismo , Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/metabolismo , Bases de Dados Factuais , Redes Reguladoras de Genes , Genoma Humano , Humanos , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismo , Razão de Chances , Proteínas Oncogênicas/genética , Proteínas Oncogênicas/metabolismo , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Ubiquitina-Proteína Ligases
7.
BMC Genomics ; 13 Suppl 7: S9, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23281853

RESUMO

BACKGROUND: Recent developments in high-throughput sequencing (HTS) technologies have made it feasible to sequence the complete transcriptomes of non-model organisms or metatranscriptomes from environmental samples. The challenge after generating hundreds of millions of sequences is to annotate these transcripts and classify the transcripts based on their putative functions. Because many biological scientists lack the knowledge to install Linux-based software packages or maintain databases used for transcript annotation, we developed an automatic annotation tool with an easy-to-use interface. METHODS: To elucidate the potential functions of gene transcripts, we integrated well-established annotation tools: Blast2GO, PRIAM and RPS BLAST in a web-based service, FastAnnotator, which can assign Gene Ontology (GO) terms, Enzyme Commission numbers (EC numbers) and functional domains to query sequences. RESULTS: Using six transcriptome sequence datasets as examples, we demonstrated the ability of FastAnnotator to assign functional annotations. FastAnnotator annotated 88.1% and 81.3% of the transcripts from the well-studied organisms Caenorhabditis elegans and Streptococcus parasanguinis, respectively. Furthermore, FastAnnotator annotated 62.9%, 20.4%, 53.1% and 42.0% of the sequences from the transcriptomes of sweet potato, clam, amoeba, and Trichomonas vaginalis, respectively, which lack reference genomes. We demonstrated that FastAnnotator can complete the annotation process in a reasonable amount of time and is suitable for the annotation of transcriptomes from model organisms or organisms for which annotated reference genomes are not avaiable. CONCLUSIONS: The sequencing process no longer represents the bottleneck in the study of genomics, and automatic annotation tools have become invaluable as the annotation procedure has become the limiting step. We present FastAnnotator, which was an automated annotation web tool designed to efficiently annotate sequences with their gene functions, enzyme functions or domains. FastAnnotator is useful in transcriptome studies and especially for those focusing on non-model organisms or metatranscriptomes. FastAnnotator does not require local installation and is freely available at http://fastannotator.cgu.edu.tw.


Assuntos
Software , Transcriptoma/genética , Animais , Sequência de Bases , Caenorhabditis elegans/genética , Bases de Dados Genéticas , Genoma , Genoma Bacteriano , Internet , Streptococcus/genética , Interface Usuário-Computador
8.
BMC Genomics ; 12 Suppl 3: S16, 2011 Nov 30.
Artigo em Inglês | MEDLINE | ID: mdl-22369477

RESUMO

BACKGROUND: Gene duplication provides resources for developing novel genes and new functions while retaining the original functions. In addition, alternative splicing could increase the complexity of expression at the transcriptome and proteome level without increasing the number of gene copy in the genome. Duplication and alternative splicing are thought to work together to provide the diverse functions or expression patterns for eukaryotes. Previously, it was believed that duplication and alternative splicing were negatively correlated and probably interchangeable. RESULTS: We look into the relationship between occurrence of alternative splicing and duplication at different time after duplication events. We found duplication and alternative splicing were indeed inversely correlated if only recently duplicated genes were considered, but they became positively correlated when we took those ancient duplications into account. Specifically, for slightly or moderately duplicated genes with gene families containing 2 - 7 paralogs, genes were more likely to evolve alternative splicing and had on average a greater number of alternative splicing isoforms after long-term evolution compared to singleton genes. On the other hand, those large gene families (contain at least 8 paralogs) had a lower proportion of alternative splicing, and fewer alternative splicing isoforms on average even when ancient duplicated genes were taken into consideration. We also found these duplicated genes having alternative splicing were under tighter evolutionary constraints compared to those having no alternative splicing, and had an enrichment of genes that participate in molecular transducer activities. CONCLUSIONS: We studied the association between occurrences of alternative splicing and gene duplication. Our results implicate that there are key differences in functions and evolutionary constraints among singleton genes or duplicated genes with or without alternative splicing incidences. It implies that the gene duplication and alternative splicing may have different functional significance in the evolution of speciation diversity.


Assuntos
Processamento Alternativo , Evolução Molecular , Duplicação Gênica , Animais , Bases de Dados Factuais , Humanos , Camundongos , Isoformas de Proteínas/química , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Proteínas/química , Proteínas/genética , Proteínas/metabolismo , Ferramenta de Busca
9.
BMC Bioinformatics ; 11 Suppl 7: S6, 2010 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-21106128

RESUMO

BACKGROUND: Orthologs are genes derived from the same ancestor gene loci after speciation events. Orthologous proteins usually have similar sequences and perform comparable biological functions. Therefore, ortholog identification is useful in annotations of newly sequenced genomes. With rapidly increasing number of sequenced genomes, constructing or updating ortholog relationship between all genomes requires lots of effort and computation time. In addition, elucidating ortholog relationships between distantly related genomes is challenging because of the lower sequence similarity. Therefore, an efficient ortholog detection method that can deal with large number of distantly related genomes is desired. RESULTS: An efficient ortholog detection pipeline DODO (DOmain based Detection of Orthologs) is created on the basis of domain architectures in this study. Supported by domain composition, which usually directly related with protein function, DODO could facilitate orthologs detection across distantly related genomes. DODO works in two main steps. Starting from domain information, it first assigns protein groups according to their domain architectures and further identifies orthologs within those groups with much reduced complexity. Here DODO is shown to detect orthologs between two genomes in considerably shorter period of time than traditional methods of reciprocal best hits and it is more significant when analyzed a large number of genomes. The output results of DODO are highly comparable with other known ortholog databases. CONCLUSIONS: DODO provides a new efficient pipeline for detection of orthologs in a large number of genomes. In addition, a database established with DODO is also easier to maintain and could be updated relatively effortlessly. The pipeline of DODO could be downloaded from http://140.109.42.19:16080/dodo_web/home.htm.


Assuntos
Biologia Computacional/métodos , Genoma/genética , Proteínas/química , Proteínas/genética , Algoritmos , Sequência de Aminoácidos , Animais , Bases de Dados Genéticas , Humanos , Camundongos , Anotação de Sequência Molecular/métodos , Dados de Sequência Molecular , Estrutura Terciária de Proteína , Alinhamento de Sequência
10.
BMC Bioinformatics ; 11 Suppl 7: S11, 2010 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-21106118

RESUMO

BACKGROUND: sRNAs, which belong to the non-coding RNA family and range from approximately 50 to 400 nucleotides, serve various important gene regulatory roles. Most are believed to be trans-regulating and function by being complementary to their target mRNAs in order to inhibiting translation by ribosome occlusion. Despite this understanding of their functionality, the global properties associated with regulation by sRNAs are not yet understood. Here we use topological analysis of sRNA targets in terms of protein-protein interaction and transcription-regulatory networks in Escherichia coli to shed light on the global correlation between sRNA regulation and cellular control networks. RESULTS: The analysis of sRNA targets in terms of their networks showed that some specific network properties could be identified. In protein-protein interaction network, sRNA targets tend to occupy more central positions (higher closeness centrality, p-val = 0.022) and more cliquish (larger clustering coefficient, p-val = 0.037). The targets of the same sRNA tend to form a network module (shorter characteristic path length, p-val = 0.015; larger density, p-val = 0.019; higher in-degree ratio, p-val = 0.009). Using the transcription-regulatory network, sRNA targets tend to be under multiple regulation (higher indegree, p-val = 0.013) and the targets usually are important to the transfer of regulatory signals (higher betweenness, p-val = 0.012). As was found for the protein-protein interaction network, the targets that are regulated by the same sRNA also tend to be closely knit within the transcription-regulatory network (larger density, p-val = 0.036), and inward interactions between them are greater than the outward interactions (higher in-degree ratio, p-val = 0.023). However, after incorporating information on predicted sRNAs and down-stream targets, the results are not as clear-cut, but the overall network modularity is still evident. CONCLUSIONS: Our results indicate that sRNA targeting tends to show a clustering pattern that is similar to the human microRNA regulation associated with protein-protein interaction network that was observed in a previous study. Namely, the sRNA targets show close interaction and forms a closely knit network module for both the protein-protein interaction and the transcription-regulatory networks. Thus, targets of the same sRNA work in a concerted way toward a specific goal. In addition, in the transcription-regulatory network, sRNA targets act as "multiplexor", accepting regulatory control from multiple sources and acting accordingly. Our results indicate that sRNA targeting shows different properties when compared to the proteins that form cellular networks.


Assuntos
Escherichia coli/genética , Escherichia coli/metabolismo , Regulação Bacteriana da Expressão Gênica , Redes Reguladoras de Genes , Pequeno RNA não Traduzido/genética , Pequeno RNA não Traduzido/metabolismo , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , RNA Bacteriano/genética , RNA Bacteriano/metabolismo , Proteínas Repressoras/genética , Proteínas Repressoras/metabolismo , Reprodutibilidade dos Testes
11.
Anticancer Res ; 39(11): 6317-6324, 2019 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-31704862

RESUMO

BACKGROUND/AIM: The aim of this study was to evaluate N-acetylgalactosamine-6-sulfatase (GALNS) as a new biomarker candidate for detecting lung cancer. Glycodelin or PAEP, the serum levels of which are known to be elevated in lung and other cancers, served as a benchmark for comparison. PATIENTS AND METHODS: A total of 170 serum samples from healthy controls and patients with pneumonia, lung cancer, breast cancer, colon cancer, liver cancer, and head and neck cancer were analyzed for the levels of GALNS and PAEP by ELISA. RESULTS: The median serum levels of GALNS and PAEP in all cancer types as well as pneumonia patients were significantly higher than those of the healthy controls. CONCLUSION: In addition to previously known cancers, the median serum levels of PAEP were also found to be higher in liver and head and neck cancer patients. GALNS and PAEP are promising general biomarkers for multiple cancers and deserve further evaluation.


Assuntos
Biomarcadores Tumorais/sangue , Condroitina Sulfatases/sangue , Glicodelina/sangue , Neoplasias Pulmonares/sangue , Área Sob a Curva , Benchmarking , Neoplasias da Mama/sangue , Estudos de Casos e Controles , Linhagem Celular Tumoral , Neoplasias do Colo/sangue , Ensaio de Imunoadsorção Enzimática , Feminino , Neoplasias de Cabeça e Pescoço/sangue , Humanos , Neoplasias Hepáticas/sangue , Pulmão/metabolismo , Neoplasias Pulmonares/diagnóstico , Masculino , Pneumonia/sangue
12.
Sci Rep ; 7(1): 10430, 2017 09 05.
Artigo em Inglês | MEDLINE | ID: mdl-28874813

RESUMO

ABSATRACT: Along with the constant improvement in high-throughput sequencing technology, an increasing number of transcriptome sequencing projects are carried out in organisms without decoded genome information and even on environmental biological samples. To study the biological functions of novel transcripts, the very first task is to identify their potential functions. We present a web-based annotation tool, FunctionAnnotator, which offers comprehensive annotations, including GO term assignment, enzyme annotation, domain/motif identification and predictions for subcellular localization. To accelerate the annotation process, we have optimized the computation processes and used parallel computing for all annotation steps. Moreover, FunctionAnnotator is designed to be versatile, and it generates a variety of useful outputs for facilitating other analyses. Here, we demonstrate how FunctionAnnotator can be helpful in annotating non-model organisms. We further illustrate that FunctionAnnotator can estimate the taxonomic composition of environmental samples and assist in the identification of novel proteins by combining RNA-Seq data with proteomics technology. In summary, FunctionAnnotator can efficiently annotate transcriptomes and greatly benefits studies focusing on non-model organisms or metatranscriptomes. FunctionAnnotator, a comprehensive annotation web-service tool, is freely available online at: http://fa.cgu.edu.tw/ . This new web-based annotator will shed light on field studies involving organisms without a reference genome.

14.
J Proteomics ; 91: 375-84, 2013 Oct 08.
Artigo em Inglês | MEDLINE | ID: mdl-23933159

RESUMO

Mass measurement and precursor mass assignment are independent processes in proteomic data acquisition. Due to misassignments to C-13 peak, or for other reasons, extensive precursor mass shifts (i.e., deviations of the measured from calculated precursor neutral masses) in LC-MS/MS data obtained with the high-accuracy LTQ-Orbitrap mass spectrometers have been reported in previous studies. Although computational methods for post-acquisition reassignment to monoisotopic mass have been developed to curate the MS/MS spectra prior to database search, a simpler method for estimating the fraction of spectra with precursor mass shift so as to determine whether the data require curation remains desirable. Here, we provide the evidence that an easy approach, which applies a large precursor tolerance (2.1Da or higher) in SEQUEST search against a forward and decoy protein sequence database and then filters the data with PeptideProphet peptide identification probability (p≥0.9), could detect most of the MS/MS spectra containing inaccurate precursor masses. Furthermore, through the implementation of artificial mass shifts on 4000 randomly selected MS/MS spectra, which originally had accurate precursor mass assigned by the mass spectrometers, we demonstrated that the accuracy of the precursor mass has almost negligible influence on the efficacy and fidelity of peptide identification. BIOLOGICAL SIGNIFICANCE: Integral precursor mass shift is a known problem and thus proteomic data should be handled and analyzed properly to avoid losing important protein identification and/or quantification information. A quick and easy approach for estimating the number of MS/MS spectra with inaccurate precursor mass assignments would be helpful for evaluating the performance of the instrument, determining whether the data requires curation prior to database search or should be searched with specific search parameter(s). Here we demonstrated most of the MS/MS spectra with inaccurate mass assignments (integral or non-integral changes) that could be easily identified by database search with large precursor tolerance windows.


Assuntos
Bases de Dados de Proteínas , Halobacterium salinarum/química , Proteômica , Espectrometria de Massas em Tandem , Proteínas de Bactérias/química , Isótopos de Carbono/química , Linhagem Celular Tumoral , Etiquetas de Sequências Expressas , Humanos , Peptídeos/química , Probabilidade , Proteoma , Reprodutibilidade dos Testes , Software
15.
Mol Cell Proteomics ; 5(6): 987-97, 2006 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-16497792

RESUMO

To better understand the extremely halophilic archaeon Halobacterium species NRC-1, we analyzed its soluble proteome by two-dimensional liquid chromatography coupled to electrospray ionization tandem mass spectrometry. A total of 888 unique proteins were identified with a ProteinProphet probability (P) between 0.9 and 1.0. To evaluate the biochemical activities of the organism, the proteomic data were subjected to a biological network analysis using our BMSorter software. This allowed us to examine the proteins expressed in different biomodules and study the interactions between pertinent biomodules. Interestingly an integrated analysis of the enzymes in the amino acid metabolism and citrate cycle networks suggested that up to eight amino acids may be converted to oxaloacetate, fumarate, or oxoglutarate in the citrate cycle for energy production. In addition, glutamate and aspartate may be interconverted from other amino acids or synthesized from citrate cycle intermediates to meet the high demand for the acidic amino acids that are required to build the highly acidic proteome of the organism. Thus this study demonstrated that proteome analysis can provide useful information and help systems analyses of organisms.


Assuntos
Proteínas Arqueais/análise , Halobacterium/química , Halobacterium/metabolismo , Proteoma/análise , Aminoácidos/metabolismo , Proteínas Arqueais/genética , Proteínas Arqueais/metabolismo , Transporte Biológico , Cromatografia Líquida , Biologia Computacional/métodos , Fenômenos Genéticos , Genoma Arqueal , Halobacterium/genética , Espectrometria de Massas , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA