Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Sci Rep ; 7(1): 10430, 2017 09 05.
Artigo em Inglês | MEDLINE | ID: mdl-28874813

RESUMO

ABSATRACT: Along with the constant improvement in high-throughput sequencing technology, an increasing number of transcriptome sequencing projects are carried out in organisms without decoded genome information and even on environmental biological samples. To study the biological functions of novel transcripts, the very first task is to identify their potential functions. We present a web-based annotation tool, FunctionAnnotator, which offers comprehensive annotations, including GO term assignment, enzyme annotation, domain/motif identification and predictions for subcellular localization. To accelerate the annotation process, we have optimized the computation processes and used parallel computing for all annotation steps. Moreover, FunctionAnnotator is designed to be versatile, and it generates a variety of useful outputs for facilitating other analyses. Here, we demonstrate how FunctionAnnotator can be helpful in annotating non-model organisms. We further illustrate that FunctionAnnotator can estimate the taxonomic composition of environmental samples and assist in the identification of novel proteins by combining RNA-Seq data with proteomics technology. In summary, FunctionAnnotator can efficiently annotate transcriptomes and greatly benefits studies focusing on non-model organisms or metatranscriptomes. FunctionAnnotator, a comprehensive annotation web-service tool, is freely available online at: http://fa.cgu.edu.tw/ . This new web-based annotator will shed light on field studies involving organisms without a reference genome.

2.
DNA Res ; 24(3): 327-332, 2017 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-28419256

RESUMO

Organelle genomes are widely thought to have arisen from reduction events involving cyanobacterial and archaeal genomes, in the case of chloroplasts, or α-proteobacterial genomes, in the case of mitochondria. Heterogeneity in base composition and codon preference has long been the subject of investigation of topics ranging from phylogenetic distortion to the design of overexpression cassettes for transgenic expression. From the overexpression point of view, it is critical to systematically analyze the codon usage patterns of the organelle genomes. In light of the importance of codon usage patterns in the development of hyper-expression organelle transgenics, we present ChloroMitoCU, the first-ever curated, web-based reference catalog of the codon usage patterns in organelle genomes. ChloroMitoCU contains the pre-compiled codon usage patterns of 328 chloroplast genomes (29,960 CDS) and 3,502 mitochondrial genomes (49,066 CDS), enabling genome-wide exploration and comparative analysis of codon usage patterns across species. ChloroMitoCU allows the phylogenetic comparison of codon usage patterns across organelle genomes, the prediction of codon usage patterns based on user-submitted transcripts or assembled organelle genes, and comparative analysis with the pre-compiled patterns across species of interest. ChloroMitoCU can increase our understanding of the biased patterns of codon usage in organelle genomes across multiple clades. ChloroMitoCU can be accessed at: http://chloromitocu.cgu.edu.tw/.


Assuntos
Códon/genética , Evolução Molecular , Genoma de Cloroplastos , Genoma Mitocondrial , Genômica/métodos , Cloroplastos/genética , Códon/análise , Eucariotos/genética , Mitocôndrias/genética , Software
3.
BMC Bioinformatics ; 17(Suppl 19): 513, 2016 Dec 22.
Artigo em Inglês | MEDLINE | ID: mdl-28155708

RESUMO

BACKGROUND: Next-generation sequencing promises the de novo genomic and transcriptomic analysis of samples of interests. However, there are only a few organisms having reference genomic sequences and even fewer having well-defined or curated annotations. For transcriptome studies focusing on organisms lacking proper reference genomes, the common strategy is de novo assembly followed by functional annotation. However, things become even more complicated when multiple transcriptomes are compared. RESULTS: Here, we propose a new analysis strategy and quantification methods for quantifying expression level which not only generate a virtual reference from sequencing data, but also provide comparisons between transcriptomes. First, all reads from the transcriptome datasets are pooled together for de novo assembly. The assembled contigs are searched against NCBI NR databases to find potential homolog sequences. Based on the searched result, a set of virtual transcripts are generated and served as a reference transcriptome. By using the same reference, normalized quantification values including RC (read counts), eRPKM (estimated RPKM) and eTPM (estimated TPM) can be obtained that are comparable across transcriptome datasets. In order to demonstrate the feasibility of our strategy, we implement it in the web service PARRoT. PARRoT stands for Pipeline for Analyzing RNA Reads of Transcriptomes. It analyzes gene expression profiles for two transcriptome sequencing datasets. For better understanding of the biological meaning from the comparison among transcriptomes, PARRoT further provides linkage between these virtual transcripts and their potential function through showing best hits in SwissProt, NR database, assigning GO terms. Our demo datasets showed that PARRoT can analyze two paired-end transcriptomic datasets of approximately 100 million reads within just three hours. CONCLUSIONS: In this study, we proposed and implemented a strategy to analyze transcriptomes from non-reference organisms which offers the opportunity to quantify and compare transcriptome profiles through a homolog based virtual transcriptome reference. By using the homolog based reference, our strategy effectively avoids the problems that may cause from inconsistencies among transcriptomes. This strategy will shed lights on the field of comparative genomics for non-model organism. We have implemented PARRoT as a web service which is freely available at http://parrot.cgu.edu.tw .


Assuntos
Cnidários/genética , Perfilação da Expressão Gênica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Modelos Biológicos , Análise de Sequência de RNA/métodos , Software , Transcriptoma , Animais , Genômica/métodos , Internet , Anotação de Sequência Molecular
4.
BMC Genomics ; 16: 648, 2015 Aug 28.
Artigo em Inglês | MEDLINE | ID: mdl-26315384

RESUMO

BACKGROUND: Whole genome sequence construction is becoming increasingly feasible because of advances in next generation sequencing (NGS), including increasing throughput and read length. By simply overlapping paired-end reads, we can obtain longer reads with higher accuracy, which can facilitate the assembly process. However, the influences of different library sizes and assembly methods on paired-end sequencing-based de novo assembly remain poorly understood. RESULTS: We used 250 bp Illumina Miseq paired-end reads of different library sizes generated from genomic DNA from Escherichia coli DH1 and Streptococcus parasanguinis FW213 to compare the assembly results of different library sizes and assembly approaches. Our data indicate that overlapping paired-end reads can increase read accuracy but sometimes cause insertion or deletions. Regarding genome assembly, merged reads only outcompete original paired-end reads when coverage depth is low, and larger libraries tend to yield better assembly results. These results imply that distance information is the most critical factor during assembly. Our results also indicate that when depth is sufficiently high, assembly from subsets can sometimes produce better results. CONCLUSIONS: In summary, this study provides systematic evaluations of de novo assembly from paired end sequencing data. Among the assembly strategies, we find that overlapping paired-end reads is not always beneficial for bacteria genome assembly and should be avoided or used with caution especially for genomes containing high fraction of repetitive sequences. Because increasing numbers of projects aim at bacteria genome sequencing, our study provides valuable suggestions for the field of genomic sequence construction.


Assuntos
Escherichia coli/genética , Genoma Bacteriano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Streptococcus/genética , Pareamento Incorreto de Bases/genética , Pareamento de Bases/genética , Mapeamento de Sequências Contíguas , Biblioteca Gênica , Genes Bacterianos , Mutação INDEL/genética , Padrões de Referência
5.
Hum Mutat ; 36(2): 167-74, 2015 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-25196204

RESUMO

Next-generation sequencing (NGS) technologies have revolutionized the field of genetics and are trending toward clinical diagnostics. Exome and targeted sequencing in a disease context represent a major NGS clinical application, considering its utility and cost-effectiveness. With the ongoing discovery of disease-associated genes, various gene panels have been launched for both basic research and diagnostic tests. However, the fundamental inconsistencies among the diverse annotation sources, software packages, and data formats have complicated the subsequent analysis. To manage disease-associated NGS data, we developed Vanno, a Web-based application for in-depth analysis and rapid evaluation of disease-causative genome sequence alterations. Vanno integrates information from biomedical databases, functional predictions from available evaluation models, and mutation landscapes from TCGA cancer types. A highly integrated framework that incorporates filtering, sorting, clustering, and visual analytic modules is provided to facilitate exploration of oncogenomics datasets at different levels, such as gene, variant, protein domain, or three-dimensional structure. Such design is crucial for the extraction of knowledge from sequence alterations and translating biological insights into clinical applications. Taken together, Vanno supports almost all disease-associated gene tests and exome sequencing panels designed for NGS, providing a complete solution for targeted and exome sequencing analysis. Vanno is freely available at http://cgts.cgu.edu.tw/vanno.


Assuntos
Software , Curadoria de Dados , Exoma , Genoma Humano , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Anotação de Sequência Molecular , Análise de Sequência de DNA
6.
Nucleic Acids Res ; 43(Database issue): D849-55, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25398898

RESUMO

Whole-exome sequencing, which centres on the protein coding regions of disease/cancer associated genes, represents the most cost-effective method to-date for deciphering the association between genetic alterations and diseases. Large-scale whole exome/genome sequencing projects have been launched by various institutions, such as NCI, Broad Institute and TCGA, to provide a comprehensive catalogue of coding variants in diverse tissue samples and cell lines. Further functional and clinical interrogation of these sequence variations must rely on extensive cross-platforms integration of sequencing information and a proteome database that explicitly and comprehensively archives the corresponding mutated peptide sequences. While such data resource is a critical for the mass spectrometry-based proteomic analysis of exomic variants, no database is currently available for the collection of mutant protein sequences that correspond to recent large-scale genomic data. To address this issue and serve as bridge to integrate genomic and proteomics datasets, CMPD (http://cgbc.cgu.edu.tw/cmpd) collected over 2 millions genetic alterations, which not only facilitates the confirmation and examination of potential cancer biomarkers but also provides an invaluable resource for translational medicine research and opportunities to identify mutated proteins encoded by mutated genes.


Assuntos
Bases de Dados de Proteínas , Proteínas Mutantes/genética , Proteínas de Neoplasias/genética , Neoplasias/genética , Proteoma/genética , Linhagem Celular Tumoral , Humanos , Internet , Mutação
7.
BMC Genomics ; 15: 539, 2014 Jun 30.
Artigo em Inglês | MEDLINE | ID: mdl-24974934

RESUMO

BACKGROUND: Chromatin is a dynamic but highly regulated structure. DNA-binding proteins such as transcription factors, epigenetic and chromatin modifiers are responsible for regulating specific gene expression pattern and may result in different phenotypes. To reveal the identity of the proteins associated with the specific region on DNA, chromatin immunoprecipitation (ChIP) is the most widely used technique. ChIP assay followed by next generation sequencing (ChIP-seq) or microarray (ChIP-chip) is often used to study patterns of protein-binding profiles in different cell types and in cancer samples on a genome-wide scale. However, only a limited number of bioinformatics tools are available for ChIP datasets analysis. RESULTS: We present ChIPseek, a web-based tool for ChIP data analysis providing summary statistics in graphs and offering several commonly demanded analyses. ChIPseek can provide statistical summary of the dataset including histogram of peak length distribution, histogram of distances to the nearest transcription start site (TSS), and pie chart (or bar chart) of genomic locations for users to have a comprehensive view on the dataset for further analysis. For examining the potential functions of peaks, ChIPseek provides peak annotation, visualization of peak genomic location, motif identification, sequence extraction, and comparison between datasets. Beyond that, ChIPseek also offers users the flexibility to filter peaks and re-analyze the filtered subset of peaks. ChIPseek supports 20 different genome assemblies for 12 model organisms including human, mouse, rat, worm, fly, frog, zebrafish, chicken, yeast, fission yeast, Arabidopsis, and rice. We use demo datasets to demonstrate the usage and intuitive user interface of ChIPseek. CONCLUSIONS: ChIPseek provides a user-friendly interface for biologists to analyze large-scale ChIP data without requiring any programing skills. All the results and figures produced by ChIPseek can be downloaded for further analysis. The analysis tools built into ChIPseek, especially the ones for selecting and examine a subset of peaks from ChIP data, provides invaluable helps for exploring the high through-put data from either ChIP-seq or ChIP-chip. ChIPseek is freely available at http://chipseek.cgu.edu.tw.


Assuntos
Imunoprecipitação da Cromatina , Sequenciamento de Nucleotídeos em Larga Escala , Software , Navegador , Animais , Biologia Computacional/métodos , Genômica/métodos , Humanos
8.
Parasitol Res ; 112(9): 3193-202, 2013 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-23828188

RESUMO

Angiostrongylus cantonensis is an important zoonotic nematode. It is the causative agent of eosinophilic meningitis and eosinophilic meningoencephalitis in humans. However, information of this parasite at the genomic level is very limited. In the present study, the transcriptomic profiles of the fifth-stage larvae (L5) of A. cantonensis were investigated by next-generation sequencing (NGS). In the NGS database established from the larvae isolated from the brain of Sprague-Dawley rats, 31,487 unique genes with a mean length of 617 nucleotides were assembled. These genes were found to have a 46.08% significant similarity to Caenorhabditis elegans by BLASTx. They were then compared with the expressed sequence tags of 18 other nematodes, and significant matches of 36.09-59.12% were found. Among these genes, 3,338 were found to participate in 124 Kyoto Encyclopedia of Genes and Genomes pathways. These pathways included 1,514 metabolisms, 846 genetic information processing, 358 environmental information processing, 264 cellular processes, and 91 organismal systems. Analysis of 30,816 sequences with the gene ontology database indicated that their annotations included 5,656 biological processes (3,364 cellular processes, 3,061 developmental processes, and 3,191 multicellular organismal processes), 7,218 molecular functions (4,597 binding and 3,084 catalytic activities), and 4,719 cellular components (4,459 cell parts and 4,466 cells). Moreover, stress-related genes (112 heat stress and 33 oxidation stress) and genes for proteases (159) were not uncommon. This study is the first NGS-based study to set up a transcriptomic database of A. cantonensis L5. The results provide new insights into the survival, development, and host-parasite interactions of this blood-feeding nematode.


Assuntos
Angiostrongylus cantonensis/genética , Meningoencefalite/parasitologia , Infecções por Strongylida/parasitologia , Transcriptoma/genética , Angiostrongylus cantonensis/citologia , Angiostrongylus cantonensis/crescimento & desenvolvimento , Animais , DNA Complementar/química , DNA Complementar/genética , DNA de Helmintos/genética , Etiquetas de Sequências Expressas , Feminino , Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Interações Hospedeiro-Parasita , Humanos , Larva , Masculino , Anotação de Sequência Molecular , RNA de Helmintos/genética , Ratos , Ratos Sprague-Dawley , Análise de Sequência de DNA , Homologia de Sequência , Zoonoses
9.
Hum Mutat ; 34(10): 1340-6, 2013 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-23893859

RESUMO

Targeted sequencing using next-generation sequencing technologies is currently being rapidly adopted for clinical sequencing and cancer marker tests. However, no existing bioinformatics tool is available for the analysis and visualization of multiple targeted sequencing datasets. In the present study, we use cancer panel targeted sequencing datasets generated by the Life Technologies Ion Personal Genome Machine Sequencer as an example to illustrate how to develop an automated pipeline for the comparative analyses of multiple datasets. Cancer Panel Analysis Pipeline (CPAP) uses standard output files from variant calling software to generate a distribution map of SNPs among all of the samples in a circular diagram generated by Circos. The diagram is hyperlinked to a dynamic HTML table that allows the users to identify target SNPs by using different filters. CPAP also integrates additional information about the identified SNPs by linking to an integrated SQL database compiled from SNP-related databases, including dbSNP, 1000 Genomes Project, COSMIC, and dbNSFP. CPAP only takes 17 min to complete a comparative analysis of 500 datasets. CPAP not only provides an automated platform for the analysis of multiple cancer panel datasets but can also serve as a model for any customized targeted sequencing project.


Assuntos
Biomarcadores Tumorais/genética , Biologia Computacional/métodos , Neoplasias/genética , Software , Bases de Dados Genéticas , Humanos , Navegador
10.
J Microbiol Immunol Infect ; 46(5): 366-73, 2013 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-22921107

RESUMO

BACKGROUND/PURPOSE(S): Trichomoniasis caused by Trichomonas vaginalis is the most common non-viral sexually transmitted infection. Morphological transformation from the trophozoite stage to the amoeboid or pseudocyst stage is crucial for T. vaginalis infection and survival. Protein phosphorylation is a key post-translational modification involved in the regulation of several biological processes in various prokaryotes and eukaryotes. More than 880 protein kinases have been identified in the T. vaginalis genome. However, little is known about the phosphorylation of specific proteins and the distribution of phosphorylated proteins in different stages of the morphological transformation of T. vaginalis. METHODS: To obtain a more comprehensive understanding of the T. vaginalis phosphoproteome, we analyzed phosphorylated proteins in the three morphological stages using titanium dioxide combined with LC-MS/MS. RESULTS: A total of 93 phosphopeptides originating from 82 unique proteins were identified. Among these proteins, 21 were detected in all stages, 29 were identified in two different stages, and 32 were stage specific. CONCLUSION: Identification of stage-specific phosphorylated proteins indicates that phosphorylation of these proteins may play a key role in the morphological transformation of T. vaginalis.


Assuntos
Fosfoproteínas/análise , Proteoma/análise , Proteínas de Protozoários/análise , Trichomonas vaginalis/química , Cromatografia Líquida , Perfilação da Expressão Gênica , Processamento de Proteína Pós-Traducional , Espectrometria de Massas em Tandem , Trichomonas vaginalis/crescimento & desenvolvimento
11.
BMC Genomics ; 13 Suppl 7: S9, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23281853

RESUMO

BACKGROUND: Recent developments in high-throughput sequencing (HTS) technologies have made it feasible to sequence the complete transcriptomes of non-model organisms or metatranscriptomes from environmental samples. The challenge after generating hundreds of millions of sequences is to annotate these transcripts and classify the transcripts based on their putative functions. Because many biological scientists lack the knowledge to install Linux-based software packages or maintain databases used for transcript annotation, we developed an automatic annotation tool with an easy-to-use interface. METHODS: To elucidate the potential functions of gene transcripts, we integrated well-established annotation tools: Blast2GO, PRIAM and RPS BLAST in a web-based service, FastAnnotator, which can assign Gene Ontology (GO) terms, Enzyme Commission numbers (EC numbers) and functional domains to query sequences. RESULTS: Using six transcriptome sequence datasets as examples, we demonstrated the ability of FastAnnotator to assign functional annotations. FastAnnotator annotated 88.1% and 81.3% of the transcripts from the well-studied organisms Caenorhabditis elegans and Streptococcus parasanguinis, respectively. Furthermore, FastAnnotator annotated 62.9%, 20.4%, 53.1% and 42.0% of the sequences from the transcriptomes of sweet potato, clam, amoeba, and Trichomonas vaginalis, respectively, which lack reference genomes. We demonstrated that FastAnnotator can complete the annotation process in a reasonable amount of time and is suitable for the annotation of transcriptomes from model organisms or organisms for which annotated reference genomes are not avaiable. CONCLUSIONS: The sequencing process no longer represents the bottleneck in the study of genomics, and automatic annotation tools have become invaluable as the annotation procedure has become the limiting step. We present FastAnnotator, which was an automated annotation web tool designed to efficiently annotate sequences with their gene functions, enzyme functions or domains. FastAnnotator is useful in transcriptome studies and especially for those focusing on non-model organisms or metatranscriptomes. FastAnnotator does not require local installation and is freely available at http://fastannotator.cgu.edu.tw.


Assuntos
Software , Transcriptoma/genética , Animais , Sequência de Bases , Caenorhabditis elegans/genética , Bases de Dados Genéticas , Genoma , Genoma Bacteriano , Internet , Streptococcus/genética , Interface Usuário-Computador
12.
Nucleic Acids Res ; 34(Web Server issue): W95-8, 2006 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-16845117

RESUMO

The large number of experimentally determined protein 3D structures is a rich resource for studying protein function and evolution, and protein structure comparison (PSC) is a key method for such studies. When comparing two protein structures, almost all currently available PSC servers report a single and sequential (i.e. topological) alignment, whereas the existence of good alternative alignments, including those involving permutations (i.e. non-sequential or non-topological alignments), is well known. We have recently developed a novel PSC method that can detect alternative alignments of statistical significance (alignment similarity P-value <10(-5)), including structural permutations at all levels of complexity. OPAAS, the server of this PSC method freely accessible at our website (http://opaas.ibms.sinica.edu.tw), provides an easy-to-read hierarchical layout of output to display detailed information on all of the significant alternative alignments detected. Because these alternative alignments can offer a more complete picture on the structural, evolutionary and functional relationship between two proteins, OPAAS can be used in structural bioinformatics research to gain additional insight that is not readily provided by existing PSC servers.


Assuntos
Estrutura Secundária de Proteína , Software , Homologia Estrutural de Proteína , Internet , Interface Usuário-Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA