Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
BMC Genomics ; 21(Suppl 3): 163, 2020 Apr 02.
Artículo en Inglés | MEDLINE | ID: mdl-32241255

RESUMEN

BACKGROUND: DNA methylation is a crucial epigenomic mechanism in various biological processes. Using whole-genome bisulfite sequencing (WGBS) technology, methylated cytosine sites can be revealed at the single nucleotide level. However, the WGBS data analysis process is usually complicated and challenging. RESULTS: To alleviate the associated difficulties, we integrated the WGBS data processing steps and downstream analysis into a two-phase approach. First, we set up the required tools in Galaxy and developed workflows to calculate the methylation level from raw WGBS data and generate a methylation status summary, the mtable. This computation environment is wrapped into the Docker container image DocMethyl, which allows users to rapidly deploy an executable environment without tedious software installation and library dependency problems. Next, the mtable files were uploaded to the web server EpiMOLAS_web to link with the gene annotation databases that enable rapid data retrieval and analyses. CONCLUSION: To our knowledge, the EpiMOLAS framework, consisting of DocMethyl and EpiMOLAS_web, is the first approach to include containerization technology and a web-based system for WGBS data analysis from raw data processing to downstream analysis. EpiMOLAS will help users cope with their WGBS data and also conduct reproducible analyses of publicly available data, thereby gaining insights into the mechanisms underlying complex biological phenomenon. The Galaxy Docker image DocMethyl is available at https://hub.docker.com/r/lsbnb/docmethyl/. EpiMOLAS_web is publicly accessible at http://symbiosis.iis.sinica.edu.tw/epimolas/.


Asunto(s)
Biología Computacional/métodos , Metilación de ADN/genética , Genoma Humano/genética , Secuenciación Completa del Genoma/métodos , Islas de CpG/genética , Humanos , Internet , Programas Informáticos
2.
BMC Genomics ; 19(Suppl 9): 238, 2019 Apr 18.
Artículo en Inglés | MEDLINE | ID: mdl-30999844

RESUMEN

BACKGROUND: With the rapid increase in genome sequencing projects for non-model organisms, numerous genome assemblies are currently in progress or available as drafts, but not made available as satisfactory, usable genomes. Data quality assessment of genome assemblies is gaining importance not only for people who perform the assembly/re-assembly processes, but also for those who attempt to use assemblies as maps in downstream analyses. Recent studies of the quality control, quality evaluation/ assessment of genome assemblies have focused on either quality control of reads before assemblies or evaluation of the assemblies with respect to their contiguity and correctness. However, correctness assessment depends on a reference and is not applicable for de novo assembly projects. Hence, development of methods providing both post-assembly and pre-assembly quality assessment reports for examining the quality/correctness of de novo assemblies and the input reads is worth studying. RESULTS: We present SQUAT, an efficient tool for both pre-assembly and post-assembly quality assessment of de novo genome assemblies. The pre-assembly module of SQUAT computes quality statistics of reads and presents the analysis in a well-designed interface to visualize the distribution of high- and poor-quality reads in a portable HTML report. The post-assembly module of SQUAT provides read mapping analytics in an HTML format. We categorized reads into several groups including uniquely mapped reads, multiply mapped, unmapped reads; for uniquely mapped reads, we further categorized them into perfectly matched, with substitutions, containing clips, and the others. We carefully defined the poorly mapped (PM) reads into several groups to prevent the underestimation of unmapped reads; indeed, a high PM% would be a sign of a poor assembly that requires researchers' attention for further examination or improvements before using the assembly. Finally, we evaluate SQUAT with six datasets, including the genome assemblies for eel, worm, mushroom, and three bacteria. The results show that SQUAT reports provide useful information with details for assessing the quality of assemblies and reads. AVAILABILITY: The SQUAT software with links to both its docker image and the on-line manual is freely available at https://github.com/luke831215/SQUAT .


Asunto(s)
Exactitud de los Datos , Genoma , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Agaricales/genética , Animales , Caenorhabditis elegans/genética , Mapeo Cromosómico , Electrophorus/genética , Control de Calidad
3.
BMC Genomics ; 16 Suppl 12: S9, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26678408

RESUMEN

BACKGROUND: Recent progress in next-generation sequencing technology has afforded several improvements such as ultra-high throughput at low cost, very high read quality, and substantially increased sequencing depth. State-of-the-art high-throughput sequencers, such as the Illumina MiSeq system, can generate ~15 Gbp sequencing data per run, with >80% bases above Q30 and a sequencing depth of up to several 1000x for small genomes. Illumina HiSeq 2500 is capable of generating up to 1 Tbp per run, with >80% bases above Q30 and often >100x sequencing depth for large genomes. To speed up otherwise time-consuming genome assembly and/or to obtain a skeleton of the assembly quickly for scaffolding or progressive assembly, methods for noise removal and reduction of redundancy in the original data, with almost equal or better assembly results, are worth studying. RESULTS: We developed two subset selection methods for single-end reads and a method for paired-end reads based on base quality scores and other read analytic tools using the MapReduce framework. We proposed two strategies to select reads: MinimalQ and ProductQ. MinimalQ selects reads with minimal base-quality above a threshold. ProductQ selects reads with probability of no incorrect base above a threshold. In the single-end experiments, we used Escherichia coli and Bacillus cereus datasets of MiSeq, Velvet assembler for genome assembly, and GAGE benchmark tools for result evaluation. In the paired-end experiments, we used the giant grouper (Epinephelus lanceolatus) dataset of HiSeq, ALLPATHS-LG genome assembler, and QUAST quality assessment tool for comparing genome assemblies of the original set and the subset. The results show that subset selection not only can speed up the genome assembly but also can produce substantially longer scaffolds. AVAILABILITY: The software is freely available at https://github.com/moneycat/QReadSelector.


Asunto(s)
Biología Computacional/métodos , Mapeo Contig/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Animales , Bacillus cereus/genética , Escherichia coli/genética , Tamaño del Genoma , Secuenciación de Nucleótidos de Alto Rendimiento/instrumentación , Perciformes/genética , Análisis de Secuencia de ADN/instrumentación , Programas Informáticos
4.
PLoS One ; 9(6): e98146, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24897343

RESUMEN

BACKGROUND: Explosive growth of next-generation sequencing data has resulted in ultra-large-scale data sets and ensuing computational problems. Cloud computing provides an on-demand and scalable environment for large-scale data analysis. Using a MapReduce framework, data and workload can be distributed via a network to computers in the cloud to substantially reduce computational latency. Hadoop/MapReduce has been successfully adopted in bioinformatics for genome assembly, mapping reads to genomes, and finding single nucleotide polymorphisms. Major cloud providers offer Hadoop cloud services to their users. However, it remains technically challenging to deploy a Hadoop cloud for those who prefer to run MapReduce programs in a cluster without built-in Hadoop/MapReduce. RESULTS: We present CloudDOE, a platform-independent software package implemented in Java. CloudDOE encapsulates technical details behind a user-friendly graphical interface, thus liberating scientists from having to perform complicated operational procedures. Users are guided through the user interface to deploy a Hadoop cloud within in-house computing environments and to run applications specifically targeted for bioinformatics, including CloudBurst, CloudBrush, and CloudRS. One may also use CloudDOE on top of a public cloud. CloudDOE consists of three wizards, i.e., Deploy, Operate, and Extend wizards. Deploy wizard is designed to aid the system administrator to deploy a Hadoop cloud. It installs Java runtime environment version 1.6 and Hadoop version 0.20.203, and initiates the service automatically. Operate wizard allows the user to run a MapReduce application on the dashboard list. To extend the dashboard list, the administrator may install a new MapReduce application using Extend wizard. CONCLUSIONS: CloudDOE is a user-friendly tool for deploying a Hadoop cloud. Its smart wizards substantially reduce the complexity and costs of deployment, execution, enhancement, and management. Interested users may collaborate to improve the source code of CloudDOE to further incorporate more MapReduce bioinformatics tools into CloudDOE and support next-generation big data open source tools, e.g., Hadoop BigTop and Spark. AVAILABILITY: CloudDOE is distributed under Apache License 2.0 and is freely available at http://clouddoe.iis.sinica.edu.tw/.


Asunto(s)
Biología Computacional/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Algoritmos
5.
Dev Comp Immunol ; 38(1): 78-87, 2012 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-22564858

RESUMEN

The members of the inhibitor of apoptosis protein (IAP) family are involved in the regulation of diverse cellular processes, including apoptosis, signal transduction and mitosis. Here, we report the cloning and characterization of three IAP genes from Pacific white shrimp Litopenaeus vannamei: LvIAP1, LvIAP2 and LvSurvivin. LvIAP1, the orthologue of Penaeus monodon IAP (PmIAP), consists of three BIR domains and one RING domain; LvIAP2 consists of two BIR domains and LvSurvivin has only one BIR domain. Expression profiling by absolute quantitative real-time RT-PCR revealed that of the three IAP genes, LvIAP1 had the highest expression levels in almost all examined tissues and LvSurvivin had the lowest expression levels. Furthermore, among the examined tissues, the lymphoid organs most strongly expressed all three genes. When LvIAP1 expression was silenced by injection of its corresponding dsRNA, the shrimp died within 48h after injection, whereas injection of the other two dsRNAs did not cause shrimp death. In LvIAP1-silenced shrimp, the number of circulating haemocytes decreased dramatically because of extensive apoptosis. This suggested that LvIAP1 is central to the regulation of shrimp haemocyte apoptosis.


Asunto(s)
Proteínas de Artrópodos/fisiología , Proteínas Inhibidoras de la Apoptosis/fisiología , Penaeidae/fisiología , Animales , Apoptosis , Proteínas de Artrópodos/química , Proteínas de Artrópodos/genética , Clonación Molecular , Proteínas Inhibidoras de la Apoptosis/química , Proteínas Inhibidoras de la Apoptosis/genética , Estructura Terciaria de Proteína , ARN Bicatenario/metabolismo , Análisis de Secuencia de ADN
6.
BMC Genomics ; 13 Suppl 7: S28, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-23282094

RESUMEN

BACKGROUND: State-of-the-art high-throughput sequencers, e.g., the Illumina HiSeq series, generate sequencing reads that are longer than 150 bp up to a total of 600 Gbp of data per run. The high-throughput sequencers generate lengthier reads with greater sequencing depth than those generated by previous technologies. Two major challenges exist in using the high-throughput technology for de novo assembly of genomes. First, the amount of physical memory may be insufficient to store the data structure of the assembly algorithm, even for high-end multicore processors. Moreover, the graph-theoretical model used to capture intersection relationships of the reads may contain structural defects that are not well managed by existing assembly algorithms. RESULTS: We developed a distributed genome assembler based on string graphs and MapReduce framework, known as the CloudBrush. The assembler includes a novel edge-adjustment algorithm to detect structural defects by examining the neighboring reads of a specific read for sequencing errors and adjusting the edges of the string graph, if necessary. CloudBrush is evaluated against GAGE benchmarks to compare its assembly quality with the other assemblers. The results show that our assemblies have a moderate N50, a low misassembly rate of misjoins, and indels of > 5 bp. In addition, we have introduced two measures, known as precision and recall, to address the issues of faithfully aligned contigs to target genomes. Compared with the assembly tools used in the GAGE benchmarks, CloudBrush is shown to produce contigs with high precision and recall. We also verified the effectiveness of the edge-adjustment algorithm using simulated datasets and ran CloudBrush on a nematode dataset using a commercial cloud. CloudBrush assembler is available at https://github.com/ice91/CloudBrush.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Programas Informáticos , Algoritmos , Bases de Datos Factuales , Almacenamiento y Recuperación de la Información , Internet , Interfaz Usuario-Computador
7.
ISRN Bioinform ; 2012: 139842, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-25969743

RESUMEN

In this postgenomic era, a huge volume of information derived from expressed sequence tags (ESTs) has been constructed for functional description of gene expression profiles. Comparative studies have become more and more important to researchers of biology. In order to facilitate these comparative studies, we have constructed a user-friendly EST annotation pipeline with comparison tools on an integrated EST service website, Bio301. Bio301 includes regular EST preprocessing, BLAST similarity search, gene ontology (GO) annotation, statistics reporting, a graphical GO browsing interface, and microarray probe selection tools. In addition, Bio301 is equipped with statistical library comparison functions using multiple EST libraries based on GO annotations for mining meaningful biological information.

8.
ISRN Bioinform ; 2012: 816402, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-25969752

RESUMEN

Background. The emergence of next-generation sequencing platform gives rise to a new generation of assembly algorithms. Compared with the Sanger sequencing data, the next-generation sequence data present shorter reads, higher coverage depth, and different error profiles. These features bring new challenging issues for de novo transcriptome assembly. Methodology. To explore the influence of these features on assembly algorithms, we studied the relationship between read overlap size, coverage depth, and error rate using simulated data. According to the relationship, we propose a de novo transcriptome assembly procedure, called Euler-mix, and demonstrate its performance on a real transcriptome dataset of mice. The simulation tool and evaluation tool are freely available as open source. Significance. Euler-mix is a straightforward pipeline; it focuses on dealing with the variation of coverage depth of short reads dataset. The experiment result showed that Euler-mix improves the performance of de novo transcriptome assembly.

9.
Mar Biotechnol (NY) ; 13(4): 608-21, 2011 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-20401624

RESUMEN

By economic value, shrimp is currently the most important seafood commodity worldwide, and these animals are often the subject of scientific research in shrimp farming countries. High throughput methods, such as expressed sequence tags (ESTs), were originally developed to study human genomics, but they are now available for studying other important organisms, including shrimp. ESTs are short sequences generated by sequencing randomly selected cDNA clones from a cDNA library. This is currently the most efficient and powerful method for providing transcriptomic data for organisms with an uncharacterized genome. This review will summarize the sixteen major shrimp EST studies that have been conducted to date. In addition, we analyzed the EST data downloaded from NCBI dbEST for the four major penaeid shrimp species and constructed a database to host all of these EST data as well as our own analysis results. This database provides the shrimp aquaculture research community with an outline of the shrimp transcriptome as well as a tool for shrimp gene identification.


Asunto(s)
Bases de Datos Genéticas , Etiquetas de Secuencia Expresada , Regulación de la Expresión Génica/genética , Inmunidad Innata/genética , Penaeidae/genética , Transcriptoma/genética , Animales , Reproducción/genética , Procesos de Determinación del Sexo/genética , Especificidad de la Especie
10.
Bioinformatics ; 22(17): 2180-2, 2006 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-16820423

RESUMEN

UNLABELLED: In this article, we combined EST information from the UniGene database and orthologous relationships from the Ensembl database to construct a ZooDDD database. The primary function of ZooDDD is to mine evolutionary conserved, highly expressed, tissue-specific orthologues in model animals. The candidate genes of interest derived from the ZooDDD database will provide biologists with a good step for comparing the expression, functions and evolution of animal genomes. AVAILABILITY: http://bio301.iis.sinica.edu.tw/~ZooDDDNew/main.php.


Asunto(s)
Mapeo Cromosómico/métodos , Bases de Datos Genéticas , Evolución Molecular , Almacenamiento y Recuperación de la Información/métodos , Programas Informáticos , Especificidad de la Especie , Interfaz Usuario-Computador , Secuencia de Bases , Gráficos por Computador , Secuencia Conservada/genética , Etiquetas de Secuencia Expresada , Genoma/genética , Datos de Secuencia Molecular , Homología de Secuencia de Ácido Nucleico , Procesamiento de Señales Asistido por Computador
11.
Bioinformatics ; 20(17): 3156-65, 2004 Nov 22.
Artículo en Inglés | MEDLINE | ID: mdl-15217808

RESUMEN

MOTIVATION: Synteny mapping, or detecting regions that are orthologous between two genomes, is a key step in studies of comparative genomics. For completely sequenced genomes, this is increasingly accomplished by whole-genome sequence alignment. However, such methods are computationally expensive, especially for large genomes, and require rather complicated post-processing procedures to filter out non-orthologous sequence matches. RESULTS: We have developed a novel method that does not require sequence alignment for synteny mapping of two large genomes, such as the human and mouse. In this method, the occurrence spectra of genome-wide unique 16mer sequences present in both the human and mouse genome are used to directly detect orthologous genomic segments. Being sequence alignment-free, the method is very fast and able to map the two mammalian genomes in one day of computing time on a single Pentium IV personal computer. The resulting human-mouse synteny map was shown to be in excellent agreement with those produced by the Mouse Genome Sequencing Consortium (MGSC) and by the Ensembl team; furthermore, the syntenic relationship of segments found only by our method was supported by BLASTZ sequence alignment.


Asunto(s)
Algoritmos , Mapeo Cromosómico/métodos , Cromosomas Humanos Par 16/genética , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Interfaz Usuario-Computador , Animales , Secuencia Conservada , Evolución Molecular , Genoma Humano , Islas Genómicas/genética , Humanos , Ratones , Especificidad de la Especie
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...