Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 89
Filtrar
Más filtros

Bases de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Cell ; 184(4): 1047-1063.e23, 2021 02 18.
Artículo en Inglés | MEDLINE | ID: mdl-33539780

RESUMEN

DNA has not been utilized to record temporal information, although DNA has been used to record biological information and to compute mathematical problems. Here, we found that indel generation by Cas9 and guide RNA can occur at steady rates, in contrast to typical dynamic biological reactions, and the accumulated indel frequency can be a function of time. By measuring indel frequencies, we developed a method for recording and measuring absolute time periods over hours to weeks in mammalian cells. These time-recordings were conducted in several cell types, with different promoters and delivery vectors for Cas9, and in both cultured cells and cells of living mice. As applications, we recorded the duration of chemical exposure and the lengths of elapsed time since the onset of biological events (e.g., heat exposure and inflammation). We propose that our systems could serve as synthetic "DNA clocks."


Asunto(s)
Proteína 9 Asociada a CRISPR/metabolismo , Animales , Secuencia de Bases , Microambiente Celular , Simulación por Computador , Células HEK293 , Semivida , Humanos , Mutación INDEL/genética , Inflamación/patología , Integrasas/metabolismo , Masculino , Ratones Desnudos , Regiones Promotoras Genéticas/genética , ARN Guía de Kinetoplastida/genética , Reproducibilidad de los Resultados , Factores de Tiempo
2.
Bioinformatics ; 38(3): 671-677, 2022 01 12.
Artículo en Inglés | MEDLINE | ID: mdl-34677573

RESUMEN

MOTIVATION: MicroRNAs (miRNAs) play pivotal roles in gene expression regulation by binding to target sites of messenger RNAs (mRNAs). While identifying functional targets of miRNAs is of utmost importance, their prediction remains a great challenge. Previous computational algorithms have major limitations. They use conservative candidate target site (CTS) selection criteria mainly focusing on canonical site types, rely on laborious and time-consuming manual feature extraction, and do not fully capitalize on the information underlying miRNA-CTS interactions. RESULTS: In this article, we introduce TargetNet, a novel deep learning-based algorithm for functional miRNA target prediction. To address the limitations of previous approaches, TargetNet has three key components: (i) relaxed CTS selection criteria accommodating irregularities in the seed region, (ii) a novel miRNA-CTS sequence encoding scheme incorporating extended seed region alignments and (iii) a deep residual network-based prediction model. The proposed model was trained with miRNA-CTS pair datasets and evaluated with miRNA-mRNA pair datasets. TargetNet advances the previous state-of-the-art algorithms used in functional miRNA target classification. Furthermore, it demonstrates great potential for distinguishing high-functional miRNA targets. AVAILABILITY AND IMPLEMENTATION: The codes and pre-trained models are available at https://github.com/mswzeus/TargetNet.


Asunto(s)
MicroARNs , MicroARNs/genética , MicroARNs/metabolismo , Redes Neurales de la Computación , Algoritmos , ARN Mensajero/genética , Regulación de la Expresión Génica , Biología Computacional
3.
Bioinformatics ; 37(11): 1562-1570, 2021 Jul 12.
Artículo en Inglés | MEDLINE | ID: mdl-29474530

RESUMEN

MOTIVATION: Metagenomic sequencing has become a crucial tool for obtaining a gene catalogue of operational taxonomic units (OTUs) in a microbial community. A typical metagenomic sequencing produces a large amount of data (often in the order of terabytes or more), and computational tools are indispensable for efficient processing. In particular, error correction in metagenomics is crucial for accurate and robust genetic cataloging of microbial communities. However, many existing error-correction tools take a prohibitively long time and often bottleneck the whole analysis pipeline. RESULTS: To overcome this computational hurdle, we analyzed and exploited the data-level parallelism that exists in the error-correction procedure and proposed a tool named MUGAN that exploits both multi-core central processing units and multiple graphics processing units for co-processing. According to the experimental results, our approach reduced not only the time demand for denoising amplicons from approximately 59 h to only 46 min, but also the overestimation of the number of OTUs, estimating 6.7 times less species-level OTUs than the baseline. In addition, our approach provides web-based intuitive visualization of results. Given its efficiency and convenience, we anticipate that our approach would greatly facilitate denoising efforts in metagenomics studies. AVAILABILITY AND IMPLEMENTATION: http://data.snu.ac.kr/pub/mugan. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

4.
Methods ; 179: 65-72, 2020 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-32445695

RESUMEN

Drug metabolism is determined by the biochemical and physiological properties of the drug molecule. To improve the performance of a drug property prediction model, it is important to extract complex molecular dynamics from limited data. Recent machine learning or deep learning based models have employed the atom- and bond-type information, as well as the structural information to predict drug properties. However, many of these methods can be used only for the graph representations. Message passing neural networks (MPNNs) (Gilmer et al., 2017) is a framework used to learn both local and global features from irregularly formed data, and is invariant to permutations. This network performs an iterative message passing (MP) operation on each object and its neighbors, and obtain the final output from all messages regardless of their order. In this study, we applied the MP-based attention network (Nikolentzos et al., 2019) originally developed for text learning to perform chemical classification tasks. Before training, we tokenized the characters, and obtained embeddings of each molecular sequence. We conducted various experiments to maximize the predictivity of the model. We trained and evaluated our model using various chemical classification benchmark tasks. Our results are comparable to previous state-of-the-art and baseline models or outperform. To the best of our knowledge, this is the first attempt to learn chemical strings using an MP-based algorithm. We will extend our work to more complex tasks such as regression or generation tasks in the future.


Asunto(s)
Quimioinformática/métodos , Química Farmacéutica/métodos , Aprendizaje Profundo , Farmacología Clínica/métodos , Predicción/métodos , Humanos
5.
BMC Bioinformatics ; 20(1): 521, 2019 Oct 26.
Artículo en Inglés | MEDLINE | ID: mdl-31655545

RESUMEN

BACKGROUND: Quantitative structure-activity relationship (QSAR) is a computational modeling method for revealing relationships between structural properties of chemical compounds and biological activities. QSAR modeling is essential for drug discovery, but it has many constraints. Ensemble-based machine learning approaches have been used to overcome constraints and obtain reliable predictions. Ensemble learning builds a set of diversified models and combines them. However, the most prevalent approach random forest and other ensemble approaches in QSAR prediction limit their model diversity to a single subject. RESULTS: The proposed ensemble method consistently outperformed thirteen individual models on 19 bioassay datasets and demonstrated superiority over other ensemble approaches that are limited to a single subject. The comprehensive ensemble method is publicly available at http://data.snu.ac.kr/QSAR/ . CONCLUSIONS: We propose a comprehensive ensemble method that builds multi-subject diversified models and combines them through second-level meta-learning. In addition, we propose an end-to-end neural network-based individual classifier that can automatically extract sequential features from a simplified molecular-input line-entry system (SMILES). The proposed individual models did not show impressive results as a single model, but it was considered the most important predictor when combined, according to the interpretation of the meta-learning.


Asunto(s)
Relación Estructura-Actividad Cuantitativa , Descubrimiento de Drogas/métodos , Aprendizaje Automático
6.
Brief Bioinform ; 18(5): 851-869, 2017 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-27473064

RESUMEN

In the era of big data, transformation of biomedical big data into valuable knowledge has been one of the most important challenges in bioinformatics. Deep learning has advanced rapidly since the early 2000s and now demonstrates state-of-the-art performance in various fields. Accordingly, application of deep learning in bioinformatics to gain insight from data has been emphasized in both academia and industry. Here, we review deep learning in bioinformatics, presenting examples of current research. To provide a useful and comprehensive perspective, we categorize research both by the bioinformatics domain (i.e. omics, biomedical imaging, biomedical signal processing) and deep learning architecture (i.e. deep neural networks, convolutional neural networks, recurrent neural networks, emergent architectures) and present brief descriptions of each study. Additionally, we discuss theoretical and practical issues of deep learning in bioinformatics and suggest future research directions. We believe that this review will provide valuable insights and serve as a starting point for researchers to apply deep learning approaches in their bioinformatics studies.


Asunto(s)
Aprendizaje Automático , Biología Computacional , Humanos , Redes Neurales de la Computación
7.
Bioinformatics ; 34(22): 3889-3897, 2018 11 15.
Artículo en Inglés | MEDLINE | ID: mdl-29850775

RESUMEN

Motivation: Long non-coding RNAs (lncRNAs) are important regulatory elements in biological processes. LncRNAs share similar sequence characteristics with messenger RNAs, but they play completely different roles, thus providing novel insights for biological studies. The development of next-generation sequencing has helped in the discovery of lncRNA transcripts. However, the experimental verification of numerous transcriptomes is time consuming and costly. To alleviate these issues, a computational approach is needed to distinguish lncRNAs from the transcriptomes. Results: We present a deep learning-based approach, lncRNAnet, to identify lncRNAs that incorporates recurrent neural networks for RNA sequence modeling and convolutional neural networks for detecting stop codons to obtain an open reading frame indicator. lncRNAnet performed clearly better than the other tools for sequences of short lengths, on which most lncRNAs are distributed. In addition, lncRNAnet successfully learned features and showed 7.83%, 5.76%, 5.30% and 3.78% improvements over the alternatives on a human test set in terms of specificity, accuracy, F1-score and area under the curve, respectively. Availability and implementation: Data and codes are available in http://data.snu.ac.kr/pub/lncRNAnet.


Asunto(s)
Aprendizaje Profundo , ARN Largo no Codificante/genética , Bases de Datos Genéticas , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Sistemas de Lectura Abierta
8.
Acta Derm Venereol ; 99(3): 284-290, 2019 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-30460369

RESUMEN

The aim of this study was to evaluate changes in the skin surface microbiome in patients with atopic dermatitis during treatment. The effect of narrowband ultraviolet B phototherapy was also studied to determine the influence of exposure to ultraviolet. A total of 18 patients with atopic dermatitis were included in the study. Patients were divided into 2 groups based on treatment: 1 group treated with narrowband ultraviolet B phototherapy and topical corticosteroid, and the other group treated with topical corticosteroid only. Skin swabs and high-throughput sequencing of 16S ribosomal RNA bacterial genes were performed at 3 time-points. The microbial diversity of lesional skin increased greatly after treatment. The proportion of Staphylococcus aureus showed a significant positive correlation with eczema severity. In conclusion, a drastic increase in microbial diversity and decrease in S. aureus proportion were observed with eczema treatment. Narrowband ultraviolet B treatment did not exert additive effects on eczema improvement; however, it appeared to reduce the recurrence of eczema.


Asunto(s)
Corticoesteroides/administración & dosificación , Dermatitis Atópica/terapia , Microbiota/efectos de los fármacos , Microbiota/efectos de la radiación , Piel/efectos de los fármacos , Piel/efectos de la radiación , Staphylococcus aureus/efectos de los fármacos , Staphylococcus aureus/efectos de la radiación , Terapia Ultravioleta , Administración Cutánea , Adolescente , Corticoesteroides/efectos adversos , Adulto , Niño , Preescolar , Dermatitis Atópica/diagnóstico , Dermatitis Atópica/microbiología , Femenino , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Masculino , Recurrencia , Ribotipificación , Seúl , Piel/microbiología , Staphylococcus aureus/genética , Factores de Tiempo , Resultado del Tratamiento , Terapia Ultravioleta/efectos adversos , Adulto Joven
9.
BMC Bioinformatics ; 19(1): 170, 2018 05 11.
Artículo en Inglés | MEDLINE | ID: mdl-29751737

RESUMEN

After publication of the original article [1], it has been found that the author affiliations have been accidentally left out in the PDF. The full affiliations can be found in this correction.

10.
BMC Bioinformatics ; 19(Suppl 1): 44, 2018 02 19.
Artículo en Inglés | MEDLINE | ID: mdl-29504903

RESUMEN

BACKGROUND: DNA damage causes aging, cancer, and other serious diseases. The comet assay can detect multiple types of DNA lesions with high sensitivity, and it has been widely applied. Although comet assay platforms have improved the limited throughput and reproducibility of traditional assays in recent times, analyzing large quantities of comet data often requires a tremendous human effort. To overcome this challenge, we proposed HiComet, a computational tool that can rapidly recognize and characterize a large number of comets, using little user intervention. RESULTS: We tested HiComet with real data from 35 high-throughput comet assay experiments, with over 700 comets in total. The proposed method provided unprecedented levels of performance as an automated comet recognition tool in terms of robustness (measured by precision and recall) and throughput. CONCLUSIONS: HiComet is an automated tool for high-throughput comet-assay analysis and could significantly facilitate characterization of individual comets by accelerating its most rate-limiting step. An online implementation of HiComet is freely available at https://github.com/taehoonlee/HiComet/ .


Asunto(s)
Ensayo Cometa/métodos , Daño del ADN , Programas Informáticos , Algoritmos , Procesamiento de Imagen Asistido por Computador
11.
Brief Bioinform ; 17(4): 713-27, 2016 07.
Artículo en Inglés | MEDLINE | ID: mdl-26330577

RESUMEN

A wide variety of large-scale data have been produced in bioinformatics. In response, the need for efficient handling of biomedical big data has been partly met by parallel computing. However, the time demand of many bioinformatics programs still remains high for large-scale practical uses because of factors that hinder acceleration by parallelization. Recently, new generations of storage devices have emerged, such as NAND flash-based solid-state drives (SSDs), and with the renewed interest in near-data processing, they are increasingly becoming acceleration methods that can accompany parallel processing. In certain cases, a simple drop-in replacement of hard disk drives by SSDs results in dramatic speedup. Despite the various advantages and continuous cost reduction of SSDs, there has been little review of SSD-based profiling and performance exploration of important but time-consuming bioinformatics programs. For an informative review, we perform in-depth profiling and analysis of 23 key bioinformatics programs using multiple types of devices. Based on the insight we obtain from this research, we further discuss issues related to design and optimize bioinformatics algorithms and pipelines to fully exploit SSDs. The programs we profile cover traditional and emerging areas of importance, such as alignment, assembly, mapping, expression analysis, variant calling and metagenomics. We explain how acceleration by parallelization can be combined with SSDs for improved performance and also how using SSDs can expedite important bioinformatics pipelines, such as variant calling by the Genome Analysis Toolkit and transcriptome analysis using RNA sequencing. We hope that this review can provide useful directions and tips to accompany future bioinformatics algorithm design procedures that properly consider new generations of powerful storage devices.


Asunto(s)
Biología Computacional , Algoritmos , Perfilación de la Expresión Génica
12.
Methods ; 129: 33-40, 2017 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-28323040

RESUMEN

A "miRNA sponge" is an artificial oligonucleotide-based miRNA inhibitor containing multiple binding sites for a specific miRNA. Each miRNA sponge can bind and sequester several miRNA copies, thereby decreasing the cellular levels of the target miRNA. In addition to developing artificial miRNA sponges, scientists have sought endogenous RNA transcripts and found that long non-coding RNAs, competing endogenous RNAs, pseudogenes, circular RNAs, and coding RNAs could act as miRNA sponges under precise conditions. Here we present a computational approach for the prediction of endogenous human miRNA sponge candidates targeting viral miRNAs derived from pathogenic human viruses. Viral miRNA binding sites were predicted using a newly-developed machine learning-based method, and candidate interactions between miRNAs and sponge RNAs were experimentally validated using luciferase reporter assay, western blot analysis, and flow cytometry. We found that BX649188.1 functions as a potential natural miRNA sponge against kshv-miR-K12-7-3p.


Asunto(s)
MicroARNs/genética , ARN Largo no Codificante/genética , ARN Viral/genética , ARN/genética , Sitios de Unión , Humanos , Aprendizaje Automático , MicroARNs/aislamiento & purificación , Oligonucleótidos/genética , ARN/aislamiento & purificación , ARN Circular , ARN Viral/aislamiento & purificación
13.
Methods ; 129: 50-59, 2017 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-28813689

RESUMEN

From May to July 2015, there was a nation-wide outbreak of Middle East respiratory syndrome (MERS) in Korea. MERS is caused by MERS-CoV, an enveloped, positive-sense, single-stranded RNA virus belonging to the family Coronaviridae. Despite expert opinions that the danger of MERS might be exaggerated, there was an overreaction by the public according to the Korean mass media, which led to a noticeable reduction in social and economic activities during the outbreak. To explain this phenomenon, we presumed that machine learning-based analysis of media outlets would be helpful and collected a number of Korean mass media articles and short-text comments produced during the 10-week outbreak. To process and analyze the collected data (over 86 million words in total) effectively, we created a methodology composed of machine-learning and information-theoretic approaches. Our proposal included techniques for extracting emotions from emoticons and Internet slang, which allowed us to significantly (approximately 73%) increase the number of emotion-bearing texts needed for robust sentiment analysis of social media. As a result, we discovered a plausible explanation for the public overreaction to MERS in terms of the interplay between the disease, mass media, and public emotions.


Asunto(s)
Infecciones por Coronavirus/epidemiología , Brotes de Enfermedades , Aprendizaje Automático , Medios de Comunicación de Masas , Infecciones por Coronavirus/virología , Humanos , Coronavirus del Síndrome Respiratorio de Oriente Medio/patogenicidad , República de Corea
14.
BMC Geriatr ; 18(1): 234, 2018 10 03.
Artículo en Inglés | MEDLINE | ID: mdl-30285646

RESUMEN

BACKGROUND: The conventional scores of the neuropsychological batteries are not fully optimized for diagnosing dementia despite their variety and abundance of information. To achieve low-cost high-accuracy diagnose performance for dementia using a neuropsychological battery, a novel framework is proposed using the response profiles of 2666 cognitively normal elderly individuals and 435 dementia patients who have participated in the Korean Longitudinal Study on Cognitive Aging and Dementia (KLOSCAD). METHODS: The key idea of the proposed framework is to propose a cost-effective and precise two-stage classification procedure that employed Mini Mental Status Examination (MMSE) as a screening test and the KLOSCAD Neuropsychological Assessment Battery as a diagnostic test using deep learning. In addition, an evaluation procedure of redundant variables is introduced to prevent performance degradation. A missing data imputation method is also presented to increase the robustness by recovering information loss. The proposed deep neural networks (DNNs) architecture for the classification is validated through rigorous evaluation in comparison with various classifiers. RESULTS: The k-nearest-neighbor imputation has been induced according to the proposed framework, and the proposed DNNs for two stage classification show the best accuracy compared to the other classifiers. Also, 49 redundant variables were removed, which improved diagnostic performance and suggested the potential of simplifying the assessment. Using this two-stage framework, we could get 8.06% higher diagnostic accuracy of dementia than MMSE alone and 64.13% less cost than KLOSCAD-N alone. CONCLUSION: The proposed framework could be applied to general dementia early detection programs to improve robustness, preciseness, and cost-effectiveness.


Asunto(s)
Análisis Costo-Beneficio/métodos , Aprendizaje Profundo/economía , Demencia/diagnóstico , Demencia/economía , Pruebas Neuropsicológicas , Anciano , Anciano de 80 o más Años , Enfermedad de Alzheimer/diagnóstico , Enfermedad de Alzheimer/economía , Enfermedad de Alzheimer/psicología , Cognición/fisiología , Envejecimiento Cognitivo/fisiología , Envejecimiento Cognitivo/psicología , Estudios de Cohortes , Demencia/psicología , Femenino , Humanos , Estudios Longitudinales , Masculino , Persona de Mediana Edad , República de Corea/epidemiología
15.
Proc Natl Acad Sci U S A ; 111(6): 2122-7, 2014 Feb 11.
Artículo en Inglés | MEDLINE | ID: mdl-24469816

RESUMEN

Self-assembling RNA molecules present compelling substrates for the rational interrogation and control of living systems. However, imperfect in silico models--even at the secondary structure level--hinder the design of new RNAs that function properly when synthesized. Here, we present a unique and potentially general approach to such empirical problems: the Massive Open Laboratory. The EteRNA project connects 37,000 enthusiasts to RNA design puzzles through an online interface. Uniquely, EteRNA participants not only manipulate simulated molecules but also control a remote experimental pipeline for high-throughput RNA synthesis and structure mapping. We show herein that the EteRNA community leveraged dozens of cycles of continuous wet laboratory feedback to learn strategies for solving in vitro RNA design problems on which automated methods fail. The top strategies--including several previously unrecognized negative design rules--were distilled by machine learning into an algorithm, EteRNABot. Over a rigorous 1-y testing phase, both the EteRNA community and EteRNABot significantly outperformed prior algorithms in a dozen RNA secondary structure design tests, including the creation of dendrimer-like structures and scaffolds for small molecule sensors. These results show that an online community can carry out large-scale experiments, hypothesis generation, and algorithm design to create practical advances in empirical science.


Asunto(s)
Laboratorios/organización & administración , ARN/química , Algoritmos , Conformación de Ácido Nucleico , Programas Informáticos , Interfaz Usuario-Computador
16.
Bioinformatics ; 31(17): 2808-15, 2015 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-25943472

RESUMEN

MOTIVATION: Capillary electrophoresis (CE) is a powerful approach for structural analysis of nucleic acids, with recent high-throughput variants enabling three-dimensional RNA modeling and the discovery of new rules for RNA structure design. Among the steps composing CE analysis, the process of finding each band in an electrophoretic trace and mapping it to a position in the nucleic acid sequence has required significant manual inspection and remains the most time-consuming and error-prone step. The few available tools seeking to automate this band annotation have achieved limited accuracy and have not taken advantage of information across dozens of profiles routinely acquired in high-throughput measurements. RESULTS: We present a dynamic-programming-based approach to automate band annotation for high-throughput capillary electrophoresis. The approach is uniquely able to define and optimize a robust target function that takes into account multiple CE profiles (sequencing ladders, different chemical probes, different mutants) collected for the RNA. Over a large benchmark of multi-profile datasets for biological RNAs and designed RNAs from the EteRNA project, the method outperforms prior tools (QuSHAPE and FAST) significantly in terms of accuracy compared with gold-standard manual annotations. The amount of computation required is reasonable at a few seconds per dataset. We also introduce an 'E-score' metric to automatically assess the reliability of the band annotation and show it to be practically useful in flagging uncertainties in band annotation for further inspection. AVAILABILITY AND IMPLEMENTATION: The implementation of the proposed algorithm is included in the HiTRACE software, freely available as an online server and for download at http://hitrace.stanford.edu. CONTACT: sryoon@snu.ac.kr or rhiju@stanford.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Electroforesis Capilar/métodos , ARN/química , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Humanos , Reproducibilidad de los Resultados
17.
Methods ; 69(3): 220-9, 2014 Oct 01.
Artículo en Inglés | MEDLINE | ID: mdl-25088780

RESUMEN

MicroRNAs (miRNAs) regulate the function of their target genes by down-regulating gene expression, participating in various biological processes. Since the discovery of the first miRNA, computational tools have been essential to predict targets of given miRNAs that can be biologically verified. The precise mechanism underlying miRNA-mRNA interaction has not yet been elucidated completely, and it is still difficult to predict miRNA targets computationally in a robust fashion, despite the large number of in silico prediction methodologies in existence. Because of this limitation, different target prediction tools often report different and (occasionally conflicting) sets of targets. Therefore, we propose a novel target prediction methodology called stacking-based miRNA interaction learner ensemble (SMILE) that employs the concept of stacked generalization (stacking), which is a type of ensemble learning that integrates the outcomes of individual prediction tools with the aim of surpassing the performance of the individual tools. We tested the proposed SMILE method on human miRNA-mRNA interaction data derived from public databases. In our experiments, SMILE improved the accuracy of the target prediction significantly in terms of the area under the receiver operating characteristic curve. Any new target prediction tool can easily be incorporated into the proposed methodology as a component learner, and we anticipate that SMILE will provide a flexible and effective framework for elucidating in vivo miRNA-mRNA interaction.


Asunto(s)
MicroARNs/genética , ARN Mensajero/genética , Análisis de Secuencia de ARN/métodos , Algoritmos , Simulación por Computador , Humanos
18.
Nucleic Acids Res ; 41(Web Server issue): W492-8, 2013 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-23761448

RESUMEN

To facilitate the analysis of large-scale high-throughput capillary electrophoresis data, we previously proposed a suite of efficient analysis software named HiTRACE (High Throughput Robust Analysis of Capillary Electrophoresis). HiTRACE has been used extensively for quantitating data from RNA and DNA structure mapping experiments, including mutate-and-map contact inference, chromatin footprinting, the Eterna RNA design project and other high-throughput applications. However, HiTRACE is based on a suite of command-line MATLAB scripts that requires nontrivial efforts to learn, use and extend. Here, we present HiTRACE-Web, an online version of HiTRACE that includes standard features previously available in the command-line version and additional features such as automated band annotation and flexible adjustment of annotations, all via a user-friendly environment. By making use of parallelization, the on-line workflow is also faster than software implementations available to most users on their local computers. Free access: http://hitrace.org.


Asunto(s)
ADN/química , Electroforesis Capilar/métodos , ARN/química , Programas Informáticos , Internet
19.
BMC Bioinformatics ; 15 Suppl 9: S10, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25252785

RESUMEN

Merging the forward and reverse reads from paired-end sequencing is a critical task that can significantly improve the performance of downstream tasks, such as genome assembly and mapping, by providing them with virtually elongated reads. However, due to the inherent limitations of most paired-end sequencers, the chance of observing erroneous bases grows rapidly as the end of a read is approached, which becomes a critical hurdle for accurately merging paired-end reads. Although there exist several sophisticated approaches to this problem, their performance in terms of quality of merging often remains unsatisfactory. To address this issue, here we present a context-aware scheme for paired-end reads (CASPER): a computational method to rapidly and robustly merge overlapping paired-end reads. Being particularly well suited to amplicon sequencing applications, CASPER is thoroughly tested with both simulated and real high-throughput amplicon sequencing data. According to our experimental results, CASPER significantly outperforms existing state-of-the art paired-end merging tools in terms of accuracy and robustness. CASPER also exploits the parallelism in the task of paired-end merging and effectively speeds up by multithreading. CASPER is freely available for academic use at http://best.snu.ac.kr/casper.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos , Algoritmos , Análisis de Secuencia de ADN/métodos
20.
Biochemistry ; 53(19): 3063-5, 2014 May 20.
Artículo en Inglés | MEDLINE | ID: mdl-24766159

RESUMEN

Chemical mapping experiments offer powerful information about RNA structure but currently involve ad hoc assumptions in data processing. We show that simple dilutions, referencing standards (GAGUA hairpins), and HiTRACE/MAPseeker analysis allow rigorous overmodification correction, background subtraction, and normalization for electrophoretic data and a ligation bias correction needed for accurate deep sequencing data. Comparisons across six noncoding RNAs stringently test the proposed standardization of dimethyl sulfate (DMS), 2'-OH acylation (SHAPE), and carbodiimide measurements. Identification of new signatures for extrahelical bulges and DMS "hot spot" pockets (including tRNA A58, methylated in vivo) illustrates the utility and necessity of standardization for quantitative RNA mapping.


Asunto(s)
Conformación de Ácido Nucleico , ARN/química , Ésteres del Ácido Sulfúrico/química , Acilación
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA