Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
ACS Omega ; 8(42): 39759-39769, 2023 Oct 24.
Artigo em Inglês | MEDLINE | ID: mdl-37901490

RESUMO

In recent years, molecular representation learning has emerged as a key area of focus in various chemical tasks. However, many existing models fail to fully consider the geometric information on molecular structures, resulting in less intuitive representations. Moreover, the widely used message passing mechanism is limited to providing the interpretation of experimental results from a chemical perspective. To address these challenges, we introduce a novel transformer-based framework for molecular representation learning, named the geometry-aware transformer (GeoT). The GeoT learns molecular graph structures through attention-based mechanisms specifically designed to offer reliable interpretability as well as molecular property prediction. Consequently, the GeoT can generate attention maps of the interatomic relationships associated with training objectives. In addition, the GeoT demonstrates performance comparable to that of MPNN-based models while achieving reduced computational complexity. Our comprehensive experiments, including an empirical simulation, reveal that the GeoT effectively learns chemical insights into molecular structures, bridging the gap between artificial intelligence and molecular sciences.

2.
ACS Omega ; 7(5): 4234-4244, 2022 Feb 08.
Artigo em Inglês | MEDLINE | ID: mdl-35155916

RESUMO

A molecule is a complex of heterogeneous components, and the spatial arrangements of these components determine the whole molecular properties and characteristics. With the advent of deep learning in computational chemistry, several studies have focused on how to predict molecular properties based on molecular configurations. MA message-passing neural network provides an effective framework for capturing molecular geometric features with the perspective of a molecule as a graph. However, most of these studies assumed that all heterogeneous molecular features, such as atomic charge, bond length, or other geometric features, always contribute equivalently to the target prediction, regardless of the task type. In this study, we propose a dual-branched neural network for molecular property prediction based on both the message-passing framework and standard multilayer perceptron neural networks. Our model learns heterogeneous molecular features with different scales, which are trained flexibly according to each prediction target. In addition, we introduce a discrete branch to learn single-atom features without local aggregation, apart from message-passing steps. We verify that this novel structure can improve the model performance. The proposed model outperforms other recent models with sparser representations. Our experimental results indicate that, in the chemical property prediction tasks, the diverse chemical nature of targets should be carefully considered for both model performance and generalizability. Finally, we provide the intuitive analysis between the experimental results and the chemical meaning of the target.

3.
Bioinformatics ; 38(3): 671-677, 2022 01 12.
Artigo em Inglês | MEDLINE | ID: mdl-34677573

RESUMO

MOTIVATION: MicroRNAs (miRNAs) play pivotal roles in gene expression regulation by binding to target sites of messenger RNAs (mRNAs). While identifying functional targets of miRNAs is of utmost importance, their prediction remains a great challenge. Previous computational algorithms have major limitations. They use conservative candidate target site (CTS) selection criteria mainly focusing on canonical site types, rely on laborious and time-consuming manual feature extraction, and do not fully capitalize on the information underlying miRNA-CTS interactions. RESULTS: In this article, we introduce TargetNet, a novel deep learning-based algorithm for functional miRNA target prediction. To address the limitations of previous approaches, TargetNet has three key components: (i) relaxed CTS selection criteria accommodating irregularities in the seed region, (ii) a novel miRNA-CTS sequence encoding scheme incorporating extended seed region alignments and (iii) a deep residual network-based prediction model. The proposed model was trained with miRNA-CTS pair datasets and evaluated with miRNA-mRNA pair datasets. TargetNet advances the previous state-of-the-art algorithms used in functional miRNA target classification. Furthermore, it demonstrates great potential for distinguishing high-functional miRNA targets. AVAILABILITY AND IMPLEMENTATION: The codes and pre-trained models are available at https://github.com/mswzeus/TargetNet.


Assuntos
MicroRNAs , MicroRNAs/genética , MicroRNAs/metabolismo , Redes Neurais de Computação , Algoritmos , RNA Mensageiro/genética , Regulação da Expressão Gênica , Biologia Computacional
4.
PLoS One ; 16(5): e0251865, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34003870

RESUMO

Heat shock proteins (HSPs) play a pivotal role as molecular chaperones against unfavorable conditions. Although HSPs are of great importance, their computational identification remains a significant challenge. Previous studies have two major limitations. First, they relied heavily on amino acid composition features, which inevitably limited their prediction performance. Second, their prediction performance was overestimated because of the independent two-stage evaluations and train-test data redundancy. To overcome these limitations, we introduce two novel deep learning algorithms: (1) time-efficient DeepHSP and (2) high-performance DeeperHSP. We propose a convolutional neural network (CNN)-based DeepHSP that classifies both non-HSPs and six HSP families simultaneously. It outperforms state-of-the-art algorithms, despite taking 14-15 times less time for both training and inference. We further improve the performance of DeepHSP by taking advantage of protein transfer learning. While DeepHSP is trained on raw protein sequences, DeeperHSP is trained on top of pre-trained protein representations. Therefore, DeeperHSP remarkably outperforms state-of-the-art algorithms increasing F1 scores in both cross-validation and independent test experiments by 20% and 10%, respectively. We envision that the proposed algorithms can provide a proteome-wide prediction of HSPs and help in various downstream analyses for pathology and clinical research.


Assuntos
Proteínas de Choque Térmico/genética , Aprendizado de Máquina , Chaperonas Moleculares/genética , Redes Neurais de Computação , Algoritmos , Sequência de Aminoácidos/genética , Biologia Computacional/tendências , Aprendizado Profundo , Proteínas de Choque Térmico/isolamento & purificação , Humanos , Transporte Proteico/genética
5.
Bioinformatics ; 37(11): 1562-1570, 2021 Jul 12.
Artigo em Inglês | MEDLINE | ID: mdl-29474530

RESUMO

MOTIVATION: Metagenomic sequencing has become a crucial tool for obtaining a gene catalogue of operational taxonomic units (OTUs) in a microbial community. A typical metagenomic sequencing produces a large amount of data (often in the order of terabytes or more), and computational tools are indispensable for efficient processing. In particular, error correction in metagenomics is crucial for accurate and robust genetic cataloging of microbial communities. However, many existing error-correction tools take a prohibitively long time and often bottleneck the whole analysis pipeline. RESULTS: To overcome this computational hurdle, we analyzed and exploited the data-level parallelism that exists in the error-correction procedure and proposed a tool named MUGAN that exploits both multi-core central processing units and multiple graphics processing units for co-processing. According to the experimental results, our approach reduced not only the time demand for denoising amplicons from approximately 59 h to only 46 min, but also the overestimation of the number of OTUs, estimating 6.7 times less species-level OTUs than the baseline. In addition, our approach provides web-based intuitive visualization of results. Given its efficiency and convenience, we anticipate that our approach would greatly facilitate denoising efforts in metagenomics studies. AVAILABILITY AND IMPLEMENTATION: http://data.snu.ac.kr/pub/mugan. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

6.
Pac Symp Biocomput ; 24: 88-99, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30864313

RESUMO

Recent advances in next-generation sequencing technologies have facilitated the use of deoxyribonucleic acid (DNA) as a novel covert channels in steganography. There are various methods that exist in other domains to detect hidden messages in conventional covert channels. However, they have not been applied to DNA steganography. The current most common detection approaches, namely frequency analysis-based methods, often overlook important signals when directly applied to DNA steganography because those methods depend on the distribution of the number of sequence characters. To address this limitation, we propose a general sequence learning-based DNA steganalysis framework. The proposed approach learns the intrinsic distribution of coding and non-coding sequences and detects hidden messages by exploiting distribution variations after hiding these messages. Using deep recurrent neural networks (RNNs), our framework identifies the distribution variations by using the classification score to predict whether a sequence is to be a coding or non-coding sequence. We compare our proposed method to various existing methods and biological sequence analysis methods implemented on top of our framework. According to our experimental results, our approach delivers a robust detection performance compared to other tools.


Assuntos
Biologia Computacional/métodos , DNA/genética , Aprendizado Profundo , Redes Neurais de Computação , Meios de Comunicação , Humanos , Teoria da Informação , Análise de Sequência de DNA
7.
Bioinformatics ; 34(22): 3889-3897, 2018 11 15.
Artigo em Inglês | MEDLINE | ID: mdl-29850775

RESUMO

Motivation: Long non-coding RNAs (lncRNAs) are important regulatory elements in biological processes. LncRNAs share similar sequence characteristics with messenger RNAs, but they play completely different roles, thus providing novel insights for biological studies. The development of next-generation sequencing has helped in the discovery of lncRNA transcripts. However, the experimental verification of numerous transcriptomes is time consuming and costly. To alleviate these issues, a computational approach is needed to distinguish lncRNAs from the transcriptomes. Results: We present a deep learning-based approach, lncRNAnet, to identify lncRNAs that incorporates recurrent neural networks for RNA sequence modeling and convolutional neural networks for detecting stop codons to obtain an open reading frame indicator. lncRNAnet performed clearly better than the other tools for sequences of short lengths, on which most lncRNAs are distributed. In addition, lncRNAnet successfully learned features and showed 7.83%, 5.76%, 5.30% and 3.78% improvements over the alternatives on a human test set in terms of specificity, accuracy, F1-score and area under the curve, respectively. Availability and implementation: Data and codes are available in http://data.snu.ac.kr/pub/lncRNAnet.


Assuntos
Aprendizado Profundo , RNA Longo não Codificante/genética , Bases de Dados Genéticas , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Fases de Leitura Aberta
8.
Artigo em Inglês | MEDLINE | ID: mdl-26930691

RESUMO

To assess the genetic diversity of an environmental sample in metagenomics studies, the amplicon sequences of 16s rRNA genes need to be clustered into operational taxonomic units (OTUs). Many existing tools for OTU clustering trade off between accuracy and computational efficiency. We propose a novel OTU clustering algorithm, hc-OTU, which achieves high accuracy and fast runtime by exploiting homopolymer compaction and k-mer profiling to significantly reduce the computing time for pairwise distances of amplicon sequences. We compare the proposed method with other widely used methods, including UCLUST, CD-HIT, MOTHUR, ESPRIT, ESPRIT-TREE, and CLUSTOM, comprehensively, using nine different experimental datasets and many evaluation metrics, such as normalized mutual information, adjusted Rand index, measure of concordance, and F-score. Our evaluation reveals that the proposed method achieves a level of accuracy comparable to the respective accuracy levels of MOTHUR and ESPRIT-TREE, two widely used OTU clustering methods, while delivering orders-of-magnitude speedups.


Assuntos
Análise por Conglomerados , Metagenômica/métodos , Análise de Sequência de DNA/métodos , Algoritmos , RNA Ribossômico 16S/genética
9.
Sci Rep ; 7(1): 8238, 2017 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-28811672

RESUMO

Nonporous silica nanoparticles (SiNPs) have potential as promising carriers for ophthalmic drugs. However, the in vivo safety of ocular topical SiNPs remains unclear. This study investigated the in vivo safety of oral and ocular topical applications of 100 nm-sized SiNPs in Sprague-Dawley rats. The rats were divided into the following four groups: low-dose oral administration (total 100 mg/kg of SiNPs mixed with food for one week), high-dose oral administration (total 1000 mg/kg of SiNPs mixed with food for one week), ocular topical administration (10 mg/ml concentration, one drop, applied to the right eyes four times a day for one month), or a negative control (no SiNP treatment). The rats were observed for 12 weeks to investigate any signs of general or ocular toxicity. During the observation period, no differences were observed in the body weights, food and water intakes, behaviors and abnormal symptoms of the four groups. No animal deaths occurred. After 12 weeks, hematologic, blood biochemical parameters and ophthalmic examinations revealed no abnormal findings in any of the animals. The lack of toxicity of the SiNPs was further verified in autopsy findings of brain, liver, lung, spleen, heart, kidneys, intestine, eyeballs, and ovaries or testes.


Assuntos
Nanopartículas , Dióxido de Silício , Administração Oral , Administração Tópica , Animais , Biomarcadores , Técnicas de Diagnóstico Oftalmológico , Portadores de Fármacos/química , Sistemas de Liberação de Medicamentos , Olho/efeitos dos fármacos , Olho/patologia , Feminino , Imuno-Histoquímica , Masculino , Nanopartículas/administração & dosagem , Nanopartículas/efeitos adversos , Nanopartículas/química , Tamanho do Órgão , Ratos , Dióxido de Silício/química
10.
PLoS One ; 12(7): e0181463, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28749987

RESUMO

We consider the correction of errors from nucleotide sequences produced by next-generation targeted amplicon sequencing. The next-generation sequencing (NGS) platforms can provide a great deal of sequencing data thanks to their high throughput, but the associated error rates often tend to be high. Denoising in high-throughput sequencing has thus become a crucial process for boosting the reliability of downstream analyses. Our methodology, named DUDE-Seq, is derived from a general setting of reconstructing finite-valued source data corrupted by a discrete memoryless channel and effectively corrects substitution and homopolymer indel errors, the two major types of sequencing errors in most high-throughput targeted amplicon sequencing platforms. Our experimental studies with real and simulated datasets suggest that the proposed DUDE-Seq not only outperforms existing alternatives in terms of error-correction capability and time efficiency, but also boosts the reliability of downstream analyses. Further, the flexibility of DUDE-Seq enables its robust application to different sequencing platforms and analysis pipelines by simple updates of the noise model. DUDE-Seq is available at http://data.snu.ac.kr/pub/dude-seq.


Assuntos
Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequência de Bases , Simulação por Computador , Bases de Dados de Ácidos Nucleicos
11.
Brief Bioinform ; 18(5): 851-869, 2017 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-27473064

RESUMO

In the era of big data, transformation of biomedical big data into valuable knowledge has been one of the most important challenges in bioinformatics. Deep learning has advanced rapidly since the early 2000s and now demonstrates state-of-the-art performance in various fields. Accordingly, application of deep learning in bioinformatics to gain insight from data has been emphasized in both academia and industry. Here, we review deep learning in bioinformatics, presenting examples of current research. To provide a useful and comprehensive perspective, we categorize research both by the bioinformatics domain (i.e. omics, biomedical imaging, biomedical signal processing) and deep learning architecture (i.e. deep neural networks, convolutional neural networks, recurrent neural networks, emergent architectures) and present brief descriptions of each study. Additionally, we discuss theoretical and practical issues of deep learning in bioinformatics and suggest future research directions. We believe that this review will provide valuable insights and serve as a starting point for researchers to apply deep learning approaches in their bioinformatics studies.


Assuntos
Aprendizado de Máquina , Biologia Computacional , Humanos , Redes Neurais de Computação
12.
BMC Bioinformatics ; 15 Suppl 9: S10, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25252785

RESUMO

Merging the forward and reverse reads from paired-end sequencing is a critical task that can significantly improve the performance of downstream tasks, such as genome assembly and mapping, by providing them with virtually elongated reads. However, due to the inherent limitations of most paired-end sequencers, the chance of observing erroneous bases grows rapidly as the end of a read is approached, which becomes a critical hurdle for accurately merging paired-end reads. Although there exist several sophisticated approaches to this problem, their performance in terms of quality of merging often remains unsatisfactory. To address this issue, here we present a context-aware scheme for paired-end reads (CASPER): a computational method to rapidly and robustly merge overlapping paired-end reads. Being particularly well suited to amplicon sequencing applications, CASPER is thoroughly tested with both simulated and real high-throughput amplicon sequencing data. According to our experimental results, CASPER significantly outperforms existing state-of-the art paired-end merging tools in terms of accuracy and robustness. CASPER also exploits the parallelism in the task of paired-end merging and effectively speeds up by multithreading. CASPER is freely available for academic use at http://best.snu.ac.kr/casper.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Algoritmos , Análise de Sequência de DNA/métodos
13.
Artigo em Inglês | MEDLINE | ID: mdl-24109767

RESUMO

In sequencing results, the quality score is reported for each base, representing the probability that the base is called incorrectly. The notion of quality scores was initially developed for conventional Sanger sequencing, but is widely used for next-generation sequencing techniques, including Illumina. In this paper, we carry out in-depth analysis of quality scores reported for Illumina reads and present how they are related to real errors in the reads. We confirmed strong interrelation between quality scores and real errors in Illumina reads, and observed that reverse reads tend to have lower quality scores than forward reads in paired-end reads do. In addition, we discovered other interesting patterns from quality score analysis. Our hope is that the findings in this paper will be helpful for designing error-correction and/or filtering methods for next-generation sequencing.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Sequência de Bases , Sequenciamento de Nucleotídeos em Larga Escala/normas , Probabilidade , Controle de Qualidade , Análise de Sequência de DNA/normas , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA