Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 76
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Nucleic Acids Res ; 49(5): 2700-2720, 2021 03 18.
Artigo em Inglês | MEDLINE | ID: mdl-33590099

RESUMO

In animal gonads, transposable elements are actively repressed to preserve genome integrity through the PIWI-interacting RNA (piRNA) pathway. In mice, piRNAs are abundantly expressed in male germ cells, and form effector complexes with three distinct PIWIs. The depletion of individual Piwi genes causes male-specific sterility with no discernible phenotype in female mice. Unlike mice, most other mammals have four PIWI genes, some of which are expressed in the ovary. Here, purification of PIWI complexes from oocytes of the golden hamster revealed that the size of the PIWIL1-associated piRNAs changed during oocyte maturation. In contrast, PIWIL3, an ovary-specific PIWI in most mammals, associates with short piRNAs only in metaphase II oocytes, which coincides with intense phosphorylation of the protein. An improved high-quality genome assembly and annotation revealed that PIWIL1- and PIWIL3-associated piRNAs appear to share the 5'-ends of common piRNA precursors and are mostly derived from unannotated sequences with a diminished contribution from TE-derived sequences, most of which correspond to endogenous retroviruses. Our findings show the complex and dynamic nature of biogenesis of piRNAs in hamster oocytes, and together with the new genome sequence generated, serve as the foundation for developing useful models to study the piRNA pathway in mammalian oocytes.


Assuntos
Proteínas Argonautas/metabolismo , Oócitos/crescimento & desenvolvimento , Oócitos/metabolismo , RNA Interferente Pequeno/metabolismo , Animais , Proteínas Argonautas/genética , Feminino , Genômica , Masculino , Mesocricetus , Metáfase , Fosforilação , RNA Interferente Pequeno/genética , Testículo/metabolismo
2.
BMC Bioinformatics ; 22(Suppl 6): 427, 2021 Jun 02.
Artigo em Inglês | MEDLINE | ID: mdl-34078257

RESUMO

BACKGROUND: The increasing use of whole metagenome sequencing has spurred the need to improve de novo assemblers to facilitate the discovery of unknown species and the analysis of their genomic functions. MetaVelvet-SL is a short-read de novo metagenome assembler that partitions a multi-species de Bruijn graph into single-species sub-graphs. This study aimed to improve the performance of MetaVelvet-SL by using a deep learning-based model to predict the partition nodes in a multi-species de Bruijn graph. RESULTS: This study showed that the recent advances in deep learning offer the opportunity to better exploit sequence information and differentiate genomes of different species in a metagenomic sample. We developed an extension to MetaVelvet-SL, which we named MetaVelvet-DL, that builds an end-to-end architecture using Convolutional Neural Network and Long Short-Term Memory units. The deep learning model in MetaVelvet-DL can more accurately predict how to partition a de Bruijn graph than the Support Vector Machine-based model in MetaVelvet-SL can. Assembly of the Critical Assessment of Metagenome Interpretation (CAMI) dataset showed that after removing chimeric assemblies, MetaVelvet-DL produced longer single-species contigs, with less misassembled contigs than MetaVelvet-SL did. CONCLUSIONS: MetaVelvet-DL provides more accurate de novo assemblies of whole metagenome data. The authors believe that this improvement can help in furthering the understanding of microbiomes by providing a more accurate description of the metagenomic samples under analysis.


Assuntos
Aprendizado Profundo , Metagenoma , Algoritmos , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Metagenômica , Análise de Sequência de DNA , Software
3.
Brief Bioinform ; 20(3): 866-876, 2019 05 21.
Artigo em Inglês | MEDLINE | ID: mdl-29112696

RESUMO

Long reads obtained from third-generation sequencing platforms can help overcome the long-standing challenge of the de novo assembly of sequences for the genomic analysis of non-model eukaryotic organisms. Numerous long-read-aided de novo assemblies have been published recently, which exhibited superior quality of the assembled genomes in comparison with those achieved using earlier second-generation sequencing technologies. Evaluating assemblies is important in guiding the appropriate choice for specific research needs. In this study, we evaluated 10 long-read assemblers using a variety of metrics on Pacific Biosciences (PacBio) data sets from different taxonomic categories with considerable differences in genome size. The results allowed us to narrow down the list to a few assemblers that can be effectively applied to eukaryotic assembly projects. Moreover, we highlight how best to use limited genomic resources for effectively evaluating the genome assemblies of non-model organisms.


Assuntos
Genoma , Análise de Sequência de DNA/métodos , Animais , Caenorhabditis elegans/genética , Escherichia coli/genética , Ipomoea/genética , Plasmodium falciparum/genética
4.
Esophagus ; 18(3): 612-620, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-33635412

RESUMO

BACKGROUND: Because cancers of hollow organs such as the esophagus are hard to detect even by the expert physician, it is important to establish diagnostic systems to support physicians and increase the accuracy of diagnosis. In recent years, deep learning-based artificial intelligence (AI) technology has been employed for medical image recognition. However, no optimal CT diagnostic system employing deep learning technology has been attempted and established for esophageal cancer so far. PURPOSE: To establish an AI-based diagnostic system for esophageal cancer from CT images. MATERIALS AND METHODS: In this single-center, retrospective cohort study, 457 patients with primary esophageal cancer referred to our division between 2005 and 2018 were enrolled. We fine-tuned VGG16, an image recognition model of deep learning convolutional neural network (CNN), for the detection of esophageal cancer. We evaluated the diagnostic accuracy of the CNN using a test data set including 46 cancerous CT images and 100 non-cancerous images and compared it to that of two radiologists. RESULTS: Pre-treatment esophageal cancer stages of the patients included in the test data set were clinical T1 (12 patients), clinical T2 (9 patients), clinical T3 (20 patients), and clinical T4 (5 patients). The CNN-based system showed a diagnostic accuracy of 84.2%, F value of 0.742, sensitivity of 71.7%, and specificity of 90.0%. CONCLUSIONS: Our AI-based diagnostic system succeeded in detecting esophageal cancer with high accuracy. More training with vast datasets collected from multiples centers would lead to even higher diagnostic accuracy and aid better decision making.


Assuntos
Aprendizado Profundo , Neoplasias Esofágicas , Inteligência Artificial , Neoplasias Esofágicas/diagnóstico por imagem , Humanos , Estudos Retrospectivos , Tomografia Computadorizada por Raios X/métodos
5.
BMC Genomics ; 21(Suppl 3): 243, 2020 Apr 02.
Artigo em Inglês | MEDLINE | ID: mdl-32241258

RESUMO

BACKGROUND: The common marmoset (Callithrix jacchus) is one of the most studied primate model organisms. However, the marmoset genomes available in the public databases are highly fragmented and filled with sequence gaps, hindering research advances related to marmoset genomics and transcriptomics. RESULTS: Here we utilize single-molecule, long-read sequence data to improve and update the existing genome assembly and report a near-complete genome of the common marmoset. The assembly is of 2.79 Gb size, with a contig N50 length of 6.37 Mb and a chromosomal scaffold N50 length of 143.91 Mb, representing the most contiguous and high-quality marmoset genome up to date. Approximately 90% of the assembled genome was represented in contigs longer than 1 Mb, with approximately 104-fold improvement in contiguity over the previously published marmoset genome. More than 98% of the gaps from the previously published genomes were filled successfully, which improved the mapping rates of genomic and transcriptomic data on to the assembled genome. CONCLUSIONS: Altogether the updated, high-quality common marmoset genome assembly provide improvements at various levels over the previous versions of the marmoset genome assemblies. This will allow researchers working on primate genomics to apply the genome more efficiently for their genomic and transcriptomic sequence data.


Assuntos
Callithrix/genética , Mapeamento Cromossômico/métodos , Genoma/genética , Animais , Biologia Computacional/métodos , Mapeamento de Sequências Contíguas/métodos , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Alinhamento de Sequência
6.
Bioinformatics ; 34(13): i237-i244, 2018 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-29949978

RESUMO

Motivation: The convolutional neural network (CNN) has been applied to the classification problem of DNA sequences, with the additional purpose of motif discovery. The training of CNNs with distributed representations of four nucleotides has successfully derived position weight matrices on the learned kernels that corresponded to sequence motifs such as protein-binding sites. Results: We propose a novel application of CNNs to classification of pairwise alignments of sequences for accurate clustering of sequences and show the benefits of the CNN method of inputting pairwise alignments for clustering of non-coding RNA (ncRNA) sequences and for motif discovery. Classification of a pairwise alignment of two sequences into positive and negative classes corresponds to the clustering of the input sequences. After we combined the distributed representation of RNA nucleotides with the secondary-structure information specific to ncRNAs and furthermore with mapping profiles of next-generation sequence reads, the training of CNNs for classification of alignments of RNA sequences yielded accurate clustering in terms of ncRNA families and outperformed the existing clustering methods for ncRNA sequences. Several interesting sequence motifs and secondary-structure motifs known for the snoRNA family and specific to microRNA and tRNA families were identified. Availability and implementation: The source code of our CNN software in the deep-learning framework Chainer is available at http://www.dna.bio.keio.ac.jp/cnn/, and the dataset used for performance evaluation in this work is available at the same URL.


Assuntos
Biologia Computacional/métodos , Redes Neurais de Computação , RNA não Traduzido/metabolismo , Software , Adenocarcinoma/metabolismo , Sítios de Ligação , Análise por Conglomerados , Humanos , Masculino , MicroRNAs/química , MicroRNAs/classificação , MicroRNAs/metabolismo , Conformação de Ácido Nucleico , Neoplasias da Próstata/metabolismo , Ligação Proteica , RNA Nucleolar Pequeno/química , RNA Nucleolar Pequeno/classificação , RNA Nucleolar Pequeno/metabolismo , RNA de Transferência/química , RNA de Transferência/classificação , RNA de Transferência/metabolismo , RNA não Traduzido/química , RNA não Traduzido/classificação
7.
BMC Bioinformatics ; 19(Suppl 19): 526, 2018 Dec 31.
Artigo em Inglês | MEDLINE | ID: mdl-30598075

RESUMO

BACKGROUND: Previous studies have suggested deep learning to be a highly effective approach for screening lead compounds for new drugs. Several deep learning models have been developed by addressing the use of various kinds of fingerprints and graph convolution architectures. However, these methods are either advantageous or disadvantageous depending on whether they (1) can distinguish structural differences including chirality of compounds, and (2) can automatically discover effective features. RESULTS: We developed another deep learning model for compound classification. In this method, we constructed a distributed representation of compounds based on the SMILES notation, which linearly represents a compound structure, and applied the SMILES-based representation to a convolutional neural network (CNN). The use of SMILES allows us to process all types of compounds while incorporating a broad range of structure information, and representation learning by CNN automatically acquires a low-dimensional representation of input features. In a benchmark experiment using the TOX 21 dataset, our method outperformed conventional fingerprint methods, and performed comparably against the winning model of the TOX 21 Challenge. Multivariate analysis confirmed that the chemical space consisting of the features learned by SMILES-based representation learning adequately expressed a richer feature space that enabled the accurate discrimination of compounds. Using motif detection with the learned filters, not only important known structures (motifs) such as protein-binding sites but also structures of unknown functional groups were detected. CONCLUSIONS: The source code of our SMILES-based convolutional neural network software in the deep learning framework Chainer is available at http://www.dna.bio.keio.ac.jp/smiles/ , and the dataset used for performance evaluation in this work is available at the same URL.


Assuntos
DNA/metabolismo , Aprendizado Profundo , Redes Neurais de Computação , Preparações Farmacêuticas/metabolismo , Proteínas/metabolismo , Software , Sítios de Ligação , DNA/química , Humanos , Modelos Químicos , Preparações Farmacêuticas/química , Ligação Proteica , Proteínas/química
8.
Bioinformatics ; 32(12): i369-i377, 2016 06 15.
Artigo em Inglês | MEDLINE | ID: mdl-27307639

RESUMO

MOTIVATION: Deep sequencing of the transcripts of regulatory non-coding RNA generates footprints of post-transcriptional processes. After obtaining sequence reads, the short reads are mapped to a reference genome, and specific mapping patterns can be detected called read mapping profiles, which are distinct from random non-functional degradation patterns. These patterns reflect the maturation processes that lead to the production of shorter RNA sequences. Recent next-generation sequencing studies have revealed not only the typical maturation process of miRNAs but also the various processing mechanisms of small RNAs derived from tRNAs and snoRNAs. RESULTS: We developed an algorithm termed SHARAKU to align two read mapping profiles of next-generation sequencing outputs for non-coding RNAs. In contrast with previous work, SHARAKU incorporates the primary and secondary sequence structures into an alignment of read mapping profiles to allow for the detection of common processing patterns. Using a benchmark simulated dataset, SHARAKU exhibited superior performance to previous methods for correctly clustering the read mapping profiles with respect to 5'-end processing and 3'-end processing from degradation patterns and in detecting similar processing patterns in deriving the shorter RNAs. Further, using experimental data of small RNA sequencing for the common marmoset brain, SHARAKU succeeded in identifying the significant clusters of read mapping profiles for similar processing patterns of small derived RNA families expressed in the brain. AVAILABILITY AND IMPLEMENTATION: The source code of our program SHARAKU is available at http://www.dna.bio.keio.ac.jp/sharaku/, and the simulated dataset used in this work is available at the same link. Accession code: The sequence data from the whole RNA transcripts in the hippocampus of the left brain used in this work is available from the DNA DataBank of Japan (DDBJ) Sequence Read Archive (DRA) under the accession number DRA004502. CONTACT: yasu@bio.keio.ac.jp SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala , Análise por Conglomerados , Bases de Dados de Ácidos Nucleicos , RNA não Traduzido , Análise de Sequência de RNA , Software
9.
Mol Ther ; 21(3): 526-32, 2013 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-23337983

RESUMO

Lysosomal ß-galactosidase (ß-Gal) deficiency causes a group of disorders that include neuronopathic GM1 gangliosidosis and non-neuronopathic Morquio B disease. We have previously proposed the use of small molecule ligands of ß-Gal as pharmacological chaperones (PCs) for the treatment of GM1 gangliosidosis brain pathology. Although it is still under development, PC therapy has yielded promising preclinical results in several lysosomal diseases. In this study, we evaluated the effect of bicyclic 1-deoxygalactonojirimycin (DGJ) derivative of the sp(2)-iminosugar type, namely 5N,6S-(N'-butyliminomethylidene)-6-thio-1- deoxygalactonojirimycin (6S-NBI-DGJ), as a novel PC for human mutant ß-Gal. In vitro, 6S-NBI-DGJ had the ability to inhibit the activity of human ß-Gal in a competitive manner and was able to protect this enzyme from heat-induced degradation. Computational analysis supported that the rigid glycone bicyclic core of 6S-NBI-DGJ binds to the active site of the enzyme, with the aglycone N'-butyl substituent, in a precise E-orientation, located at a hydrophobic region nearby. Chaperone potential profiling indicated significant increases of enzyme activity in 24 of 88 ß-Gal mutants, including four common mutations. Finally, oral administration of 6S-NBI-DGJ ameliorated the brain pathology of GM1 gangliosidosis model mice. These results suggest that 6S-NBI-DGJ is a novel PC that may be effective on a broad range of ß-Gal mutants.


Assuntos
1-Desoxinojirimicina/análogos & derivados , Gangliosidose GM1/tratamento farmacológico , Chaperonas Moleculares/farmacologia , 1-Desoxinojirimicina/farmacologia , Administração Oral , Animais , Compostos Bicíclicos Heterocíclicos com Pontes/farmacologia , Células Cultivadas , Biologia Computacional , Modelos Animais de Doenças , Inibidores Enzimáticos/farmacologia , Fibroblastos/efeitos dos fármacos , Fibroblastos/metabolismo , Gangliosidose GM1/genética , Imino Açúcares/química , Imino Açúcares/farmacologia , Lisossomos/metabolismo , Camundongos , Camundongos Endogâmicos C57BL , Camundongos Knockout , Mucopolissacaridose IV/tratamento farmacológico , Mucopolissacaridose IV/genética , Mutação , Recombinação Genética , beta-Galactosidase/química , beta-Galactosidase/genética
10.
Nucleic Acids Res ; 40(20): e155, 2012 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-22821567

RESUMO

An important step in 'metagenomics' analysis is the assembly of multiple genomes from mixed sequence reads of multiple species in a microbial community. Most conventional pipelines use a single-genome assembler with carefully optimized parameters. A limitation of a single-genome assembler for de novo metagenome assembly is that sequences of highly abundant species are likely misidentified as repeats in a single genome, resulting in a number of small fragmented scaffolds. We extended a single-genome assembler for short reads, known as 'Velvet', to metagenome assembly, which we called 'MetaVelvet', for mixed short reads of multiple species. Our fundamental concept was to first decompose a de Bruijn graph constructed from mixed short reads into individual sub-graphs, and second, to build scaffolds based on each decomposed de Bruijn sub-graph as an isolate species genome. We made use of two features, the coverage (abundance) difference and graph connectivity, for the decomposition of the de Bruijn graph. For simulated datasets, MetaVelvet succeeded in generating significantly higher N50 scores than any single-genome assemblers. MetaVelvet also reconstructed relatively low-coverage genome sequences as scaffolds. On real datasets of human gut microbial read data, MetaVelvet produced longer scaffolds and increased the number of predicted genes.


Assuntos
Metagenômica/métodos , Análise de Sequência de DNA , Algoritmos , Bactérias/classificação , Trato Gastrointestinal/microbiologia , Genoma Bacteriano , Humanos , Software
11.
mSystems ; 9(5): e0140523, 2024 May 16.
Artigo em Inglês | MEDLINE | ID: mdl-38557130

RESUMO

The gut microbiome affects the health status of the host through complex interactions with the host's intestinal wall. These host-microbiome interactions may spatially vary along the physical and chemical environment of the intestine, but these changes remain unknown. This study investigated these intricate relationships through a gene co-expression network analysis based on dual transcriptome profiling of different intestinal sites-cecum, transverse colon, and rectum-of the primate common marmoset. We proposed a gene module extraction algorithm based on the graph theory to find tightly interacting gene modules of the host and the microbiome from a vast co-expression network. The 27 gene modules identified by this method, which include both host and microbiome genes, not only produced results consistent with previous studies regarding the host-microbiome relationships, but also provided new insights into microbiome genes acting as potential mediators in host-microbiome interplays. Specifically, we discovered associations between the host gene FBP1, a cancer marker, and polysaccharide degradation-related genes (pfkA and fucI) coded by Bacteroides vulgatus, as well as relationships between host B cell-specific genes (CD19, CD22, CD79B, and PTPN6) and a tryptophan synthesis gene (trpB) coded by Parabacteroides distasonis. Furthermore, our proposed module extraction algorithm surpassed existing approaches by successfully defining more functionally related gene modules, providing insights for understanding the complex relationship between the host and the microbiome.IMPORTANCEWe unveiled the intricate dynamics of the host-microbiome interactions along the colon by identifying closely interacting gene modules from a vast gene co-expression network, constructed based on simultaneous profiling of both host and microbiome transcriptomes. Our proposed gene module extraction algorithm, designed to interpret inter-species interactions, enabled the identification of functionally related gene modules encompassing both host and microbiome genes, which was challenging with conventional modularity maximization algorithms. Through these identified gene modules, we discerned previously unrecognized bacterial genes that potentially mediate in known relationships between host genes and specific bacterial species. Our findings underscore the spatial variations in host-microbiome interactions along the colon, rather than displaying a uniform pattern throughout the colon.


Assuntos
Microbioma Gastrointestinal , Redes Reguladoras de Genes , Animais , Microbioma Gastrointestinal/genética , Callithrix/microbiologia , Interações entre Hospedeiro e Microrganismos/genética , Perfilação da Expressão Gênica/métodos , Transcriptoma , Intestinos/microbiologia , Algoritmos
12.
Genome Res ; 20(9): 1219-28, 2010 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-20534883

RESUMO

The centromere is essential for faithful chromosome segregation by providing the site for kinetochore assembly. Although the role of the centromere is conserved throughout evolution, the DNA sequences associated with centromere regions are highly divergent among species and it remains to be determined how centromere DNA directs kinetochore formation. Despite the active use of chicken DT40 cells in studies of chromosome segregation, the sequence of the chicken centromere was unclear. Here, we performed a comprehensive analysis of chicken centromere DNA which revealed unique features of chicken centromeres compared with previously studied vertebrates. Centromere DNA sequences from the chicken macrochromosomes, with the exception of chromosome 5, contain chromosome-specific homogenous tandem repetitive arrays that span several hundred kilobases. In contrast, the centromeres of chromosomes 5, 27, and Z do not contain tandem repetitive sequences and span non-tandem-repetitive sequences of only approximately 30 kb. To test the function of these centromere sequences, we conditionally removed the centromere from the Z chromosome using genetic engineering and have shown that that the non-tandem-repeat sequence of chromosome Z is a functional centromere.


Assuntos
Centrômero/genética , Galinhas/genética , Cromossomos/genética , Sequências Repetitivas de Ácido Nucleico , Sequências de Repetição em Tandem , Animais , Sequência de Bases , DNA/química , Hibridização in Situ Fluorescente , Dados de Sequência Molecular , Mapeamento Físico do Cromossomo
13.
Bioinformatics ; 28(9): 1276-7, 2012 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-22419785

RESUMO

SUMMARY: Existing SAM visualization tools like 'samtools tview' (Li et al., 2009) are limited to a small region of the genome, and tools like Tablet (Milne et al., 2010) are limited to a relatively small number of reads and may fail outright on large datasets. We need to visualize complex ChIP-Seq and RNA-Seq features such as polarity as well as coverage across whole 3 Gbp genomes such as Human. We have addressed these problems in a lightweight visualization system called SAMSCOPE accelerated by OpenGL. The extensive pre-processing and fast OpenGL interface of SAMSCOPE provides instantaneous and intuitive browsing of complex data at all levels of detail across multiple experiments. AVAILABILITY AND IMPLEMENTATION: The SAMSCOPE software, implemented in C++ for Linux, with source code, binary packages and documentation are freely available from http://samscope.dna.bio.keio.ac.jp.


Assuntos
Gráficos por Computador , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Software , Animais , Bacillus subtilis/genética , Genoma , Humanos , Linguagens de Programação
14.
Bioinformatics ; 28(24): 3218-24, 2012 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-23060618

RESUMO

MOTIVATION: It is well known that the accuracy of RNA secondary structure prediction from a single sequence is limited, and thus a comparative approach that predicts a common secondary structure from aligned sequences is a better choice if homologous sequences with reliable alignments are available. However, correct secondary structure information is needed to produce reliable alignments of RNA sequences. To tackle this dilemma, we require a fast and accurate aligner that takes structural information into consideration to yield reliable structural alignments, which are suitable for common secondary structure prediction. RESULTS: We develop DAFS, a novel algorithm that simultaneously aligns and folds RNA sequences based on maximizing expected accuracy of a predicted common secondary structure and its alignment. DAFS decomposes the pairwise structural alignment problem into two independent secondary structure prediction problems and one pairwise (non-structural) alignment problem by the dual decomposition technique, and maintains the consistency of a pairwise structural alignment by imposing penalties on inconsistent base pairs and alignment columns that are iteratively updated. Furthermore, we extend DAFS to consider pseudoknots in RNA structural alignments by integrating IPknot for predicting a pseudoknotted structure. The experiments on publicly available datasets showed that DAFS can produce reliable structural alignments from unaligned sequences in terms of accuracy of common secondary structure prediction.


Assuntos
Algoritmos , RNA/química , Alinhamento de Sequência/métodos , Análise de Sequência de RNA/métodos , Pareamento de Bases , Conformação de Ácido Nucleico , Dobramento de RNA
15.
Bioinformatics ; 28(5): 745-6, 2012 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-22257668

RESUMO

UNLABELLED: Since tens of millions of chemical compounds have been accumulated in public chemical databases, fast comprehensive computational methods to predict interactions between chemical compounds and proteins are needed for virtual screening of lead compounds. Previously, we proposed a novel method for predicting protein-chemical interactions using two-layer Support Vector Machine classifiers that require only readily available biochemical data, i.e. amino acid sequences of proteins and structure formulas of chemical compounds. In this article, the method has been implemented as the COPICAT web service, with an easy-to-use front-end interface. Users can simply submit a protein-chemical interaction prediction job using a pre-trained classifier, or can even train their own classification model by uploading training data. COPICAT's fast and accurate computational prediction has enhanced lead compound discovery against a database of tens of millions of chemical compounds, implying that the search space for drug discovery is extended by >1000 times compared with currently well-used high-throughput screening methodologies. AVAILABILITY: The COPICAT server is available at http://copicat.dna.bio.keio.ac.jp. All functions, including the prediction function are freely available via anonymous login without registration. Registered users, however, can use the system more intensively.


Assuntos
Bases de Dados Factuais , Ligantes , Proteínas/metabolismo , Software , Máquina de Vetores de Suporte , Ligação Proteica , Proteínas/química
16.
Healthcare (Basel) ; 11(4)2023 Feb 07.
Artigo em Inglês | MEDLINE | ID: mdl-36833018

RESUMO

Ultrasonography is widely used for diagnosis of diseases in internal organs because it is nonradioactive, noninvasive, real-time, and inexpensive. In ultrasonography, a set of measurement markers is placed at two points to measure organs and tumors, then the position and size of the target finding are measured on this basis. Among the measurement targets of abdominal ultrasonography, renal cysts occur in 20-50% of the population regardless of age. Therefore, the frequency of measurement of renal cysts in ultrasound images is high, and the effect of automating measurement would be high as well. The aim of this study was to develop a deep learning model that can automatically detect renal cysts in ultrasound images and predict the appropriate position of a pair of salient anatomical landmarks to measure their size. The deep learning model adopted fine-tuned YOLOv5 for detection of renal cysts and fine-tuned UNet++ for prediction of saliency maps, representing the position of salient landmarks. Ultrasound images were input to YOLOv5, and images cropped inside the bounding box and detected from the input image by YOLOv5 were input to UNet++. For comparison with human performance, three sonographers manually placed salient landmarks on 100 unseen items of the test data. These salient landmark positions annotated by a board-certified radiologist were used as the ground truth. We then evaluated and compared the accuracy of the sonographers and the deep learning model. Their performances were evaluated using precision-recall metrics and the measurement error. The evaluation results show that the precision and recall of our deep learning model for detection of renal cysts are comparable to standard radiologists; the positions of the salient landmarks were predicted with an accuracy close to that of the radiologists, and in a shorter time.

17.
Commun Chem ; 6(1): 249, 2023 Nov 16.
Artigo em Inglês | MEDLINE | ID: mdl-37973971

RESUMO

The structural diversity of chemical libraries, which are systematic collections of compounds that have potential to bind to biomolecules, can be represented by chemical latent space. A chemical latent space is a projection of a compound structure into a mathematical space based on several molecular features, and it can express structural diversity within a compound library in order to explore a broader chemical space and generate novel compound structures for drug candidates. In this study, we developed a deep-learning method, called NP-VAE (Natural Product-oriented Variational Autoencoder), based on variational autoencoder for managing hard-to-analyze datasets from DrugBank and large molecular structures such as natural compounds with chirality, an essential factor in the 3D complexity of compounds. NP-VAE was successful in constructing the chemical latent space from large-sized compounds that were unable to be handled in existing methods, achieving higher reconstruction accuracy, and demonstrating stable performance as a generative model across various indices. Furthermore, by exploring the acquired latent space, we succeeded in comprehensively analyzing a compound library containing natural compounds and generating novel compound structures with optimized functions.

18.
BMC Bioinformatics ; 13 Suppl 17: S8, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23282285

RESUMO

BACKGROUND: Prediction of biochemical (metabolic) pathways has a wide range of applications, including the optimization of drug candidates, and the elucidation of toxicity mechanisms. Recently, several methods have been developed for pathway prediction to derive a goal compound from a start compound. However, these methods require high computational costs, and cannot perform comprehensive prediction of novel metabolic pathways. Our aim of this study is to develop a de novo prediction method for reconstructions of metabolic pathways and predictions of unknown biosynthetic pathways in the sense that it does not require any initial network such as KEGG metabolic network to be explored. RESULTS: We formulated pathway prediction between a start compound and a goal compound as the shortest path search problem in terms of the number of enzyme reactions applied. We propose an efficient search method based on A* algorithm and heuristic techniques utilizing Linear Programming (LP) solution for estimation of the distance to the goal. First, a chemical compound is represented by a feature vector which counts frequencies of substructure occurrences in the structural formula. Second, an enzyme reaction is represented as an operator vector by detecting the structural changes to compounds before and after the reaction. By defining compound vectors as nodes and operator vectors as edges, prediction of the reaction pathway is reduced to the shortest path search problem in the vector space. In experiments on the DDT degradation pathway, we verify that the shortest paths predicted by our method are biologically correct pathways registered in the KEGG database. The results also demonstrate that the LP heuristics can achieve significant reduction in computation time. Furthermore, we apply our method to a secondary metabolite pathway of plant origin, and successfully find a novel biochemical pathway which cannot be predicted by the existing method. For the reconstruction of a known biochemical pathway, our method is over 40 times as fast as the existing method. CONCLUSIONS: Our method enables fast and accurate de novo pathway predictions and novel pathway detection.


Assuntos
Algoritmos , Biologia Computacional/métodos , Simulação por Computador , Enzimas/metabolismo , Redes e Vias Metabólicas , Compostos Orgânicos/metabolismo , Antocianinas/metabolismo , Benzopiranos , Glucosídeos/biossíntese , Luteína/biossíntese
19.
NAR Genom Bioinform ; 4(1): lqac012, 2022 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-35211670

RESUMO

Effective embedding is actively conducted by applying deep learning to biomolecular information. Obtaining better embeddings enhances the quality of downstream analyses, such as DNA sequence motif detection and protein function prediction. In this study, we adopt a pre-training algorithm for the effective embedding of RNA bases to acquire semantically rich representations and apply this algorithm to two fundamental RNA sequence problems: structural alignment and clustering. By using the pre-training algorithm to embed the four bases of RNA in a position-dependent manner using a large number of RNA sequences from various RNA families, a context-sensitive embedding representation is obtained. As a result, not only base information but also secondary structure and context information of RNA sequences are embedded for each base. We call this 'informative base embedding' and use it to achieve accuracies superior to those of existing state-of-the-art methods on RNA structural alignment and RNA family clustering tasks. Furthermore, upon performing RNA sequence alignment by combining this informative base embedding with a simple Needleman-Wunsch alignment algorithm, we succeed in calculating structural alignments with a time complexity of O(n 2) instead of the O(n 6) time complexity of the naive implementation of Sankoff-style algorithm for input RNA sequence of length n.

20.
Genes (Basel) ; 13(11)2022 11 18.
Artigo em Inglês | MEDLINE | ID: mdl-36421829

RESUMO

Existing approaches to predicting RNA secondary structures depend on how the secondary structure is decomposed into substructures, that is, the architecture, to define their parameter space. However, architecture dependency has not been sufficiently investigated, especially for pseudoknotted secondary structures. In this study, we propose a novel algorithm for directly inferring base-pairing probabilities with neural networks that do not depend on the architecture of RNA secondary structures, and then implement this approach using two maximum expected accuracy (MEA)-based decoding algorithms: Nussinov-style decoding for pseudoknot-free structures and IPknot-style decoding for pseudoknotted structures. To train the neural networks connected to each base pair, we adopt a max-margin framework, called structured support vector machines (SSVM), as the output layer. Our benchmarks for predicting RNA secondary structures with and without pseudoknots show that our algorithm outperforms existing methods in prediction accuracy.


Assuntos
RNA , Software , Pareamento de Bases , RNA/genética , RNA/química , Conformação de Ácido Nucleico , Análise de Sequência de RNA/métodos , Sequência de Bases , Redes Neurais de Computação , Probabilidade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA