Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 76
Filtrar
1.
mSystems ; : e0140523, 2024 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-38557130

RESUMO

The gut microbiome affects the health status of the host through complex interactions with the host's intestinal wall. These host-microbiome interactions may spatially vary along the physical and chemical environment of the intestine, but these changes remain unknown. This study investigated these intricate relationships through a gene co-expression network analysis based on dual transcriptome profiling of different intestinal sites-cecum, transverse colon, and rectum-of the primate common marmoset. We proposed a gene module extraction algorithm based on the graph theory to find tightly interacting gene modules of the host and the microbiome from a vast co-expression network. The 27 gene modules identified by this method, which include both host and microbiome genes, not only produced results consistent with previous studies regarding the host-microbiome relationships, but also provided new insights into microbiome genes acting as potential mediators in host-microbiome interplays. Specifically, we discovered associations between the host gene FBP1, a cancer marker, and polysaccharide degradation-related genes (pfkA and fucI) coded by Bacteroides vulgatus, as well as relationships between host B cell-specific genes (CD19, CD22, CD79B, and PTPN6) and a tryptophan synthesis gene (trpB) coded by Parabacteroides distasonis. Furthermore, our proposed module extraction algorithm surpassed existing approaches by successfully defining more functionally related gene modules, providing insights for understanding the complex relationship between the host and the microbiome.IMPORTANCEWe unveiled the intricate dynamics of the host-microbiome interactions along the colon by identifying closely interacting gene modules from a vast gene co-expression network, constructed based on simultaneous profiling of both host and microbiome transcriptomes. Our proposed gene module extraction algorithm, designed to interpret inter-species interactions, enabled the identification of functionally related gene modules encompassing both host and microbiome genes, which was challenging with conventional modularity maximization algorithms. Through these identified gene modules, we discerned previously unrecognized bacterial genes that potentially mediate in known relationships between host genes and specific bacterial species. Our findings underscore the spatial variations in host-microbiome interactions along the colon, rather than displaying a uniform pattern throughout the colon.

2.
Commun Chem ; 6(1): 249, 2023 Nov 16.
Artigo em Inglês | MEDLINE | ID: mdl-37973971

RESUMO

The structural diversity of chemical libraries, which are systematic collections of compounds that have potential to bind to biomolecules, can be represented by chemical latent space. A chemical latent space is a projection of a compound structure into a mathematical space based on several molecular features, and it can express structural diversity within a compound library in order to explore a broader chemical space and generate novel compound structures for drug candidates. In this study, we developed a deep-learning method, called NP-VAE (Natural Product-oriented Variational Autoencoder), based on variational autoencoder for managing hard-to-analyze datasets from DrugBank and large molecular structures such as natural compounds with chirality, an essential factor in the 3D complexity of compounds. NP-VAE was successful in constructing the chemical latent space from large-sized compounds that were unable to be handled in existing methods, achieving higher reconstruction accuracy, and demonstrating stable performance as a generative model across various indices. Furthermore, by exploring the acquired latent space, we succeeded in comprehensively analyzing a compound library containing natural compounds and generating novel compound structures with optimized functions.

3.
Healthcare (Basel) ; 11(4)2023 Feb 07.
Artigo em Inglês | MEDLINE | ID: mdl-36833018

RESUMO

Ultrasonography is widely used for diagnosis of diseases in internal organs because it is nonradioactive, noninvasive, real-time, and inexpensive. In ultrasonography, a set of measurement markers is placed at two points to measure organs and tumors, then the position and size of the target finding are measured on this basis. Among the measurement targets of abdominal ultrasonography, renal cysts occur in 20-50% of the population regardless of age. Therefore, the frequency of measurement of renal cysts in ultrasound images is high, and the effect of automating measurement would be high as well. The aim of this study was to develop a deep learning model that can automatically detect renal cysts in ultrasound images and predict the appropriate position of a pair of salient anatomical landmarks to measure their size. The deep learning model adopted fine-tuned YOLOv5 for detection of renal cysts and fine-tuned UNet++ for prediction of saliency maps, representing the position of salient landmarks. Ultrasound images were input to YOLOv5, and images cropped inside the bounding box and detected from the input image by YOLOv5 were input to UNet++. For comparison with human performance, three sonographers manually placed salient landmarks on 100 unseen items of the test data. These salient landmark positions annotated by a board-certified radiologist were used as the ground truth. We then evaluated and compared the accuracy of the sonographers and the deep learning model. Their performances were evaluated using precision-recall metrics and the measurement error. The evaluation results show that the precision and recall of our deep learning model for detection of renal cysts are comparable to standard radiologists; the positions of the salient landmarks were predicted with an accuracy close to that of the radiologists, and in a shorter time.

4.
Genes (Basel) ; 13(11)2022 11 18.
Artigo em Inglês | MEDLINE | ID: mdl-36421829

RESUMO

Existing approaches to predicting RNA secondary structures depend on how the secondary structure is decomposed into substructures, that is, the architecture, to define their parameter space. However, architecture dependency has not been sufficiently investigated, especially for pseudoknotted secondary structures. In this study, we propose a novel algorithm for directly inferring base-pairing probabilities with neural networks that do not depend on the architecture of RNA secondary structures, and then implement this approach using two maximum expected accuracy (MEA)-based decoding algorithms: Nussinov-style decoding for pseudoknot-free structures and IPknot-style decoding for pseudoknotted structures. To train the neural networks connected to each base pair, we adopt a max-margin framework, called structured support vector machines (SSVM), as the output layer. Our benchmarks for predicting RNA secondary structures with and without pseudoknots show that our algorithm outperforms existing methods in prediction accuracy.


Assuntos
RNA , Software , Pareamento de Bases , RNA/genética , RNA/química , Conformação de Ácido Nucleico , Análise de Sequência de RNA/métodos , Sequência de Bases , Redes Neurais de Computação , Probabilidade
5.
mSystems ; 7(5): e0052022, 2022 10 26.
Artigo em Inglês | MEDLINE | ID: mdl-36005400

RESUMO

The intestinal microbiome is closely related to host health, and metatranscriptomic analysis can be used to assess the functional activity of microbiomes by quantifying microbial gene expression levels, helping elucidate the interactions between the microbiome and the environment. However, the functional changes in the microbiome along the host intestinal tract remain unknown, and previous analytical methods have limitations, such as potentially overlooking unknown genes due to dependence on existing databases. The objective of this study is to develop a computational pipeline combined with next-generation sequencing for spatial covariation analysis of the functional activity of microbiomes at multiple intestinal sites (biogeographic locations) within the same individual. This method reconstructs a reference metagenomic sequence across multiple intestinal sites and integrates the metagenome and metatranscriptome, allowing the gene expression levels of the microbiome, including unknown bacterial genes, to be compared among multiple sites. When this method was applied to metatranscriptomic analysis in the intestinal tract of common marmosets, a New World monkey, the reconstructed metagenome covered most of the expressed genes and revealed that the differences in microbial gene expression among the cecum, transverse colon, and feces were more dynamic and sensitive to environmental shifts than the abundances of the genes. In addition, metatranscriptomic profiling at three intestinal sites of the same individual enabled covariation analysis incorporating spatial relevance, accurately predicting the function of a total of 10,856 unknown genes. Our findings demonstrate that our proposed analytical method captures functional changes in microbiomes at the gene resolution level. IMPORTANCE We developed an analysis method that integrates metagenomes and metatranscriptomes from multiple intestinal sites to elucidate how microbial function varies along the intestinal tract. This method enables spatial covariation analysis of the functional activity of microbiomes and accurate identification of gene expression changes among intestinal sites, including changes in the expression of unknown bacterial genes. Moreover, we applied this method to the investigation of the common marmoset intestine, which is anatomically and pharmacologically similar to that of humans. Our findings indicate the expression pattern of the microbiome varies in response to changes in the internal environment along the intestinal tract, and this microbial change may affect the intestinal environment.


Assuntos
Microbioma Gastrointestinal , Microbiota , Animais , Humanos , Callithrix/genética , Microbiota/genética , Metagenoma , Intestinos , Microbioma Gastrointestinal/genética
6.
Genes (Basel) ; 13(8)2022 07 26.
Artigo em Inglês | MEDLINE | ID: mdl-35893066

RESUMO

Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) is a newly emerging virus well known as the major cause of the worldwide pandemic due to Coronavirus Disease 2019 (COVID-19). Major breakthroughs in the Next Generation Sequencing (NGS) field were elucidated following the first release of a full-length SARS-CoV-2 genome on the 10 January 2020, with the hope of turning the table against the worsening pandemic situation. Previous studies in respiratory virus characterization require mapping of raw sequences to the human genome in the downstream bioinformatics pipeline as part of metagenomic principles. Illumina, as the major player in the NGS arena, took action by releasing guidelines for improved enrichment kits called the Respiratory Virus Oligo Panel (RVOP) based on a hybridization capture method capable of capturing targeted respiratory viruses, including SARS-CoV-2; therefore, allowing a direct map of raw sequences data to SARS-CoV-2 genome in downstream bioinformatics pipeline. Consequently, two bioinformatics pipelines emerged with no previous studies benchmarking the pipelines. This study focuses on gaining insight and understanding of target enrichment workflow by Illumina through the utilization of different bioinformatics pipelines named as 'Fast Pipeline' and 'Normal Pipeline' to SARS-CoV-2 strains isolated from Yogyakarta and Central Java, Indonesia. Overall, both pipelines work well in the characterization of SARS-CoV-2 samples, including in the identification of major studied nucleotide substitutions and amino acid mutations. A higher number of reads mapped to the SARS-CoV-2 genome in Fast Pipeline and merely were discovered as a contributing factor in a higher number of coverage depth and identified variations (SNPs, insertion, and deletion). Fast Pipeline ultimately works well in a situation where time is a critical factor. On the other hand, Normal Pipeline would require a longer time as it mapped reads to the human genome. Certain limitations were identified in terms of pipeline algorithm, whereas it is highly recommended in future studies to design a pipeline in an integrated framework, for instance, by using NextFlow, a workflow framework to combine all scripts into one fully integrated pipeline.


Assuntos
COVID-19 , SARS-CoV-2 , COVID-19/virologia , Biologia Computacional/métodos , Genoma Viral , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , SARS-CoV-2/genética
7.
Commun Med (Lond) ; 2: 9, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35603277

RESUMO

Background: Approximately 2.4 million patients in Japan would benefit from treatment for thyroid disease, including Graves' disease and Hashimoto's disease. However, only 450,000 of them are receiving treatment, and many patients with thyroid dysfunction remain largely overlooked. In this retrospective study, we aimed to develop and conduct preliminary testing on a machine learning method for screening patients with hyperthyroidism and hypothyroidism who would benefit from prompt medical treatment. Methods: We collected electronic medical records and medical checkup data from four hospitals in Japan. We applied four machine learning algorithms to construct classification models to distinguish patients with hyperthyroidism and hypothyroidism from control subjects using routine laboratory tests. Performance evaluation metrics such as sensitivity, specificity, and the area under receiver operating characteristic (AUROC) were obtained. Techniques such as feature importance were further applied to understand the contribution of each feature to the machine learning output. Results: The results of cross-validation and external evaluation indicated that we achieved high classification accuracies (AUROC = 93.8% for hyperthyroidism model and AUROC = 90.9% for hypothyroidism model). Serum creatinine (S-Cr), mean corpuscular volume (MCV), and total cholesterol were the three features that were most strongly correlated with the hyperthyroidism model, and S-Cr, lactic acid dehydrogenase (LDH), and total cholesterol were correlated with the hypothyroidism model. Conclusions: We demonstrated the potential of machine learning approaches for diagnosing the presence of thyroid dysfunction from routine laboratory tests. Further validation, including prospective clinical studies, is necessary prior to application of our method in the clinic.

8.
Chem Commun (Camb) ; 58(47): 6693-6696, 2022 Jun 09.
Artigo em Inglês | MEDLINE | ID: mdl-35608215

RESUMO

The protein kinase C (PKC) family consists of ten isozymes and is a potential target for treating cancer, Alzheimer's disease, and HIV infection. Since known natural PKC agonists have little selectivity among the PKC isozymes, a new scaffold is needed to develop PKC ligands with remarkable isozyme selectivity. Taking advantage of machine-learning and computational chemistry approaches, we screened the PubChem database to select sesterterpenoids alotaketals as potential PKC ligands, then designed and synthesized alotaketal analogues with a different ring system and stereochemistry from the natural products. The analogue exhibited a one-order higher affinity for PKCα-C1A than for the PKCδ-C1B domain. Thus, this compound is expected to serve as the basis for developing PKC ligands with isozyme selectivity.


Assuntos
Infecções por HIV , Isoenzimas , Inteligência Artificial , Química Computacional , Humanos , Isoenzimas/metabolismo , Ligantes , Proteína Quinase C/metabolismo
9.
NAR Genom Bioinform ; 4(1): lqac012, 2022 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-35211670

RESUMO

Effective embedding is actively conducted by applying deep learning to biomolecular information. Obtaining better embeddings enhances the quality of downstream analyses, such as DNA sequence motif detection and protein function prediction. In this study, we adopt a pre-training algorithm for the effective embedding of RNA bases to acquire semantically rich representations and apply this algorithm to two fundamental RNA sequence problems: structural alignment and clustering. By using the pre-training algorithm to embed the four bases of RNA in a position-dependent manner using a large number of RNA sequences from various RNA families, a context-sensitive embedding representation is obtained. As a result, not only base information but also secondary structure and context information of RNA sequences are embedded for each base. We call this 'informative base embedding' and use it to achieve accuracies superior to those of existing state-of-the-art methods on RNA structural alignment and RNA family clustering tasks. Furthermore, upon performing RNA sequence alignment by combining this informative base embedding with a simple Needleman-Wunsch alignment algorithm, we succeed in calculating structural alignments with a time complexity of O(n 2) instead of the O(n 6) time complexity of the naive implementation of Sankoff-style algorithm for input RNA sequence of length n.

10.
Life (Basel) ; 11(11)2021 Oct 25.
Artigo em Inglês | MEDLINE | ID: mdl-34833011

RESUMO

Protein-RNA interactions (PRIs) are essential for many biological processes, so understanding aspects of the sequences and structures involved in PRIs is important for unraveling such processes. Because of the expensive and time-consuming techniques required for experimental determination of complex protein-RNA structures, various computational methods have been developed to predict PRIs. However, most of these methods focus on predicting only RNA-binding regions in proteins or only protein-binding motifs in RNA. Methods for predicting entire residue-base contacts in PRIs have not yet achieved sufficient accuracy. Furthermore, some of these methods require the identification of 3D structures or homologous sequences, which are not available for all protein and RNA sequences. Here, we propose a prediction method for predicting residue-base contacts between proteins and RNAs using only sequence information and structural information predicted from sequences. The method can be applied to any protein-RNA pair, even when rich information such as its 3D structure, is not available. In this method, residue-base contact prediction is formalized as an integer programming problem. We predict a residue-base contact map that maximizes a scoring function based on sequence-based features such as k-mers of sequences and the predicted secondary structure. The scoring function is trained using a max-margin framework from known PRIs with 3D structures. To verify our method, we conducted several computational experiments. The results suggest that our method, which is based on only sequence information, is comparable with RNA-binding residue prediction methods based on known binding data.

11.
BMC Bioinformatics ; 22(Suppl 6): 427, 2021 Jun 02.
Artigo em Inglês | MEDLINE | ID: mdl-34078257

RESUMO

BACKGROUND: The increasing use of whole metagenome sequencing has spurred the need to improve de novo assemblers to facilitate the discovery of unknown species and the analysis of their genomic functions. MetaVelvet-SL is a short-read de novo metagenome assembler that partitions a multi-species de Bruijn graph into single-species sub-graphs. This study aimed to improve the performance of MetaVelvet-SL by using a deep learning-based model to predict the partition nodes in a multi-species de Bruijn graph. RESULTS: This study showed that the recent advances in deep learning offer the opportunity to better exploit sequence information and differentiate genomes of different species in a metagenomic sample. We developed an extension to MetaVelvet-SL, which we named MetaVelvet-DL, that builds an end-to-end architecture using Convolutional Neural Network and Long Short-Term Memory units. The deep learning model in MetaVelvet-DL can more accurately predict how to partition a de Bruijn graph than the Support Vector Machine-based model in MetaVelvet-SL can. Assembly of the Critical Assessment of Metagenome Interpretation (CAMI) dataset showed that after removing chimeric assemblies, MetaVelvet-DL produced longer single-species contigs, with less misassembled contigs than MetaVelvet-SL did. CONCLUSIONS: MetaVelvet-DL provides more accurate de novo assemblies of whole metagenome data. The authors believe that this improvement can help in furthering the understanding of microbiomes by providing a more accurate description of the metagenomic samples under analysis.


Assuntos
Aprendizado Profundo , Metagenoma , Algoritmos , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Metagenômica , Análise de Sequência de DNA , Software
12.
Sci Data ; 8(1): 159, 2021 06 28.
Artigo em Inglês | MEDLINE | ID: mdl-34183680

RESUMO

Cynomolgus macaque (Macaca fascicularis) and common marmoset (Callithrix jacchus) have been widely used in human biomedical research. Long-standing primate genome assemblies used the human genome as a reference for ordering and orienting the assembled fragments into chromosomes. Here we performed de novo genome assembly of these two species without any human genome-based bias observed in the genome assemblies released earlier. We assembled PacBio long reads, and the resultant contigs were scaffolded with Hi-C data, which were further refined based on Hi-C contact maps and alternate de novo assemblies. The assemblies achieved scaffold N50 lengths of 149 Mb and 137 Mb for cynomolgus macaque and common marmoset, respectively. The high fidelity of our assembly is also ascertained by BAC-end concordance in common marmoset. Our assembly of cynomolgus macaque outperformed all the available assemblies of this species in terms of contiguity. The chromosome-scale genome assemblies produced in this study are valuable resources for non-human primate models and provide an important baseline in human biomedical research.


Assuntos
Callithrix/genética , Mapeamento de Sequências Contíguas , Macaca fascicularis/genética , Animais , Cromossomos , Ordem dos Genes
13.
J Cheminform ; 13(1): 36, 2021 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-33933121

RESUMO

MOTIVATION: Virtual screening, which can computationally predict the presence or absence of protein-compound interactions, has attracted attention as a large-scale, low-cost, and short-term search method for seed compounds. Existing machine learning methods for predicting protein-compound interactions are largely divided into those based on molecular structure data and those based on network data. The former utilize information on proteins and compounds, such as amino acid sequences and chemical structures; the latter rely on interaction network data, such as protein-protein interactions and compound-compound interactions. However, there have been few attempts to combine both types of data in molecular information and interaction networks. RESULTS: We developed a deep learning-based method that integrates protein features, compound features, and multiple types of interactome data to predict protein-compound interactions. We designed three benchmark datasets with different difficulties and applied them to evaluate the prediction method. The performance evaluations show that our deep learning framework for integrating molecular structure data and interactome data outperforms state-of-the-art machine learning methods for protein-compound interaction prediction tasks. The performance improvement is statistically significant according to the Wilcoxon signed-rank test. This finding reveals that the multi-interactome data captures perspectives other than amino acid sequence homology and chemical structure similarity and that both types of data synergistically improve the prediction accuracy. Furthermore, experiments on the three benchmark datasets show that our method is more robust than existing methods in accurately predicting interactions between proteins and compounds that are unseen in training samples.

14.
Esophagus ; 18(3): 612-620, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-33635412

RESUMO

BACKGROUND: Because cancers of hollow organs such as the esophagus are hard to detect even by the expert physician, it is important to establish diagnostic systems to support physicians and increase the accuracy of diagnosis. In recent years, deep learning-based artificial intelligence (AI) technology has been employed for medical image recognition. However, no optimal CT diagnostic system employing deep learning technology has been attempted and established for esophageal cancer so far. PURPOSE: To establish an AI-based diagnostic system for esophageal cancer from CT images. MATERIALS AND METHODS: In this single-center, retrospective cohort study, 457 patients with primary esophageal cancer referred to our division between 2005 and 2018 were enrolled. We fine-tuned VGG16, an image recognition model of deep learning convolutional neural network (CNN), for the detection of esophageal cancer. We evaluated the diagnostic accuracy of the CNN using a test data set including 46 cancerous CT images and 100 non-cancerous images and compared it to that of two radiologists. RESULTS: Pre-treatment esophageal cancer stages of the patients included in the test data set were clinical T1 (12 patients), clinical T2 (9 patients), clinical T3 (20 patients), and clinical T4 (5 patients). The CNN-based system showed a diagnostic accuracy of 84.2%, F value of 0.742, sensitivity of 71.7%, and specificity of 90.0%. CONCLUSIONS: Our AI-based diagnostic system succeeded in detecting esophageal cancer with high accuracy. More training with vast datasets collected from multiples centers would lead to even higher diagnostic accuracy and aid better decision making.


Assuntos
Aprendizado Profundo , Neoplasias Esofágicas , Inteligência Artificial , Neoplasias Esofágicas/diagnóstico por imagem , Humanos , Estudos Retrospectivos , Tomografia Computadorizada por Raios X/métodos
15.
Nucleic Acids Res ; 49(5): 2700-2720, 2021 03 18.
Artigo em Inglês | MEDLINE | ID: mdl-33590099

RESUMO

In animal gonads, transposable elements are actively repressed to preserve genome integrity through the PIWI-interacting RNA (piRNA) pathway. In mice, piRNAs are abundantly expressed in male germ cells, and form effector complexes with three distinct PIWIs. The depletion of individual Piwi genes causes male-specific sterility with no discernible phenotype in female mice. Unlike mice, most other mammals have four PIWI genes, some of which are expressed in the ovary. Here, purification of PIWI complexes from oocytes of the golden hamster revealed that the size of the PIWIL1-associated piRNAs changed during oocyte maturation. In contrast, PIWIL3, an ovary-specific PIWI in most mammals, associates with short piRNAs only in metaphase II oocytes, which coincides with intense phosphorylation of the protein. An improved high-quality genome assembly and annotation revealed that PIWIL1- and PIWIL3-associated piRNAs appear to share the 5'-ends of common piRNA precursors and are mostly derived from unannotated sequences with a diminished contribution from TE-derived sequences, most of which correspond to endogenous retroviruses. Our findings show the complex and dynamic nature of biogenesis of piRNAs in hamster oocytes, and together with the new genome sequence generated, serve as the foundation for developing useful models to study the piRNA pathway in mammalian oocytes.


Assuntos
Proteínas Argonautas/metabolismo , Oócitos/crescimento & desenvolvimento , Oócitos/metabolismo , RNA Interferente Pequeno/metabolismo , Animais , Proteínas Argonautas/genética , Feminino , Genômica , Masculino , Mesocricetus , Metáfase , Fosforilação , RNA Interferente Pequeno/genética , Testículo/metabolismo
16.
Nat Commun ; 12(1): 941, 2021 02 11.
Artigo em Inglês | MEDLINE | ID: mdl-33574226

RESUMO

Accurate predictions of RNA secondary structures can help uncover the roles of functional non-coding RNAs. Although machine learning-based models have achieved high performance in terms of prediction accuracy, overfitting is a common risk for such highly parameterized models. Here we show that overfitting can be minimized when RNA folding scores learnt using a deep neural network are integrated together with Turner's nearest-neighbor free energy parameters. Training the model with thermodynamic regularization ensures that folding scores and the calculated free energy are as close as possible. In computational experiments designed for newly discovered non-coding RNAs, our algorithm (MXfold2) achieves the most robust and accurate predictions of RNA secondary structures without sacrificing computational efficiency compared to several other algorithms. The results suggest that integrating thermodynamic information could help improve the robustness of deep learning-based predictions of RNA secondary structure.


Assuntos
Aprendizado Profundo , Dobramento de RNA , RNA/química , Algoritmos , Biologia Computacional/métodos , Humanos , Aprendizado de Máquina , Redes Neurais de Computação , Termodinâmica
17.
Bioinform Adv ; 1(1): vbab039, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-36700086

RESUMO

Motivation: Biological sequence classification is the most fundamental task in bioinformatics analysis. For example, in metagenome analysis, binning is a typical type of DNA sequence classification. In order to classify sequences, it is necessary to define sequence features. The k-mer frequency, base composition and alignment-based metrics are commonly used. On the other hand, in the field of image recognition using machine learning, image classification is broadly divided into those based on shape and those based on style. A style matrix was introduced as a method of expressing the style of an image (e.g. color usage and texture). Results: We propose a novel sequence feature, called genomic style, inspired by image classification approaches, for classifying and clustering DNA sequences. As with the style of images, the DNA sequence is considered to have a genomic style unique to the bacterial species, and the style matrix concept is applied to the DNA sequence. Our main aim is to introduce the genomics style as yet another basic sequence feature for metagenome binning problem in replace of the most commonly used sequence feature k-mer frequency. Performance evaluations showed that our method using a style matrix has the potential for accurate binning when compared with state-of-the-art binning tools based on k-mer frequency. Availability and implementation: The source code for the implementation of this genomic style method, along with the dataset for the performance evaluation, is available from https://github.com/friendflower94/binning-style. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

18.
Contemp Clin Trials Commun ; 19: 100649, 2020 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-32913919

RESUMO

INTRODUCTION: Depressive and neurocognitive disorders are debilitating conditions that account for the leading causes of years lived with disability worldwide. However, there are no biomarkers that are objective or easy-to-obtain in daily clinical practice, which leads to difficulties in assessing treatment response and developing new drugs. New technology allows quantification of features that clinicians perceive as reflective of disorder severity, such as facial expressions, phonic/speech information, body motion, daily activity, and sleep. METHODS: Major depressive disorder, bipolar disorder, and major and minor neurocognitive disorders as well as healthy controls are recruited for the study. A psychiatrist/psychologist conducts conversational 10-min interviews with participants ≤10 times within up to five years of follow-up. Interviews are recorded using RGB and infrared cameras, and an array microphone. As an option, participants are asked to wear wrist-band type devices during the observational period. Various software is used to process the raw video, voice, infrared, and wearable device data. A machine learning approach is used to predict the presence of symptoms, severity, and the improvement/deterioration of symptoms. DISCUSSION: The overall goal of this proposed study, the Project for Objective Measures Using Computational Psychiatry Technology (PROMPT), is to develop objective, noninvasive, and easy-to-use biomarkers for assessing the severity of depressive and neurocognitive disorders in the hopes of guiding decision-making in clinical settings as well as reducing the risk of clinical trial failure. Challenges may include the large variability of samples, which makes it difficult to extract the features that commonly reflect disorder severity. TRIAL REGISTRATION: UMIN000021396, University Hospital Medical Information Network (UMIN).

19.
BMC Genomics ; 21(Suppl 3): 243, 2020 Apr 02.
Artigo em Inglês | MEDLINE | ID: mdl-32241258

RESUMO

BACKGROUND: The common marmoset (Callithrix jacchus) is one of the most studied primate model organisms. However, the marmoset genomes available in the public databases are highly fragmented and filled with sequence gaps, hindering research advances related to marmoset genomics and transcriptomics. RESULTS: Here we utilize single-molecule, long-read sequence data to improve and update the existing genome assembly and report a near-complete genome of the common marmoset. The assembly is of 2.79 Gb size, with a contig N50 length of 6.37 Mb and a chromosomal scaffold N50 length of 143.91 Mb, representing the most contiguous and high-quality marmoset genome up to date. Approximately 90% of the assembled genome was represented in contigs longer than 1 Mb, with approximately 104-fold improvement in contiguity over the previously published marmoset genome. More than 98% of the gaps from the previously published genomes were filled successfully, which improved the mapping rates of genomic and transcriptomic data on to the assembled genome. CONCLUSIONS: Altogether the updated, high-quality common marmoset genome assembly provide improvements at various levels over the previous versions of the marmoset genome assemblies. This will allow researchers working on primate genomics to apply the genome more efficiently for their genomic and transcriptomic sequence data.


Assuntos
Callithrix/genética , Mapeamento Cromossômico/métodos , Genoma/genética , Animais , Biologia Computacional/métodos , Mapeamento de Sequências Contíguas/métodos , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Alinhamento de Sequência
20.
Sci Rep ; 9(1): 12719, 2019 09 03.
Artigo em Inglês | MEDLINE | ID: mdl-31481684

RESUMO

Genetically modified nonhuman primates (NHP) are useful models for biomedical research. Gene editing technologies have enabled production of target-gene knock-out (KO) NHP models. Target-gene-KO/knock-in (KI) efficiency of CRISPR/Cas9 has not been extensively investigated in marmosets. In this study, optimum conditions for target gene modification efficacies of CRISPR/mRNA and CRISPR/nuclease in marmoset embryos were examined. CRISPR/nuclease was more effective than CRISPR/mRNA in avoiding mosaic genetic alteration. Furthermore, optimal conditions to generate KI marmoset embryos were investigated using CRISPR/Cas9 and 2 different lengths (36 nt and 100 nt) each of a sense or anti-sense single-strand oligonucleotide (ssODN). KIs were observed when CRISPR/nuclease and 36 nt sense or anti-sense ssODNs were injected into embryos. All embryos exhibited mosaic mutations with KI and KO, or imprecise KI, of c-kit. Although further improvement of KI strategies is required, these results indicated that CRISPR/Cas9 may be utilized to produce KO/KI marmosets via gene editing.


Assuntos
Animais Geneticamente Modificados/genética , Sistemas CRISPR-Cas , Embrião de Mamíferos , Edição de Genes , Técnicas de Introdução de Genes , Técnicas de Inativação de Genes , Animais , Callithrix
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...