Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 49
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
BMC Med Inform Decis Mak ; 21(1): 163, 2021 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-34016115

RESUMO

BACKGROUND: Sepsis is a severe illness that affects millions of people worldwide, and its early detection is critical for effective treatment outcomes. In recent years, researchers have used models to classify positive patients or identify the probability for sepsis using vital signs and other time-series variables as input. METHODS: In our study, we analyzed patients' conditions by their kinematics position, velocity, and acceleration, in a six-dimensional space defined by six vital signs. The patient is affected by the disease after a period if the position gets "near" to a calculated sepsis position in space. We imputed these kinematics features as explanatory variables of long short-term memory (LSTM), convolutional neural network (CNN) and linear neural network (LNN) and compared the prediction accuracies with only the vital signs as input. The dataset used contained information of approximately 4800 patients, each with 48 hourly registers. RESULTS: We demonstrated that the kinematics features models had an improved performance compared with vital signs models. The kinematics features model of LSTM achieved the best accuracy, 0.803, which was nine points higher than the vital signs model. Although with lesser accuracies, the kinematics features models of the CNN and LNN showed better performances than vital signs models. CONCLUSION: Applying our novel approach for early detection of sepsis using neural networks will prove to be an invaluable, more accurate method than considering only simple vital signs as input variables. We expect that other researchers with similar objectives can use the model presented in this innovative approach to improve their results.


Assuntos
Redes Neurais de Computação , Sepse , Fenômenos Biomecânicos , Diagnóstico Precoce , Humanos , Sepse/diagnóstico , Sinais Vitais
2.
BMC Genomics ; 21(1): 679, 2020 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-32998685

RESUMO

BACKGROUND: Species of the genus Monascus are considered to be economically important and have been widely used in the production of yellow and red food colorants. In particular, three Monascus species, namely, M. pilosus, M. purpureus, and M. ruber, are used for food fermentation in the cuisine of East Asian countries such as China, Japan, and Korea. These species have also been utilized in the production of various kinds of natural pigments. However, there is a paucity of information on the genomes and secondary metabolites of these strains. Here, we report the genomic analysis and secondary metabolites produced by M. pilosus NBRC4520, M. purpureus NBRC4478 and M. ruber NBRC4483, which are NBRC standard strains. We believe that this report will lead to a better understanding of red yeast rice food. RESULTS: We examined the diversity of secondary metabolite production in three Monascus species (M. pilosus, M. purpureus, and M. ruber) at both the metabolome level by LCMS analysis and at the genome level. Specifically, M. pilosus NBRC4520, M. purpureus NBRC4478 and M. ruber NBRC4483 strains were used in this study. Illumina MiSeq 300 bp paired-end sequencing generated 17 million high-quality short reads in each species, corresponding to 200 times the genome size. We measured the pigments and their related metabolites using LCMS analysis. The colors in the liquid media corresponding to the pigments and their related metabolites produced by the three species were very different from each other. The gene clusters for secondary metabolite biosynthesis of the three Monascus species also diverged, confirming that M. pilosus and M. purpureus are chemotaxonomically different. M. ruber has similar biosynthetic and secondary metabolite gene clusters to M. pilosus. The comparison of secondary metabolites produced also revealed divergence in the three species. CONCLUSIONS: Our findings are important for improving the utilization of Monascus species in the food industry and industrial field. However, in view of food safety, we need to determine if the toxins produced by some Monascus strains exist in the genome or in the metabolome.


Assuntos
Genes de Plantas , Especiação Genética , Monascus/genética , Pigmentos Biológicos/genética , Metabolismo Secundário , Monascus/classificação , Monascus/metabolismo , Família Multigênica , Filogenia , Pigmentos Biológicos/biossíntese
3.
BMC Bioinformatics ; 20(1): 380, 2019 Jul 09.
Artigo em Inglês | MEDLINE | ID: mdl-31288752

RESUMO

BACKGROUND: Alkaloids, a class of organic compounds that contain nitrogen bases, are mainly synthesized as secondary metabolites in plants and fungi, and they have a wide range of bioactivities. Although there are thousands of compounds in this class, few of their biosynthesis pathways are fully identified. In this study, we constructed a model to predict their precursors based on a novel kind of neural network called the molecular graph convolutional neural network. Molecular similarity is a crucial metric in the analysis of qualitative structure-activity relationships. However, it is sometimes difficult for current fingerprint representations to emphasize specific features for the target problems efficiently. It is advantageous to allow the model to select the appropriate features according to data-driven decisions for extracting more useful information, which influences a classification or regression problem substantially. RESULTS: In this study, we applied a neural network architecture for undirected graph representation of molecules. By encoding a molecule as an abstract graph and applying "convolution" on the graph and training the weight of the neural network framework, the neural network can optimize feature selection for the training problem. By incorporating the effects from adjacent atoms recursively, graph convolutional neural networks can extract the features of latent atoms that represent chemical features of a molecule efficiently. In order to investigate alkaloid biosynthesis, we trained the network to distinguish the precursors of 566 alkaloids, which are almost all of the alkaloids whose biosynthesis pathways are known, and showed that the model could predict starting substances with an averaged accuracy of 97.5%. CONCLUSION: We have showed that our model can predict more accurately compared to the random forest and general neural network when the variables and fingerprints are not selected, while the performance is comparable when we carefully select 507 variables from 18000 dimensions of descriptors. The prediction of pathways contributes to understanding of alkaloid synthesis mechanisms and the application of graph based neural network models to similar problems in bioinformatics would therefore be beneficial. We applied our model to evaluate the precursors of biosynthesis of 12000 alkaloids found in various organisms and found power-low-like distribution.


Assuntos
Alcaloides/classificação , Vias Biossintéticas , Redes Neurais de Computação , Algoritmos , Alcaloides/química , Metaboloma , Modelos Teóricos
4.
BMC Bioinformatics ; 19(1): 264, 2018 07 13.
Artigo em Inglês | MEDLINE | ID: mdl-30005591

RESUMO

BACKGROUND: There are different and complicated associations between genes and diseases. Finding the causal associations between genes and specific diseases is still challenging. In this work we present a method to predict novel associations of genes and pathways with inflammatory bowel disease (IBD) by integrating information of differential gene expression, protein-protein interaction and known disease genes related to IBD. RESULTS: We downloaded IBD gene expression data from NCBI's Gene Expression Omnibus, performed statistical analysis to determine differentially expressed genes, collected known IBD genes from DisGeNet database, which were used to construct a IBD related PPI network with HIPPIE database. We adapted our graph-based clustering algorithm DPClusO to cluster the disease PPI network. We evaluated the statistical significance of the identified clusters in the context of determining the richness of IBD genes using Fisher's exact test and predicted novel genes related to IBD. We showed 93.8% of our predictions are correct in the context of other databases and published literatures related to IBD. CONCLUSIONS: Finding disease-causing genes is necessary for developing drugs with synergistic effect targeting many genes simultaneously. Here we present an approach to identify novel disease genes and pathways and discuss our approach in the context of IBD. The approach can be generalized to find disease-associated genes for other diseases.


Assuntos
Redes Reguladoras de Genes , Doenças Inflamatórias Intestinais/genética , Algoritmos , Área Sob a Curva , Bases de Dados Genéticas , Ontologia Genética , Humanos , Mapas de Interação de Proteínas/genética , Curva ROC , Reprodutibilidade dos Testes
5.
BMC Bioinformatics ; 17(1): 520, 2016 Dec 07.
Artigo em Inglês | MEDLINE | ID: mdl-27927171

RESUMO

BACKGROUND: The binary similarity and dissimilarity measures have critical roles in the processing of data consisting of binary vectors in various fields including bioinformatics and chemometrics. These metrics express the similarity and dissimilarity values between two binary vectors in terms of the positive matches, absence mismatches or negative matches. To our knowledge, there is no published work presenting a systematic way of finding an appropriate equation to measure binary similarity that performs well for certain data type or application. A proper method to select a suitable binary similarity or dissimilarity measure is needed to obtain better classification results. RESULTS: In this study, we proposed a novel approach to select binary similarity and dissimilarity measures. We collected 79 binary similarity and dissimilarity equations by extensive literature search and implemented those equations as an R package called bmeasures. We applied these metrics to quantify the similarity and dissimilarity between herbal medicine formulas belonging to the Indonesian Jamu and Japanese Kampo separately. We assessed the capability of binary equations to classify herbal medicine pairs into match and mismatch efficacies based on their similarity or dissimilarity coefficients using the Receiver Operating Characteristic (ROC) curve analysis. According to the area under the ROC curve results, we found Indonesian Jamu and Japanese Kampo datasets obtained different ranking of binary similarity and dissimilarity measures. Out of all the equations, the Forbes-2 similarity and the Variant of Correlation similarity measures are recommended for studying the relationship between Jamu formulas and Kampo formulas, respectively. CONCLUSIONS: The selection of binary similarity and dissimilarity measures for multivariate analysis is data dependent. The proposed method can be used to find the most suitable binary similarity and dissimilarity equation wisely for a particular data. Our finding suggests that all four types of matching quantities in the Operational Taxonomic Unit (OTU) table are important to calculate the similarity and dissimilarity coefficients between herbal medicine formulas. Also, the binary similarity and dissimilarity measures that include the negative match quantity d achieve better capability to separate herbal medicine pairs compared to equations that exclude d.


Assuntos
Plantas Medicinais/classificação , Análise por Conglomerados , Medicina Herbária/métodos , Indonésia , Japão , Curva ROC
6.
J Biomed Inform ; 61: 194-202, 2016 06.
Artigo em Inglês | MEDLINE | ID: mdl-27064123

RESUMO

Conventionally, workflows examining transcription regulation networks from gene expression data involve distinct analytical steps. There is a need for pipelines that unify data mining and inference deduction into a singular framework to enhance interpretation and hypotheses generation. We propose a workflow that merges network construction with gene expression data mining focusing on regulation processes in the context of transcription factor driven gene regulation. The pipeline implements pathway-based modularization of expression profiles into functional units to improve biological interpretation. The integrated workflow was implemented as a web application software (TransReguloNet) with functions that enable pathway visualization and comparison of transcription factor activity between sample conditions defined in the experimental design. The pipeline merges differential expression, network construction, pathway-based abstraction, clustering and visualization. The framework was applied in analysis of actual expression datasets related to lung, breast and prostrate cancer.


Assuntos
Mineração de Dados , Regulação da Expressão Gênica , Software , Transcriptoma , Análise por Conglomerados , Apresentação de Dados , Humanos
7.
Plant Cell Physiol ; 56(5): 843-51, 2015 May.
Artigo em Inglês | MEDLINE | ID: mdl-25637373

RESUMO

Curcuminoids, namely curcumin and its analogs, are secondary metabolites that act as the primary active constituents of turmeric (Curcuma longa). The contents of these curcuminoids vary among species in the genus Curcuma. For this reason, we compared two wild strains and two cultivars to understand the differences in the synthesis of curcuminoids. Because the fluxes of metabolic reactions depend on the amounts of their substrate and the activity of the catalysts, we analyzed the metabolite concentrations and gene expression of related enzymes. We developed a method based on RNA sequencing (RNA-Seq) analysis that focuses on a specific set of genes to detect expression differences between species in detail. We developed a 'selection-first' method for RNA-Seq analysis in which short reads are mapped to selected enzymes in the target biosynthetic pathways in order to reduce the effect of mapping errors. Using this method, we found that the difference in the contents of curcuminoids among the species, as measured by gas chromatography-mass spectrometry, could be explained by the changes in the expression of genes encoding diketide-CoA synthase, and curcumin synthase at the branching point of the curcuminoid biosynthesis pathway.


Assuntos
Vias Biossintéticas/genética , Curcuma/genética , Curcuma/metabolismo , Curcumina/metabolismo , Metabolômica/métodos , Análise de Sequência de RNA/métodos , Análise por Conglomerados , Perfilação da Expressão Gênica , Regulação da Expressão Gênica de Plantas , Redes e Vias Metabólicas/genética , Especificidade da Espécie , Transcriptoma/genética
8.
Plant Cell Physiol ; 55(1): e7, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24285751

RESUMO

Databases (DBs) are required by various omics fields because the volume of molecular biology data is increasing rapidly. In this study, we provide instructions for users and describe the current status of our metabolite activity DB. To facilitate a comprehensive understanding of the interactions between the metabolites of organisms and the chemical-level contribution of metabolites to human health, we constructed a metabolite activity DB known as the KNApSAcK Metabolite Activity DB. It comprises 9,584 triplet relationships (metabolite-biological activity-target species), including 2,356 metabolites, 140 activity categories, 2,963 specific descriptions of biological activities and 778 target species. Approximately 46% of the activities described in the DB are related to chemical ecology, most of which are attributed to antimicrobial agents and plant growth regulators. The majority of the metabolites with antimicrobial activities are flavonoids and phenylpropanoids. The metabolites with plant growth regulatory effects include plant hormones. Over half of the DB contents are related to human health care and medicine. The five largest groups are toxins, anticancer agents, nervous system agents, cardiovascular agents and non-therapeutic agents, such as flavors and fragrances. The KNApSAcK Metabolite Activity DB is integrated within the KNApSAcK Family DBs to facilitate further systematized research in various omics fields, especially metabolomics, nutrigenomics and foodomics. The KNApSAcK Metabolite Activity DB could also be utilized for developing novel drugs and materials, as well as for identifying viable drug resources and other useful compounds.


Assuntos
Fenômenos Biológicos , Bases de Dados como Assunto , Metaboloma , Análise por Conglomerados , Humanos , Estatística como Assunto
9.
Plant Cell Physiol ; 54(5): 711-27, 2013 May.
Artigo em Inglês | MEDLINE | ID: mdl-23509110

RESUMO

Biology is increasingly becoming a data-intensive science with the recent progress of the omics fields, e.g. genomics, transcriptomics, proteomics and metabolomics. The species-metabolite relationship database, KNApSAcK Core, has been widely utilized and cited in metabolomics research, and chronological analysis of that research work has helped to reveal recent trends in metabolomics research. To meet the needs of these trends, the KNApSAcK database has been extended by incorporating a secondary metabolic pathway database called Motorcycle DB. We examined the enzyme sequence diversity related to secondary metabolism by means of batch-learning self-organizing maps (BL-SOMs). Initially, we constructed a map by using a big data matrix consisting of the frequencies of all possible dipeptides in the protein sequence segments of plants and bacteria. The enzyme sequence diversity of the secondary metabolic pathways was examined by identifying clusters of segments associated with certain enzyme groups in the resulting map. The extent of diversity of 15 secondary metabolic enzyme groups is discussed. Data-intensive approaches such as BL-SOM applied to big data matrices are needed for systematizing protein sequences. Handling big data has become an inevitable part of biology.


Assuntos
Bases de Dados como Assunto , Proteínas de Plantas/química , Plantas/metabolismo , Metabolismo Secundário , Alcaloides/metabolismo , Alquil e Aril Transferases/metabolismo , Sequência de Aminoácidos , Sistema Enzimático do Citocromo P-450/metabolismo , Flavonoides/metabolismo , Metabolômica , Peptídeos/química , Plantas/enzimologia
10.
Plant Cell Physiol ; 54(5): 728-39, 2013 May.
Artigo em Inglês | MEDLINE | ID: mdl-23574698

RESUMO

Metabolomics analysis tools can provide quantitative information on the concentration of metabolites in an organism. In this paper, we propose the minimum pathway model generator tool for simulating the dynamics of metabolite concentrations (SS-mPMG) and a tool for parameter estimation by genetic algorithm (SS-GA). SS-mPMG can extract a subsystem of the metabolic network from the genome-scale pathway maps to reduce the complexity of the simulation model and automatically construct a dynamic simulator to evaluate the experimentally observed behavior of metabolites. Using this tool, we show that stochastic simulation can reproduce experimentally observed dynamics of amino acid biosynthesis in Arabidopsis thaliana. In this simulation, SS-mPMG extracts the metabolic network subsystem from published databases. The parameters needed for the simulation are determined using a genetic algorithm to fit the simulation results to the experimental data. We expect that SS-mPMG and SS-GA will help researchers to create relevant metabolic networks and carry out simulations of metabolic reactions derived from metabolomics data.


Assuntos
Algoritmos , Arabidopsis/metabolismo , Simulação por Computador , Redes e Vias Metabólicas , Metabolômica , Cinética , Modelos Biológicos , Análise de Componente Principal , Processos Estocásticos
11.
Nucleic Acids Res ; 39(13): e90, 2011 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-21576222

RESUMO

We identified the sequence-specific starting positions of consecutive miscalls in the mapping of reads obtained from the Illumina Genome Analyser (GA). Detailed analysis of the miscall pattern indicated that the underlying mechanism involves sequence-specific interference of the base elongation process during sequencing. The two major sequence patterns that trigger this sequence-specific error (SSE) are: (i) inverted repeats and (ii) GGC sequences. We speculate that these sequences favor dephasing by inhibiting single-base elongation, by: (i) folding single-stranded DNA and (ii) altering enzyme preference. This phenomenon is a major cause of sequence coverage variability and of the unfavorable bias observed for population-targeted methods such as RNA-seq and ChIP-seq. Moreover, SSE is a potential cause of false single-nucleotide polymorphism (SNP) calls and also significantly hinders de novo assembly. This article highlights the importance of recognizing SSE and its underlying mechanisms in the hope of enhancing the potential usefulness of the Illumina sequencers.


Assuntos
Análise de Sequência de DNA , Análise de Sequência de RNA , Bacillus subtilis/genética , Pareamento Incorreto de Bases , Mapeamento Cromossômico , Genoma Bacteriano , Sequências Repetidas Invertidas
12.
Comput Methods Programs Biomed ; 236: 107543, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37100024

RESUMO

BACKGROUND AND OBJECTIVE: Defining and separating cancer subtypes is essential for facilitating personalized therapy modality and prognosis of patients. The definition of subtypes has been constantly recalibrated as a result of our deepened understanding. During this recalibration, researchers often rely on clustering of cancer data to provide an intuitive visual reference that could reveal the intrinsic characteristics of subtypes. The data being clustered are often omics data such as transcriptomics that have strong correlations to the underlying biological mechanism. However, while existing studies have shown promising results, they suffer from issues associated with omics data: sample scarcity and high dimensionality while they impose unrealistic assumptions to extract useful features from the data while avoiding overfitting to spurious correlations. METHODS: This paper proposes to leverage a recent strong generative model, Vector-Quantized Variational AutoEncoder, to tackle the data issues and extract discrete representations that are crucial to the quality of subsequent clustering by retaining only information relevant to reconstructing the input. RESULTS: Extensive experiments and medical analysis on multiple datasets comprising 10 distinct cancers demonstrate the proposed clustering results can significantly and robustly improve prognosis over prevalent subtyping systems. CONCLUSION: Our proposal does not impose strict assumptions on data distribution; while, its latent features are better representations of the transcriptomic data in different cancer subtypes, capable of yielding superior clustering performance with any mainstream clustering method.


Assuntos
Neoplasias , Humanos , Perfilação da Expressão Gênica , Transcriptoma , Análise por Conglomerados
13.
Life (Basel) ; 13(2)2023 Feb 03.
Artigo em Inglês | MEDLINE | ID: mdl-36836796

RESUMO

The use of herbal medicines in recent decades has increased because their side effects are considered lower than conventional medicine. Unani herbal medicines are often used in Southern Asia. These herbal medicines are usually composed of several types of medicinal plants to treat various diseases. Research on herbal medicine usually focuses on insight into the composition of plants used as ingredients. However, in the present study, we extended to the level of metabolites that exist in the medicinal plants. This study aimed to develop a predictive model of the Unani therapeutic usage based on its constituent metabolites using deep learning and data-intensive science approaches. Furthermore, the best prediction model was then utilized to extract important metabolites for each therapeutic usage of Unani. In this study, it was observed that the deep neural network approach provided a much better prediction model than other algorithms including random forest and support vector machine. Moreover, according to the best prediction model using the deep neural network, we identified 118 important metabolites for nine therapeutic usages of Unani.

14.
Artigo em Inglês | MEDLINE | ID: mdl-37022825

RESUMO

Stage-based sleep screening is a widely-used tool in both healthcare and neuroscientific research, as it allows for the accurate assessment of sleep patterns and stages. In this paper, we propose a novel framework that is based on authoritative guidance in sleep medicine and is designed to automatically capture the time-frequency characteristics of sleep electroencephalogram (EEG) signals in order to make staging decisions. Our framework consists of two main phases: a feature extraction process that partitions the input EEG spectrograms into a sequence of time-frequency patches, and a staging phase that searches for correlations between the extracted features and the defining characteristics of sleep stages. To model the staging phase, we utilize a Transformer model with an attention-based module, which allows for the extraction of global contextual relevance among time-frequency patches and the use of this relevance for staging decisions. The proposed method is validated on the large-scale Sleep Heart Health Study dataset and achieves new state-of-the-art results for the wake, N2, and N3 stages, with respective F1 scores of 0.93, 0.88, and 0.87 using only EEG signals. Our method also demonstrates high inter-rater reliability, with a kappa score of 0.80. Moreover, we provide visualizations of the correspondence between sleep staging decisions and features extracted by our method, which enhances the interpretability of the proposal. Overall, our work represents a significant contribution to the field of automated sleep staging and has important implications for both healthcare and neuroscience research.


Assuntos
Fases do Sono , Sono , Humanos , Reprodutibilidade dos Testes , Polissonografia/métodos , Eletroencefalografia/métodos
15.
Plant Cell Physiol ; 53(2): e1, 2012 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-22123792

RESUMO

A database (DB) describing the relationships between species and their metabolites would be useful for metabolomics research, because it targets systematic analysis of enormous numbers of organic compounds with known or unknown structures in metabolomics. We constructed an extensive species-metabolite DB for plants, the KNApSAcK Core DB, which contains 101,500 species-metabolite relationships encompassing 20,741 species and 50,048 metabolites. We also developed a search engine within the KNApSAcK Core DB for use in metabolomics research, making it possible to search for metabolites based on an accurate mass, molecular formula, metabolite name or mass spectra in several ionization modes. We also have developed databases for retrieving metabolites related to plants used for a range of purposes. In our multifaceted plant usage DB, medicinal/edible plants are related to the geographic zones (GZs) where the plants are used, their biological activities, and formulae of Japanese and Indonesian traditional medicines (Kampo and Jamu, respectively). These data are connected to the species-metabolites relationship DB within the KNApSAcK Core DB, keyed via the species names. All databases can be accessed via the website http://kanaya.naist.jp/KNApSAcK_Family/. KNApSAcK WorldMap DB comprises 41,548 GZ-plant pair entries, including 222 GZs and 15,240 medicinal/edible plants. The KAMPO DB consists of 336 formulae encompassing 278 medicinal plants; the JAMU DB consists of 5,310 formulae encompassing 550 medicinal plants. The Biological Activity DB consists of 2,418 biological activities and 33,706 pairwise relationships between medicinal plants and their biological activities. Current statistics of the binary relationships between individual databases were characterized by the degree distribution analysis, leading to a prediction of at least 1,060,000 metabolites within all plants. In the future, the study of metabolomics will need to take this huge number of metabolites into consideration.


Assuntos
Biologia Computacional , Bases de Dados Factuais , Metabolômica/métodos , Plantas Medicinais/metabolismo , Geografia , Indonésia , Internet , Japão , Medicina Tradicional do Leste Asiático , Ferramenta de Busca
16.
Mol Inform ; 41(7): e2100247, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35014190

RESUMO

The plants produce numerous types of secondary metabolites which have pharmacological importance in drug development for different diseases. Computational methods widely use the fingerprints of the metabolites to understand different properties and similarities among metabolites and for the prediction of chemical reactions etc. In this work, we developed three different deep neural network models (DNN) to predict the antibacterial property of plant metabolites. We developed the first DNN model using the fingerprint set of metabolites as features. In the second DNN model, we searched the similarities among fingerprints using correlation and used one representative feature from each group of highly correlated fingerprints. In the third model, the fingerprints of metabolites were used to find structurally similar chemical compound clusters. Form each cluster a representative metabolite is selected and made part of the training dataset. The second model reduced the number of features where the third model achieved better classification results for test data. In both cases, we applied the simple graph clustering method to cluster the corresponding network. The correlation-based DNN model reduced some features while retaining an almost similar performance compared to the first DNN model. The third model improves classification results for test data by capturing wider variance within training data using graph clustering method. This third model is somewhat novel approach and can be applied to build DNN models for other purposes.


Assuntos
Antibacterianos , Redes Neurais de Computação , Antibacterianos/farmacologia , Análise por Conglomerados
17.
Plant Methods ; 18(1): 118, 2022 Nov 05.
Artigo em Inglês | MEDLINE | ID: mdl-36335358

RESUMO

BACKGROUND: Phytochemicals or secondary metabolites are low molecular weight organic compounds with little function in plant growth and development. Nevertheless, the metabolite diversity govern not only the phenetics of an organism but may also inform the evolutionary pattern and adaptation of green plants to the changing environment. Plant chemoinformatics analyzes the chemical system of natural products using computational tools and robust mathematical algorithms. It has been a powerful approach for species-level differentiation and is widely employed for species classifications and reinforcement of previous classifications. RESULTS: This study attempts to classify Angiosperms using plant sulfur-containing compound (SCC) or sulphated compound information. The SCC dataset of 692 plant species were collected from the comprehensive species-metabolite relationship family (KNApSAck) database. The structural similarity score of metabolite pairs under all possible combinations (plant species-metabolite) were determined and metabolite pairs with a Tanimoto coefficient value > 0.85 were selected for clustering using machine learning algorithm. Metabolite clustering showed association between the similar structural metabolite clusters and metabolite content among the plant species. Phylogenetic tree construction of Angiosperms displayed three major clades, of which, clade 1 and clade 2 represented the eudicots only, and clade 3, a mixture of both eudicots and monocots. The SCC-based construction of Angiosperm phylogeny is a subset of the existing monocot-dicot classification. The majority of eudicots present in clade 1 and 2 were represented by glucosinolate compounds. These clades with SCC may have been a mixture of ancestral species whilst the combinatorial presence of monocot-dicot in clade 3 suggests sulphated-chemical structure diversification in the event of adaptation during evolutionary change. CONCLUSIONS: Sulphated chemoinformatics informs classification of Angiosperms via machine learning technique.

18.
Antibiotics (Basel) ; 11(9)2022 Sep 05.
Artigo em Inglês | MEDLINE | ID: mdl-36139978

RESUMO

Jamu is the traditional Indonesian herbal medicine system that is considered to have many benefits such as serving as a cure for diseases or maintaining sound health. A Jamu medicine is generally made from a mixture of several herbs. Natural antibiotics can provide a way to handle the problem of antibiotic resistance. This research aims to discover the potential of herbal plants as natural antibiotic candidates based on a machine learning approach. Our input data consists of a list of herbal formulas with plants as their constituents. The target class corresponds to bacterial diseases that can be cured by herbal formulas. The best model has been observed by implementing the Random Forest (RF) algorithm. For 10-fold cross-validations, the maximum accuracy, recall, and precision are 91.10%, 91.10%, and 90.54% with standard deviations 1.05, 1.05, and 1.48, respectively, which imply that the model obtained is good and robust. This study has shown that 14 plants can be potentially used as natural antibiotic candidates. Furthermore, according to scientific journals, 10 of the 14 selected plants have direct or indirect antibacterial activity.

19.
Life (Basel) ; 12(1)2021 Dec 24.
Artigo em Inglês | MEDLINE | ID: mdl-35054420

RESUMO

Recent advances in information technology have brought forth a paradigm shift in science, especially in the biology and medical fields [...].

20.
Database (Oxford) ; 20212021 03 11.
Artigo em Inglês | MEDLINE | ID: mdl-33705530

RESUMO

A biomarker is a measurable indicator of a disease or abnormal state of a body that plays an important role in disease diagnosis, prognosis and treatment. The biomarker has become a significant topic due to its versatile usage in the medical field and in rapid detection of the presence or severity of some diseases. The volume of biomarker data is rapidly increasing and the identified data are scattered. To provide comprehensive information, the explosively growing data need to be recorded in a single platform. There is no open-source freely available comprehensive online biomarker database. To fulfill this purpose, we have developed a human biomarker database as part of the KNApSAcK family databases which contain a vast quantity of information on the relationships between biomarkers and diseases. We have classified the diseases into 18 disease classes, mostly according to the National Center for Biotechnology Information definitions. Apart from this database development, we also have performed disease classification by separately using protein and metabolite biomarkers based on the network clustering algorithm DPClusO and hierarchical clustering. Finally, we reached a conclusion about the relationships among the disease classes. The human biomarker database can be accessed online and the inter-disease relationships may be helpful in understanding the molecular mechanisms of diseases. To our knowledge, this is one of the first approaches to classify diseases based on biomarkers. Database URL:  http://www.knapsackfamily.com/Biomarker/top.php.


Assuntos
Algoritmos , Proteínas , Biomarcadores , Análise por Conglomerados , Bases de Dados Factuais , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA