Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 45
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36750041

RESUMO

Drug-drug interactions (DDIs) are compound effects when patients take two or more drugs at the same time, which may weaken the efficacy of drugs or cause unexpected side effects. Thus, accurately predicting DDIs is of great significance for the drug development and the drug safety surveillance. Although many methods have been proposed for the task, the biological knowledge related to DDIs is not fully utilized and the complex semantics among drug-related biological entities are not effectively captured in existing methods, leading to suboptimal performance. Moreover, the lack of interpretability for the predicted results also limits the wide application of existing methods for DDIs prediction. In this study, we propose a novel framework for predicting DDIs with interpretability. Specifically, we construct a heterogeneous information network (HIN) by explicitly utilizing the biological knowledge related to the procedure of inducing DDIs. To capture the complex semantics in HIN, a meta-path-based information fusion mechanism is proposed to learn high-quality representations of drugs. In addition, an attention mechanism is designed to combine semantic information obtained from meta-paths with different lengths to obtain final representations of drugs for DDIs prediction. Comprehensive experiments are conducted on 2410 approved drugs, and the results of predictive performance comparison show that our proposed framework outperforms selected representative baselines on the task of DDIs prediction. The results of ablation study and cold-start scenario indicate that the meta-path-based information fusion mechanism red is beneficial for capturing the complex semantics among drug-related biological entities. Moreover, the results of case study demonstrate that the designed attention mechanism is able to provide partial interpretability for the predicted DDIs. Therefore, the proposed method will be a feasible solution to the task of predicting DDIs.


Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Interações Medicamentosas , Semântica
2.
Bioinformatics ; 40(4)2024 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-38569882

RESUMO

MOTIVATION: The crisis of antibiotic resistance, which causes antibiotics used to treat bacterial infections to become less effective, has emerged as one of the foremost challenges to public health. Identifying the properties of antibiotic resistance genes (ARGs) is an essential way to mitigate this issue. Although numerous methods have been proposed for this task, most of these approaches concentrate solely on predicting antibiotic class, disregarding other important properties of ARGs. In addition, existing methods for simultaneously predicting multiple properties of ARGs fail to account for the causal relationships among these properties, limiting the predictive performance. RESULTS: In this study, we propose a causality-guided framework for annotating properties of ARGs, in which causal inference is utilized for representation learning. More specifically, the hidden biological patterns determining the properties of ARGs are described by a Gaussian Mixture Model, and procedure of causal representation learning is used to derive the hidden features. In addition, a causal graph among different properties is constructed to capture the causal relationships among properties of ARGs, which is integrated into the task of annotating properties of ARGs. The experimental results on a real-world dataset demonstrate the effectiveness of the proposed framework on the task of annotating properties of ARGs. AVAILABILITY AND IMPLEMENTATION: The data and source codes are available in GitHub at https://github.com/David-WZhao/CausalARG.


Assuntos
Antibacterianos , Genes Bacterianos , Antibacterianos/farmacologia , Resistência Microbiana a Medicamentos/genética , Software
3.
Brief Bioinform ; 23(3)2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35272349

RESUMO

The increasing prevalence of antibiotic resistance has become a global health crisis. For the purpose of safety regulation, it is of high importance to identify antibiotic resistance genes (ARGs) in bacteria. Although culture-based methods can identify ARGs relatively more accurately, the identifying process is time-consuming and specialized knowledge is required. With the rapid development of whole genome sequencing technology, researchers attempt to identify ARGs by computing sequence similarity from public databases. However, these computational methods might fail to detect ARGs due to the low sequence identity to known ARGs. Moreover, existing methods cannot effectively address the issue of multidrug resistance prediction for ARGs, which is a great challenge to clinical treatments. To address the challenges, we propose an end-to-end multi-label learning framework for predicting ARGs. More specifically, the task of ARGs prediction is modeled as a problem of multi-label learning, and a deep neural network-based end-to-end framework is proposed, in which a specific loss function is introduced to employ the advantage of multi-label learning for ARGs prediction. In addition, a dual-view modeling mechanism is employed to make full use of the semantic associations among two views of ARGs, i.e. sequence-based information and structure-based information. Extensive experiments are conducted on publicly available data, and experimental results demonstrate the effectiveness of the proposed framework on the task of ARGs prediction.


Assuntos
Antibacterianos , Genes Bacterianos , Antibacterianos/farmacologia , Bactérias/genética , Resistência Microbiana a Medicamentos/genética , Redes Neurais de Computação
4.
Methods ; 218: 48-56, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37516260

RESUMO

Drug repurposing, which typically applies the procedure of drug-disease associations (DDAs) prediction, is a feasible solution to drug discovery. Compared with traditional methods, drug repurposing can reduce the cost and time for drug development and advance the success rate of drug discovery. Although many methods for drug repurposing have been proposed and the obtained results are relatively acceptable, there is still some room for improving the predictive performance, since those methods fail to consider fully the issue of sparseness in known drug-disease associations. In this paper, we propose a novel multi-task learning framework based on graph representation learning to identify DDAs for drug repurposing. In our proposed framework, a heterogeneous information network is first constructed by combining multiple biological datasets. Then, a module consisting of multiple layers of graph convolutional networks is utilized to learn low-dimensional representations of nodes in the constructed heterogeneous information network. Finally, two types of auxiliary tasks are designed to help to train the target task of DDAs prediction in the multi-task learning framework. Comprehensive experiments are conducted on real data and the results demonstrate the effectiveness of the proposed method for drug repurposing.


Assuntos
Desenvolvimento de Medicamentos , Reposicionamento de Medicamentos , Descoberta de Drogas
5.
Environ Res ; 227: 115777, 2023 06 15.
Artigo em Inglês | MEDLINE | ID: mdl-36966989

RESUMO

The present study aims at using lipid in a novel way to improve the efficiency of methane production from lignite anaerobic digestion. The obtained results showed an increase by 3.13 times of the cumulative biomethane content of lignite anaerobic fermentation, when 1.8 g lipid was added. The gene expression of functional metabolic enzymes was also found to be enhanced during the anaerobic fermentation. Moreover, the enzymes related to fatty acid degradation such as long-chain Acyl-CoA synthetase and Acyl-CoA dehydrogenase were increased by 1.72 and 10.48 times, respectively, which consequently, accelerated the conversion of fatty acid. Furthermore, the addition of lipid enhanced the carbon dioxide trophic and acetic acid trophic metabolic pathways. Hence, the addition of lipids was argued to promote the production of methane from lignite anaerobic fermentation, which provided a new insight for the conversion and utilization of lipid waste.


Assuntos
Ácidos Graxos , Metano , Fermentação , Anaerobiose , Ácidos Graxos/metabolismo , Catálise , Reatores Biológicos
6.
J Environ Manage ; 343: 118058, 2023 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-37229851

RESUMO

Metagenomic sequencing technology was applied to evaluate differences in the anaerobic fermentation process of coal slimes by analyzing microbial diversity, functional activity structure, and cooperative relationship during the anaerobic fermentation of coal slimes with different coal ranks. The obtained results showed that the production of biomethane from coal slime was decreased by increasing metamorphism degree. Internal reason was higher abundance of microbial community in low rank coal slimes compared to that in high rank coal which had higher activity in the gene expression of key steps such as hydrolysis and acidification, methanation and the production of hydrogen and acetic acid. Acetic acid decarboxylation and CO2 reduction are two key pathways of methanation process. At the same time, K11261 (formylmethanofuran dehydrogenase subunit) and K01499 (methenyltetrahydromethanopterin cyclohydrolase) genes were further enriched in low rank slime systems, which enhanced the proportion of CO2 reduction in methanation pathway and was beneficial to biomethane production. Research revealed the roles of different coal slime ranks in biomethane production process and is considered as an important reference significance for further exploration of coal slime resource utilization.


Assuntos
Carvão Mineral , Metagenômica , Fermentação , Dióxido de Carbono , Metano , Anaerobiose , Acetatos , Reatores Biológicos
7.
Bioinformatics ; 37(13): 1891-1899, 2021 Jul 27.
Artigo em Inglês | MEDLINE | ID: mdl-33492356

RESUMO

MOTIVATION: Multiple events extraction from biomedical literature is a challenging task for biomedical community. Usually, biomedical event extraction is modeled as two sub-tasks, trigger identification and argument detection. Most existing methods perform these two sub-tasks sequentially, and fail to make full use of the interaction between them, leading to suboptimal results for multiple biomedical events extraction. RESULTS: We propose a novel framework of reinforcement learning (RL) for the task of multiple biomedical events extraction. More specifically, trigger identification and argument detection are treated as main-task and subsidiary-task, respectively. Assigning the event type of triggers (in the main-task) is viewed as the action taken in RL, and the result of corresponding argument detection (i.e. the subsidiary-task) for the identified trigger is used for computing the reward of the taken action. Moreover, the result of the subsidiary-task is modeled as part of environment information in RL to help the procedure of trigger identification. In addition, external biomedical knowledge bases are employed for representation learning of biomedical text, which can improve the performance of biomedical event extraction. Results on two widely used biomedical corpora demonstrate that the proposed framework performs better than the selected baselines on the task of multiple events extraction. The ablation test indicates the contributions of RL and external KBs to the performance improvement in the proposed method. In addition, by modeling multiple events extraction under the RL framework, the supervised information is exploited more effectively than the classical supervised learning paradigm. Availability and implementationSource codes will be available at: https://github.com/David-WZhao/BioEE-RL.

8.
Traffic ; 19(2): 122-137, 2018 02.
Artigo em Inglês | MEDLINE | ID: mdl-29112302

RESUMO

Iron is essential for most living organisms. The iron-regulated transporter1 (IRT1) plays a major role in iron uptake in roots, and its trafficking from endoplasmic reticulum (ER) to plasma membrane (PM) is tightly coordinated with changes in iron environment. However, studies on the IRT1 response are limited. Here, we report that Malus xiaojinesis IRT1 (MxIRT1) associates with detergent-resistant membranes (DRMs, a biochemical counterpart of PM microdomains), whereas the PM microdomains are known platforms for signal transduction in the PM. Depending on the shift of MxIRT1 from microdomains to homogeneous regions in PM, MxIRT1-mediated iron absorption is activated by the cholesterol recognition/interaction amino acid consensus (CRAC) motif of MxIRT1. MxIRT1 initially associates with DRMs in ER via its transmembrane domain 1 (TMD1), and thus begins DRMs-dependent intracellular trafficking. Subsequently, MxIRT1 is sequestered in COPII vesicles via the ER export signal sequence in MxIRT1. These studies suggest that iron homeostasis is influenced by the CRAC motif and TMD1 domain due to their determination of MxIRT1-DRMs association.


Assuntos
Membrana Celular/metabolismo , Retículo Endoplasmático/metabolismo , Proteínas de Membrana/metabolismo , Proteínas de Plantas/metabolismo , Raízes de Plantas/metabolismo , Colesterol/metabolismo , Detergentes , Malus , Sinais Direcionadores de Proteínas/fisiologia
9.
Plant J ; 90(1): 147-163, 2017 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-28103409

RESUMO

Iron (Fe)-homeostasis in the plastids is closely associated with Fe transport proteins that prevent Fe from occurring in its toxic free ionic forms. However, the number of known protein families related to Fe transport in the plastids (about five) and the function of iron in non-green plastids is limited. In the present study, we report the functional characterization of Zea mays Fe deficiency-related 4 (ZmFDR4), which was isolated from a differentially expressed clone of a cDNA library of Fe deficiency-induced maize roots. ZmFDR4 is homologous to the bacterial FliP superfamily, coexisted in both algae and terrestrial plants, and capable of restoring the normal growth of the yeast mutant fet3fet4, which possesses defective Fe uptake systems. ZmFDR4 mRNA is ubiquitous in maize and is inducible by iron deficiency in wheat. Transient expression of the 35S:ZmFDR4-eGFP fusion protein in rice protoplasts indicated that ZmFDR4 maybe localizes to the plastids envelope and thylakoid. In 35S:c-Myc-ZmFDR4 transgenic tobacco, immunohistochemistry and immunoblotting confirmed that ZmFDR4 is targeted to both the chloroplast envelope and thylakoid. Meanwhile, ultrastructure analysis indicates that ZmFDR4 promotes the density of plastids and accumulation of starch grains. Moreover, Bathophenanthroline disulfonate (BPDS) colorimetry and inductively coupled plasma mass spectrometry (ICP-MS) indicate that ZmFDR4 is related to Fe uptake by plastids and increases seed Fe content. Finally, 35S:c-Myc-ZmFDR4 transgenic tobacco show enhanced photosynthetic efficiency. Therefore, the results of the present study demonstrate that ZmFDR4 functions as an iron transporter in monocot plastids and provide insight into the process of Fe uptake by plastids.


Assuntos
Deficiências de Ferro , Ferro/metabolismo , Proteínas de Plantas/metabolismo , Plastídeos/metabolismo , Zea mays/metabolismo , Regulação da Expressão Gênica de Plantas , Folhas de Planta/genética , Folhas de Planta/metabolismo , Proteínas de Plantas/genética , Raízes de Plantas/genética , Raízes de Plantas/metabolismo , Nicotiana/genética , Nicotiana/metabolismo , Zea mays/genética
10.
BMC Bioinformatics ; 17(1): 213, 2016 May 13.
Artigo em Inglês | MEDLINE | ID: mdl-27177941

RESUMO

BACKGROUND: Next-generation sequencing (NGS) technologies have provided researchers with vast possibilities in various biological and biomedical research areas. Efficient data mining strategies are in high demand for large scale comparative and evolutional studies to be performed on the large amounts of data derived from NGS projects. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. METHODS: We report a novel procedure to analyse NGS data using topic modeling. It consists of four major procedures: NGS data retrieval, preprocessing, topic modeling, and data mining using Latent Dirichlet Allocation (LDA) topic outputs. The NGS data set of the Salmonella enterica strains were used as a case study to show the workflow of this procedure. The perplexity measurement of the topic numbers and the convergence efficiencies of Gibbs sampling were calculated and discussed for achieving the best result from the proposed procedure. RESULTS: The output topics by LDA algorithms could be treated as features of Salmonella strains to accurately describe the genetic diversity of fliC gene in various serotypes. The results of a two-way hierarchical clustering and data matrix analysis on LDA-derived matrices successfully classified Salmonella serotypes based on the NGS data. The implementation of topic modeling in NGS data analysis procedure provides a new way to elucidate genetic information from NGS data, and identify the gene-phenotype relationships and biomarkers, especially in the era of biological and medical big data. CONCLUSION: The implementation of topic modeling in NGS data analysis provides a new way to elucidate genetic information from NGS data, and identify the gene-phenotype relationships and biomarkers, especially in the era of biological and medical big data.


Assuntos
Algoritmos , Mineração de Dados/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Biomarcadores/análise , Análise por Conglomerados , Modelos Teóricos , Polimorfismo de Nucleotídeo Único/genética , Salmonella/classificação , Salmonella/genética , Sorotipagem
11.
BMC Public Health ; 16: 279, 2016 Mar 19.
Artigo em Inglês | MEDLINE | ID: mdl-26993983

RESUMO

BACKGROUND: Both adolescent substance use and adolescent depression are major public health problems, and have the tendency to co-occur. Thousands of articles on adolescent substance use or depression have been published. It is labor intensive and time consuming to extract huge amounts of information from the cumulated collections. Topic modeling offers a computational tool to find relevant topics by capturing meaningful structure among collections of documents. METHODS: In this study, a total of 17,723 abstracts from PubMed published from 2000 to 2014 on adolescent substance use and depression were downloaded as objects, and Latent Dirichlet allocation (LDA) was applied to perform text mining on the dataset. Word clouds were used to visually display the content of topics and demonstrate the distribution of vocabularies over each topic. RESULTS: The LDA topics recaptured the search keywords in PubMed, and further discovered relevant issues, such as intervention program, association links between adolescent substance use and adolescent depression, such as sexual experience and violence, and risk factors of adolescent substance use, such as family factors and peer networks. Using trend analysis to explore the dynamics of proportion of topics, we found that brain research was assessed as a hot issue by the coefficient of the trend test. CONCLUSIONS: Topic modeling has the ability to segregate a large collection of articles into distinct themes, and it could be used as a tool to understand the literature, not only by recapturing known facts but also by discovering other relevant topics.


Assuntos
Mineração de Dados/métodos , Depressão/epidemiologia , Transtornos Relacionados ao Uso de Substâncias/epidemiologia , Adolescente , Comportamento do Adolescente , Humanos
12.
BMC Bioinformatics ; 16 Suppl 13: S8, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26424364

RESUMO

BACKGROUND: Topic modelling is an active research field in machine learning. While mainly used to build models from unstructured textual data, it offers an effective means of data mining where samples represent documents, and different biological endpoints or omics data represent words. Latent Dirichlet Allocation (LDA) is the most commonly used topic modelling method across a wide number of technical fields. However, model development can be arduous and tedious, and requires burdensome and systematic sensitivity studies in order to find the best set of model parameters. Often, time-consuming subjective evaluations are needed to compare models. Currently, research has yielded no easy way to choose the proper number of topics in a model beyond a major iterative approach. METHODS AND RESULTS: Based on analysis of variation of statistical perplexity during topic modelling, a heuristic approach is proposed in this study to estimate the most appropriate number of topics. Specifically, the rate of perplexity change (RPC) as a function of numbers of topics is proposed as a suitable selector. We test the stability and effectiveness of the proposed method for three markedly different types of grounded-truth datasets: Salmonella next generation sequencing, pharmacological side effects, and textual abstracts on computational biology and bioinformatics (TCBB) from PubMed. CONCLUSION: The proposed RPC-based method is demonstrated to choose the best number of topics in three numerical experiments of widely different data types, and for databases of very different sizes. The work required was markedly less arduous than if full systematic sensitivity studies had been carried out with number of topics as a parameter. We understand that additional investigation is needed to substantiate the method's theoretical basis, and to establish its generalizability in terms of dataset characteristics.


Assuntos
Biologia Computacional/métodos , Mineração de Dados/métodos , Heurística/fisiologia , Bases de Dados Factuais , Sequenciamento de Nucleotídeos em Larga Escala
13.
Yeast ; 32(7): 499-517, 2015 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-25871543

RESUMO

Iron is essential for plants, but highly toxic when present in excess. Consequently, iron uptake by root transporters must be finely tuned to avoid excess uptake from soil under iron excess. The iron-regulated transporter of Malus xiaojinensis (MxIRT1), induced in roots under iron deficiency, is a highly effective iron(II) transporter. Here, we investigated how the presence of excessive iron leads to MxIRT1 degradation in yeast expressing this plant iron transporter protein. To determine the relationship between iron abundance and MxIRT1 degradation, relative levels of autophagy-related gene-8 (ATG8) mRNA and the active ATG8-phosphatidylethanolamine-conjugated (PE) protein were measured in wild-type yeast and the autophagic mutant strains atg1∆, atg5∆, atg7∆, ypt7∆ and tor1∆ under normal and excessive iron conditions. The data showed that the exposure of MxIRT1-eGFP-transformed wild-type and tor1∆ strains to excessive iron led to significantly increased levels of ATG8 transcript and ATG8-PE protein, which resulted in enhanced MxIRT1 degradation. Co-localization of mCherry-ATG8 and MxIRT1-eGFP provided evidence that these proteins interact during autophagy in yeast. While inhibition of autophagic initiation, autophagosome formation and vacuole fusion all decreased MxIRT1 degradation. PMSF inhibition of autophagy prevented degradation, leading to the accumulation of MxIRT1-containing vesicles in the vacuoles. MxIRT1-vesicles were sorted into autophagosomes for iron-induced degradation in yeast, whereas the endogenous iron(II) transporter Fet4 was degraded in an autophagy-independent manner. Moreover, immunoprecipitation showed that multimono-ubiquitins provided MxIRT1 with the ubiquitination signal. Together, three factors, iron excess, autophagy and mono-ubiquitination, affect the functional activity and stability of exogenous MxIRT1 in yeast, thereby preventing iron uptake via this root transporter.


Assuntos
Autofagia , Proteínas de Transporte de Cátions/metabolismo , Ferro/metabolismo , Malus/genética , Proteínas de Plantas/metabolismo , Proteólise , Saccharomyces/fisiologia , Família da Proteína 8 Relacionada à Autofagia , Proteínas de Transporte de Cátions/genética , Expressão Gênica , Perfilação da Expressão Gênica , Proteínas Associadas aos Microtúbulos/análise , Proteínas Associadas aos Microtúbulos/genética , Proteínas de Plantas/genética , Mapeamento de Interação de Proteínas , RNA Mensageiro/análise , RNA Mensageiro/genética , Proteínas Recombinantes/genética , Proteínas Recombinantes/metabolismo , Saccharomyces/genética , Saccharomyces/metabolismo , Proteínas de Saccharomyces cerevisiae/análise , Proteínas de Saccharomyces cerevisiae/genética , Ubiquitinação
14.
Transgenic Res ; 24(1): 109-22, 2015 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-25099285

RESUMO

Iron and zinc are essential in plant and human nutrition. Iron deficiency has been one of the causes of human mortality, especially in developing countries with high rice consumption. MxIRT1 is a ferrous transporter that has been screened from an iron-efficient genotype of the apple tree, Malus xiaojinensis Cheng et Jiang. In order to produce Fe-biofortified rice with MxIRT1 to solve the Fe-deficiency problem, plant expression vectors of pCAMBIA1302-MxIRT1:GFP and pCAMBIA1302-anti MxIRT1:GFP were constructed that led to successful production of transgenic rice. The transgenic plant phenotypes showed that the expression of endogenous OsIRT1 was suppressed by anti-MxIRT1 in antisense lines that acted as an opposing control, while sense lines had a higher tolerance under Zn- and Fe-deficient conditions. The iron and zinc concentration in T3 seeds increased by three times in sense lines when compared to the wild type. To understand the MxIRT1 cadmium uptake, the MxIRT1 cadmium absorption trait was compared with AtIRT1 and OsIRT1 in transgenic rice protoplasts, and it was found that MxIRT1 had the lowest Cd uptake capacity. MxIRT1 transgenic tobacco-cultured bright yellow-2 (BY-2) cells and rice lines were subjected to different Fe conditions and the results from the non-invasive micro-test technique showed that iron was actively transported compared to cadmium as long as iron was readily available in the environment. This suggests that MxIRT1 is a good candidate gene for plant Fe and Zn biofortification.


Assuntos
Ferro/metabolismo , Oryza/genética , Plantas Geneticamente Modificadas , Zinco/metabolismo , Regulação da Expressão Gênica de Plantas , Humanos , Malus/genética , Oryza/metabolismo , Proteínas de Plantas/biossíntese , Proteínas de Plantas/genética , Sementes/genética
15.
BMC Bioinformatics ; 15 Suppl 11: S11, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25350106

RESUMO

BACKGROUND: The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. RESULTS: In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. CONCLUSION: Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting that topic model-based methods could provide an analytic advancement in the analysis of large biological or medical datasets.


Assuntos
Mineração de Dados/métodos , Algoritmos , Neoplasias da Mama/classificação , Neoplasias da Mama/mortalidade , Análise por Conglomerados , Eletroforese em Gel de Campo Pulsado , Feminino , Humanos , Neoplasias Pulmonares/classificação , Modelos Estatísticos , Salmonella/classificação , Salmonella/isolamento & purificação , Análise de Sobrevida
16.
Artigo em Inglês | MEDLINE | ID: mdl-38051617

RESUMO

Computational drug repositioning can identify potential associations between drugs and diseases. This technology has been shown to be effective in accelerating drug development and reducing experimental costs. Although there has been plenty of research for this task, existing methods are deficient in utilizing complex relationships among biological entities, which may not be conducive to subsequent simulation of drug treatment processes. In this article, we propose a heterogeneous graph embedding method called HMLKGAT to infer novel potential drugs for diseases. More specifically, we first construct a heterogeneous information network by combining drug-disease, drug-protein and disease-protein biological networks. Then, a multi-layer graph attention model is utilized to capture the complex associations in the network to derive representations for drugs and diseases. Finally, to maintain the relationship of nodes in different feature spaces, we propose a multi-kernel learning method to transform and combine the representations. Experimental results demonstrate that HMLKGAT outperforms six state-of-the-art methods in drug-related disease prediction, and case studies of five classical drugs further demonstrate the effectiveness of HMLKGAT.


Assuntos
Aprendizado Profundo , Simulação por Computador , Desenvolvimento de Medicamentos , Reposicionamento de Medicamentos
17.
Artigo em Inglês | MEDLINE | ID: mdl-38640044

RESUMO

The crisis of antibiotic resistance has become a significant global threat to human health. Understanding properties of antibiotic resistance genes (ARGs) is the first step to mitigate this issue. Although many methods have been proposed for predicting properties of ARGs, most of these methods focus only on predicting antibiotic classes, while ignoring other properties of ARGs, such as resistance mechanisms and transferability. However, acquiring all of these properties of ARGs can help researchers gain a more comprehensive understanding of the essence of antibiotic resistance, which will facilitate the development of antibiotics. In this paper, the task of predicting properties of ARGs is modeled as a multi-task learning problem, and an effective subtask-aware representation learning-based framework is proposed accordingly. More specifically, property-specific expert networks and shared expert networks are utilized respectively to learn subtask-specific features for each subtask and shared features among different subtasks. In addition, a gating-controlled mechanism is employed to dynamically allocate weights to subtask-specific semantics and shared semantics obtained respectively from property-specific expert networks and shared expert networks, thus adjusting distinctive contributions of subtask-specific features and shared features to achieve optimal performance for each subtask simultaneously. Extensive experiments are conducted on publicly available data, and experimental results demonstrate the effectiveness of the proposed framework on the task of ARGs properties prediction. The data and source codes are available in GitHub at https://github.com/David-WZhao/GCM-ARG.

18.
BMC Bioinformatics ; 14 Suppl 14: S15, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24267777

RESUMO

BACKGROUND: Pulsed field gel electrophoresis (PFGE) is currently the most widely and routinely used method by the Centers for Disease Control and Prevention (CDC) and state health labs in the United States for Salmonella surveillance and outbreak tracking. Major drawbacks of commercially available PFGE analysis programs have been their difficulty in dealing with large datasets and the limited availability of analysis tools. There exists a need to develop new analytical tools for PFGE data mining in order to make full use of valuable data in large surveillance databases. RESULTS: In this study, a software package was developed consisting of five types of bioinformatics approaches exploring and implementing for the analysis and visualization of PFGE fingerprinting. The approaches include PFGE band standardization, Salmonella serotype prediction, hierarchical cluster analysis, distance matrix analysis and two-way hierarchical cluster analysis. PFGE band standardization makes it possible for cross-group large dataset analysis. The Salmonella serotype prediction approach allows users to predict serotypes of Salmonella isolates based on their PFGE patterns. The hierarchical cluster analysis approach could be used to clarify subtypes and phylogenetic relationships among groups of PFGE patterns. The distance matrix and two-way hierarchical cluster analysis tools allow users to directly visualize the similarities/dissimilarities of any two individual patterns and the inter- and intra-serotype relationships of two or more serotypes, and provide a summary of the overall relationships between user-selected serotypes as well as the distinguishable band markers of these serotypes. The functionalities of these tools were illustrated on PFGE fingerprinting data from PulseNet of CDC. CONCLUSIONS: The bioinformatics approaches included in the software package developed in this study were integrated with the PFGE database to enhance the data mining of PFGE fingerprints. Fast and accurate prediction makes it possible to elucidate Salmonella serotype information before conventional serological methods are pursued. The development of bioinformatics tools to distinguish the PFGE markers and serotype specific patterns will enhance PFGE data retrieval, interpretation and serotype identification and will likely accelerate source tracking to identify the Salmonella isolates implicated in foodborne diseases.


Assuntos
Biologia Computacional/métodos , Eletroforese em Gel de Campo Pulsado/métodos , Salmonella/classificação , Análise por Conglomerados , Mineração de Dados , Bases de Dados Genéticas , Humanos , Salmonella/química , Salmonella/genética , Sorotipagem
19.
IEEE/ACM Trans Comput Biol Bioinform ; 20(6): 3635-3647, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37616131

RESUMO

Side effects of drugs have gained increasing attention in the biomedical field, and accurate identification of drug side effects is essential for drug development and drug safety surveillance. Although the traditional pharmacological experiments can accurately detect the side effects of drugs, the identifying process is time-consuming, costly, and may lead to incomplete identification of side effects. With the expanding of various biomedical databases, many computational methods have been developed for the task of drug-side effect associations (DSAs) prediction. However, existing methods have the following three drawbacks: 1). multiple drug-related databases are not fully used; 2). the complex semantics among drugs and side effects are not effectively captured; 3). the explainability of the predicted DSAs is missed for most existing methods. Therefore, there is an urgent need to find a more effective method for predicting DSAs. To address these issues, we propose a novel meta-path-based graph neural network model for drug-side effect associations prediction (MPGNN-DSA). In MPGNN-DSA, a heterogeneous information network is first constructed by combining multiple biological datasets. Then, a meta-path-based feature learning module is utilized for learning high-quality representations of drugs and side effects by capturing the semantics contained in meta-paths of the constructed HIN. With the learned features, the prediction module is conducted to derive the predicted side effects for drugs. In addition, the explainability of the predicted DSAs can be provided as well with the semantics contained in meta-paths. We conduct comprehensive experiments, and the results demonstrate the effectiveness of MPGNN-DSA, suggesting that the proposed method will be a feasible solution to the task of DSAs prediction.


Assuntos
Algoritmos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Redes Neurais de Computação , Descoberta de Drogas/métodos , Gerenciamento de Dados
20.
IEEE J Biomed Health Inform ; 27(6): 3061-3071, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-37030796

RESUMO

In the treatment of bacterial infectious diseases, overuse of antibiotics may lead to not only bacterial resistance to antibiotics but also dysbiosis of beneficial bacteria which are essential for maintaining normal human life activities. Instead, phage therapy, which invades and lyses specific pathogenic bacteria without affecting beneficial bacteria, becomes more and more popular to treat bacterial infectious diseases. For the effective phage therapy, it requires to accurately predict potential phage-host interactions from heterogeneous information network consisting of bacteria and phages. Although many models have been proposed for predicting phage-host interactions, most methods fail to consider fully the sparsity and unconnectedness of phage-host heterogeneous information network, deriving the undesirable performance on phage-host interactions prediction. To address the challenge, we propose an effective model called GERMAN-PHI for predicting Phage-Host Interactions via Graph Embedding Representation learning with Multi-head Attention mechaNism. In GERMAN-PHI, the multi-head attention mechanism is utilized to learn representations of phages and hosts from multiple perspectives of phage-host associations, addressing the sparsity and unconnectedness in phage-host heterogeneous information network. More specifically, a module of GAT with talking-heads is employed to learn representations of phages and bacteria, on which neural induction matrix completion is conducted to reconstruct the phage-host association matrix. Results of comprehensive experiments demonstrate that GERMAN-PHI performs better than the state-of-the-art methods on phage-host interactions prediction. In addition, results of case study for two high-risk human pathogens show that GERMAN-PHI can predict validated phages with high accuracy, and some potential or new associated phages are provided as well.


Assuntos
Bacteriófagos , Doenças Transmissíveis , Humanos , Bactérias , Antibacterianos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA