Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 65
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38600664

RESUMO

Small open reading frames (smORFs) have been acknowledged to play various roles on essential biological pathways and affect human beings from diabetes to tumorigenesis. Predicting smORFs in silico is quite a prerequisite for processing the omics data. Here, we proposed the smORF-coding-potential-predicting framework, sOCP, which provides functions to construct a model for predicting novel smORFs in some species. The sOCP model constructed in human was based on in-frame features and the nucleotide bias around the start codon, and the small feature subset was proved to be competent enough and avoid overfitting problems for complicated models. It showed more advanced prediction metrics than previous methods and could correlate closely with experimental evidence in a heterogeneous dataset. The model was applied to Rattus norvegicus and exhibited satisfactory performance. We then scanned smORFs with ATG and non-ATG start codons from the human genome and generated a database containing about a million novel smORFs with coding potential. Around 72 000 smORFs are located on the lncRNA regions of the genome. The smORF-encoded peptides may be involved in biological pathways rare for canonical proteins, including glucocorticoid catabolic process and the prokaryotic defense system. Our work provides a model and database for human smORF investigation and a convenient tool for further smORF prediction in other species.


Assuntos
Genoma Humano , Peptídeos , Animais , Humanos , Ratos , Fases de Leitura Aberta , Peptídeos/genética , Proteínas/genética
2.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36750041

RESUMO

Drug-drug interactions (DDIs) are compound effects when patients take two or more drugs at the same time, which may weaken the efficacy of drugs or cause unexpected side effects. Thus, accurately predicting DDIs is of great significance for the drug development and the drug safety surveillance. Although many methods have been proposed for the task, the biological knowledge related to DDIs is not fully utilized and the complex semantics among drug-related biological entities are not effectively captured in existing methods, leading to suboptimal performance. Moreover, the lack of interpretability for the predicted results also limits the wide application of existing methods for DDIs prediction. In this study, we propose a novel framework for predicting DDIs with interpretability. Specifically, we construct a heterogeneous information network (HIN) by explicitly utilizing the biological knowledge related to the procedure of inducing DDIs. To capture the complex semantics in HIN, a meta-path-based information fusion mechanism is proposed to learn high-quality representations of drugs. In addition, an attention mechanism is designed to combine semantic information obtained from meta-paths with different lengths to obtain final representations of drugs for DDIs prediction. Comprehensive experiments are conducted on 2410 approved drugs, and the results of predictive performance comparison show that our proposed framework outperforms selected representative baselines on the task of DDIs prediction. The results of ablation study and cold-start scenario indicate that the meta-path-based information fusion mechanism red is beneficial for capturing the complex semantics among drug-related biological entities. Moreover, the results of case study demonstrate that the designed attention mechanism is able to provide partial interpretability for the predicted DDIs. Therefore, the proposed method will be a feasible solution to the task of predicting DDIs.


Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Interações Medicamentosas , Semântica
3.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36702753

RESUMO

Microbes can affect the metabolism and immunity of human body incessantly, and the dysbiosis of human microbiome drives not only the occurrence but also the progression of disease (i.e. multiple statuses of disease). Recently, microbiome-based association tests have been widely developed to detect the association between the microbiome and host phenotype. However, the existing methods have not achieved satisfactory performance in testing the association between the microbiome and ordinal/nominal multicategory phenotypes (e.g. disease severity and tumor subtype). In this paper, we propose an optimal microbiome-based association test for multicategory phenotypes, namely, multiMiAT. Specifically, under the multinomial logit model framework, we first introduce a microbiome regression-based kernel association test for multicategory phenotypes (multiMiRKAT). As a data-driven optimal test, multiMiAT then integrates multiMiRKAT, score test and MiRKAT-MC to maintain excellent performance in diverse association patterns. Massive simulation experiments prove the success of our method. Furthermore, multiMiAT is also applied to real microbiome data experiments to detect the association between the gut microbiome and clinical statuses of colorectal cancer as well as for diverse statuses of Clostridium difficile infections.


Assuntos
Microbioma Gastrointestinal , Microbiota , Humanos , Microbiota/genética , Simulação por Computador , Fenótipo , Modelos Logísticos
4.
Bioinformatics ; 40(4)2024 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-38569882

RESUMO

MOTIVATION: The crisis of antibiotic resistance, which causes antibiotics used to treat bacterial infections to become less effective, has emerged as one of the foremost challenges to public health. Identifying the properties of antibiotic resistance genes (ARGs) is an essential way to mitigate this issue. Although numerous methods have been proposed for this task, most of these approaches concentrate solely on predicting antibiotic class, disregarding other important properties of ARGs. In addition, existing methods for simultaneously predicting multiple properties of ARGs fail to account for the causal relationships among these properties, limiting the predictive performance. RESULTS: In this study, we propose a causality-guided framework for annotating properties of ARGs, in which causal inference is utilized for representation learning. More specifically, the hidden biological patterns determining the properties of ARGs are described by a Gaussian Mixture Model, and procedure of causal representation learning is used to derive the hidden features. In addition, a causal graph among different properties is constructed to capture the causal relationships among properties of ARGs, which is integrated into the task of annotating properties of ARGs. The experimental results on a real-world dataset demonstrate the effectiveness of the proposed framework on the task of annotating properties of ARGs. AVAILABILITY AND IMPLEMENTATION: The data and source codes are available in GitHub at https://github.com/David-WZhao/CausalARG.


Assuntos
Antibacterianos , Genes Bacterianos , Antibacterianos/farmacologia , Resistência Microbiana a Medicamentos/genética , Software
5.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-35561307

RESUMO

The association between the compositions of microbial communities and various host phenotypes is an important research topic. Microbiome association research addresses multiple domains, such as human disease and diet. Statistical methods for testing microbiome-phenotype associations have been studied recently to determine their ability to assess longitudinal microbiome data. However, existing methods fail to detect sparse association signals in longitudinal microbiome data. In this paper, we developed a novel method, namely aGEEMIHC, which is a data-driven adaptive microbiome higher criticism analysis based on generalized estimating equations to detect sparse microbial association signals from longitudinal microbiome data. aGEEMiHC adopts generalized estimating equations framework that fully considers the correlation among different observations from the same subject in longitudinal data. To be robust to diverse correlation structures for longitudinal data, aGEEMiHC integrates multiple microbiome higher criticism analyses based on generalized estimating equations with different working correlation structures. Extensive simulation experiments demonstrate that aGEEMiHC can control the type I error correctly and achieve superior performance according to a statistical power comparison. We also applied it to longitudinal microbiome data with various types of host phenotypes to demonstrate the stability of our method. aGEEMiHC is also utilized for real longitudinal microbiome data, and we found a significant association between the gut microbiome and Crohn's disease. In addition, our method ranks the significant factors associated with the host phenotype to provide potential biomarkers.


Assuntos
Doença de Crohn , Microbioma Gastrointestinal , Microbiota , Biomarcadores , Simulação por Computador , Doença de Crohn/genética , Microbioma Gastrointestinal/genética , Humanos , Modelos Estatísticos
6.
Brief Bioinform ; 23(3)2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35272349

RESUMO

The increasing prevalence of antibiotic resistance has become a global health crisis. For the purpose of safety regulation, it is of high importance to identify antibiotic resistance genes (ARGs) in bacteria. Although culture-based methods can identify ARGs relatively more accurately, the identifying process is time-consuming and specialized knowledge is required. With the rapid development of whole genome sequencing technology, researchers attempt to identify ARGs by computing sequence similarity from public databases. However, these computational methods might fail to detect ARGs due to the low sequence identity to known ARGs. Moreover, existing methods cannot effectively address the issue of multidrug resistance prediction for ARGs, which is a great challenge to clinical treatments. To address the challenges, we propose an end-to-end multi-label learning framework for predicting ARGs. More specifically, the task of ARGs prediction is modeled as a problem of multi-label learning, and a deep neural network-based end-to-end framework is proposed, in which a specific loss function is introduced to employ the advantage of multi-label learning for ARGs prediction. In addition, a dual-view modeling mechanism is employed to make full use of the semantic associations among two views of ARGs, i.e. sequence-based information and structure-based information. Extensive experiments are conducted on publicly available data, and experimental results demonstrate the effectiveness of the proposed framework on the task of ARGs prediction.


Assuntos
Antibacterianos , Genes Bacterianos , Antibacterianos/farmacologia , Bactérias/genética , Resistência Microbiana a Medicamentos/genética , Redes Neurais de Computação
7.
Plant Physiol ; 191(3): 1535-1545, 2023 03 17.
Artigo em Inglês | MEDLINE | ID: mdl-36548962

RESUMO

As one of the essential life forms in the biosphere, research on cyanobacteria has been growing remarkably for decades. Biological functions in organisms are often accomplished through protein-protein interactions (PPIs), which help to regulate interacting proteins or organize them into an integral machine. However, the study of PPIs in cyanobacteria falls far behind that in mammals and has not been integrated for ease of use. Thus, we built CyanoMapDB (http://www.cyanomapdb.msbio.pro/), a database providing cyanobacterial PPIs with experimental evidence, consisting of 52,304 PPIs among 6,789 proteins from 23 cyanobacterial species. We collected available data in UniProt, STRING, and IntAct, and mined numerous PPIs from co-fractionation MS data in cyanobacteria. The integrated data are accessible in CyanoMapDB (http://www.cyanomapdb.msbio.pro/), enabling users to easily query proteins of interest, investigate interacting proteins with evidence from different sources, and acquire a visual network of the target protein. We believe that CyanoMapDB will promote research involved with cyanobacteria and plants.


Assuntos
Cianobactérias , Mapeamento de Interação de Proteínas , Animais , Bases de Dados de Proteínas , Proteínas/metabolismo , Cianobactérias/genética , Cianobactérias/metabolismo , Mamíferos/metabolismo
8.
Methods ; 218: 48-56, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37516260

RESUMO

Drug repurposing, which typically applies the procedure of drug-disease associations (DDAs) prediction, is a feasible solution to drug discovery. Compared with traditional methods, drug repurposing can reduce the cost and time for drug development and advance the success rate of drug discovery. Although many methods for drug repurposing have been proposed and the obtained results are relatively acceptable, there is still some room for improving the predictive performance, since those methods fail to consider fully the issue of sparseness in known drug-disease associations. In this paper, we propose a novel multi-task learning framework based on graph representation learning to identify DDAs for drug repurposing. In our proposed framework, a heterogeneous information network is first constructed by combining multiple biological datasets. Then, a module consisting of multiple layers of graph convolutional networks is utilized to learn low-dimensional representations of nodes in the constructed heterogeneous information network. Finally, two types of auxiliary tasks are designed to help to train the target task of DDAs prediction in the multi-task learning framework. Comprehensive experiments are conducted on real data and the results demonstrate the effectiveness of the proposed method for drug repurposing.


Assuntos
Desenvolvimento de Medicamentos , Reposicionamento de Medicamentos , Descoberta de Drogas
9.
Small ; 19(45): e2302683, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37466274

RESUMO

Orderly heterostructured catalysts, which integrate nanomaterials of complementary structures and dimensions into single-entity structures, have hold great promise for sustainability applications. In this work, it is showcased that air as green reagent can trigger in situ localized phase transformation and transform the metal carbonate hydroxide nanowires into ordered heterostructured catalyst. In single-crystal nanowire heterostructure, the in situ generated and nanosized Co3 O4 will be anchored in single-crystal Co6 (CO3 )2 (OH)8 nanowires spontaneously, triggered by the lattice matching between the (220) plane of Co3 O4 and the (001) plane of Co6 (CO3 )2 (OH)8 . The lattice matching allows intimate contact at heterointerface with well-defined orientation and strong interfacial coupling, and thus significantly expedites the transfer of photogenerated electrons from tiny Co3 O4 to catalytically active Co6 (CO3 )2 (OH)8 in single-crystal nanowire, which elevates the catalytic efficiency of metal carbonate catalyst in the CO2 reduction reaction (VCO = 19.46 mmol g-1 h-1 and VH2 = 11.53 mmol g-1 h-1 ). The present findings add to the growing body of knowledge on exploiting Earth-abundant metal-carbonate catalysts, and demonstrate the utility of localized phase transformation in constructing advanced catalysts for energy and environmental sustainability applications.

10.
Methods ; 203: 604-613, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35605749

RESUMO

Microbial community is an important part of organisms or ecosystems to maintain health and stability. Analyzing the interaction of microorganisms in the ecosystem and mining the co-occurrence module of the microbial community can deepen the understanding of microbial community function. This could also improve the ability to manipulate the microbial community, thus provide new means for ecological restoration, disease treatment and drug development. Instead of the investigations of pairwise relationships, more and more studies have realized that the higher-order interactions may play important roles in explaining the diversity and complexity of the community. In this study, a hypergraph clustering (HCMFP) based on modularity feature projection is proposed to detect the microbial community in higher-order interaction network among microbes. Specifically, HCMFP uses information entropy to mine the higher-order logical relationships among microbes, and constructs a hypergraph learning model based on modularity feature projection to detect the microbial community. The experimental results show that compared with other methods, HCMFP has better clustering performance and reliable convergence speed. The proposed method is an effective tool for high-order organizations in microbial interaction network. The code and data in this study is freely available at https://github.com/Mayingjun20179/ HCMFP.


Assuntos
Ecossistema , Consórcios Microbianos , Análise por Conglomerados
11.
Methods ; 205: 11-17, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-35636652

RESUMO

Microorganisms play important roles in our lives especially on metabolism and diseases. Determining the probability of human suffering from specific diseases and the severity of the disease based on microbial genes is the crucial research for understanding the relationship between microbes and diseases. Previous could extract the topological information of phylogenetic trees and integrate them to metagenomic datasets, thus enable classifiers to learn more information in limited datasets and thus improve the performance of the models. In this paper, we proposed a GNPI model to better learn the structure of phylogenetic trees. GNPI maintained the original vector format of metagenomic datasets, while previous research had to change the input form to matrices. The vector-like form of the input data can be easily adopted in the baseline machine learning models and is available for deep learning models. The datasets processed with GNPI help enhance the accuracy of machine learning and deep learning models in three different datasets. GNPI is an interpretable data processing method for host phenotype prediction and other bioinformatics tasks.


Assuntos
Metagenoma , Metagenômica , Humanos , Aprendizado de Máquina , Metagenômica/métodos , Fenótipo , Filogenia
12.
Brief Bioinform ; 21(1): 1-10, 2020 Jan 17.
Artigo em Inglês | MEDLINE | ID: mdl-30239587

RESUMO

Sequence clustering is a basic bioinformatics task that is attracting renewed attention with the development of metagenomics and microbiomics. The latest sequencing techniques have decreased costs and as a result, massive amounts of DNA/RNA sequences are being produced. The challenge is to cluster the sequence data using stable, quick and accurate methods. For microbiome sequencing data, 16S ribosomal RNA operational taxonomic units are typically used. However, there is often a gap between algorithm developers and bioinformatics users. Different software tools can produce diverse results and users can find them difficult to analyze. Understanding the different clustering mechanisms is crucial to understanding the results that they produce. In this review, we selected several popular clustering tools, briefly explained the key computing principles, analyzed their characters and compared them using two independent benchmark datasets. Our aim is to assist bioinformatics users in employing suitable clustering tools effectively to analyze big sequencing data. Related data, codes and software tools were accessible at the link http://lab.malab.cn/∼lg/clustering/.

13.
Bioinformatics ; 37(13): 1891-1899, 2021 Jul 27.
Artigo em Inglês | MEDLINE | ID: mdl-33492356

RESUMO

MOTIVATION: Multiple events extraction from biomedical literature is a challenging task for biomedical community. Usually, biomedical event extraction is modeled as two sub-tasks, trigger identification and argument detection. Most existing methods perform these two sub-tasks sequentially, and fail to make full use of the interaction between them, leading to suboptimal results for multiple biomedical events extraction. RESULTS: We propose a novel framework of reinforcement learning (RL) for the task of multiple biomedical events extraction. More specifically, trigger identification and argument detection are treated as main-task and subsidiary-task, respectively. Assigning the event type of triggers (in the main-task) is viewed as the action taken in RL, and the result of corresponding argument detection (i.e. the subsidiary-task) for the identified trigger is used for computing the reward of the taken action. Moreover, the result of the subsidiary-task is modeled as part of environment information in RL to help the procedure of trigger identification. In addition, external biomedical knowledge bases are employed for representation learning of biomedical text, which can improve the performance of biomedical event extraction. Results on two widely used biomedical corpora demonstrate that the proposed framework performs better than the selected baselines on the task of multiple events extraction. The ablation test indicates the contributions of RL and external KBs to the performance improvement in the proposed method. In addition, by modeling multiple events extraction under the RL framework, the supervised information is exploited more effectively than the classical supervised learning paradigm. Availability and implementationSource codes will be available at: https://github.com/David-WZhao/BioEE-RL.

14.
Biomarkers ; 27(2): 188-195, 2022 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-35001797

RESUMO

Background: Vitamin D deficiency has been associated with increased sepsis incidence and mortality in various populations. Vitamin D exerts its effect through vitamin receptor (VDR), and various single nucleotide polymorphisms have been reported to affects the expression and structure of the VDR. In the present study, we investigated the possible role of vitamin D deficiency and VDR polymorphisms in susceptibility to sepsis.Methods: 576 sepsis patients and 421 healthy controls were enrolled in the present study. Plasma vitamin D levels in patients and healthy controls were quantified by ELISA. Genetic variants in the VDR (FokI, TaqI, BsmI, and ApaI) were genotyped by TaqMan assay.Results: Reduced serum Vitamin D level was observed in subjects with sepsis compared to healthy controls (p ≤ 0.0001). Further, subjects with septic shock had diminished 25(OH) vitamin D compared to severe sepsis cases (p ≤ 0.0001). FokI variants and minor alleles were more prevalent in sepsis patients compared to healthy controls (Ff: p ≤ 0.0001, χ2 =17.39; ff: p=0.001, χ2 =10.79; f: p ≤ 0.0001, χ2 =23.51). Furthermore, combined plasma levels of 25(OH) vitamin D and FokI polymorphism revealed a significant role in a predisposition to sepsis and septic shock. However, the prevalence of other VDR polymorphisms (TaqI, BsmI and ApaI) were comparable among different clinical categories.Conclusions: Low 25(OH) vitamin D levels and FokI mutants are associated with an increased risk of sepsis and septic shock in a Chinese cohort.Clinical significanceLower levels of 25-OH vitamin D are highly prevalent in Sepsis patients.Subjects harbouring VDR FokI variants are predisposed to susceptibility to sepsis in the studied cohort.


Assuntos
Receptores de Calcitriol , Sepse , Estudos de Casos e Controles , Predisposição Genética para Doença , Genótipo , Hospitais , Humanos , Polimorfismo de Nucleotídeo Único , Receptores de Calcitriol/genética , Sepse/genética , Vitamina D
15.
Environ Microbiol ; 23(1): 327-339, 2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-33185973

RESUMO

Microbial taxon-taxon co-occurrences may directly or indirectly reflect the potential relationships between the members within a microbial community. However, to what extent and the specificity by which these co-occurrences are influenced by environmental factors remains unclear. In this report, we evaluated how the dynamics of microbial taxon-taxon co-occurrence is associated with the changes of environmental factors in Nan Lake at Wuhan city, China with a Modified Liquid Association method. We were able to detect more than 1000 taxon-taxon co-occurrences highly correlated with one or more environmental factors across a phytoplankton bloom using 16S rRNA gene amplicon community profiles. These co-occurrences, referred to as environment dependent co-occurrences (ED_co-occurrences), delineate a unique network in which a taxon-taxon pair exhibits specific, and potentially dynamic correlations with an environmental parameter, while the individual relative abundance of each may not. Microcystis involved ED_co-occurrences are in important topological positions in the network, suggesting relationships between the bloom dominant species and other taxa could play a role in the interplay of microbial community and environment across various bloom stages. Our results may broaden our understanding of the response of a microbial community to the environment, particularly at the level of microbe-microbe associations.


Assuntos
Cianobactérias/crescimento & desenvolvimento , Cianobactérias/isolamento & purificação , Lagos/microbiologia , China , Cianobactérias/genética , Cianobactérias/metabolismo , DNA Bacteriano/genética , Microbiota , Fitoplâncton/classificação , Fitoplâncton/genética , Fitoplâncton/crescimento & desenvolvimento , Fitoplâncton/isolamento & purificação , RNA Ribossômico 16S/genética
16.
Methods ; 173: 44-51, 2020 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-31238097

RESUMO

According to the advances of high-throughput sequencing technology, massive microbiome data accumulated from environmental investigations to human studies. The microbiome-wide association studies are to study the relationship between the microbiome and human health or environment. Recently, Deep Neural Networks (DNNs) are encouraging due to their layer-wise learning ability for representation learning. However, DNNs are considered as black boxes and they require a large amount of training data which makes them impractical to conduct microbiome-wide association studies directly. Meanwhile, the microbiome data is high dimension with many features and noise. A single feature selection method for dealing with the kind of dataset is often unstable. In this work, we introduced a deep learning model named Deep Forest to conduct the microbiome-wide association studies and an ensemble feature selection method is proposed to guide microbial biomarkers' identification. The experiments showed that our ensemble feature method based on Deep Forest had good stability and robustness. The results of feature selection could guide the discovery of microbial biomarkers and help to diagnose microbial-related diseases. The code is available at https://github.com/MicroAVA/MWAS-Biomarkers.git.


Assuntos
Biomarcadores , Pesquisa Biomédica/métodos , Estudo de Associação Genômica Ampla/métodos , Microbiota/genética , Humanos , Redes Neurais de Computação
17.
Scott Med J ; 66(1): 16-22, 2021 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-32990500

RESUMO

BACKGROUND AND AIMS: The neurological damage caused by cardiac arrest (CA) can seriously affect quality of life. We investigated the effect of metformin pretreatment on brain injury and survival in a rat CA/cardiopulmonary resuscitation (CPR) model. METHODS AND RESULTS: After 14 days of pretreatment with metformin, rats underwent 9 minutes of asphyxia CA/CPR. Survival was evaluated 7 days after restoration of spontaneous circulation; neurological deficit scale (NDS) score was evaluated at days 1, 3, and 7. Proteins related to the endoplasmic reticulum (ER) stress response and autophagy were measured using immunoblotting. Seven-day survival was significantly improved and NDS score was significantly improved in rats pretreated with metformin. Metformin enhanced AMPK-induced autophagy activation. AMPK and autophagy inhibitors removed the metformin neuroprotective effect. Although metformin inhibited the ER stress response, its inhibitory effect was weaker than 4-PBA. CONCLUSION: In a CA/CPR rat model, 14-day pretreatment with metformin has a neuroprotective effect. This effect is closely related to the activation of AMPK-induced autophagy and inhibition of the ER stress response. Long-term use of metformin can reduce brain damage following CA/CPR.


Assuntos
Lesões Encefálicas/prevenção & controle , Reanimação Cardiopulmonar/efeitos adversos , Metformina/farmacologia , Fármacos Neuroprotetores/farmacologia , Proteínas Quinases Ativadas por AMP/efeitos dos fármacos , Animais , Autofagia/efeitos dos fármacos , Lesões Encefálicas/etiologia , Modelos Animais de Doenças , Estresse do Retículo Endoplasmático/efeitos dos fármacos , Ratos
18.
BMC Bioinformatics ; 20(Suppl 16): 583, 2019 Dec 02.
Artigo em Inglês | MEDLINE | ID: mdl-31787075

RESUMO

BACKGROUND: Microbes have been shown to play a crucial role in various ecosystems. Many human diseases have been proved to be associated with bacteria, so it is essential to extract the interaction between bacteria for medical research and application. At the same time, many bacterial interactions with certain experimental evidences have been reported in biomedical literature. Integrating this knowledge into a database or knowledge graph could accelerate the progress of biomedical research. A crucial and necessary step in interaction extraction (IE) is named entity recognition (NER). However, due to the specificity of bacterial naming, there are still challenges in bacterial named entity recognition. RESULTS: In this paper, we propose a novel method for bacterial named entity recognition, which integrates domain features into a deep learning framework combining bidirectional long short-term memory network and convolutional neural network. When domain features are not added, F1-measure of the model achieves 89.14%. After part-of-speech (POS) features and dictionary features are added, F1-measure of the model achieves 89.7%. Hence, our model achieves an advanced performance in bacterial NER with the domain features. CONCLUSIONS: We propose an efficient method for bacterial named entity recognition which combines domain features and deep learning models. Compared with the previous methods, the effect of our model has been improved. At the same time, the process of complex manual extraction and feature design are significantly reduced.


Assuntos
Algoritmos , Bactérias/genética , Aprendizado Profundo , Bases de Dados como Assunto , Humanos , Modelos Teóricos , Redes Neurais de Computação
19.
BMC Bioinformatics ; 20(Suppl 16): 594, 2019 Dec 02.
Artigo em Inglês | MEDLINE | ID: mdl-31787095

RESUMO

BACKGROUND: Viruses are closely related to bacteria and human diseases. It is of great significance to predict associations between viruses and hosts for understanding the dynamics and complex functional networks in microbial community. With the rapid development of the metagenomics sequencing, some methods based on sequence similarity and genomic homology have been used to predict associations between viruses and hosts. However, the known virus-host association network was ignored in these methods. RESULTS: We proposed a kernelized logistic matrix factorization with integrating different information to predict potential virus-host associations on the heterogeneous network (ILMF-VH) which is constructed by connecting a virus network with a host network based on known virus-host associations. The virus network is constructed based on oligonucleotide frequency measurement, and the host network is constructed by integrating oligonucleotide frequency similarity and Gaussian interaction profile kernel similarity through similarity network fusion. The host prediction accuracy of our method is better than other methods. In addition, case studies show that the host of crAssphage predicted by ILMF-VH is consistent with presumed host in previous studies, and another potential host Escherichia coli is also predicted. CONCLUSIONS: The proposed model is an effective computational tool for predicting interactions between viruses and hosts effectively, and it has great potential for discovering novel hosts of viruses.


Assuntos
Algoritmos , Vírus/genética , Área Sob a Curva , Bases de Dados como Assunto , Interações Hospedeiro-Patógeno , Humanos , Modelos Logísticos
20.
BMC Med Inform Decis Mak ; 19(Suppl 9): 251, 2019 12 12.
Artigo em Inglês | MEDLINE | ID: mdl-31830960

RESUMO

BACKGROUND: In order to better help doctors make decision in the clinical setting, research is necessary to connect electronic health record (EHR) with the biomedical literature. Pseudo Relevance Feedback (PRF) is a kind of classical query modification technique that has shown to be effective in many retrieval models and thus suitable for handling terse language and clinical jargons in EHR. Previous work has introduced a set of constraints (axioms) of traditional PRF model. However, in the feedback document, the importance degree of candidate term and the co-occurrence relationship between a candidate term and a query term. Most methods do not consider both of these factors. Intuitively, terms that have higher co-occurrence degree with a query term are more likely to be related to the query topic. METHODS: In this paper, we incorporate original HAL model into the Rocchio's model, and propose a new concept of term proximity feedback weight. A HAL-based Rocchio's model in the query expansion, called HRoc, is proposed. Meanwhile, we design three normalization methods to better incorporate proximity information to query expansion. Finally, we introduce an adaptive parameter to replace the length of sliding window of HAL model, and it can select window size according to document length. RESULTS: Based on 2016 TREC Clinical Support medicine dataset, experimental results demonstrate that the proposed HRoc and HRoc_AP models superior to other advanced models, such as PRoc2 and TF-PRF methods on various evaluation metrics. Among them, compared with the Proc2 and TF-PRF models, the MAP of our model is increased by 8.5% and 12.24% respectively, while the F1 score of our model is increased by 7.86% and 9.88% respectively. CONCLUSIONS: The proposed HRoc model can effectively enhance the precision and the recall rate of Information Retrieval and gets a more precise result than other models. Furthermore, after introducing self-adaptive parameter, the advanced HRoc_AP model uses less hyper-parameters than other models while enjoys an equivalent performance, which greatly improves the efficiency and applicability of the model and thus helps clinicians to retrieve clinical support document effectively.


Assuntos
Sistemas de Apoio a Decisões Clínicas , Armazenamento e Recuperação da Informação , Modelos Teóricos , Algoritmos , Retroalimentação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA