Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
J Biomed Inform ; 115: 103688, 2021 03.
Artículo en Inglés | MEDLINE | ID: mdl-33545331

RESUMEN

One of the effective missions of biology and medical science is to find disease-related genes. Recent research uses gene/protein networks to find such genes. Due to false positive interactions in these networks, the results often are not accurate and reliable. Integrating multiple gene/protein networks could overcome this drawback, causing a network with fewer false positive interactions. The integration method plays a crucial role in the quality of the constructed network. In this paper, we integrate several sources to build a reliable heterogeneous network, i.e., a network that includes nodes of different types. Due to the different gene/protein sources, four gene-gene similarity networks are constructed first and integrated by applying the type-II fuzzy voter scheme. The resulting gene-gene network is linked to a disease-disease similarity network (as the outcome of integrating four sources) through a two-part disease-gene network. We propose a novel algorithm, namely random walk with restart on the heterogeneous network method with fuzzy fusion (RWRHN-FF). Through running RWRHN-FF over the heterogeneous network, disease-related genes are determined. Experimental results using the leave-one-out cross-validation indicate that RWRHN-FF outperforms existing methods. The proposed algorithm can be applied to find new genes for prostate, breast, gastric, and colon cancers. Since the RWRHN-FF algorithm converges slowly on large heterogeneous networks, we propose a parallel implementation of the RWRHN-FF algorithm on the Apache Spark platform for high-throughput and reliable network inference. Experiments run on heterogeneous networks of different sizes indicate faster convergence compared to other non-distributed modes of implementation.


Asunto(s)
Biología Computacional , Redes Reguladoras de Genes , Algoritmos , Humanos , Masculino
2.
J Biomed Inform ; 116: 103706, 2021 04.
Artículo en Inglés | MEDLINE | ID: mdl-33610879

RESUMEN

Automatic text summarization methods generate a shorter version of the input text to assist the reader in gaining a quick yet informative gist. Existing text summarization methods generally focus on a single aspect of text when selecting sentences, causing the potential loss of essential information. In this study, we propose a domain-specific method that models a document as a multi-layer graph to enable multiple features of the text to be processed at the same time. The features we used in this paper are word similarity, semantic similarity, and co-reference similarity, which are modelled as three different layers. The unsupervised method selects sentences from the multi-layer graph based on the MultiRank algorithm and the number of concepts. The proposed MultiGBS algorithm employs UMLS and extracts the concepts and relationships using different tools such as SemRep, MetaMap, and OGER. Extensive evaluation by ROUGE and BERTScore shows increased F-measure values.


Asunto(s)
Minería de Datos , Semántica , Algoritmos , Lenguaje , Procesamiento de Lenguaje Natural
3.
Animals (Basel) ; 12(1)2021 Dec 23.
Artículo en Inglés | MEDLINE | ID: mdl-35011134

RESUMEN

Mastitis, a disease with high incidence worldwide, is the most prevalent and costly disease in the dairy industry. Gram-negative bacteria such as Escherichia coli (E. coli) are assumed to be among the leading agents causing acute severe infection with clinical signs. E. Coli, environmental mastitis pathogens, are the primary etiological agents of bovine mastitis in well-managed dairy farms. Response to E. Coli infection has a complex pattern affected by genetic and environmental parameters. On the other hand, the efficacy of antibiotics and/or anti-inflammatory treatment in E. coli mastitis is still a topic of scientific debate, and studies on the treatment of clinical cases show conflicting results. Unraveling the bio-signature of mastitis in dairy cattle can open new avenues for drug repurposing. In the current research, a novel, semi-supervised heterogeneous label propagation algorithm named Heter-LP, which applies both local and global network features for data integration, was used to potentially identify novel therapeutic avenues for the treatment of E. coli mastitis. Online data repositories relevant to known diseases, drugs, and gene targets, along with other specialized biological information for E. coli mastitis, including critical genes with robust bio-signatures, drugs, and related disorders, were used as input data for analysis with the Heter-LP algorithm. Our research identified novel drugs such as Glibenclamide, Ipratropium, Salbutamol, and Carbidopa as possible therapeutics that could be used against E. coli mastitis. Predicted relationships can be used by pharmaceutical scientists or veterinarians to find commercially efficacious medicines or a combination of two or more active compounds to treat this infectious disease.

4.
Sci Rep ; 10(1): 8846, 2020 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-32483162

RESUMEN

Rare or orphan diseases affect only small populations, thereby limiting the economic incentive for the drug development process, often resulting in a lack of progress towards treatment. Drug repositioning is a promising approach in these cases, due to its low cost. In this approach, one attempts to identify new purposes for existing drugs that have already been developed and approved for use. By applying the process of drug repositioning to identify novel treatments for rare diseases, we can overcome the lack of economic incentives and make concrete progress towards new therapies. Adrenocortical Carcinoma (ACC) is a rare disease with no practical and definitive therapeutic approach. We apply Heter-LP, a new method of drug repositioning, to suggest novel therapeutic avenues for ACC. Our analysis identifies innovative putative drug-disease, drug-target, and disease-target relationships for ACC, which include Cosyntropin (drug) and DHCR7, IGF1R, MC1R, MAP3K3, TOP2A (protein targets). When results are analyzed using all available information, a number of novel predicted associations related to ACC appear to be valid according to current knowledge. We expect the predicted relations will be useful for drug repositioning in ACC since the resulting ranked lists of drugs and protein targets can be used to expedite the necessary clinical processes.


Asunto(s)
Neoplasias de la Corteza Suprarrenal/patología , Reposicionamiento de Medicamentos/métodos , Neoplasias de la Corteza Suprarrenal/tratamiento farmacológico , Carcinoma Corticosuprarrenal/tratamiento farmacológico , Carcinoma Corticosuprarrenal/patología , Biología Computacional , Cosintropina/uso terapéutico , ADN-Topoisomerasas de Tipo II/metabolismo , Humanos , Oxidorreductasas actuantes sobre Donantes de Grupo CH-CH/antagonistas & inhibidores , Oxidorreductasas actuantes sobre Donantes de Grupo CH-CH/metabolismo , Proteínas de Unión a Poli-ADP-Ribosa/antagonistas & inhibidores , Proteínas de Unión a Poli-ADP-Ribosa/metabolismo , Receptor IGF Tipo 1/antagonistas & inhibidores , Receptor IGF Tipo 1/metabolismo
5.
Methods Mol Biol ; 1903: 291-316, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-30547450

RESUMEN

Using existing drugs for diseases which are not developed for their treating (drug repositioning) provides a new approach to developing drugs at a lower cost, faster, and more secured. We proposed a method for drug repositioning which can predict simple and complex relationships between drugs, drug targets, and diseases. Since biological networks typically present a suitable model for relationships between different biological concepts, our primary approach is to analyze graphs and complex networks in the study of drugs and their therapeutic effects. Given the nature of existing data, the use of semi-supervised learning methods is crucial. So, in our research, we have developed a label propagation method to predict drug-target, drug-disease, and disease-target interactions (Heter-LP), which integrates various data sources at different levels. The predicted interactions are the most prominent relationships among the millions of relationships suggested to the related researchers for further investigation. The main advantages of Heter-LP are the effective integration of input data, eliminating the need for negative samples, and the use of local and global features together. The main steps of this research are as follows. The first step is the construction of a heterogeneous network as a data modeling task, in which data are collected and prepared. The second step is predicting potential interactions. We present a new label propagation algorithm for heterogeneous networks, which consists of two parts, one mapping and the other an iterative method for determining the final labels of the entire network vertices. Finally, for evaluation, we calculated the AUC and AUPR with tenfold cross-validation and compared the results with the best available methods for label propagation in heterogeneous networks and drug repositioning. Also, a series of experimental evaluations and some specific case studies have been presented. The result of the AUC and AUPR for Heter-LP was much higher than the average of the best available methods.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Reposicionamiento de Medicamentos/métodos , Humanos , Aprendizaje Automático Supervisado
6.
J Biomed Inform ; 84: 42-58, 2018 08.
Artículo en Inglés | MEDLINE | ID: mdl-29906584

RESUMEN

OBJECTIVE: Automatic text summarization offers an efficient solution to access the ever-growing amounts of both scientific and clinical literature in the biomedical domain by summarizing the source documents while maintaining their most informative contents. In this paper, we propose a novel graph-based summarization method that takes advantage of the domain-specific knowledge and a well-established data mining technique called frequent itemset mining. METHODS: Our summarizer exploits the Unified Medical Language System (UMLS) to construct a concept-based model of the source document and mapping the document to the concepts. Then, it discovers frequent itemsets to take the correlations among multiple concepts into account. The method uses these correlations to propose a similarity function based on which a represented graph is constructed. The summarizer then employs a minimum spanning tree based clustering algorithm to discover various subthemes of the document. Eventually, it generates the final summary by selecting the most informative and relative sentences from all subthemes within the text. RESULTS: We perform an automatic evaluation over a large number of summaries using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics. The results demonstrate that the proposed summarization system outperforms various baselines and benchmark approaches. CONCLUSION: The carried out research suggests that the incorporation of domain-specific knowledge and frequent itemset mining equips the summarization system in a better way to address the informativeness measurement of the sentences. Moreover, clustering the graph nodes (sentences) can enable the summarizer to target different main subthemes of a source document efficiently. The evaluation results show that the proposed approach can significantly improve the performance of the summarization systems in the biomedical domain.


Asunto(s)
Análisis por Conglomerados , Minería de Datos/métodos , Informática Médica/métodos , Semántica , Algoritmos , Registros Electrónicos de Salud , Reconocimiento de Normas Patrones Automatizadas , Unified Medical Language System
7.
Brief Bioinform ; 19(5): 878-892, 2018 09 28.
Artículo en Inglés | MEDLINE | ID: mdl-28334136

RESUMEN

Experimental drug development is time-consuming, expensive and limited to a relatively small number of targets. However, recent studies show that repositioning of existing drugs can function more efficiently than de novo experimental drug development to minimize costs and risks. Previous studies have proven that network analysis is a versatile platform for this purpose, as the biological networks are used to model interactions between many different biological concepts. The present study is an attempt to review network-based methods in predicting drug targets for drug repositioning. For each method, the preferred type of data set is described, and their advantages and limitations are discussed. For each method, we seek to provide a brief description, as well as an evaluation based on its performance metrics.We conclude that integrating distinct and complementary data should be used because each type of data set reveals a unique aspect of information about an organism. We also suggest that applying a standard set of evaluation metrics and data sets would be essential in this fast-growing research domain.


Asunto(s)
Reposicionamiento de Medicamentos/métodos , Biología Computacional/métodos , Bases de Datos Farmacéuticas/estadística & datos numéricos , Interacciones Farmacológicas , Reposicionamiento de Medicamentos/clasificación , Reposicionamiento de Medicamentos/estadística & datos numéricos , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Redes Reguladoras de Genes , Humanos , Aprendizaje Automático , Redes y Vías Metabólicas , Simulación del Acoplamiento Molecular/estadística & datos numéricos , Mapas de Interacción de Proteínas
8.
Artif Intell Med ; 84: 101-116, 2018 01.
Artículo en Inglés | MEDLINE | ID: mdl-29208328

RESUMEN

Automatic text summarization tools help users in the biomedical domain to acquire their intended information from various textual resources more efficiently. Some of biomedical text summarization systems put the basis of their sentence selection approach on the frequency of concepts extracted from the input text. However, it seems that exploring other measures rather than the raw frequency for identifying valuable contents within an input document, or considering correlations existing between concepts, may be more useful for this type of summarization. In this paper, we describe a Bayesian summarization method for biomedical text documents. The Bayesian summarizer initially maps the input text to the Unified Medical Language System (UMLS) concepts; then it selects the important ones to be used as classification features. We introduce six different feature selection approaches to identify the most important concepts of the text and select the most informative contents according to the distribution of these concepts. We show that with the use of an appropriate feature selection approach, the Bayesian summarizer can improve the performance of biomedical summarization. Using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) toolkit, we perform extensive evaluations on a corpus of scientific papers in the biomedical domain. The results show that when the Bayesian summarizer utilizes the feature selection methods that do not use the raw frequency, it can outperform the biomedical summarizers that rely on the frequency of concepts, domain-independent and baseline methods.


Asunto(s)
Indización y Redacción de Resúmenes/métodos , Investigación Biomédica/métodos , Minería de Datos/métodos , Semántica , Unified Medical Language System , Teorema de Bayes , Lectura
9.
Comput Methods Programs Biomed ; 146: 77-89, 2017 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-28688492

RESUMEN

OBJECTIVE: Automatic text summarization tools can help users in the biomedical domain to access information efficiently from a large volume of scientific literature and other sources of text documents. In this paper, we propose a summarization method that combines itemset mining and domain knowledge to construct a concept-based model and to extract the main subtopics from an input document. Our summarizer quantifies the informativeness of each sentence using the support values of itemsets appearing in the sentence. METHODS: To address the concept-level analysis of text, our method initially maps the original document to biomedical concepts using the Unified Medical Language System (UMLS). Then, it discovers the essential subtopics of the text using a data mining technique, namely itemset mining, and constructs the summarization model. The employed itemset mining algorithm extracts a set of frequent itemsets containing correlated and recurrent concepts of the input document. The summarizer selects the most related and informative sentences and generates the final summary. RESULTS: We evaluate the performance of our itemset-based summarizer using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics, performing a set of experiments. We compare the proposed method with GraphSum, TexLexAn, SweSum, SUMMA, AutoSummarize, the term-based version of the itemset-based summarizer, and two baselines. The results show that the itemset-based summarizer performs better than the compared methods. The itemset-based summarizer achieves the best scores for all the assessed ROUGE metrics (R-1: 0.7583, R-2: 0.3381, R-W-1.2: 0.0934, and R-SU4: 0.3889). We also perform a set of preliminary experiments to specify the best value for the minimum support threshold used in the itemset mining algorithm. The results demonstrate that the value of this threshold directly affects the accuracy of the summarization model, such that a significant decrease can be observed in the performance of summarization due to assigning extreme thresholds. CONCLUSION: Compared to the statistical, similarity, and word frequency methods, the proposed method demonstrates that the summarization model obtained from the concept extraction and itemset mining provides the summarizer with an effective metric for measuring the informative content of sentences. This can lead to an improvement in the performance of biomedical literature summarization.


Asunto(s)
Investigación Biomédica , Minería de Datos , Publicaciones , Algoritmos , Unified Medical Language System
10.
Comput Biol Med ; 88: 18-31, 2017 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-28672176

RESUMEN

Detecting the protein complexes is an important task in analyzing the protein interaction networks. Although many algorithms predict protein complexes in different ways, surveys on the interaction networks indicate that about 50% of detected interactions are false positives. Consequently, the accuracy of existing methods needs to be improved. In this paper we propose a novel algorithm to detect the protein complexes in 'noisy' protein interaction data. First, we integrate several biological data sources to determine the reliability of each interaction and determine more accurate weights for the interactions. A data fusion component is used for this step, based on the interval type-2 fuzzy voter that provides an efficient combination of the information sources. This fusion component detects the errors and diminishes their effect on the detection protein complexes. So in the first step, the reliability scores have been assigned for every interaction in the network. In the second step, we have proposed a general protein complex detection algorithm by exploiting and adopting the strong points of other algorithms and existing hypotheses regarding real complexes. Finally, the proposed method has been applied for the yeast interaction datasets for predicting the interactions. The results show that our framework has a better performance regarding precision and F-measure than the existing approaches.


Asunto(s)
Biología Computacional/métodos , Bases de Datos de Proteínas , Lógica Difusa , Complejos Multiproteicos/química , Complejos Multiproteicos/metabolismo , Mapeo de Interacción de Proteínas/métodos , Algoritmos , Perfilación de la Expresión Génica , Mapas de Interacción de Proteínas , Semántica
11.
J Biomed Inform ; 68: 167-183, 2017 04.
Artículo en Inglés | MEDLINE | ID: mdl-28300647

RESUMEN

Drug repositioning offers an effective solution to drug discovery, saving both time and resources by finding new indications for existing drugs. Typically, a drug takes effect via its protein targets in the cell. As a result, it is necessary for drug development studies to conduct an investigation into the interrelationships of drugs, protein targets, and diseases. Although previous studies have made a strong case for the effectiveness of integrative network-based methods for predicting these interrelationships, little progress has been achieved in this regard within drug repositioning research. Moreover, the interactions of new drugs and targets (lacking any known targets and drugs, respectively) cannot be accurately predicted by most established methods. In this paper, we propose a novel semi-supervised heterogeneous label propagation algorithm named Heter-LP, which applies both local and global network features for data integration. To predict drug-target, disease-target, and drug-disease associations, we use information about drugs, diseases, and targets as collected from multiple sources at different levels. Our algorithm integrates these various types of data into a heterogeneous network and implements a label propagation algorithm to find new interactions. Statistical analyses of 10-fold cross-validation results and experimental analyses support the effectiveness of the proposed algorithm.


Asunto(s)
Algoritmos , Descubrimiento de Drogas , Reposicionamiento de Medicamentos , Humanos , Proteínas
12.
J Theor Biol ; 350: 49-56, 2014 Jun 07.
Artículo en Inglés | MEDLINE | ID: mdl-24491253

RESUMEN

Human haplotypes include essential information about SNPs, which in turn provide valuable information for such studies as finding relationships between some diseases and their potential genetic causes, e.g., for Genome Wide Association Studies. Due to expensiveness of directly determining haplotypes and recent progress in high throughput sequencing, there has been an increasing motivation for haplotype assembly, which is the problem of finding a pair of haplotypes from a set of aligned fragments. Although the problem has been extensively studied and a number of algorithms have already been proposed for the problem, more accurate methods are still beneficial because of high importance of the haplotypes information. In this paper, first, we develop a probabilistic model, that incorporates the Minor Allele Frequency (MAF) of SNP sites, which is missed in the existing maximum likelihood models. Then, we show that the probabilistic model will reduce to the Minimum Error Correction (MEC) model when the information of MAF is omitted and some approximations are made. This result provides a novel theoretical support for the MEC, despite some criticisms against it in the recent literature. Next, under the same approximations, we simplify the model to an extension of the MEC in which the information of MAF is used. Finally, we extend the haplotype assembly algorithm HapSAT by developing a weighted Max-SAT formulation for the simplified model, which is evaluated empirically with positive results.


Asunto(s)
Algoritmos , Frecuencia de los Genes/genética , Haplotipos/genética , Bases de Datos Genéticas , Humanos , Funciones de Verosimilitud , Modelos Genéticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA