Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-38691429

RESUMO

DNA damage is a critical factor in the onset and progression of cancer. When DNA is damaged, the number of genetic mutations increases, making it necessary to activate DNA repair mechanisms. A crucial factor in the base excision repair process, which helps maintain the stability of the genome, is an enzyme called DNA polymerase [Formula: see text] (Pol[Formula: see text]) encoded by the POLB gene. It plays a vital role in the repair of damaged DNA. Additionally, variations known as Single Nucleotide Polymorphisms (SNPs) in the POLB gene can potentially affect the ability to repair DNA. This study uses bioinformatics tools that extract important features from SNPs to construct a feature matrix, which is then used in combination with machine learning algorithms to predict the likelihood of developing cancer associated with a specific mutation. Eight different machine learning algorithms were used to investigate the relationship between POLB gene variations and their potential role in cancer onset. This study not only highlights the complex link between POLB gene SNPs and cancer, but also underscores the effectiveness of machine learning approaches in genomic studies, paving the way for advanced predictive models in genetic and cancer research.

2.
Biology (Basel) ; 12(4)2023 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-37106719

RESUMO

Gene expression profiling is one of the most recognized techniques for inferring gene regulators and their potential targets in gene regulatory networks (GRN). The purpose of this study is to build a regulatory network for the budding yeast Saccharomyces cerevisiae genome by incorporating the use of RNA-seq and microarray data represented by a wide range of experimental conditions. We introduce a pipeline for data analysis, data preparation, and training models. Several kernel classification models; including one-class, two-class, and rare event classification methods, are used to categorize genes. We test the impact of the normalization techniques on the overall performance of RNA-seq. Our findings provide new insights into the interactions between genes in the yeast regulatory network. The conclusions of our study have significant importance since they highlight the effectiveness of classification and its contribution towards enhancing the present comprehension of the yeast regulatory network. When assessed, our pipeline demonstrates strong performance across different statistical metrics, such as a 99% recall rate and a 98% AUC score.

3.
Front Mol Biosci ; 9: 900771, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35769908

RESUMO

DNA polymerase ß (pol ß) is a member of the X- family of DNA polymerases that catalyze the distributive addition of nucleoside triphosphates during base excision DNA repair. Previous studies showed that the enzyme was phosphorylated in vitro with PKC at two serines (44 and 55), causing loss of DNA polymerase activity but not DNA binding. In this work, we have investigated the phosphorylation-induced conformational changes in DNA polymerase ß in the presence of Mg ions. We report a comprehensive atomic resolution study of wild type and phosphorylated DNA polymerase using molecular dynamics (MD) simulations. The results are examined via novel methods of internal dynamics and energetics analysis to reveal the underlying mechanism of conformational transitions observed in DNA pol ß. The results show drastic conformational changes in the structure of DNA polymerase ß due to S44 phosphorylation. Phosphorylation-induced conformational changes transform the enzyme from a closed to an open structure. The dynamic cross-correlation shows that phosphorylation enhances the correlated motions between the different domains. Centrality network analysis reveals that the S44 phosphorylation causes structural rearrangements and modulates the information pathway between the Lyase domain and base pair binding domain. Further analysis of our simulations reveals that a critical hydrogen bond (between S44 and E335) disruption and the formation of three additional salt bridges are potential drivers of these conformational changes. In addition, we found that two of these additional salt bridges form in the presence of Mg ions on the active sites of the enzyme. These results agree with our previous study of DNA pol ß S44 phosphorylation without Mg ions which predicted the deactivation of DNA pol ß. However, the phase space of structural transitions induced by S44 phosphorylation is much richer in the presence of Mg ions.

4.
PLoS One ; 17(4): e0264771, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35439250

RESUMO

Most realistic social communities are multi-profiled cross-communities constructed from users sharing commonalities that include adaptive social profile ingredients (i.e., natural adaptation to certain social traits). The most important types of such cross-communities are the densest holonic ones, because they exhibit many interesting properties. For example, such a cross-community can represent a portion of users, who share all the following traits: ethnicity, religion, neighbourhood, and age-range. The denser a multi-profiled cross-community is, the more granular and holonic it is and the greater the number of its members, whose interests are exhibited in the common interests of the entire cross-community. Moreover, the denser a cross-community is, the more specific and distinguishable its interests are (e.g., more distinguishable from other cross-communities). Unfortunately, methods that advocate the detection of granular multi-profiled cross-communities have been under-researched. Most current methods detect multi-profiled communities without consideration to their granularities. To overcome this, we introduce in this paper a novel methodology for detecting the smallest and most granular multi-profiled cross-community, to which an active user belongs. The methodology is implemented in a system called ID_CC. To improve the accuracy of detecting such cross-communities, we first uncover missing links in social networks. It is imperative for uncovering such missing links because they may contain valuable information (social characteristics commonalities, cross-memberships, etc.). We evaluated ID_CC by comparing it experimentally with eight methods. The results of the experiments revealed marked improvement.


Assuntos
Religião , Rede Social , Coleta de Dados , Etnicidade , Humanos , Características de Residência
5.
PLoS One ; 16(1): e0243127, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33406077

RESUMO

A traceable biomarker is a member of a disease's molecular pathway. A disease may be associated with several molecular pathways. Each different combination of these molecular pathways, to which detected traceable biomarkers belong, may serve as an indicative of the elicitation of the disease at a different time frame in the future. Based on this notion, we introduce a novel methodology for personalizing an individual's degree of future susceptibility to a specific disease. We implemented the methodology in a working system called Susceptibility Degree to a Disease Predictor (SDDP). For a specific disease d, let S be the set of molecular pathways, to which traceable biomarkers detected from most patients of d belong. For the same disease d, let S' be the set of molecular pathways, to which traceable biomarkers detected from a certain individual belong. SDDP is able to infer the subset S'' ⊆{S-S'} of undetected molecular pathways for the individual. Thus, SDDP can infer undetected molecular pathways of a disease for an individual based on few molecular pathways detected from the individual. SDDP can also help in inferring the combination of molecular pathways in the set {S'+S''}, whose traceable biomarkers collectively is an indicative of the disease. SDDP is composed of the following four components: information extractor, interrelationship between molecular pathways modeler, logic inferencer, and risk indicator. The information extractor takes advantage of the exponential increase of biomedical literature to automatically extract the common traceable biomarkers for a specific disease. The interrelationship between molecular pathways modeler models the hierarchical interrelationships between the molecular pathways of the traceable biomarkers. The logic inferencer transforms the hierarchical interrelationships between the molecular pathways into rule-based specifications. It employs the specification rules and the inference rules for predicate logic to infer as many as possible undetected molecular pathways of a disease for an individual. The risk indicator outputs a risk indicator value that reflects the individual's degree of future susceptibility to the disease. We evaluated SDDP by comparing it experimentally with other methods. Results revealed marked improvement.


Assuntos
Algoritmos , Suscetibilidade a Doenças , Biomarcadores/metabolismo , Quimiocinas CXC/metabolismo , Diabetes Mellitus Tipo 2/patologia , Humanos , Armazenamento e Recuperação da Informação , Lógica , Risco
6.
Annu Int Conf IEEE Eng Med Biol Soc ; 2020: 5284-5287, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-33019176

RESUMO

Each mixture of deficient molecular families of a specific disease induces the disease at a different time frame in the future. Based on this, we propose a novel methodology for personalizing a person's level of future susceptibility to a specific disease by inferring the mixture of his/her molecular families, whose combined deficiencies is likely to induce the disease. We implemented the methodology in a working system called DRIT, which consists of the following components: logic inferencer, information extractor, risk indicator, and interrelationship between molecular families modeler. The information extractor takes advantage of the exponential increase of biomedical literature to extract the common biomarkers that test positive among most patients with a specific disease. The logic inferencer transforms the hierarchical interrelationships between the molecular families of a disease into rule-based specifications. The interrelationship between molecular families modeler models the hierarchical interrelationships between the molecular families, whose biomarkers were extracted by the information extractor. It employs the specification rules and the inference rules for predicate logic to infer as many as possible probable deficient molecular families for a person based on his/her few molecular families, whose biomarkers tested positive by medical screening. The risk indicator outputs a risk indicator value that reflects a person's level of future susceptibility to the disease. We evaluated DRIT by comparing it experimentally with a comparable method. Results revealed marked improvement.


Assuntos
Lógica , Publicações , Biomarcadores , Feminino , Humanos , Masculino , Fatores de Risco
7.
Evol Bioinform Online ; 16: 1176934320920310, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-35173404

RESUMO

Computational prediction of gene-gene associations is one of the productive directions in the study of bioinformatics. Many tools are developed to infer the relation between genes using different biological data sources. The association of a pair of genes deduced from the analysis of biological data becomes meaningful when it reflects the directionality and the type of reaction between genes. In this work, we follow another method to construct a causal gene co-expression network while identifying transcription factors in each pair of genes using microarray expression data. We adopt a machine learning technique based on a logistic regression model to tackle the sparsity of the network and to improve the quality of the prediction accuracy. The proposed system classifies each pair of genes into either connected or nonconnected class using the data of the correlation between these genes in the whole Saccharomyces cerevisiae genome. The accuracy of the classification model in predicting related genes was evaluated using several data sets for the yeast regulatory network. Our system achieves high performance in terms of several statistical measures.

8.
IEEE Trans Cybern ; 50(2): 525-535, 2020 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-30281507

RESUMO

Over the past decade, keystroke-based pattern recognition techniques, as a forensic tool for behavioral biometrics, have gained increasing attention. Although a number of machine learning-based approaches have been proposed, they are limited in terms of their capability to recognize and profile a set of an individual's characteristics. In addition, up to today, their focus was primarily gender and age, which seem to be more appropriate for commercial applications (such as developing commercial software), leaving out from research other characteristics, such as the educational level. Educational level is an acquired user characteristic, which can improve targeted advertising, as well as provide valuable information in a digital forensic investigation, when it is known. In this context, this paper proposes a novel machine learning model, the randomized radial basis function network, which recognizes and profiles the educational level of an individual who stands behind the keyboard. The performance of the proposed model is evaluated by using the empirical data obtained by recording volunteers' keystrokes during their daily usage of a computer. Its performance is also compared with other well-referenced machine learning models using our keystroke dynamic datasets. Although the proposed model achieves high accuracy in educational level prediction of an unknown user, it suffers from high computational cost. For this reason, we examine ways to reduce the time that is needed to build our model, including the use of a novel data condensation method, and discuss the tradeoff between an accurate and a fast prediction. To the best of our knowledge, this is the first model in the literature that predicts the educational level of an individual based on the keystroke dynamics information only.

9.
Environ Sci Pollut Res Int ; 27(3): 3086-3099, 2020 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-31838698

RESUMO

The aim of this work consists on the synthesis of a nanomaterial for heavy metal ion removal from aqueous solutions. Al-doped ZnO (ZnO:Alx%) nanopowders with 0 to 5% Al content are prepared via an amended sol-gel method. The morphology and microstructure of the prepared ZnO:Alx% are probed by means of scanning electron microscopy (SEM), X-ray particles diffraction (XRD) analysis, energy dispersive X-ray spectroscopy (EDS) and elemental mapping. The findings reveal the prevalence of the hexagonal wurtzite ZnO structure with increasing crystallite size (45 to 60 nm) as a result of Al doping. SEM images show nearly spherical nanoparticles with considerable aggregation. EDS and elemental mapping analysis confirm the incorporation of Al within ZnO host lattice. The relatively large surface area as estimated from N2 adsorption makes the nanopowders very favorable for the uptake Cd(II), Cr (IV), Co (II) and Ni(II) from aqueous solution. The ZnO:Alx% with 1 wt% Al exhibits the highest uptake rate of heavy metal ions. The adsorption process has been found to be spontaneous and endothermic and obey Langmuir adsorption model. The high tendency of the prepared nanoparticles to eliminate heavy metal ions renders them suitable candidates for environmental remediation. Desorption studies with 0.1 M NaOH indicate that ZnO:Alx% can be regenerated effectively.


Assuntos
Metais Pesados , Nanopartículas , Óxido de Zinco , Alumínio/química , Íons
10.
BMC Bioinformatics ; 20(1): 71, 2019 Feb 08.
Artigo em Inglês | MEDLINE | ID: mdl-30736739

RESUMO

BACKGROUND: A large number of computational methods have been proposed for predicting protein functions. The underlying techniques adopted by most of these methods revolve around predicting the functions of an unannotated protein p from already annotated proteins that have similar characteristics as p. Recent Information Extraction methods take advantage of the huge growth of biomedical literature to predict protein functions. They extract biological molecule terms that directly describe protein functions from biomedical texts. However, they consider only explicitly mentioned terms that co-occur with proteins in texts. We observe that some important biological molecule terms pertaining functional categories may implicitly co-occur with proteins in texts. Therefore, the methods that rely solely on explicitly mentioned terms in texts may miss vital functional information implicitly mentioned in the texts. RESULTS: To overcome the limitations of methods that rely solely on explicitly mentioned terms in texts to predict protein functions, we propose in this paper an Information Extraction system called PL-PPF. The proposed system employs techniques for predicting the functions of proteins based on their co-occurrences with explicitly and implicitly mentioned biological molecule terms that pertain functional categories in biomedical literature. That is, PL-PPF employs a combination of statistical-based explicit term extraction techniques and logic-based implicit term extraction techniques. The statistical component of PL-PPF predicts some of the functions of a protein by extracting the explicitly mentioned functional terms that directly describe the functions of the protein from the biomedical texts associated with the protein. The logic-based component of PL-PPF predicts additional functions of the protein by inferring the functional terms that co-occur implicitly with the protein in the biomedical texts associated with it. First, the system employs its statistical-based component to extract the explicitly mentioned functional terms. Then, it employs its logic-based component to infer additional functions of the protein. Our hypothesis is that important biological molecule terms pertaining functional categories of proteins are likely to co-occur implicitly with the proteins in biomedical texts. We evaluated PL-PPF experimentally and compared it with five systems. Results revealed better prediction performance. CONCLUSIONS: The experimental results showed that PL-PPF outperformed the other five systems. This is an indication of the effectiveness and practical viability of PL-PPF's combination of explicit and implicit techniques. We also evaluated two versions of PL-PPF: one adopting the complete techniques (i.e., adopting both the implicit and explicit techniques) and the other adopting only the explicit terms co-occurrence extraction techniques (i.e., without the inference rules for predicate logic). The experimental results showed that the complete version outperformed significantly the other version. This is attributed to the effectiveness of the rules of predicate logic to infer functional terms that co-occur implicitly with proteins in biomedical texts. A demo application of PL-PPF can be accessed through the following link: http://ecesrvr.kustar.ac.ae:8080/plppf/.


Assuntos
Lógica , Proteínas/metabolismo , Publicações , Bases de Dados Genéticas , Ontologia Genética , Genoma Fúngico , Armazenamento e Recuperação da Informação , Anotação de Sequência Molecular , Reprodutibilidade dos Testes , Saccharomyces cerevisiae/genética
11.
BMC Bioinformatics ; 20(1): 70, 2019 Feb 08.
Artigo em Inglês | MEDLINE | ID: mdl-30736752

RESUMO

BACKGROUND: Understanding the genetic networks and their role in chronic diseases (e.g., cancer) is one of the important objectives of biological researchers. In this work, we present a text mining system that constructs a gene-gene-interaction network for the entire human genome and then performs network analysis to identify disease-related genes. We recognize the interacting genes based on their co-occurrence frequency within the biomedical literature and by employing linear and non-linear rare-event classification models. We analyze the constructed network of genes by using different network centrality measures to decide on the importance of each gene. Specifically, we apply betweenness, closeness, eigenvector, and degree centrality metrics to rank the central genes of the network and to identify possible cancer-related genes. RESULTS: We evaluated the top 15 ranked genes for different cancer types (i.e., Prostate, Breast, and Lung Cancer). The average precisions for identifying breast, prostate, and lung cancer genes vary between 80-100%. On a prostate case study, the system predicted an average of 80% prostate-related genes. CONCLUSIONS: The results show that our system has the potential for improving the prediction accuracy of identifying gene-gene interaction and disease-gene associations. We also conduct a prostate cancer case study by using the threshold property in logistic regression, and we compare our approach with some of the state-of-the-art methods.


Assuntos
Epistasia Genética , Redes Reguladoras de Genes , Predisposição Genética para Doença , Humanos , Modelos Logísticos , Masculino , Neoplasias da Próstata/genética , Curva ROC
12.
Artigo em Inglês | MEDLINE | ID: mdl-27723600

RESUMO

This study proposes a new method to determine the functions of an unannotated protein. The proteins and amino acid residues mentioned in biomedical texts associated with an unannotated protein can be considered as characteristics terms for , which are highly predictive of the potential functions of . Similarly, proteins and amino acid residues mentioned in biomedical texts associated with proteins annotated with a functional category can be considered as characteristics terms of . We introduce in this paper an information extraction system called IFP_IFC that predicts the functions of an unannotated protein by representing and each functional category by a vector of weights. Each weight reflects the degree of association between a characteristic term and (or a characteristic term and ). First, IFP_IFC constructs a network, whose nodes represent the different functional categories, and its edges the interrelationships between the nodes. Then, it determines the functions of by employing random walks with restarts on the mentioned network. The walker is the vector of . Finally, is assigned to the functional categories of the nodes in the network that are visited most by the walker. We evaluated the quality of IFP_IFC by comparing it experimentally with two other systems. Results showed marked improvement.


Assuntos
Biologia Computacional/métodos , Mineração de Dados/métodos , Proteínas/classificação , Proteínas/fisiologia , Bases de Dados de Proteínas , Anotação de Sequência Molecular , Reprodutibilidade dos Testes
13.
Sci Rep ; 7(1): 15784, 2017 Nov 17.
Artigo em Inglês | MEDLINE | ID: mdl-29150626

RESUMO

Text mining has become an important tool in bioinformatics research with the massive growth in the biomedical literature over the past decade. Mining the biomedical literature has resulted in an incredible number of computational algorithms that assist many bioinformatics researchers. In this paper, we present a text mining system called Gene Interaction Rare Event Miner (GIREM) that constructs gene-gene-interaction networks for human genome using information extracted from biomedical literature. GIREM identifies functionally related genes based on their co-occurrences in the abstracts of biomedical literature. For a given gene g, GIREM first extracts the set of genes found within the abstracts of biomedical literature associated with g. GIREM aims at enhancing biological text mining approaches by identifying the semantic relationship between each co-occurrence of a pair of genes in abstracts using the syntactic structures of sentences and linguistics theories. It uses a supervised learning algorithm, weighted logistic regression to label pairs of genes to related or un-related classes, and to reflect the population proportion using smaller samples. We evaluated GIREM by comparing it experimentally with other well-known approaches and a protein-protein interactions database. Results showed marked improvement.


Assuntos
Mineração de Dados , Redes Reguladoras de Genes , Publicações , Genes , Curva ROC
15.
Water Sci Technol ; 73(4): 881-9, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26901732

RESUMO

Herein the degradation of malachite green (MG) dye from aqueous medium by vanadium doped zinc oxide (ZnO:V3%) nanopowder was investigated. The specific surface area and pore volume of the nanopowder was characterized by nitrogen adsorption method. Batch experimental procedures were conducted to investigate the adsorption and photocatalytic degradation of MG dye. Adsorption kinetics investigations were performed by varying the amount of the catalyst and the initial dye concentrations. Adsorption and photocatalytic degradation data were modeled using the Lagergren pseudo-first-order and second-order kinetic equation. The results showed that the ZnO:V3% nanopowder was particularly effective for the removal of MG and data were found to comply with Lagergreen pseudo-first-order kinetic model.


Assuntos
Recuperação e Remediação Ambiental/métodos , Nanopartículas/química , Corantes de Rosanilina/química , Vanádio/química , Poluentes Químicos da Água/química , Óxido de Zinco/química , Adsorção/efeitos da radiação , Catálise/efeitos da radiação , Recuperação e Remediação Ambiental/instrumentação , Concentração de Íons de Hidrogênio , Cinética , Luz
16.
BMC Bioinformatics ; 17: 34, 2016 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-26767846

RESUMO

BACKGROUND: All proteins associate with other molecules. These associated molecules are highly predictive of the potential functions of proteins. The association of a protein and a molecule can be determined from their co-occurrences in biomedical abstracts. Extensive semantically related co-occurrences of a protein's name and a molecule's name in the sentences of biomedical abstracts can be considered as indicative of the association between the protein and the molecule. Dependency parsers extract textual relations from a text by determining the grammatical relations between words in a sentence. They can be used for determining the textual relations between proteins and molecules. Despite their success, they may extract textual relations with low precision. This is because they do not consider the semantic relationships between terms in a sentence (i.e., they consider only the structural relationships between the terms). Moreover, they may not be well suited for complex sentences and for long-distance textual relations. RESULTS: We introduce an information extraction system called PPFBM that predicts the functions of unannotated proteins from the molecules that associate with these proteins. PPFBM represents each protein by the other molecules that associate with it in the abstracts referenced in the protein's entries in reliable biological databases. It automatically extracts each co-occurrence of a protein-molecule pair that represents semantic relationship between the pair. Towards this, we present novel semantic rules that identify the semantic relationship between each co-occurrence of a protein-molecule pair using the syntactic structures of sentences and linguistics theories. PPFBM determines the functions of an un-annotated protein p as follows. First, it determines the set S r of annotated proteins that is semantically similar to p by matching the molecules representing p and the annotated proteins. Then, it assigns p the functional category FC if the significance of the frequency of occurrences of S r in abstracts associated with proteins annotated with FC is statistically significantly different than the significance of the frequency of occurrences of S r in abstracts associated with proteins annotated with all other functional categories. We evaluated the quality of PPFBM by comparing it experimentally with two other systems. Results showed marked improvement. CONCLUSIONS: The experimental results demonstrated that PPFBM outperforms other systems that predict protein function from the textual information found within biomedical abstracts. This is because these system do not consider the semantic relationships between terms in a sentence (i.e., they consider only the structural relationships between the terms). PPFBM's performance over these system increases steadily as the number of training protein increases. That is, PPFBM's prediction performance becomes more accurate constantly, as the size of training proteins gets larger. This is because every time a new set of test proteins is added to the current set of training proteins. A demo of PPFBM that annotates each input Yeast protein (SGD (Saccharomyces Genome Database). Available at: http://www.yeastgenome.org/download-data/curation) with the functions of Gene Ontology terms is available at: (see Appendix for more details about the demo) http://ecesrvr.kustar.ac.ae:8080/PPFBM/.


Assuntos
Armazenamento e Recuperação da Informação/métodos , Anotação de Sequência Molecular , Complexos Multiproteicos/metabolismo , Software , Bases de Dados Factuais , Genoma Fúngico , Proteínas/metabolismo , Proteínas/fisiologia , PubMed , Saccharomyces/genética , Saccharomyces/metabolismo , Semântica
17.
IEEE Trans Cybern ; 46(8): 1796-806, 2016 08.
Artigo em Inglês | MEDLINE | ID: mdl-26540724

RESUMO

Botnets, which consist of remotely controlled compromised machines called bots, provide a distributed platform for several threats against cyber world entities and enterprises. Intrusion detection system (IDS) provides an efficient countermeasure against botnets. It continually monitors and analyzes network traffic for potential vulnerabilities and possible existence of active attacks. A payload-inspection-based IDS (PI-IDS) identifies active intrusion attempts by inspecting transmission control protocol and user datagram protocol packet's payload and comparing it with previously seen attacks signatures. However, the PI-IDS abilities to detect intrusions might be incapacitated by packet encryption. Traffic-based IDS (T-IDS) alleviates the shortcomings of PI-IDS, as it does not inspect packet payload; however, it analyzes packet header to identify intrusions. As the network's traffic grows rapidly, not only the detection-rate is critical, but also the efficiency and the scalability of IDS become more significant. In this paper, we propose a state-of-the-art T-IDS built on a novel randomized data partitioned learning model (RDPLM), relying on a compact network feature set and feature selection techniques, simplified subspacing and a multiple randomized meta-learning technique. The proposed model has achieved 99.984% accuracy and 21.38 s training time on a well-known benchmark botnet dataset. Experiment results demonstrate that the proposed methodology outperforms other well-known machine-learning models used in the same detection task, namely, sequential minimal optimization, deep neural network, C4.5, reduced error pruning tree, and randomTree.

18.
Artigo em Inglês | MEDLINE | ID: mdl-26415184

RESUMO

Biologists often need to know the set of genes associated with a given set of genes or a given disease. We propose in this paper a classifier system called Monte Carlo for Genetic Network (MCforGN) that can construct genetic networks, identify functionally related genes, and predict gene-disease associations. MCforGN identifies functionally related genes based on their co-occurrences in the abstracts of biomedical literature. For a given gene g , the system first extracts the set of genes found within the abstracts of biomedical literature associated with g. It then ranks these genes to determine the ones with high co-occurrences with g . It overcomes the limitations of current approaches that employ analytical deterministic algorithms by applying Monte Carlo Simulation to approximate genetic networks. It does so by conducting repeated random sampling to obtain numerical results and to optimize these results. Moreover, it analyzes results to obtain the probabilities of different genes' co-occurrences using series of statistical tests. MCforGN can detect gene-disease associations by employing a combination of centrality measures (to identify the central genes in disease-specific genetic networks) and Monte Carlo Simulation. MCforGN aims at enhancing state-of-the-art biological text mining by applying novel extraction techniques. We evaluated MCforGN by comparing it experimentally with nine approaches. Results showed marked improvement.


Assuntos
Simulação por Computador , Mineração de Dados/métodos , Redes Reguladoras de Genes/genética , Método de Monte Carlo , Algoritmos , Biologia Computacional/métodos , Estudos de Associação Genética , Humanos , Neoplasias/genética
19.
Artigo em Inglês | MEDLINE | ID: mdl-26357314

RESUMO

Proline residues are common source of kinetic complications during folding. The X-Pro peptide bond is the only peptide bond for which the stability of the cis and trans conformations is comparable. The cis-trans isomerization (CTI) of X-Pro peptide bonds is a widely recognized rate-limiting factor, which can not only induces additional slow phases in protein folding but also modifies the millisecond and sub-millisecond dynamics of the protein. An accurate computational prediction of proline CTI is of great importance for the understanding of protein folding, splicing, cell signaling, and transmembrane active transport in both the human body and animals. In our earlier work, we successfully developed a biophysically motivated proline CTI predictor utilizing a novel tree-based consensus model with a powerful metalearning technique and achieved 86.58 percent Q2 accuracy and 0.74 Mcc, which is a better result than the results (70-73 percent Q2 accuracies) reported in the literature on the well-referenced benchmark dataset. In this paper, we describe experiments with novel randomized subspace learning and bootstrap seeding techniques as an extension to our earlier work, the consensus models as well as entropy-based learning methods, to obtain better accuracy through a precise and robust learning scheme for proline CTI prediction.


Assuntos
Biologia Computacional/métodos , Aprendizado de Máquina , Prolina/química , Algoritmos , Isomerismo , Modelos Moleculares
20.
Artigo em Inglês | MEDLINE | ID: mdl-26357323

RESUMO

We propose a classifier system called iPFPi that predicts the functions of un-annotated proteins. iPFPi assigns an un-annotated protein P the functions of GO annotation terms that are semantically similar to P. An un-annotated protein P and a GO annotation term T are represented by their characteristics. The characteristics of P are GO terms found within the abstracts of biomedical literature associated with P. The characteristics of Tare GO terms found within the abstracts of biomedical literature associated with the proteins annotated with the function of T. Let F and F/ be the important (dominant) sets of characteristic terms representing T and P, respectively. iPFPi would annotate P with the function of T, if F and F/ are semantically similar. We constructed a novel semantic similarity measure that takes into consideration several factors, such as the dominance degree of each characteristic term t in set F based on its score, which is a value that reflects the dominance status of t relative to other characteristic terms, using pairwise beats and looses procedure. Every time a protein P is annotated with the function of T, iPFPi updates and optimizes the current scores of the characteristic terms for T based on the weights of the characteristic terms for P. Set F will be updated accordingly. Thus, the accuracy of predicting the function of T as the function of subsequent proteins improves. This prediction accuracy keeps improving over time iteratively through the cumulative weights of the characteristic terms representing proteins that are successively annotated with the function of T. We evaluated the quality of iPFPi by comparing it experimentally with two recent protein function prediction systems. Results showed marked improvement.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas , Proteínas/classificação , Proteínas/metabolismo , Anotação de Sequência Molecular , Proteínas/química , Reprodutibilidade dos Testes , Semântica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...