Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 37
Filtrar
1.
Health Inf Sci Syst ; 12(1): 6, 2024 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38125666

RESUMO

Prostate cancer is the most common cancer in men worldwide and has a high mortality rate. The complex and heterogeneous development of prostate cancer has become a core obstacle in the treatment of prostate cancer. Simultaneously, the issues of overtreatment in early-stage diagnosis, oligometastasis and dormant tumor recognition, as well as personalized drug utilization, are also specific concerns that require attention in the clinical management of prostate cancer. Some typical genetic mutations have been proved to be associated with prostate cancer's initiation and progression. However, single-omic studies usually are not able to explain the causal relationship between molecular alterations and clinical phenotypes. Exploration from a systems genetics perspective is also lacking in this field, that is, the impact of gene network, the environmental factors, and even lifestyle behaviors on disease progression. At the meantime, current trend emphasizes the utilization of artificial intelligence (AI) and machine learning techniques to process extensive multidimensional data, including multi-omics. These technologies unveil the potential patterns, correlations, and insights related to diseases, thereby aiding the interpretable clinical decision making and applications, namely intelligent medicine. Therefore, there is a pressing need to integrate multidimensional data for identification of molecular subtypes, prediction of cancer progression and aggressiveness, along with perosonalized treatment performing. In this review, we systematically elaborated the landscape from molecular mechanism discovery of prostate cancer to clinical translational applications. We discussed the molecular profiles and clinical manifestations of prostate cancer heterogeneity, the identification of different states of prostate cancer, as well as corresponding precision medicine practices. Taking multi-omics fusion, systems genetics, and intelligence medicine as the main perspectives, the current research results and knowledge-driven research path of prostate cancer were summarized.

2.
Sensors (Basel) ; 20(3)2020 Jan 29.
Artigo em Inglês | MEDLINE | ID: mdl-32013244

RESUMO

Radon gas has been declared a human carcinogen by the United States Environmental Protection Agency (USEPA) and the International Agency for Research on Cancer (IARC). Several studies carried out in Spain highlighted the high radon concentrations in several regions, with Galicia (northwestern Spain) being one of the regions with the highest radon concentrations. The objective of this work was to create a safe and low-cost radon monitoring and alert system, based on open source technologies. To achieve this objective, the system uses devices, a collection of sensors with a processing unit and a communication module, and a backend, responsible for managing all the information, predicting radon levels and issuing alerts using open source technologies. Security is one of the largest challenges for the internet of things, and it is utterly important in the current scenario, given that high radon concentrations pose a health risk. For this reason, this work focuses on securing the entire end-to-end communication path to avoid data forging. The results of this work indicate that the development of a low-cost, yet secured, radon monitoring system is feasible, allowing one to create a network of sensors that can help mitigate the health hazards that high radon concentrations pose.


Assuntos
Exposição Ambiental , Monitoramento Ambiental , Radônio/isolamento & purificação , Poluição do Ar em Ambientes Fechados/prevenção & controle , Humanos , Radônio/toxicidade , Fatores de Risco , Espanha , Estados Unidos
3.
Int J Mol Sci ; 20(18)2019 Sep 05.
Artigo em Inglês | MEDLINE | ID: mdl-31491969

RESUMO

In this work, we improved a previous model used for the prediction of proteomes as new B-cell epitopes in vaccine design. The predicted epitope activity of a queried peptide is based on its sequence, a known reference epitope sequence under specific experimental conditions. The peptide sequences were transformed into molecular descriptors of sequence recurrence networks and were mixed under experimental conditions. The new models were generated using 709,100 instances of pair descriptors for query and reference peptide sequences. Using perturbations of the initial descriptors under sequence or assay conditions, 10 transformed features were used as inputs for seven Machine Learning methods. The best model was obtained with random forest classifiers with an Area Under the Receiver Operating Characteristics (AUROC) of 0.981 ± 0.0005 for the external validation series (five-fold cross-validation). The database included information about 83,683 peptides sequences, 1448 epitope organisms, 323 host organisms, 15 types of in vivo processes, 28 experimental techniques, and 505 adjuvant additives. The current model could improve the in silico predictions of epitopes for vaccine design. The script and results are available as a free repository.


Assuntos
Mapeamento de Epitopos , Aprendizado de Máquina , Peptídeos/imunologia , Sequência de Aminoácidos , Humanos , Peptídeos/química , Curva ROC , Relação Estrutura-Atividade
4.
PeerJ ; 4: e2721, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27920952

RESUMO

The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs. This package includes ten simple and complex regression models to carry out predictive modeling using Machine Learning and well-known regression algorithms. The framework for experimental design presented herein is evaluated and validated against RRegrs. Our results are different for three out of five state-of-the-art simple datasets and it can be stated that the selection of the best model according to our proposal is statistically significant and relevant. It is of relevance to use a statistical approach to indicate whether the differences are statistically significant using this kind of algorithms. Furthermore, our results with three real complex datasets report different best models than with the previously published methodology. Our final goal is to provide a complete methodology for the use of different steps in order to compare the results obtained in Computational Intelligence problems, as well as from other fields, such as for bioinformatics, cheminformatics, etc., given that our proposal is open and modifiable.

5.
Sci Rep ; 6: 19256, 2016 Jan 13.
Artigo em Inglês | MEDLINE | ID: mdl-26758643

RESUMO

Texture information could be used in proteomics to improve the quality of the image analysis of proteins separated on a gel. In order to evaluate the best technique to identify relevant textures, we use several different kernel-based machine learning techniques to classify proteins in 2-DE images into spot and noise. We evaluate the classification accuracy of each of these techniques with proteins extracted from ten 2-DE images of different types of tissues and different experimental conditions. We found that the best classification model was FSMKL, a data integration method using multiple kernel learning, which achieved AUROC values above 95% while using a reduced number of features. This technique allows us to increment the interpretability of the complex combinations of textures and to weight the importance of each particular feature in the final model. In particular the Inverse Difference Moment exhibited the highest discriminating power. A higher value can be associated with an homogeneous structure as this feature describes the homogeneity; the larger the value, the more symmetric. The final model is performed by the combination of different groups of textural features. Here we demonstrated the feasibility of combining different groups of textures in 2-DE image analysis for spot detection.


Assuntos
Eletroforese , Processamento de Imagem Assistida por Computador , Aprendizado de Máquina , Reprodutibilidade dos Testes
6.
J Theor Biol ; 384: 50-8, 2015 Nov 07.
Artigo em Inglês | MEDLINE | ID: mdl-26297890

RESUMO

Signaling proteins are an important topic in drug development due to the increased importance of finding fast, accurate and cheap methods to evaluate new molecular targets involved in specific diseases. The complexity of the protein structure hinders the direct association of the signaling activity with the molecular structure. Therefore, the proposed solution involves the use of protein star graphs for the peptide sequence information encoding into specific topological indices calculated with S2SNet tool. The Quantitative Structure-Activity Relationship classification model obtained with Machine Learning techniques is able to predict new signaling peptides. The best classification model is the first signaling prediction model, which is based on eleven descriptors and it was obtained using the Support Vector Machines-Recursive Feature Elimination (SVM-RFE) technique with the Laplacian kernel (RFE-LAP) and an AUROC of 0.961. Testing a set of 3114 proteins of unknown function from the PDB database assessed the prediction performance of the model. Important signaling pathways are presented for three UniprotIDs (34 PDBs) with a signaling prediction greater than 98.0%.


Assuntos
Peptídeos e Proteínas de Sinalização Intracelular/química , Aprendizado de Máquina , Bases de Dados de Proteínas , Humanos , Relação Quantitativa Estrutura-Atividade , Transdução de Sinais/fisiologia
7.
Anal Biochem ; 454: 53-9, 2014 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-24613260

RESUMO

Block-matching techniques have been widely used in the task of estimating displacement in medical images, and they represent the best approach in scenes with deformable structures such as tissues, fluids, and gels. In this article, a new iterative block-matching technique-based on successive deformation, search, fitting, filtering, and interpolation stages-is proposed to measure elastic displacements in two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) images. The proposed technique uses different deformation models in the task of correlating proteins in real 2D electrophoresis gel images, obtaining an accuracy of 96.6% and improving the results obtained with other techniques. This technique represents a general solution, being easy to adapt to different 2D deformable cases and providing an experimental reference for block-matching algorithms.


Assuntos
Eletroforese em Gel Bidimensional/métodos , Processamento de Imagem Assistida por Computador/métodos , Modelos Teóricos , Proteômica/métodos , Algoritmos
8.
Mol Biosyst ; 10(5): 1063-71, 2014 May.
Artigo em Inglês | MEDLINE | ID: mdl-24556806

RESUMO

Enzyme regulation proteins are very important due to their involvement in many biological processes that sustain life. The complexity of these proteins, the impossibility of identifying direct quantification molecular properties associated with the regulation of enzymatic activities, and their structural diversity creates the necessity for new theoretical methods that can predict the enzyme regulatory function of new proteins. The current work presents the first classification model that predicts protein enzyme regulators using the Markov mean properties. These protein descriptors encode the topological information of the amino acid into contact networks based on amino acid distances and physicochemical properties. MInD-Prot software calculated these molecular descriptors for 2415 protein chains (350 enzyme regulators) using five atom physicochemical properties (Mulliken electronegativity, Kang-Jhon polarizability, vdW area, atom contribution to P) and the protein 3D regions. The best classification models to predict enzyme regulators have been obtained with machine learning algorithms from Weka using 18 features. K* has been demonstrated to be the most accurate algorithm for this protein function classification. Wrapper Subset Evaluator and SVM-RFE approaches were used to perform a feature subset selection with the best results obtained from SVM-RFE. Classification performance employing all the available features can be reached using only the 8 most relevant features selected by SVM-RFE. Thus, the current work has demonstrated the possibility of predicting new molecular targets involved in enzyme regulation using fast theoretical algorithms.


Assuntos
Enzimas/metabolismo , Máquina de Vetores de Suporte , Inteligência Artificial , Bases de Dados de Proteínas , Cadeias de Markov , Curva ROC , Padrões de Referência , Software
9.
J Theor Biol ; 349: 12-21, 2014 May 21.
Artigo em Inglês | MEDLINE | ID: mdl-24491256

RESUMO

The cell death (CD) is a dynamic biological function involved in physiological and pathological processes. Due to the complexity of CD, there is a demand for fast theoretical methods that can help to find new CD molecular targets. The current work presents the first classification model to predict CD-related proteins based on Markov Mean Properties. These protein descriptors have been calculated with the MInD-Prot tool using the topological information of the amino acid contact networks of the 2423 protein chains, five atom physicochemical properties and the protein 3D regions. The Machine Learning algorithms from Weka were used to find the best classification model for CD-related protein chains using all 20 attributes. The most accurate algorithm to solve this problem was K*. After several feature subset methods, the best model found is based on only 11 variables and is characterized by the Area Under the Receiver Operating Characteristic Curve (AUROC) of 0.992 and the true positive rate (TP Rate) of 88.2% (validation set). 7409 protein chains labeled with "unknown function" in the PDB Databank were analyzed with the best model in order to predict the CD-related biological activity. Thus, several proteins have been predicted to have CD-related function in Homo sapiens: 3DRX-involved in virus-host interaction biological process, protein homooligomerization; 4DWF-involved in cell differentiation, chromatin modification, DNA damage response, protein stabilization; 1IUR-involved in ATP binding, chaperone binding; 1J7D-involved in DNA double-strand break processing, histone ubiquitination, nucleotide-binding oligomerization; 1UTU-linked with DNA repair, regulation of transcription; 3EEC-participating to the cellular membrane organization, egress of virus within host cell, class mediator resulting in cell cycle arrest, negative regulation of ubiquitin-protein ligase activity involved in mitotic cell cycle and apoptotic process. Other proteins from bacteria predicted as CD-related are 2G3V - a CAG pathogenicity island protein 13 from Helicobacter pylori, 4G5A - a hypothetical protein in Bacteroides thetaiotaomicron, 1YLK-involved in the nitrogen metabolism of Mycobacterium tuberculosis, and 1XSV - with possible DNA/RNA binding domains. The results demonstrated the possibility to predict CD-related proteins using molecular information encoded into the protein 3D structure. Thus, the current work demonstrated the possibility to predict new molecular targets involved in cell-death processes.


Assuntos
Cadeias de Markov , Proteínas/classificação , Algoritmos , Morte Celular , Bases de Dados de Proteínas , Padrões de Referência
10.
Comput Methods Programs Biomed ; 113(2): 569-84, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24286729

RESUMO

This paper describes a novel weighted voting tree classification scheme for breast density classification. Breast parenchymal density is an important risk factor in breast cancer. Moreover, it is known that mammogram interpretation is more difficult when dense tissue is involved. Therefore, automated breast density classification may aid in breast lesion detection and analysis. Several classification methods have been compared and a novel hierarchical classification procedure of combined classifiers with linear discriminant analysis (LDA) is proposed as the best solution to classify the mammograms into the four BIRADS tissue classes. The classification scheme is based on 298 texture features. Statistical analysis to test the normality and homoscedasticity of the data was carried out for feature selection. Thus, only features that are influenced by the tissue type were considered. The novel classification techniques have been incorporated into a CADe system to drive the detection algorithms and tested with 1459 images. The results obtained on the 322 screen-film mammograms (SFM) of the mini-MIAS dataset show that 99.75% of samples were correctly classified. On the 1137 full-field digital mammograms (FFDM) dataset results show 91.58% agreement. The results of the lesion detection algorithms were obtained from modules integrated within the CADe system developed by the authors and show that using breast tissue classification prior to lesion detection leads to an improvement of the detection results. The tools enhance the detectability of lesions and they are able to distinguish their local attenuation without local tissue density constraints.


Assuntos
Neoplasias da Mama/diagnóstico por imagem , Diagnóstico por Computador/normas , Reações Falso-Positivas , Mamografia , Neoplasias da Mama/classificação , Feminino , Humanos , Intensificação de Imagem Radiográfica/normas
11.
Mol Inform ; 33(4): 276-85, 2014 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-27485774

RESUMO

Lectins (Ls) play an important role in many diseases such as different types of cancer, parasitic infections and other diseases. Interestingly, the Protein Data Bank (PDB) contains +3000 protein 3D structures with unknown function. Thus, we can in principle, discover new Ls mining non-annotated structures from PDB or other sources. However, there are no general models to predict new biologically relevant Ls based on 3D chemical structures. We used the MARCH-INSIDE software to calculate the Markov-Shannon 3D electrostatic entropy parameters for the complex networks of protein structure of 2200 different protein 3D structures, including 1200 Ls. We have performed a Linear Discriminant Analysis (LDA) using these parameters as inputs in order to seek a new Quantitative Structure-Activity Relationship (QSAR) model, which is able to discriminate 3D structure of Ls from other proteins. We implemented this predictor in the web server named LECTINPred, freely available at http://bio-aims.udc.es/LECTINPred.php. This web server showed the following goodness-of-fit statistics: Sensitivity=96.7 % (for Ls), Specificity=87.6 % (non-active proteins), and Accuracy=92.5 % (for all proteins), considering altogether both the training and external prediction series. In mode 2, users can carry out an automatic retrieval of protein structures from PDB. We illustrated the use of this server, in operation mode 1, performing a data mining of PDB. We predicted Ls scores for +2000 proteins with unknown function and selected the top-scored ones as possible lectins. In operation mode 2, LECTINPred can also upload 3D structural models generated with structure-prediction tools like LOMETS or PHYRE2. The new Ls are expected to be of relevance as cancer biomarkers or useful in parasite vaccine design.

12.
Artigo em Inglês | MEDLINE | ID: mdl-23920753

RESUMO

The development of personalized medicine is tightly linked with the correct exploitation of molecular data, especially those associated with the genome sequence along with these use of genomic data there is an increasing demand to share these data for research purposes. Transition of clinical data to research is based in the anonymization of these data so the patient cannot be identified, the use of genomic data poses a great challenge because its nature of identifying data. In this work we have analyzed current methods for genome anonymization and propose a one way encryption method that may enable the process of genomic data sharing accessing only to certain regions of genomes for research purposes.


Assuntos
Algoritmos , Segurança Computacional , Confidencialidade , Bases de Dados Genéticas , Registros Eletrônicos de Saúde , Estudo de Associação Genômica Ampla/métodos , Registros de Saúde Pessoal , Genômica/métodos , Disseminação de Informação/métodos , Armazenamento e Recuperação da Informação/métodos , Software
13.
Curr Top Med Chem ; 13(14): 1681-91, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23889046

RESUMO

The transport of the molecules inside cells is a very important topic, especially in Drug Metabolism. The experimental testing of the new proteins for the transporter molecular function is expensive and inefficient due to the large amount of new peptides. Therefore, there is a need for cheap and fast theoretical models to predict the transporter proteins. In the current work, the primary structure of a protein is represented as a molecular Star graph, characterized by a series of topological indices. The dataset was made up of 2,503 protein chains, out of which 413 have transporter molecular function and 2,090 have no transporter function. These indices were used as input to several classification techniques to find the best Quantitative Structure Activity Relationship (QSAR) model that can evaluate the transporter function of a new protein chain. Among several feature selection techniques, the Support Vector Machine Recursive Feature Elimination allows us to obtain a classification model based on 20 attributes with a true positive rate of 83% and a false positive rate of 16.7%.


Assuntos
Proteínas de Transporte/química , Máquina de Vetores de Suporte , Animais , Proteínas de Transporte/metabolismo , Humanos , Conformação Proteica , Relação Quantitativa Estrutura-Atividade
14.
Curr Comput Aided Drug Des ; 9(2): 206-25, 2013 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-23700999

RESUMO

The successful high throughput screening of molecule libraries for a specific biological property is one of the main improvements in drug discovery. The virtual molecular filtering and screening relies greatly on quantitative structure-activity relationship (QSAR) analysis, a mathematical model that correlates the activity of a molecule with molecular descriptors. QSAR models have the potential to reduce the costly failure of drug candidates in advanced (clinical) stages by filtering combinatorial libraries, eliminating candidates with a predicted toxic effect and poor pharmacokinetic profiles, and reducing the number of experiments. To obtain a predictive and reliable QSAR model, scientists use methods from various fields such as molecular modeling, pattern recognition, machine learning or artificial intelligence. QSAR modeling relies on three main steps: molecular structure codification into molecular descriptors, selection of relevant variables in the context of the analyzed activity, and search of the optimal mathematical model that correlates the molecular descriptors with a specific activity. Since a variety of techniques from statistics and artificial intelligence can aid variable selection and model building steps, this review focuses on the evolutionary computation methods supporting these tasks. Thus, this review explains the basic of the genetic algorithms and genetic programming as evolutionary computation approaches, the selection methods for high-dimensional data in QSAR, the methods to build QSAR models, the current evolutionary feature selection methods and applications in QSAR and the future trend on the joint or multi-task feature selection methods.


Assuntos
Inteligência Artificial , Relação Quantitativa Estrutura-Atividade , Algoritmos , Desenho de Fármacos
16.
Curr Top Med Chem ; 13(5): 591-601, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23548022

RESUMO

Advances done in "-Omics" technologies in the last 20 years have made available to the researches huge amounts of data spanning a wide variety of biological processes from gene sequences to the metabolites present in a cell at a particular time. The management, analysis and representation of these data have been facilitated by mean of the advances made by biomedical informatics in areas such as data architecture and integration systems. However, despite the efforts done by biologists in this area, research in drug design adds a new level of information by incorporating data related with small molecules, which increases the complexity of these integration systems. Current knowledge in molecular biology has shown that it is possible to use comprehensive and integrative approaches to understand the biological processes from a systems perspective and that pathological processes can be mapped into biological networks. Therefore, current strategies for drug design are focusing on how to interact with or modify those networks to achieve the desired effects on what is called systems chemical biology. In this review several approaches for data integration in systems chemical biology will be analysed and described. Furthermore, because of the increasing relevance of the development and use of nanomaterials and their expected impact in the near future, the requirements of integration systems that incorporate these new data types associated with nanomaterials will also be analysed.


Assuntos
Biologia de Sistemas/métodos , Integração de Sistemas , Humanos , Nanoestruturas
17.
Mar Drugs ; 11(3): 830-41, 2013 Mar 12.
Artigo em Inglês | MEDLINE | ID: mdl-23481679

RESUMO

Okadaic Acid (OA) constitutes the main active principle in Diarrhetic Shellfish Poisoning (DSP) toxins produced during Harmful Algal Blooms (HABs), representing a serious threat for human consumers of edible shellfish. Furthermore, OA conveys critical deleterious effects for marine organisms due to its genotoxic potential. Many efforts have been dedicated to OA biomonitoring during the last three decades. However, it is only now with the current availability of detailed molecular information on DNA organization and the mechanisms involved in the maintenance of genome integrity, that a new arena starts opening up for the study of OA contamination. In the present work we address the links between OA genotoxicity and chromatin by combining Next Generation Sequencing (NGS) technologies and bioinformatics. To this end, we introduce CHROMEVALOAdb, a public database containing the chromatin-associated transcriptome of the mussel Mytilus galloprovincialis (a sentinel model organism) in response to OA exposure. This resource constitutes a leap forward for the development of chromatin-based biomarkers, paving the road towards the generation of powerful and sensitive tests for the detection and evaluation of the genotoxic effects of OA in coastal areas.


Assuntos
Bases de Dados Factuais , Mutagênicos/análise , Mytilus/genética , Ácido Okadáico/análise , Animais , Carcinógenos/análise , Carcinógenos/isolamento & purificação , Carcinógenos/toxicidade , Cromatina/metabolismo , Monitoramento Ambiental/métodos , Humanos , Testes de Mutagenicidade/métodos , Mutagênicos/isolamento & purificação , Mutagênicos/toxicidade , Ácido Okadáico/toxicidade , Análise de Sequência de DNA , Transcriptoma
18.
Curr Comput Aided Drug Des ; 9(1): 108-17, 2013 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-23294434

RESUMO

In recent years, in the post genomic era, more and more data is being generated by biological high throughput technologies, such as proteomics and transcriptomics. This omics data can be very useful, but the real challenge is to analyze all this data, as a whole, after integrating it. Biomedical data integration enables making queries to different, heterogeneous and distributed biomedical data sources. Data integration solutions can be very useful not only in the context of drug design, but also in biomedical information retrieval, clinical diagnosis, system biology, etc. In this review, we analyze the most common approaches to biomedical data integration, such as federated databases, data warehousing, multi-agent systems and semantic technology, as well as the solutions developed using these approaches in the past few years.


Assuntos
Biologia Computacional/métodos , Desenho Assistido por Computador , Bases de Dados Factuais , Desenho de Fármacos , Animais , Humanos , Internet
19.
Front Biosci (Elite Ed) ; 5(2): 446-60, 2013 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-23277001

RESUMO

It usually can take more than ten years from the time a new drug is discovered, until can be launched on the market. Regulatory requirements are part of the process of drug discovery and drug development. It acts at every developmental stage. Regulatory affairs works to establish an effective and uniform balance between voluntary and regulatory compliance and agency responsiveness to consumer needs. It evaluates and coordinates all proposed legal actions to ascertain compliance with regulatory policy. The ontology presented for regulatory affairs and drug research and development gives us the possibility to correlate information from different levels and to discover new relationships between the legal aspects. In addition, the transparency of the information is affected by the inability of existing integration strategies to organize and apply the available knowledge to the range of real scientific and business issue in critical safety and regulatory applications. Therefore, the semantic technologies based on ontologies make the knowledge reusable by several applications across business, from discovery to corporate affairs.


Assuntos
Técnicas de Química Sintética/métodos , Bases de Dados Factuais , Descoberta de Drogas/legislação & jurisprudência , Descoberta de Drogas/métodos , Avaliação Pré-Clínica de Medicamentos/métodos , Regulamentação Governamental , Disseminação de Informação/métodos , Disseminação de Informação/legislação & jurisprudência , Internet , Software
20.
J Theor Biol ; 317: 331-7, 2013 Jan 21.
Artigo em Inglês | MEDLINE | ID: mdl-23116665

RESUMO

Aging and life quality is an important research topic nowadays in areas such as life sciences, chemistry, pharmacology, etc. People live longer, and, thus, they want to spend that extra time with a better quality of life. At this regard, there exists a tiny subset of molecules in nature, named antioxidant proteins that may influence the aging process. However, testing every single protein in order to identify its properties is quite expensive and inefficient. For this reason, this work proposes a model, in which the primary structure of the protein is represented using complex network graphs that can be used to reduce the number of proteins to be tested for antioxidant biological activity. The graph obtained as a representation will help us describe the complex system by using topological indices. More specifically, in this work, Randic's Star Networks have been used as well as the associated indices, calculated with the S2SNet tool. In order to simulate the existing proportion of antioxidant proteins in nature, a dataset containing 1999 proteins, of which 324 are antioxidant proteins, was created. Using this data as input, Star Graph Topological Indices were calculated with the S2SNet tool. These indices were then used as input to several classification techniques. Among the techniques utilised, the Random Forest has shown the best performance, achieving a score of 94% correctly classified instances. Although the target class (antioxidant proteins) represents a tiny subset inside the dataset, the proposed model is able to achieve a percentage of 81.8% correctly classified instances for this class, with a precision of 81.3%.


Assuntos
Algoritmos , Antioxidantes/metabolismo , Proteínas/metabolismo , Sequência de Aminoácidos , Bases de Dados de Proteínas , Dados de Sequência Molecular , Proteínas/química , Relação Quantitativa Estrutura-Atividade , Curva ROC
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...