RESUMEN
Radon gas has been declared a human carcinogen by the United States Environmental Protection Agency (USEPA) and the International Agency for Research on Cancer (IARC). Several studies carried out in Spain highlighted the high radon concentrations in several regions, with Galicia (northwestern Spain) being one of the regions with the highest radon concentrations. The objective of this work was to create a safe and low-cost radon monitoring and alert system, based on open source technologies. To achieve this objective, the system uses devices, a collection of sensors with a processing unit and a communication module, and a backend, responsible for managing all the information, predicting radon levels and issuing alerts using open source technologies. Security is one of the largest challenges for the internet of things, and it is utterly important in the current scenario, given that high radon concentrations pose a health risk. For this reason, this work focuses on securing the entire end-to-end communication path to avoid data forging. The results of this work indicate that the development of a low-cost, yet secured, radon monitoring system is feasible, allowing one to create a network of sensors that can help mitigate the health hazards that high radon concentrations pose.
Asunto(s)
Exposición a Riesgos Ambientales , Monitoreo del Ambiente , Radón/aislamiento & purificación , Contaminación del Aire Interior/prevención & control , Humanos , Radón/toxicidad , Factores de Riesgo , España , Estados UnidosRESUMEN
In this work, we improved a previous model used for the prediction of proteomes as new B-cell epitopes in vaccine design. The predicted epitope activity of a queried peptide is based on its sequence, a known reference epitope sequence under specific experimental conditions. The peptide sequences were transformed into molecular descriptors of sequence recurrence networks and were mixed under experimental conditions. The new models were generated using 709,100 instances of pair descriptors for query and reference peptide sequences. Using perturbations of the initial descriptors under sequence or assay conditions, 10 transformed features were used as inputs for seven Machine Learning methods. The best model was obtained with random forest classifiers with an Area Under the Receiver Operating Characteristics (AUROC) of 0.981 ± 0.0005 for the external validation series (five-fold cross-validation). The database included information about 83,683 peptides sequences, 1448 epitope organisms, 323 host organisms, 15 types of in vivo processes, 28 experimental techniques, and 505 adjuvant additives. The current model could improve the in silico predictions of epitopes for vaccine design. The script and results are available as a free repository.
Asunto(s)
Mapeo Epitopo , Aprendizaje Automático , Péptidos/inmunología , Secuencia de Aminoácidos , Humanos , Péptidos/química , Curva ROC , Relación Estructura-ActividadRESUMEN
Signaling proteins are an important topic in drug development due to the increased importance of finding fast, accurate and cheap methods to evaluate new molecular targets involved in specific diseases. The complexity of the protein structure hinders the direct association of the signaling activity with the molecular structure. Therefore, the proposed solution involves the use of protein star graphs for the peptide sequence information encoding into specific topological indices calculated with S2SNet tool. The Quantitative Structure-Activity Relationship classification model obtained with Machine Learning techniques is able to predict new signaling peptides. The best classification model is the first signaling prediction model, which is based on eleven descriptors and it was obtained using the Support Vector Machines-Recursive Feature Elimination (SVM-RFE) technique with the Laplacian kernel (RFE-LAP) and an AUROC of 0.961. Testing a set of 3114 proteins of unknown function from the PDB database assessed the prediction performance of the model. Important signaling pathways are presented for three UniprotIDs (34 PDBs) with a signaling prediction greater than 98.0%.
Asunto(s)
Péptidos y Proteínas de Señalización Intracelular/química , Aprendizaje Automático , Bases de Datos de Proteínas , Humanos , Relación Estructura-Actividad Cuantitativa , Transducción de Señal/fisiologíaRESUMEN
Block-matching techniques have been widely used in the task of estimating displacement in medical images, and they represent the best approach in scenes with deformable structures such as tissues, fluids, and gels. In this article, a new iterative block-matching technique-based on successive deformation, search, fitting, filtering, and interpolation stages-is proposed to measure elastic displacements in two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) images. The proposed technique uses different deformation models in the task of correlating proteins in real 2D electrophoresis gel images, obtaining an accuracy of 96.6% and improving the results obtained with other techniques. This technique represents a general solution, being easy to adapt to different 2D deformable cases and providing an experimental reference for block-matching algorithms.
Asunto(s)
Electroforesis en Gel Bidimensional/métodos , Procesamiento de Imagen Asistido por Computador/métodos , Modelos Teóricos , Proteómica/métodos , AlgoritmosRESUMEN
The cell death (CD) is a dynamic biological function involved in physiological and pathological processes. Due to the complexity of CD, there is a demand for fast theoretical methods that can help to find new CD molecular targets. The current work presents the first classification model to predict CD-related proteins based on Markov Mean Properties. These protein descriptors have been calculated with the MInD-Prot tool using the topological information of the amino acid contact networks of the 2423 protein chains, five atom physicochemical properties and the protein 3D regions. The Machine Learning algorithms from Weka were used to find the best classification model for CD-related protein chains using all 20 attributes. The most accurate algorithm to solve this problem was K*. After several feature subset methods, the best model found is based on only 11 variables and is characterized by the Area Under the Receiver Operating Characteristic Curve (AUROC) of 0.992 and the true positive rate (TP Rate) of 88.2% (validation set). 7409 protein chains labeled with "unknown function" in the PDB Databank were analyzed with the best model in order to predict the CD-related biological activity. Thus, several proteins have been predicted to have CD-related function in Homo sapiens: 3DRX-involved in virus-host interaction biological process, protein homooligomerization; 4DWF-involved in cell differentiation, chromatin modification, DNA damage response, protein stabilization; 1IUR-involved in ATP binding, chaperone binding; 1J7D-involved in DNA double-strand break processing, histone ubiquitination, nucleotide-binding oligomerization; 1UTU-linked with DNA repair, regulation of transcription; 3EEC-participating to the cellular membrane organization, egress of virus within host cell, class mediator resulting in cell cycle arrest, negative regulation of ubiquitin-protein ligase activity involved in mitotic cell cycle and apoptotic process. Other proteins from bacteria predicted as CD-related are 2G3V - a CAG pathogenicity island protein 13 from Helicobacter pylori, 4G5A - a hypothetical protein in Bacteroides thetaiotaomicron, 1YLK-involved in the nitrogen metabolism of Mycobacterium tuberculosis, and 1XSV - with possible DNA/RNA binding domains. The results demonstrated the possibility to predict CD-related proteins using molecular information encoded into the protein 3D structure. Thus, the current work demonstrated the possibility to predict new molecular targets involved in cell-death processes.
Asunto(s)
Cadenas de Markov , Proteínas/clasificación , Algoritmos , Muerte Celular , Bases de Datos de Proteínas , Estándares de ReferenciaRESUMEN
Prostate cancer is the most common cancer in men worldwide and has a high mortality rate. The complex and heterogeneous development of prostate cancer has become a core obstacle in the treatment of prostate cancer. Simultaneously, the issues of overtreatment in early-stage diagnosis, oligometastasis and dormant tumor recognition, as well as personalized drug utilization, are also specific concerns that require attention in the clinical management of prostate cancer. Some typical genetic mutations have been proved to be associated with prostate cancer's initiation and progression. However, single-omic studies usually are not able to explain the causal relationship between molecular alterations and clinical phenotypes. Exploration from a systems genetics perspective is also lacking in this field, that is, the impact of gene network, the environmental factors, and even lifestyle behaviors on disease progression. At the meantime, current trend emphasizes the utilization of artificial intelligence (AI) and machine learning techniques to process extensive multidimensional data, including multi-omics. These technologies unveil the potential patterns, correlations, and insights related to diseases, thereby aiding the interpretable clinical decision making and applications, namely intelligent medicine. Therefore, there is a pressing need to integrate multidimensional data for identification of molecular subtypes, prediction of cancer progression and aggressiveness, along with perosonalized treatment performing. In this review, we systematically elaborated the landscape from molecular mechanism discovery of prostate cancer to clinical translational applications. We discussed the molecular profiles and clinical manifestations of prostate cancer heterogeneity, the identification of different states of prostate cancer, as well as corresponding precision medicine practices. Taking multi-omics fusion, systems genetics, and intelligence medicine as the main perspectives, the current research results and knowledge-driven research path of prostate cancer were summarized.
RESUMEN
The druggable proteome refers to proteins that can bind to small molecules with appropriate chemical affinity, inducing a favorable clinical response. Predicting druggable proteins through screening and in silico modeling is imperative for drug design. To contribute to this field, we developed an accurate predictive classifier for druggable cancer-driving proteins using amino acid composition descriptors of protein sequences and 13 machine learning linear and non-linear classifiers. The optimal classifier was achieved with the support vector machine method, utilizing 200 tri-amino acid composition descriptors. The high performance of the model is evident from an area under the receiver operating characteristics (AUROC) of 0.975 ± 0.003 and an accuracy of 0.929 ± 0.006 (threefold cross-validation). The machine learning prediction model was enhanced with multi-omics approaches, including the target-disease evidence score, the shortest pathways to cancer hallmarks, structure-based ligandability assessment, unfavorable prognostic protein analysis, and the oncogenic variome. Additionally, we performed a drug repurposing analysis to identify drugs with the highest affinity capable of targeting the best predicted proteins. As a result, we identified 79 key druggable cancer-driving proteins with the highest ligandability, and 23 of them demonstrated unfavorable prognostic significance across 16 TCGA PanCancer types: CDKN2A, BCL10, ACVR1, CASP8, JAG1, TSC1, NBN, PREX2, PPP2R1A, DNM2, VAV1, ASXL1, TPR, HRAS, BUB1B, ATG7, MARK3, SETD2, CCNE1, MUTYH, CDKN2C, RB1, and SMARCA4. Moreover, we prioritized 11 clinically relevant drugs targeting these proteins. This strategy effectively predicts and prioritizes biomarkers, therapeutic targets, and drugs for in-depth studies in clinical trials. Scripts are available at https://github.com/muntisa/machine-learning-for-druggable-proteins .
Asunto(s)
Inteligencia Artificial , Neoplasias , Humanos , Neoplasias/tratamiento farmacológico , Neoplasias/genética , Neoplasias/metabolismo , Antineoplásicos/farmacología , Antineoplásicos/uso terapéutico , Antineoplásicos/química , Aprendizaje Automático , Proteínas de Neoplasias/metabolismo , Proteínas de Neoplasias/genética , Proteínas de Neoplasias/química , Máquina de Vectores de Soporte , Reposicionamiento de Medicamentos/métodos , Biología Computacional/métodos , MultiómicaRESUMEN
Aging and life quality is an important research topic nowadays in areas such as life sciences, chemistry, pharmacology, etc. People live longer, and, thus, they want to spend that extra time with a better quality of life. At this regard, there exists a tiny subset of molecules in nature, named antioxidant proteins that may influence the aging process. However, testing every single protein in order to identify its properties is quite expensive and inefficient. For this reason, this work proposes a model, in which the primary structure of the protein is represented using complex network graphs that can be used to reduce the number of proteins to be tested for antioxidant biological activity. The graph obtained as a representation will help us describe the complex system by using topological indices. More specifically, in this work, Randic's Star Networks have been used as well as the associated indices, calculated with the S2SNet tool. In order to simulate the existing proportion of antioxidant proteins in nature, a dataset containing 1999 proteins, of which 324 are antioxidant proteins, was created. Using this data as input, Star Graph Topological Indices were calculated with the S2SNet tool. These indices were then used as input to several classification techniques. Among the techniques utilised, the Random Forest has shown the best performance, achieving a score of 94% correctly classified instances. Although the target class (antioxidant proteins) represents a tiny subset inside the dataset, the proposed model is able to achieve a percentage of 81.8% correctly classified instances for this class, with a precision of 81.3%.
Asunto(s)
Algoritmos , Antioxidantes/metabolismo , Proteínas/metabolismo , Secuencia de Aminoácidos , Bases de Datos de Proteínas , Datos de Secuencia Molecular , Proteínas/química , Relación Estructura-Actividad Cuantitativa , Curva ROCRESUMEN
Okadaic Acid (OA) constitutes the main active principle in Diarrhetic Shellfish Poisoning (DSP) toxins produced during Harmful Algal Blooms (HABs), representing a serious threat for human consumers of edible shellfish. Furthermore, OA conveys critical deleterious effects for marine organisms due to its genotoxic potential. Many efforts have been dedicated to OA biomonitoring during the last three decades. However, it is only now with the current availability of detailed molecular information on DNA organization and the mechanisms involved in the maintenance of genome integrity, that a new arena starts opening up for the study of OA contamination. In the present work we address the links between OA genotoxicity and chromatin by combining Next Generation Sequencing (NGS) technologies and bioinformatics. To this end, we introduce CHROMEVALOAdb, a public database containing the chromatin-associated transcriptome of the mussel Mytilus galloprovincialis (a sentinel model organism) in response to OA exposure. This resource constitutes a leap forward for the development of chromatin-based biomarkers, paving the road towards the generation of powerful and sensitive tests for the detection and evaluation of the genotoxic effects of OA in coastal areas.
Asunto(s)
Bases de Datos Factuales , Mutágenos/análisis , Mytilus/genética , Ácido Ocadaico/análisis , Animales , Carcinógenos/análisis , Carcinógenos/aislamiento & purificación , Carcinógenos/toxicidad , Cromatina/metabolismo , Monitoreo del Ambiente/métodos , Humanos , Pruebas de Mutagenicidad/métodos , Mutágenos/aislamiento & purificación , Mutágenos/toxicidad , Ácido Ocadaico/toxicidad , Análisis de Secuencia de ADN , TranscriptomaRESUMEN
Trypanosoma brucei causes African trypanosomiasis in humans (HAT or African sleeping sickness) and Nagana in cattle. The disease threatens over 60 million people and uncounted numbers of cattle in 36 countries of sub-Saharan Africa and has a devastating impact on human health and the economy. On the other hand, Trypanosoma cruzi is responsible in South America for Chagas disease, which can cause acute illness and death, especially in young children. In this context, the discovery of novel drug targets in Trypanosome proteome is a major focus for the scientific community. Recently, many researchers have spent important efforts on the study of protein-protein interactions (PPIs) in pathogen Trypanosome species concluding that the low sequence identities between some parasite proteins and their human host render these PPIs as highly promising drug targets. To the best of our knowledge, there are no general models to predict Unique PPIs in Trypanosome (TPPIs). On the other hand, the 3D structure of an increasing number of Trypanosome proteins is reported in databases. In this regard, the introduction of a new model to predict TPPIs from the 3D structure of proteins involved in PPI is very important. For this purpose, we introduced new protein-protein complex invariants based on the Markov average electrostatic potential xi(k)(R(i)) for amino acids located in different regions (R(i)) of i-th protein and placed at a distance k one from each other. We calculated more than 30 different types of parameters for 7866 pairs of proteins (1023 TPPIs and 6823 non-TPPIs) from more than 20 organisms, including parasites and human or cattle hosts. We found a very simple linear model that predicts above 90% of TPPIs and non-TPPIs both in training and independent test subsets using only two parameters. The parameters were (d)xi(k)(s) = |xi(k)(s(1)) - xi(k)(s(2))|, the absolute difference between the xi(k)(s(i)) values on the surface of the two proteins of the pairs. We also tested nonlinear ANN models for comparison purposes but the linear model gives the best results. We implemented this predictor in the web server named TrypanoPPI freely available to public at http://miaja.tic.udc.es/Bio-AIMS/TrypanoPPI.php. This is the first model that predicts how unique a protein-protein complex in Trypanosome proteome is with respect to other parasites and hosts, opening new opportunities for antitrypanosome drug target discovery.
Asunto(s)
Internet , Proteínas/química , Proteínas Protozoarias/química , Trypanosoma/química , Cadenas de Markov , Modelos Moleculares , Redes Neurales de la Computación , Unión Proteica , Electricidad EstáticaRESUMEN
We have developed a collaborative web tool for computational biology by using open-source technologies. It allows the cooperative construction of computational models with NEURON. NEURON is a powerful local environment for modeling and simulating the nervous system. Our web tool facilitates researchers who are located far apart to build computational models of the brain, and share knowledge and opinions. The portal integrates all the necessary tools in just one. It allows the creation and participation in work sessions with NEURON, and synchronous and asynchronous file sharing. Moreover, it allows the analysis of the changes introduced in the models by the users, by means of a version control system, as well as real-time comments about each step in the development of each model. It only uses an Internet browser and minimum bandwidth consumption, thanks to the simplified data exchange process. In this paper, we present the tool NEURONSESSIONS, whose cooperative sessions also allow a virtual community to emerge for advancing in Neuroscience.
Asunto(s)
Encéfalo/fisiología , Simulación por Computador , Modelos Neurológicos , Biología Computacional , Sistemas de Computación , Conducta Cooperativa , Humanos , Internet , Programas Informáticos , Interfaz Usuario-ComputadorRESUMEN
This article describes our experience in using a Picture Archiving and Communications System, known as Secure Medical Image Information System, based on the Digital Imaging and Communications in Medicine standard that supports the use of secure transmissions, from the point of view of how the use of secure sending methods has an effect on the efficiency in the transmission according to the network employed, to quantify productivity loss due to the encryption, the secure transmission, and the subsequent decryption. To test the Secure Medical Image Information System, a series of medical data transmission were conducted from A Coruña (Spain) to the Virgen de las Nieves Hospital, situated 1,000 km away, in Granada (Spain). Once we studied the networking infrastructure of the hospital and its available image generation devices, we subsequently carried out a series of measurements during the transmissions, which allowed us to analyze the behavior of the system with different network schemes and connection speeds. The results obtained from these investigations demonstrate that the impact of secure data-sending methods on the productivity of the system is higher in networks whose capacities are higher and it is not affected by sending data during different periods in the day. In this regard, the presented approach may serve as a model for other small, and possibly mid-sized, medical centers.
Asunto(s)
Seguridad Computacional , Registro Médico Coordinado , Sistemas de Información Radiológica/organización & administración , Integración de Sistemas , Seguridad Computacional/instrumentación , Seguridad Computacional/normas , Seguridad Computacional/estadística & datos numéricos , Sistemas de Computación , Eficiencia Organizacional , Guías como Asunto , Sistemas de Información en Hospital/organización & administración , Humanos , Internet/organización & administración , Redes de Área Local , Registro Médico Coordinado/instrumentación , Registro Médico Coordinado/métodos , Registro Médico Coordinado/normas , Validación de Programas de Computación , España , Factores de TiempoRESUMEN
Single nucleotide polymorphisms (SNPs) can be used as inputs in disease computational studies such as pattern searching and classification models. Schizophrenia is an example of a complex disease with an important social impact. The multiple causes of this disease create the need of new genetic or proteomic patterns that can diagnose patients using biological information. This work presents a computational study of disease machine learning classification models using only single nucleotide polymorphisms at the HTR2A and DRD3 genes from Galician (Northwest Spain) schizophrenic patients. These classification models establish for the first time, to the best knowledge of the authors, a relationship between the sequence of the nucleic acid molecule and schizophrenia (Quantitative Genotype-Disease Relationships) that can automatically recognize schizophrenia DNA sequences and correctly classify between 78.3-93.8% of schizophrenia subjects when using datasets which include simulated negative subjects and a linear artificial neural network.
Asunto(s)
Inteligencia Artificial , Polimorfismo de Nucleótido Simple , Esquizofrenia/diagnóstico , Esquizofrenia/genética , Secuencia de Bases , Predisposición Genética a la Enfermedad , Humanos , Receptores de Dopamina D3/genética , Receptores de Serotonina 5-HT3/genética , Proyectos de Investigación , EspañaRESUMEN
Research at protein level is a useful practice in personalized medicine. More specifically, 2D gel images obtained after electrophoresis process can lead to an accurate diagnosis. Several computational approaches try to help the clinicians to establish the correspondence between pairs of proteins of multiple 2D gel images. Most of them perform the alignment of a patient image referred to a reference image. In this work, an approach based on block-matching techniques is developed. Its main characteristic is that it does not need to perform the whole alignment between two images considering each protein separately. A comparison with other published methods is presented. It can be concluded that this method works over broad range of proteomic images, although they have a high level of difficulty.
Asunto(s)
Electroforesis en Gel Bidimensional/métodos , Procesamiento de Imagen Asistido por Computador/métodos , Proteínas/análisisRESUMEN
The development of methods that can predict the metal-mediated biological activity based only on the 3D structure of metal-unbound proteins has become a goal of major importance. This work is dedicated to the amino terminal Cu(II)- and Ni(II)-binding (ATCUN) motifs that participate in the DNA cleavage and have antitumor activity. We have calculated herein, for the first time, the 3D electrostatic spectral moments for 415 different proteins, including 133 potential ATCUN antitumor proteins. Using these parameters as input for Linear Discriminant Analysis, we have found a model that discriminates between ATCUN-DNA cleavage proteins and nonactive proteins with 91.32% Accuracy (379 out of 415 of proteins including both training and external validation series). Finally, the model has predicted for the first time the DNA cleavage function of proteins from the pathogen parasites. We have predicted possible ATCUN-like proteins with a probability higher than 99% in nine parasite families such as Trypanosoma, Plasmodium, Leishmania, or Toxoplasma. The distribution by biological function of the ATCUN proteins predicted has been the following: oxidoreductases 70.5%, signaling proteins 62.5%, lyases 58.2%, membrane proteins 45.5%, ligases 44.4%, hydrolases 41.3%, transferases 39.2%, cell adhesion proteins 34.5%, metal binders 33.5%, translation proteins 25.0%, transporters 16.7%, structural proteins 9.1%, and isomerases 8.2%. The model is implemented at http://miaja.tic.udc.es/Bio-AIMS/ATCUNPred.php.
Asunto(s)
Algoritmos , Secuencia de Bases , División del ADN , Parásitos , Animales , Análisis Discriminante , Humanos , Cadenas de Markov , Modelos Moleculares , Datos de Secuencia Molecular , Parásitos/química , Parásitos/patogenicidad , Conformación Proteica , Proteínas/química , Proteínas/metabolismo , Curva ROC , Electricidad EstáticaRESUMEN
The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs. This package includes ten simple and complex regression models to carry out predictive modeling using Machine Learning and well-known regression algorithms. The framework for experimental design presented herein is evaluated and validated against RRegrs. Our results are different for three out of five state-of-the-art simple datasets and it can be stated that the selection of the best model according to our proposal is statistically significant and relevant. It is of relevance to use a statistical approach to indicate whether the differences are statistically significant using this kind of algorithms. Furthermore, our results with three real complex datasets report different best models than with the previously published methodology. Our final goal is to provide a complete methodology for the use of different steps in order to compare the results obtained in Computational Intelligence problems, as well as from other fields, such as for bioinformatics, cheminformatics, etc., given that our proposal is open and modifiable.
RESUMEN
Texture information could be used in proteomics to improve the quality of the image analysis of proteins separated on a gel. In order to evaluate the best technique to identify relevant textures, we use several different kernel-based machine learning techniques to classify proteins in 2-DE images into spot and noise. We evaluate the classification accuracy of each of these techniques with proteins extracted from ten 2-DE images of different types of tissues and different experimental conditions. We found that the best classification model was FSMKL, a data integration method using multiple kernel learning, which achieved AUROC values above 95% while using a reduced number of features. This technique allows us to increment the interpretability of the complex combinations of textures and to weight the importance of each particular feature in the final model. In particular the Inverse Difference Moment exhibited the highest discriminating power. A higher value can be associated with an homogeneous structure as this feature describes the homogeneity; the larger the value, the more symmetric. The final model is performed by the combination of different groups of textural features. Here we demonstrated the feasibility of combining different groups of textures in 2-DE image analysis for spot detection.
Asunto(s)
Electroforesis , Procesamiento de Imagen Asistido por Computador , Aprendizaje Automático , Reproducibilidad de los ResultadosRESUMEN
This paper describes a novel weighted voting tree classification scheme for breast density classification. Breast parenchymal density is an important risk factor in breast cancer. Moreover, it is known that mammogram interpretation is more difficult when dense tissue is involved. Therefore, automated breast density classification may aid in breast lesion detection and analysis. Several classification methods have been compared and a novel hierarchical classification procedure of combined classifiers with linear discriminant analysis (LDA) is proposed as the best solution to classify the mammograms into the four BIRADS tissue classes. The classification scheme is based on 298 texture features. Statistical analysis to test the normality and homoscedasticity of the data was carried out for feature selection. Thus, only features that are influenced by the tissue type were considered. The novel classification techniques have been incorporated into a CADe system to drive the detection algorithms and tested with 1459 images. The results obtained on the 322 screen-film mammograms (SFM) of the mini-MIAS dataset show that 99.75% of samples were correctly classified. On the 1137 full-field digital mammograms (FFDM) dataset results show 91.58% agreement. The results of the lesion detection algorithms were obtained from modules integrated within the CADe system developed by the authors and show that using breast tissue classification prior to lesion detection leads to an improvement of the detection results. The tools enhance the detectability of lesions and they are able to distinguish their local attenuation without local tissue density constraints.
Asunto(s)
Neoplasias de la Mama/diagnóstico por imagen , Diagnóstico por Computador/normas , Reacciones Falso Positivas , Mamografía , Neoplasias de la Mama/clasificación , Femenino , Humanos , Intensificación de Imagen Radiográfica/normasRESUMEN
Lectins (Ls) play an important role in many diseases such as different types of cancer, parasitic infections and other diseases. Interestingly, the Protein Data Bank (PDB) contains +3000 protein 3D structures with unknown function. Thus, we can in principle, discover new Ls mining non-annotated structures from PDB or other sources. However, there are no general models to predict new biologically relevant Ls based on 3D chemical structures. We used the MARCH-INSIDE software to calculate the Markov-Shannon 3D electrostatic entropy parameters for the complex networks of protein structure of 2200 different protein 3D structures, including 1200â Ls. We have performed a Linear Discriminant Analysis (LDA) using these parameters as inputs in order to seek a new Quantitative Structure-Activity Relationship (QSAR) model, which is able to discriminate 3D structure of Ls from other proteins. We implemented this predictor in the web server named LECTINPred, freely available at http://bio-aims.udc.es/LECTINPred.php. This web server showed the following goodness-of-fit statistics: Sensitivity=96.7 % (for Ls), Specificity=87.6 % (non-active proteins), and Accuracy=92.5 % (for all proteins), considering altogether both the training and external prediction series. In mode 2, users can carry out an automatic retrieval of protein structures from PDB. We illustrated the use of this server, in operation mode 1, performing a data mining of PDB. We predicted Ls scores for +2000 proteins with unknown function and selected the top-scored ones as possible lectins. In operation mode 2, LECTINPred can also upload 3D structural models generated with structure-prediction tools like LOMETS or PHYRE2. The new Ls are expected to be of relevance as cancer biomarkers or useful in parasite vaccine design.
RESUMEN
Enzyme regulation proteins are very important due to their involvement in many biological processes that sustain life. The complexity of these proteins, the impossibility of identifying direct quantification molecular properties associated with the regulation of enzymatic activities, and their structural diversity creates the necessity for new theoretical methods that can predict the enzyme regulatory function of new proteins. The current work presents the first classification model that predicts protein enzyme regulators using the Markov mean properties. These protein descriptors encode the topological information of the amino acid into contact networks based on amino acid distances and physicochemical properties. MInD-Prot software calculated these molecular descriptors for 2415 protein chains (350 enzyme regulators) using five atom physicochemical properties (Mulliken electronegativity, Kang-Jhon polarizability, vdW area, atom contribution to P) and the protein 3D regions. The best classification models to predict enzyme regulators have been obtained with machine learning algorithms from Weka using 18 features. K* has been demonstrated to be the most accurate algorithm for this protein function classification. Wrapper Subset Evaluator and SVM-RFE approaches were used to perform a feature subset selection with the best results obtained from SVM-RFE. Classification performance employing all the available features can be reached using only the 8 most relevant features selected by SVM-RFE. Thus, the current work has demonstrated the possibility of predicting new molecular targets involved in enzyme regulation using fast theoretical algorithms.