Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 43
Filtrar
1.
Genes (Basel) ; 15(3)2024 Feb 28.
Artículo en Inglés | MEDLINE | ID: mdl-38540371

RESUMEN

The analysis of gene expression quantification data is a powerful and widely used approach in cancer research. This work provides new insights into the transcriptomic changes that occur in healthy uterine tissue compared to those in cancerous tissues and explores the differences associated with uterine cancer localizations and histological subtypes. To achieve this, RNA-Seq data from the TCGA database were preprocessed and analyzed using the KnowSeq package. Firstly, a kNN model was applied to classify uterine cervix cancer, uterine corpus cancer, and healthy uterine samples. Through variable selection, a three-gene signature was identified (VWCE, CLDN15, ADCYAP1R1), achieving consistent 100% test accuracy across 20 repetitions of a 5-fold cross-validation. A supplementary similar analysis using miRNA-Seq data from the same samples identified an optimal two-gene miRNA-coding signature potentially regulating the three-gene signature previously mentioned, which attained optimal classification performance with an 82% F1-macro score. Subsequently, a kNN model was implemented for the classification of cervical cancer samples into their two main histological subtypes (adenocarcinoma and squamous cell carcinoma). A uni-gene signature (ICA1L) was identified, achieving 100% test accuracy through 20 repetitions of a 5-fold cross-validation and externally validated through the CGCI program. Finally, an examination of six cervical adenosquamous carcinoma (mixed) samples revealed a pattern where the gene expression value in the mixed class aligned closer to the histological subtype with lower expression, prompting a reconsideration of the diagnosis for these mixed samples. In summary, this study provides valuable insights into the molecular mechanisms of uterine cervix and corpus cancers. The newly identified gene signatures demonstrate robust predictive capabilities, guiding future research in cancer diagnosis and treatment methodologies.


Asunto(s)
Carcinoma Adenoescamoso , Carcinoma de Células Escamosas , MicroARNs , Neoplasias del Cuello Uterino , Femenino , Humanos , Neoplasias del Cuello Uterino/genética , Neoplasias del Cuello Uterino/metabolismo , Carcinoma de Células Escamosas/patología , Perfilación de la Expresión Génica , Carcinoma Adenoescamoso/genética , Carcinoma Adenoescamoso/patología , MicroARNs/genética
2.
Comput Biol Med ; 168: 107713, 2024 01.
Artículo en Inglés | MEDLINE | ID: mdl-38000243

RESUMEN

Cancer disease is one of the most important pathologies in the world, as it causes the death of millions of people, and the cure of this disease is limited in most cases. Rapid spread is one of the most important features of this disease, so many efforts are focused on its early-stage detection and localization. Medicine has made numerous advances in the recent decades with the help of artificial intelligence (AI), reducing costs and saving time. In this paper, deep learning models (DL) are used to present a novel method for detecting and localizing cancerous zones in WSI images, using tissue patch overlay to improve performance results. A novel overlapping methodology is proposed and discussed, together with different alternatives to evaluate the labels of the patches overlapping in the same zone to improve detection performance. The goal is to strengthen the labeling of different areas of an image with multiple overlapping patch testing. The results show that the proposed method improves the traditional framework and provides a different approach to cancer detection. The proposed method, based on applying 3x3 step 2 average pooling filters on overlapping patch labels, provides a better result with a 12.9% correction percentage for misclassified patches on the HUP dataset and 15.8% on the CINIJ dataset. In addition, a filter is implemented to correct isolated patches that were also misclassified. Finally, a CNN decision threshold study is performed to analyze the impact of the threshold value on the accuracy of the model. The alteration of the threshold decision along with the filter for isolated patches and the proposed method for overlapping patches, corrects about 20% of the patches that are mislabeled in the traditional method. As a whole, the proposed method achieves an accuracy rate of 94.6%. The code is available at https://github.com/sergioortiz26/Cancer_overlapping_filter_WSI_images.


Asunto(s)
Medicina , Neoplasias , Humanos , Inteligencia Artificial , Neoplasias/diagnóstico por imagen
4.
BMC Bioinformatics ; 24(Suppl 2): 361, 2023 Oct 18.
Artículo en Inglés | MEDLINE | ID: mdl-37853364

RESUMEN

This Supplement issue, presents five research articles which are distributed, mainly due to the subject they address, from the 8th International Work-Conference on Bioinformatics and Biomedical Engineering (IWBBIO 2020), which was held on line, during September, 30th-2nd October, 2020. These contributions have been chosen because of their quality and the importance of their findings. Those contributions were then invited to participate in this supplement for the following journals of BMC: BMC Bioinformatics and BMC Genomics. In the present Editorial in BMC journal, we summarize the contributions that provide a clear overview of the thematic areas covered by the IWBBIO conference, ranging from theoretical/review aspects to real-world applications of bioinformatic and biomedical engineering.


Asunto(s)
Ingeniería Biomédica , Biología Computacional
5.
Genes (Basel) ; 14(8)2023 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-37628626

RESUMEN

Bioinformatics is revolutionizing Biomedicine in the way we treat and diagnose pathologies related to biological manifestations resulting from variations or mutations of our DNA [...].


Asunto(s)
Bioingeniería , Ingeniería Biomédica , Biología Computacional , Aprendizaje Automático , Mutación
6.
Cancer Imaging ; 23(1): 66, 2023 Jun 26.
Artículo en Inglés | MEDLINE | ID: mdl-37365659

RESUMEN

BACKGROUND: Pancreatic ductal carcinoma patients have a really poor prognosis given its difficult early detection and the lack of early symptoms. Digital pathology is routinely used by pathologists to diagnose the disease. However, visually inspecting the tissue is a time-consuming task, which slows down the diagnostic procedure. With the advances occurred in the area of artificial intelligence, specifically with deep learning models, and the growing availability of public histology data, clinical decision support systems are being created. However, the generalization capabilities of these systems are not always tested, nor the integration of publicly available datasets for pancreatic ductal carcinoma detection (PDAC). METHODS: In this work, we explored the performace of two weakly-supervised deep learning models using the two more widely available datasets with pancreatic ductal carcinoma histology images, The Cancer Genome Atlas Project (TCGA) and the Clinical Proteomic Tumor Analysis Consortium (CPTAC). In order to have sufficient training data, the TCGA dataset was integrated with the Genotype-Tissue Expression (GTEx) project dataset, which contains healthy pancreatic samples. RESULTS: We showed how the model trained on CPTAC generalizes better than the one trained on the integrated dataset, obtaining an inter-dataset accuracy of 90.62% ± 2.32 and an outer-dataset accuracy of 92.17% when evaluated on TCGA + GTEx. Furthermore, we tested the performance on another dataset formed by tissue micro-arrays, obtaining an accuracy of 98.59%. We showed how the features learned in an integrated dataset do not differentiate between the classes, but between the datasets, noticing that a stronger normalization might be needed when creating clinical decision support systems with datasets obtained from different sources. To mitigate this effect, we proposed to train on the three available datasets, improving the detection performance and generalization capabilities of a model trained only on TCGA + GTEx and achieving a similar performance to the model trained only on CPTAC. CONCLUSIONS: The integration of datasets where both classes are present can mitigate the batch effect present when integrating datasets, improving the classification performance, and accurately detecting PDAC across different datasets.


Asunto(s)
Carcinoma Ductal Pancreático , Aprendizaje Profundo , Neoplasias Pancreáticas , Humanos , Inteligencia Artificial , Carcinoma Ductal Pancreático/diagnóstico , Carcinoma Ductal Pancreático/patología , Proteómica , Neoplasias Pancreáticas/diagnóstico , Neoplasias Pancreáticas
7.
Front Cell Dev Biol ; 11: 959611, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37020464

RESUMEN

Introduction: Deciphering the biological and physical requirements for the outset of multicellularity is limited to few experimental models. The early embryonic development of annual killifish represents an almost unique opportunity to investigate de novo cellular aggregation in a vertebrate model. As an adaptation to seasonal drought, annual killifish employs a unique developmental pattern in which embryogenesis occurs only after undifferentiated embryonic cells have completed epiboly and dispersed in low density on the egg surface. Therefore, the first stage of embryogenesis requires the congregation of embryonic cells at one pole of the egg to form a single aggregate that later gives rise to the embryo proper. This unique process presents an opportunity to dissect the self-organizing principles involved in early organization of embryonic stem cells. Indeed, the physical and biological processes required to form the aggregate of embryonic cells are currently unknown. Methods: Here, we developed an in silico, agent-based biophysical model that allows testing how cell-specific and environmental properties could determine the aggregation dynamics of early Killifish embryogenesis. In a forward engineering approach, we then proceeded to test two hypotheses for cell aggregation (cell-autonomous and a simple taxis model) as a proof of concept of modeling feasibility. In a first approach (cell autonomous system), we considered how intrinsic biophysical properties of the cells such as motility, polarity, density, and the interplay between cell adhesion and contact inhibition of locomotion drive cell aggregation into self-organized clusters. Second, we included guidance of cell migration through a simple taxis mechanism to resemble the activity of an organizing center found in several developmental models. Results: Our numerical simulations showed that random migration combined with low cell-cell adhesion is sufficient to maintain cells in dispersion and that aggregation can indeed arise spontaneously under a limited set of conditions, but, without environmental guidance, the dynamics and resulting structures do not recapitulate in vivo observations. Discussion: Thus, an environmental guidance cue seems to be required for correct execution of early aggregation in early killifish development. However, the nature of this cue (e.g., chemical or mechanical) can only be determined experimentally. Our model provides a predictive tool that could be used to better characterize the process and, importantly, to design informed experimental strategies.

8.
J Pers Med ; 12(4)2022 Mar 28.
Artículo en Inglés | MEDLINE | ID: mdl-35455654

RESUMEN

The coronavirus disease 2019 (COVID-19) has caused millions of deaths and one of the greatest health crises of all time. In this disease, one of the most important aspects is the early detection of the infection to avoid the spread. In addition to this, it is essential to know how the disease progresses in patients, to improve patient care. This contribution presents a novel method based on a hierarchical intelligent system, that analyzes the application of deep learning models to detect and classify patients with COVID-19 using both X-ray and chest computed tomography (CT). The methodology was divided into three phases, the first being the detection of whether or not a patient suffers from COVID-19, the second step being the evaluation of the percentage of infection of this disease and the final phase is to classify the patients according to their severity. Stratification of patients suffering from COVID-19 according to their severity using automatic systems based on machine learning on medical images (especially X-ray and CT of the lungs) provides a powerful tool to help medical experts in decision making. In this article, a new contribution is made to a stratification system with three severity levels (mild, moderate and severe) using a novel histogram database (which defines how the infection is in the different CT slices for a patient suffering from COVID-19). The first two phases use CNN Densenet-161 pre-trained models, and the last uses SVM with LDA supervised learning algorithms as classification models. The initial stage detects the presence of COVID-19 through X-ray multi-class (COVID-19 vs. No-Findings vs. Pneumonia) and the results obtained for accuracy, precision, recall, and F1-score values are 88%, 91%, 87%, and 89%, respectively. The following stage manifested the percentage of COVID-19 infection in the slices of the CT-scans for a patient and the results in the metrics evaluation are 0.95 in Pearson Correlation coefficient, 5.14 in MAE and 8.47 in RMSE. The last stage finally classifies a patient in three degrees of severity as a function of global infection of the lungs and the results achieved are 95% accurate.

9.
J Pers Med ; 12(4)2022 Apr 08.
Artículo en Inglés | MEDLINE | ID: mdl-35455716

RESUMEN

Differentiation between the various non-small-cell lung cancer subtypes is crucial for providing an effective treatment to the patient. For this purpose, machine learning techniques have been used in recent years over the available biological data from patients. However, in most cases this problem has been treated using a single-modality approach, not exploring the potential of the multi-scale and multi-omic nature of cancer data for the classification. In this work, we study the fusion of five multi-scale and multi-omic modalities (RNA-Seq, miRNA-Seq, whole-slide imaging, copy number variation, and DNA methylation) by using a late fusion strategy and machine learning techniques. We train an independent machine learning model for each modality and we explore the interactions and gains that can be obtained by fusing their outputs in an increasing manner, by using a novel optimization approach to compute the parameters of the late fusion. The final classification model, using all modalities, obtains an F1 score of 96.81±1.07, an AUC of 0.993±0.004, and an AUPRC of 0.980±0.016, improving those results that each independent model obtains and those presented in the literature for this problem. These obtained results show that leveraging the multi-scale and multi-omic nature of cancer data can enhance the performance of single-modality clinical decision support systems in personalized medicine, consequently improving the diagnosis of the patient.

10.
Neuroinformatics ; 20(3): 765-775, 2022 07.
Artículo en Inglés | MEDLINE | ID: mdl-35262881

RESUMEN

Neurodegenerative diseases represent a growing healthcare problem, mainly related to an aging population worldwide and thus their increasing prevalence. In particular, Alzheimer's disease (AD) and Parkinson's disease (PD) are leading neurodegenerative diseases. To aid their diagnosis and optimize treatment, we have developed a classification algorithm for AD to manipulate magnetic resonance images (MRI) stored in a large database of patients, containing 1,200 images. The algorithm can predict whether a patient is healthy, has mild cognitive impairment, or already has AD. We then applied this classification algorithm to therapeutic outcomes in PD after treatment with deep brain stimulation (DBS), to assess which stereotactic variables were the most important to consider when performing surgery in this indication. Here, we describe the stereotactic system used for DBS procedures, and compare different planning methods with the gold standard normally used (i.e., neurophysiological coordinates recorded intraoperatively). We used information collected from database of 72 DBS electrodes implanted in PD patients, and assessed the potentially most beneficial ranges of deviation within planning and neurophysiological coordinates from the operating room, to provide neurosurgeons with additional landmarks that may help to optimize outcomes: we observed that x coordinate deviation within CT scan and gold standard intra-operative neurophysiological coordinates is a robust matric to pre-assess positive therapy outcomes- "good therapy" prediction if deviation is higher than 2.5 mm. When being less than 2.5 mm, adding directly calculated variables deviation (on Y and Z axis) would lead to specific assessment of "very good therapy".


Asunto(s)
Enfermedad de Alzheimer , Estimulación Encefálica Profunda , Enfermedades Neurodegenerativas , Enfermedad de Parkinson , Anciano , Algoritmos , Enfermedad de Alzheimer/diagnóstico por imagen , Enfermedad de Alzheimer/terapia , Estimulación Encefálica Profunda/métodos , Electrodos Implantados , Humanos , Imagen por Resonancia Magnética , Enfermedades Neurodegenerativas/diagnóstico por imagen , Enfermedades Neurodegenerativas/terapia , Enfermedad de Parkinson/diagnóstico por imagen , Enfermedad de Parkinson/terapia
11.
BMC Bioinformatics ; 22(1): 454, 2021 Sep 22.
Artículo en Inglés | MEDLINE | ID: mdl-34551733

RESUMEN

BACKGROUND: Adenocarcinoma and squamous cell carcinoma are the two most prevalent lung cancer types, and their distinction requires different screenings, such as the visual inspection of histology slides by an expert pathologist, the analysis of gene expression or computer tomography scans, among others. In recent years, there has been an increasing gathering of biological data for decision support systems in the diagnosis (e.g. histology imaging, next-generation sequencing technologies data, clinical information, etc.). Using all these sources to design integrative classification approaches may improve the final diagnosis of a patient, in the same way that doctors can use multiple types of screenings to reach a final decision on the diagnosis. In this work, we present a late fusion classification model using histology and RNA-Seq data for adenocarcinoma, squamous-cell carcinoma and healthy lung tissue. RESULTS: The classification model improves results over using each source of information separately, being able to reduce the diagnosis error rate up to a 64% over the isolate histology classifier and a 24% over the isolate gene expression classifier, reaching a mean F1-Score of 95.19% and a mean AUC of 0.991. CONCLUSIONS: These findings suggest that a classification model using a late fusion methodology can considerably help clinicians in the diagnosis between the aforementioned lung cancer cancer subtypes over using each source of information separately. This approach can also be applied to any cancer type or disease with heterogeneous sources of information.


Asunto(s)
Adenocarcinoma , Carcinoma de Pulmón de Células no Pequeñas , Neoplasias Pulmonares , Carcinoma de Pulmón de Células no Pequeñas/diagnóstico por imagen , Carcinoma de Pulmón de Células no Pequeñas/genética , Humanos , Neoplasias Pulmonares/genética , Probabilidad , RNA-Seq
12.
Comput Biol Med ; 133: 104387, 2021 06.
Artículo en Inglés | MEDLINE | ID: mdl-33872966

RESUMEN

KnowSeq R/Bioc package is designed as a powerful, scalable and modular software focused on automatizing and assembling renowned bioinformatic tools with new features and functionalities. It comprises a unified environment to perform complex gene expression analyses, covering all the needed processing steps to identify a gene signature for a specific disease to gather understandable knowledge. This process may be initiated from raw files either available at well-known platforms or provided by the users themselves, and in either case coming from different information sources and different Transcriptomic technologies. The pipeline makes use of a set of advanced algorithms, including the adaptation of a novel procedure for the selection of the most representative genes in a given multiclass problem. Similarly, an intelligent system able to classify new patients, providing the user the opportunity to choose one among a number of well-known and widespread classification and feature selection methods in Bioinformatics, is embedded. Furthermore, KnowSeq is engineered to automatically develop a complete and detailed HTML report of the whole process which is also modular and scalable. Biclass breast cancer and multiclass lung cancer study cases were addressed to rigorously assess the usability and efficiency of KnowSeq. The models built by using the Differential Expressed Genes achieved from both experiments reach high classification rates. Furthermore, biological knowledge was extracted in terms of Gene Ontologies, Pathways and related diseases with the aim of helping the expert in the decision-making process. KnowSeq is available at Bioconductor (https://bioconductor.org/packages/KnowSeq), GitHub (https://github.com/CasedUgr/KnowSeq) and Docker (https://hub.docker.com/r/casedugr/knowseq).


Asunto(s)
Biología Computacional , Programas Informáticos , Algoritmos , Humanos , Transcriptoma
13.
Curr Pharm Des ; 26(34): 4246-4260, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32640953

RESUMEN

Stroke is the second leading cause of mortality and the major cause of adult physical disability worldwide. The currently available treatment to recanalize the blood flow in acute ischemic stroke is intravenous administration of tissue plasminogen activator (t-PA) and endovascular treatment. Nevertheless, those treatments have the disadvantage that reperfusion leads to a highly harmful reactive oxygen species (ROS) production, generating oxidative stress (OS), which is responsible for most of the ischemia-reperfusion injury and thus causing brain tissue damage. In addition, OS can lead brain cells to apoptosis, autophagy and necrosis. The aims of this review are to provide an updated overview of the role of OS in brain IRI, providing some bases for therapeutic interventions based on counteracting the OS-related mechanism of injury and thus suggesting novel possible strategies in the prevention of IRI after stroke.


Asunto(s)
Isquemia Encefálica , Accidente Cerebrovascular Isquémico , Daño por Reperfusión , Accidente Cerebrovascular , Isquemia Encefálica/tratamiento farmacológico , Humanos , Estrés Oxidativo , Daño por Reperfusión/tratamiento farmacológico , Accidente Cerebrovascular/tratamiento farmacológico , Activador de Tejido Plasminógeno
14.
BMC Bioinformatics ; 21(Suppl 7): 153, 2020 May 05.
Artículo en Inglés | MEDLINE | ID: mdl-32366219

RESUMEN

In the current supplement, we are proud to present seventeen relevant contributions from the 6th International Work-Conference on Bioinformatics and Biomedical Engineering (IWBBIO 2018), which was held during April 25-27, 2018 in Granada (Spain). These contributions have been chosen because of their quality and the importance of their findings.


Asunto(s)
Ingeniería Biomédica , Biología Computacional , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Modelos Biológicos
15.
IEEE J Biomed Health Inform ; 24(7): 2119-2130, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-31871000

RESUMEN

Many clinical studies have revealed the high biological similarities existing among different skin pathological states. These similarities create difficulties in the efficient diagnosis of skin cancer, and encourage to study and design new intelligent clinical decision support systems. In this sense, gene expression analysis can help find differentially expressed genes (DEGs) simultaneously discerning multiple skin pathological states in a single test. The integration of multiple heterogeneous transcriptomic datasets requires different pipeline stages to be properly designed: from suitable batch merging and efficient biomarker selection to automated classification assessment. This article presents a novel approach addressing all these technical issues, with the intention of providing new sights about skin cancer diagnosis. Although new future efforts will have to be made in the search for better biomarkers recognizing specific skin pathological states, our study found a panel of 8 highly relevant multiclass DEGs for discerning up to 10 skin pathological states: 2 healthy skin conditions a priori, 2 cataloged precancerous skin diseases and 6 cancerous skin states. Their power of diagnosis over new samples was widely tested by previously well-trained classification models. Robust performance metrics such as overall and mean multiclass F1-score outperformed recognition rates of 94% and 80%, respectively. Clinicians should give special attention to highlighted multiclass DEGs that have high gene expression changes present among them, and understand their biological relationship to different skin pathological states.


Asunto(s)
Diagnóstico por Computador/métodos , Perfilación de la Expresión Génica/métodos , Aprendizaje Automático , RNA-Seq/métodos , Neoplasias Cutáneas/diagnóstico , Biomarcadores de Tumor/análisis , Biomarcadores de Tumor/genética , Biomarcadores de Tumor/metabolismo , Biología Computacional , Humanos , Neoplasias Cutáneas/genética , Neoplasias Cutáneas/metabolismo
16.
PLoS One ; 14(2): e0212127, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-30753220

RESUMEN

In more recent years, a significant increase in the number of available biological experiments has taken place due to the widespread use of massive sequencing data. Furthermore, the continuous developments in the machine learning and in the high performance computing areas, are allowing a faster and more efficient analysis and processing of this type of data. However, biological information about a certain disease is normally widespread due to the use of different sequencing technologies and different manufacturers, in different experiments along the years around the world. Thus, nowadays it is of paramount importance to attain a correct integration of biologically-related data in order to achieve genuine benefits from them. For this purpose, this work presents an integration of multiple Microarray and RNA-seq platforms, which has led to the design of a multiclass study by collecting samples from the main four types of leukemia, quantified at gene expression. Subsequently, in order to find a set of differentially expressed genes with the highest discernment capability among different types of leukemia, an innovative parameter referred to as coverage is presented here. This parameter allows assessing the number of different pathologies that a certain gen is able to discern. It has been evaluated together with other widely known parameters under assessment of an ANOVA statistical test which corroborated its filtering power when the identified genes are subjected to a machine learning process at multiclass level. The optimal tuning of gene extraction evaluated parameters by means of this statistical test led to the selection of 42 highly relevant expressed genes. By the use of minimum-Redundancy Maximum-Relevance (mRMR) feature selection algorithm, these genes were reordered and assessed under the operation of four different classification techniques. Outstanding results were achieved by taking exclusively the first ten genes of the ranking into consideration. Finally, specific literature was consulted on this last subset of genes, revealing the occurrence of practically all of them with biological processes related to leukemia. At sight of these results, this study underlines the relevance of considering a new parameter which facilitates the identification of highly valid expressed genes for simultaneously discerning multiple types of leukemia.


Asunto(s)
Biología Computacional , Perfilación de la Expresión Génica , Leucemia/genética , Análisis de Secuencia por Matrices de Oligonucleótidos , Análisis de Secuencia de ARN , Biomarcadores de Tumor/metabolismo , Humanos , Leucemia/metabolismo , Aprendizaje Automático
17.
Int J Neural Syst ; 28(9): 1850022, 2018 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-29914313

RESUMEN

Computer-Aided Diagnosis (CAD) represents a relevant instrument to automatically classify between patients with and without Alzheimer's Disease (AD) using several actual imaging techniques. This study analyzes the optimization of volumes of interest (VOIs) to extract three-dimensional (3D) textures from Magnetic Resonance Image (MRI) in order to diagnose AD, Mild Cognitive Impairment converter (MCIc), Mild Cognitive Impairment nonconverter (MCInc) and Normal subjects. A relevant feature of the proposed approach is the use of 3D features instead of traditional two-dimensional (2D) features, by using 3D discrete wavelet transform (3D-DWT) approach for performing feature extraction from T-1 weighted MRI. Due to the high number of coefficients when applying 3D-DWT to each of the VOIs, a feature selection algorithm based on mutual information is used, as is the minimum Redundancy Maximum Relevance (mRMR) algorithm. Region optimization has been performed in order to discover the most relevant regions (VOIs) in the brain with the use of Multi-Objective Genetic Algorithms, being one of the objectives to be optimize the accuracy of the system. The error index of the system is computed by the confusion matrix obtained by the multi-class support vector machine (SVM) classifier. Principal Component Analysis (PCA) is used with the purpose of reducing the number of features to the classifier. The cohort of subjects used in the study consisted of 296 different patients. A first group of 206 patients was used to optimize VOI selection and another group of 90 independent subjects (that did not belong to the first group) was used to test the solutions yielded by the genetic algorithm. The proposed methodology obtains excellent results in multi-class classification achieving accuracies of 94.4% and also extracting significant information on the location of the most relevant points of the brain. This suggests that the proposed method could aid in the research of other neurodegenerative diseases, improving the accuracy of the diagnosis and finding the most relevant regions of the brain associated with them.


Asunto(s)
Algoritmos , Enfermedad de Alzheimer/diagnóstico por imagen , Encéfalo/diagnóstico por imagen , Disfunción Cognitiva/diagnóstico por imagen , Diagnóstico por Computador/métodos , Imagen por Resonancia Magnética , Anciano , Anciano de 80 o más Años , Enfermedad de Alzheimer/patología , Encéfalo/patología , Disfunción Cognitiva/patología , Femenino , Humanos , Imagenología Tridimensional/métodos , Teoría de la Información , Imagen por Resonancia Magnética/métodos , Masculino , Persona de Mediana Edad , Tamaño de los Órganos , Análisis de Ondículas
18.
PLoS One ; 13(5): e0196836, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29750795

RESUMEN

Most of the research studies developed applying microarray technology to the characterization of different pathological states of any disease may fail in reaching statistically significant results. This is largely due to the small repertoire of analysed samples, and to the limitation in the number of states or pathologies usually addressed. Moreover, the influence of potential deviations on the gene expression quantification is usually disregarded. In spite of the continuous changes in omic sciences, reflected for instance in the emergence of new Next-Generation Sequencing-related technologies, the existing availability of a vast amount of gene expression microarray datasets should be properly exploited. Therefore, this work proposes a novel methodological approach involving the integration of several heterogeneous skin cancer series, and a later multiclass classifier design. This approach is thus a way to provide the clinicians with an intelligent diagnosis support tool based on the use of a robust set of selected biomarkers, which simultaneously distinguishes among different cancer-related skin states. To achieve this, a multi-platform combination of microarray datasets from Affymetrix and Illumina manufacturers was carried out. This integration is expected to strengthen the statistical robustness of the study as well as the finding of highly-reliable skin cancer biomarkers. Specifically, the designed operation pipeline has allowed the identification of a small subset of 17 differentially expressed genes (DEGs) from which to distinguish among 7 involved skin states. These genes were obtained from the assessment of a number of potential batch effects on the gene expression data. The biological interpretation of these genes was inspected in the specific literature to understand their underlying information in relation to skin cancer. Finally, in order to assess their possible effectiveness in cancer diagnosis, a cross-validation Support Vector Machines (SVM)-based classification including feature ranking was performed. The accuracy attained exceeded the 92% in overall recognition of the 7 different cancer-related skin states. The proposed integration scheme is expected to allow the co-integration with other state-of-the-art technologies such as RNA-seq.


Asunto(s)
Regulación Neoplásica de la Expresión Génica/genética , Expresión Génica/genética , Neoplasias Cutáneas/genética , Biomarcadores de Tumor/genética , Perfilación de la Expresión Génica/métodos , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos
19.
PLoS One ; 13(4): e0194844, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29617451

RESUMEN

Applying differentially expressed genes (DEGs) to identify feasible biomarkers in diseases can be a hard task when working with heterogeneous datasets. Expression data are strongly influenced by technology, sample preparation processes, and/or labeling methods. The proliferation of different microarray platforms for measuring gene expression increases the need to develop models able to compare their results, especially when different technologies can lead to signal values that vary greatly. Integrative meta-analysis can significantly improve the reliability and robustness of DEG detection. The objective of this work was to develop an integrative approach for identifying potential cancer biomarkers by integrating gene expression data from two different platforms. Pancreatic ductal adenocarcinoma (PDAC), where there is an urgent need to find new biomarkers due its late diagnosis, is an ideal candidate for testing this technology. Expression data from two different datasets, namely Affymetrix and Illumina (18 and 36 PDAC patients, respectively), as well as from 18 healthy controls, was used for this study. A meta-analysis based on an empirical Bayesian methodology (ComBat) was then proposed to integrate these datasets. DEGs were finally identified from the integrated data by using the statistical programming language R. After our integrative meta-analysis, 5 genes were commonly identified within the individual analyses of the independent datasets. Also, 28 novel genes that were not reported by the individual analyses ('gained' genes) were also discovered. Several of these gained genes have been already related to other gastroenterological tumors. The proposed integrative meta-analysis has revealed novel DEGs that may play an important role in PDAC and could be potential biomarkers for diagnosing the disease.


Asunto(s)
Biomarcadores de Tumor/metabolismo , Carcinoma Ductal Pancreático/diagnóstico , Neoplasias Pancreáticas/diagnóstico , Área Bajo la Curva , Biomarcadores de Tumor/genética , Carcinoma Ductal Pancreático/metabolismo , Bases de Datos Factuales , Factores de Intercambio de Guanina Nucleótido/genética , Factores de Intercambio de Guanina Nucleótido/metabolismo , Humanos , Quinasas Asociadas a Receptores de Interleucina-1/genética , Quinasas Asociadas a Receptores de Interleucina-1/metabolismo , Leucocitos Mononucleares/citología , Leucocitos Mononucleares/metabolismo , Neoplasias Pancreáticas/metabolismo , Curva ROC , Transcriptoma , Proteínas Supresoras de Tumor/genética , Proteínas Supresoras de Tumor/metabolismo
20.
BMC Bioinformatics ; 18(1): 506, 2017 Nov 21.
Artículo en Inglés | MEDLINE | ID: mdl-29157215

RESUMEN

BACKGROUND: Nowadays, many public repositories containing large microarray gene expression datasets are available. However, the problem lies in the fact that microarray technology are less powerful and accurate than more recent Next Generation Sequencing technologies, such as RNA-Seq. In any case, information from microarrays is truthful and robust, thus it can be exploited through the integration of microarray data with RNA-Seq data. Additionally, information extraction and acquisition of large number of samples in RNA-Seq still entails very high costs in terms of time and computational resources.This paper proposes a new model to find the gene signature of breast cancer cell lines through the integration of heterogeneous data from different breast cancer datasets, obtained from microarray and RNA-Seq technologies. Consequently, data integration is expected to provide a more robust statistical significance to the results obtained. Finally, a classification method is proposed in order to test the robustness of the Differentially Expressed Genes when unseen data is presented for diagnosis. RESULTS: The proposed data integration allows analyzing gene expression samples coming from different technologies. The most significant genes of the whole integrated data were obtained through the intersection of the three gene sets, corresponding to the identified expressed genes within the microarray data itself, within the RNA-Seq data itself, and within the integrated data from both technologies. This intersection reveals 98 possible technology-independent biomarkers. Two different heterogeneous datasets were distinguished for the classification tasks: a training dataset for gene expression identification and classifier validation, and a test dataset with unseen data for testing the classifier. Both of them achieved great classification accuracies, therefore confirming the validity of the obtained set of genes as possible biomarkers for breast cancer. Through a feature selection process, a final small subset made up by six genes was considered for breast cancer diagnosis. CONCLUSIONS: This work proposes a novel data integration stage in the traditional gene expression analysis pipeline through the combination of heterogeneous data from microarrays and RNA-Seq technologies. Available samples have been successfully classified using a subset of six genes obtained by a feature selection method. Consequently, a new classification and diagnosis tool was built and its performance was validated using previously unseen samples.


Asunto(s)
Neoplasias de la Mama/genética , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Análisis de Secuencia de ARN/métodos , Algoritmos , Análisis por Conglomerados , Bases de Datos Genéticas , Femenino , Regulación Neoplásica de la Expresión Génica , Humanos , Reproducibilidad de los Resultados
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...