Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Genes (Basel) ; 15(3)2024 Feb 28.
Artículo en Inglés | MEDLINE | ID: mdl-38540371

RESUMEN

The analysis of gene expression quantification data is a powerful and widely used approach in cancer research. This work provides new insights into the transcriptomic changes that occur in healthy uterine tissue compared to those in cancerous tissues and explores the differences associated with uterine cancer localizations and histological subtypes. To achieve this, RNA-Seq data from the TCGA database were preprocessed and analyzed using the KnowSeq package. Firstly, a kNN model was applied to classify uterine cervix cancer, uterine corpus cancer, and healthy uterine samples. Through variable selection, a three-gene signature was identified (VWCE, CLDN15, ADCYAP1R1), achieving consistent 100% test accuracy across 20 repetitions of a 5-fold cross-validation. A supplementary similar analysis using miRNA-Seq data from the same samples identified an optimal two-gene miRNA-coding signature potentially regulating the three-gene signature previously mentioned, which attained optimal classification performance with an 82% F1-macro score. Subsequently, a kNN model was implemented for the classification of cervical cancer samples into their two main histological subtypes (adenocarcinoma and squamous cell carcinoma). A uni-gene signature (ICA1L) was identified, achieving 100% test accuracy through 20 repetitions of a 5-fold cross-validation and externally validated through the CGCI program. Finally, an examination of six cervical adenosquamous carcinoma (mixed) samples revealed a pattern where the gene expression value in the mixed class aligned closer to the histological subtype with lower expression, prompting a reconsideration of the diagnosis for these mixed samples. In summary, this study provides valuable insights into the molecular mechanisms of uterine cervix and corpus cancers. The newly identified gene signatures demonstrate robust predictive capabilities, guiding future research in cancer diagnosis and treatment methodologies.


Asunto(s)
Carcinoma Adenoescamoso , Carcinoma de Células Escamosas , MicroARNs , Neoplasias del Cuello Uterino , Femenino , Humanos , Neoplasias del Cuello Uterino/genética , Neoplasias del Cuello Uterino/metabolismo , Carcinoma de Células Escamosas/patología , Perfilación de la Expresión Génica , Carcinoma Adenoescamoso/genética , Carcinoma Adenoescamoso/patología , MicroARNs/genética
4.
Cancer Imaging ; 23(1): 66, 2023 Jun 26.
Artículo en Inglés | MEDLINE | ID: mdl-37365659

RESUMEN

BACKGROUND: Pancreatic ductal carcinoma patients have a really poor prognosis given its difficult early detection and the lack of early symptoms. Digital pathology is routinely used by pathologists to diagnose the disease. However, visually inspecting the tissue is a time-consuming task, which slows down the diagnostic procedure. With the advances occurred in the area of artificial intelligence, specifically with deep learning models, and the growing availability of public histology data, clinical decision support systems are being created. However, the generalization capabilities of these systems are not always tested, nor the integration of publicly available datasets for pancreatic ductal carcinoma detection (PDAC). METHODS: In this work, we explored the performace of two weakly-supervised deep learning models using the two more widely available datasets with pancreatic ductal carcinoma histology images, The Cancer Genome Atlas Project (TCGA) and the Clinical Proteomic Tumor Analysis Consortium (CPTAC). In order to have sufficient training data, the TCGA dataset was integrated with the Genotype-Tissue Expression (GTEx) project dataset, which contains healthy pancreatic samples. RESULTS: We showed how the model trained on CPTAC generalizes better than the one trained on the integrated dataset, obtaining an inter-dataset accuracy of 90.62% ± 2.32 and an outer-dataset accuracy of 92.17% when evaluated on TCGA + GTEx. Furthermore, we tested the performance on another dataset formed by tissue micro-arrays, obtaining an accuracy of 98.59%. We showed how the features learned in an integrated dataset do not differentiate between the classes, but between the datasets, noticing that a stronger normalization might be needed when creating clinical decision support systems with datasets obtained from different sources. To mitigate this effect, we proposed to train on the three available datasets, improving the detection performance and generalization capabilities of a model trained only on TCGA + GTEx and achieving a similar performance to the model trained only on CPTAC. CONCLUSIONS: The integration of datasets where both classes are present can mitigate the batch effect present when integrating datasets, improving the classification performance, and accurately detecting PDAC across different datasets.


Asunto(s)
Carcinoma Ductal Pancreático , Aprendizaje Profundo , Neoplasias Pancreáticas , Humanos , Inteligencia Artificial , Carcinoma Ductal Pancreático/diagnóstico , Carcinoma Ductal Pancreático/patología , Proteómica , Neoplasias Pancreáticas/diagnóstico , Neoplasias Pancreáticas
5.
Viruses ; 14(9)2022 08 27.
Artículo en Inglés | MEDLINE | ID: mdl-36146700

RESUMEN

OBJECTIVES: More than two years into the COVID-19 pandemic, SARS-CoV-2 still remains a global public health problem. Successive waves of infection have produced new SARS-CoV-2 variants with new mutations for which the impact on COVID-19 severity and patient survival is uncertain. METHODS: A total of 764 SARS-CoV-2 genomes, sequenced from COVID-19 patients, hospitalized from 19th February 2020 to 30 April 2021, along with their clinical data, were used for survival analysis. RESULTS: A significant association of B.1.1.7, the alpha lineage, with patient mortality (log hazard ratio (LHR) = 0.51, C.I. = [0.14,0.88]) was found upon adjustment by all the covariates known to affect COVID-19 prognosis. Moreover, survival analysis of mutations in the SARS-CoV-2 genome revealed 27 of them were significantly associated with higher mortality of patients. Most of these mutations were located in the genes coding for the S, ORF8, and N proteins. CONCLUSIONS: This study illustrates how a combination of genomic and clinical data can provide solid evidence for the impact of viral lineage on patient survival.


Asunto(s)
COVID-19 , SARS-CoV-2 , Genoma Viral , Humanos , Mutación , Pandemias , Filogenia , SARS-CoV-2/genética
6.
Gigascience ; 10(12)2021 12 02.
Artículo en Inglés | MEDLINE | ID: mdl-34865008

RESUMEN

BACKGROUND: The current SARS-CoV-2 pandemic has emphasized the utility of viral whole-genome sequencing in the surveillance and control of the pathogen. An unprecedented ongoing global initiative is producing hundreds of thousands of sequences worldwide. However, the complex circumstances in which viruses are sequenced, along with the demand of urgent results, causes a high rate of incomplete and, therefore, useless sequences. Viral sequences evolve in the context of a complex phylogeny and different positions along the genome are in linkage disequilibrium. Therefore, an imputation method would be able to predict missing positions from the available sequencing data. RESULTS: We have developed the impuSARS application, which takes advantage of the enormous number of SARS-CoV-2 genomes available, using a reference panel containing 239,301 sequences, to produce missing data imputation in viral genomes. ImpuSARS was tested in a wide range of conditions (continuous fragments, amplicons or sparse individual positions missing), showing great fidelity when reconstructing the original sequences, recovering the lineage with a 100% precision for almost all the lineages, even in very poorly covered genomes (<20%). CONCLUSIONS: Imputation can improve the pace of SARS-CoV-2 sequencing production by recovering many incomplete or low-quality sequences that would be otherwise discarded. ImpuSARS can be incorporated in any primary data processing pipeline for SARS-CoV-2 whole-genome sequencing.


Asunto(s)
Genoma Viral , SARS-CoV-2 , Filogenia , SARS-CoV-2/genética , Secuenciación Completa del Genoma
7.
PLoS Comput Biol ; 17(2): e1008748, 2021 02.
Artículo en Inglés | MEDLINE | ID: mdl-33571195

RESUMEN

MIGNON is a workflow for the analysis of RNA-Seq experiments, which not only efficiently manages the estimation of gene expression levels from raw sequencing reads, but also calls genomic variants present in the transcripts analyzed. Moreover, this is the first workflow that provides a framework for the integration of transcriptomic and genomic data based on a mechanistic model of signaling pathway activities that allows a detailed biological interpretation of the results, including a comprehensive functional profiling of cell activity. MIGNON covers the whole process, from reads to signaling circuit activity estimations, using state-of-the-art tools, it is easy to use and it is deployable in different computational environments, allowing an optimized use of the resources available.


Asunto(s)
Biología Computacional/métodos , Genómica , RNA-Seq , Transducción de Señal , Algoritmos , Línea Celular Tumoral , Bases de Datos Factuales , Perfilación de la Expresión Génica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Modelos Teóricos , Mutación , Programas Informáticos , Transcriptoma , Secuenciación del Exoma , Flujo de Trabajo
8.
Nucleic Acids Res ; 49(D1): D1130-D1137, 2021 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-32990755

RESUMEN

The knowledge of the genetic variability of the local population is of utmost importance in personalized medicine and has been revealed as a critical factor for the discovery of new disease variants. Here, we present the Collaborative Spanish Variability Server (CSVS), which currently contains more than 2000 genomes and exomes of unrelated Spanish individuals. This database has been generated in a collaborative crowdsourcing effort collecting sequencing data produced by local genomic projects and for other purposes. Sequences have been grouped by ICD10 upper categories. A web interface allows querying the database removing one or more ICD10 categories. In this way, aggregated counts of allele frequencies of the pseudo-control Spanish population can be obtained for diseases belonging to the category removed. Interestingly, in addition to pseudo-control studies, some population studies can be made, as, for example, prevalence of pharmacogenomic variants, etc. In addition, this genomic data has been used to define the first Spanish Genome Reference Panel (SGRP1.0) for imputation. This is the first local repository of variability entirely produced by a crowdsourcing effort and constitutes an example for future initiatives to characterize local variability worldwide. CSVS is also part of the GA4GH Beacon network. CSVS can be accessed at: http://csvs.babelomics.org/.


Asunto(s)
Colaboración de las Masas , Bases de Datos Genéticas , Genética de Población/métodos , Genoma Humano , Programas Informáticos , Alelos , Mapeo Cromosómico , Exoma , Frecuencia de los Genes , Variación Genética , Genómica , Humanos , Internet , Medicina de Precisión/métodos , España
9.
IEEE J Biomed Health Inform ; 24(7): 2119-2130, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-31871000

RESUMEN

Many clinical studies have revealed the high biological similarities existing among different skin pathological states. These similarities create difficulties in the efficient diagnosis of skin cancer, and encourage to study and design new intelligent clinical decision support systems. In this sense, gene expression analysis can help find differentially expressed genes (DEGs) simultaneously discerning multiple skin pathological states in a single test. The integration of multiple heterogeneous transcriptomic datasets requires different pipeline stages to be properly designed: from suitable batch merging and efficient biomarker selection to automated classification assessment. This article presents a novel approach addressing all these technical issues, with the intention of providing new sights about skin cancer diagnosis. Although new future efforts will have to be made in the search for better biomarkers recognizing specific skin pathological states, our study found a panel of 8 highly relevant multiclass DEGs for discerning up to 10 skin pathological states: 2 healthy skin conditions a priori, 2 cataloged precancerous skin diseases and 6 cancerous skin states. Their power of diagnosis over new samples was widely tested by previously well-trained classification models. Robust performance metrics such as overall and mean multiclass F1-score outperformed recognition rates of 94% and 80%, respectively. Clinicians should give special attention to highlighted multiclass DEGs that have high gene expression changes present among them, and understand their biological relationship to different skin pathological states.


Asunto(s)
Diagnóstico por Computador/métodos , Perfilación de la Expresión Génica/métodos , Aprendizaje Automático , RNA-Seq/métodos , Neoplasias Cutáneas/diagnóstico , Biomarcadores de Tumor/análisis , Biomarcadores de Tumor/genética , Biomarcadores de Tumor/metabolismo , Biología Computacional , Humanos , Neoplasias Cutáneas/genética , Neoplasias Cutáneas/metabolismo
11.
Toxicol Appl Pharmacol ; 311: 113-116, 2016 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-27720938

RESUMEN

Erlotinib is an epidermal growth factor receptor (EGFR) tyrosine kinase inhibitor that showed activity against pancreatic ductal adenocarcinoma (PDAC). The drug's most frequently reported side effect as a result of EGFR inhibition is skin rash (SR), a symptom which has been associated with a better therapeutic response to the drug. Gene expression profiling can be used as a tool to predict which patients will develop this important cutaneous manifestation. The aim of the present study was to identify which genes may influence the appearance of SR in PDAC patients. The study included 34 PDAC patients treated with erlotinib: 21 patients developed any grade of SR, while 13 patients did not (controls). Before administering any chemotherapy regimen and the development of SR, we collected RNA from peripheral blood samples of all patients and studied the differential gene expression pattern using the Illumina microarray platform HumanHT-12 v4 Expression BeadChip. Seven genes (FAM46C, IFITM3, GMPR, DENND6B, SELENBP1, NOL10, and SIAH2), involved in different pathways including regulatory, migratory, and signalling processes, were downregulated in PDAC patients with SR. Our results suggest the existence of a gene expression profiling significantly correlated with erlotinib-induced SR in PDAC that could be used as prognostic indicator in this patients.


Asunto(s)
Adenocarcinoma/tratamiento farmacológico , Clorhidrato de Erlotinib/efectos adversos , Perfilación de la Expresión Génica , Neoplasias Pancreáticas/tratamiento farmacológico , Piel/efectos de los fármacos , Anciano , Femenino , Humanos , Masculino , Persona de Mediana Edad
12.
Biomed Res Int ; 2015: 518284, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26346854

RESUMEN

The overall survival of patients with pancreatic ductal adenocarcinoma is extremely low. Although gemcitabine is the standard used chemotherapy for this disease, clinical outcomes do not reflect significant improvements, not even when combined with adjuvant treatments. There is an urgent need for prognosis markers to be found. The aim of this study was to analyze the potential value of serum cytokines to find a profile that can predict the clinical outcome in patients with pancreatic cancer and to establish a practical prognosis index that significantly predicts patients' outcomes. We have conducted an extensive analysis of serum prognosis biomarkers using an antibody array comprising 507 human cytokines. Overall survival was estimated using the Kaplan-Meier method. Univariate and multivariate Cox's proportional hazard models were used to analyze prognosis factors. To determine the extent that survival could be predicted based on this index, we used the leave-one-out cross-validation model. The multivariate model showed a better performance and it could represent a novel panel of serum cytokines that correlates to poor prognosis in pancreatic cancer. B7-1/CD80, EG-VEGF/PK1, IL-29, NRG1-beta1/HRG1-beta1, and PD-ECGF expressions portend a poor prognosis for patients with pancreatic cancer and these cytokines could represent novel therapeutic targets for this disease.


Asunto(s)
Carcinoma Ductal Pancreático/sangre , Carcinoma Ductal Pancreático/mortalidad , Citocinas/sangre , Neoplasias Pancreáticas/sangre , Neoplasias Pancreáticas/mortalidad , Adulto , Anciano , Supervivencia sin Enfermedad , Femenino , Humanos , Masculino , Persona de Mediana Edad , Tasa de Supervivencia
14.
Pancreas ; 43(7): 1042-9, 2014 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-24979617

RESUMEN

OBJECTIVE: Pancreatic ductal adenocarcinoma is a deadly disease because of late diagnosis and chemoresistance. We aimed to find a panel of serum cytokines representing diagnostic and predictive biomarkers for pancreatic cancer. METHODS: A cytokine antibody array was performed to simultaneously identify 507 cytokines in sera of patients with pancreatic cancer and healthy controls. The nonparametric Mann-Whitney U test was used to pairwise compare the controls, the pretreated patients, and the posttreated patients. Fold changes greater than or equal to 1.5 or less than or equal to 1/1.5 were considered significant. Receiver operating characteristic curves were used to assess the performance of the model. A leave-one-out cross-validation was used for estimating prediction error. RESULTS: Comparing the sera of pretreated patients against the control samples, the cytokines fibroblast growth factor 10 (FGF-10/keratinocyte growth factor-2 (KGF-2), chemokine (C-X-C motif) ligand 11 interferon inducible T cell alpha chemokine (I-TAC)/chemokine [C-X-C motif] ligand 11 (CXCL11), oncostatin M (OSM), osteoactivin/glycoprotein nonmetastatic melanoma protein B, and stem cell factor (SCF) were found significantly overexpressed. Besides, the cytokines CD30 ligand/tumor necrosis factor superfamily, member 8 (TNFSF8), chordin-like 2, FGF-10/KGF-2, growth/differentiation factor 15, I-TAC/CXCL11, OSM, and SCF were differentially expressed in response to treatment. CONCLUSIONS: We propose a role for FGF-10/KGF-2, I-TAC/CXCL11, OSM, osteoactivin/glycoprotein nonmetastatic melanoma protein B, and SCF as novel diagnostic biomarkers. CD30 ligand/TNFSF8, chordin-like 2, FGF-10/KGF-2, growth/differentiation factor 15, I-TAC/CXCL11, OSM, and SCF might represent as predictive biomarkers for gemcitabine and erlotinib response of patients with pancreatic cancer.


Asunto(s)
Biomarcadores de Tumor/sangre , Carcinoma Ductal Pancreático/sangre , Citocinas/sangre , Proteínas de Neoplasias/sangre , Neoplasias Pancreáticas/sangre , Anciano , Antígenos de Neoplasias/sangre , Protocolos de Quimioterapia Combinada Antineoplásica/uso terapéutico , Área Bajo la Curva , Antígeno CA-19-9/sangre , Antígeno Carcinoembrionario/sangre , Carcinoma Ductal Pancreático/tratamiento farmacológico , Carcinoma Ductal Pancreático/epidemiología , Comorbilidad , Desoxicitidina/administración & dosificación , Desoxicitidina/análogos & derivados , Diabetes Mellitus Tipo 2/epidemiología , Clorhidrato de Erlotinib , Femenino , Humanos , Masculino , Persona de Mediana Edad , Neoplasias Pancreáticas/tratamiento farmacológico , Neoplasias Pancreáticas/epidemiología , Valor Predictivo de las Pruebas , Quinazolinas/administración & dosificación , Curva ROC , Sensibilidad y Especificidad , Fumar/epidemiología , Microambiente Tumoral , Gemcitabina
15.
Bioinformatics ; 29(17): 2112-21, 2013 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-23793754

RESUMEN

MOTIVATION: Multiple sequence alignments (MSAs) are widely used approaches in bioinformatics to carry out other tasks such as structure predictions, biological function analyses or phylogenetic modeling. However, current tools usually provide partially optimal alignments, as each one is focused on specific biological features. Thus, the same set of sequences can produce different alignments, above all when sequences are less similar. Consequently, researchers and biologists do not agree about which is the most suitable way to evaluate MSAs. Recent evaluations tend to use more complex scores including further biological features. Among them, 3D structures are increasingly being used to evaluate alignments. Because structures are more conserved in proteins than sequences, scores with structural information are better suited to evaluate more distant relationships between sequences. RESULTS: The proposed multiobjective algorithm, based on the non-dominated sorting genetic algorithm, aims to jointly optimize three objectives: STRIKE score, non-gaps percentage and totally conserved columns. It was significantly assessed on the BAliBASE benchmark according to the Kruskal-Wallis test (P < 0.01). This algorithm also outperforms other aligners, such as ClustalW, Multiple Sequence Alignment Genetic Algorithm (MSA-GA), PRRP, DIALIGN, Hidden Markov Model Training (HMMT), Pattern-Induced Multi-sequence Alignment (PIMA), MULTIALIGN, Sequence Alignment Genetic Algorithm (SAGA), PILEUP, Rubber Band Technique Genetic Algorithm (RBT-GA) and Vertical Decomposition Genetic Algorithm (VDGA), according to the Wilcoxon signed-rank test (P < 0.05), whereas it shows results not significantly different to 3D-COFFEE (P > 0.05) with the advantage of being able to use less structures. Structural information is included within the objective function to evaluate more accurately the obtained alignments. AVAILABILITY: The source code is available at http://www.ugr.es/~fortuno/MOSAStrE/MO-SAStrE.zip.


Asunto(s)
Algoritmos , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína , Bases de Datos de Proteínas , Filogenia , Conformación Proteica , Proteínas/clasificación
16.
BMC Bioinformatics ; 14: 113, 2013 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-23537461

RESUMEN

BACKGROUND: A popular query from scientists reading a biomedical abstract is to search for topic-related documents in bibliographic databases. Such a query is challenging because the amount of information attached to a single abstract is little, whereas classification-based retrieval algorithms are optimally trained with large sets of relevant documents. As a solution to this problem, we propose a query expansion method that extends the information related to a manuscript using its cited references. RESULTS: Data on cited references and text sections in 249,108 full-text biomedical articles was extracted from the Open Access subset of the PubMed Central® database (PMC-OA). Of the five standard sections of a scientific article, the Introduction and Discussion sections contained most of the citations (mean = 10.2 and 9.9 citations, respectively). A large proportion of articles (98.4%) and their cited references (79.5%) were indexed in the PubMed® database. Using the MedlineRanker abstract classification tool, cited references allowed accurate retrieval of the citing document in a test set of 10,000 documents and also of documents related to six biomedical topics defined by particular MeSH® terms from the entire PMC-OA (p-value<0.01). Classification performance was sensitive to the topic and also to the text sections from which the references were selected. Classifiers trained on the baseline (i.e., only text from the query document and not from the references) were outperformed in almost all the cases. Best performance was often obtained when using all cited references, though using the references from Introduction and Discussion sections led to similarly good results. This query expansion method performed significantly better than pseudo relevance feedback in 4 out of 6 topics. CONCLUSIONS: The retrieval of documents related to a single document can be significantly improved by using the references cited by this document (p-value<0.01). Using references from Introduction and Discussion performs almost as well as using all references, which might be useful for methods that require reduced datasets due to computational limitations. Cited references from particular sections might not be appropriate for all topics. Our method could be a better alternative to pseudo relevance feedback though it is limited by full text availability.


Asunto(s)
Minería de Datos/métodos , PubMed , Algoritmos , Medical Subject Headings
17.
Nucleic Acids Res ; 41(1): e26, 2013 Jan 07.
Artículo en Inglés | MEDLINE | ID: mdl-23066102

RESUMEN

Multiple sequence alignments (MSAs) have become one of the most studied approaches in bioinformatics to perform other outstanding tasks such as structure prediction, biological function analysis or next-generation sequencing. However, current MSA algorithms do not always provide consistent solutions, since alignments become increasingly difficult when dealing with low similarity sequences. As widely known, these algorithms directly depend on specific features of the sequences, causing relevant influence on the alignment accuracy. Many MSA tools have been recently designed but it is not possible to know in advance which one is the most suitable for a particular set of sequences. In this work, we analyze some of the most used algorithms presented in the bibliography and their dependences on several features. A novel intelligent algorithm based on least square support vector machine is then developed to predict how accurate each alignment could be, depending on its analyzed features. This algorithm is performed with a dataset of 2180 MSAs. The proposed system first estimates the accuracy of possible alignments. The most promising methodologies are then selected in order to align each set of sequences. Since only one selected algorithm is run, the computational time is not excessively increased.


Asunto(s)
Alineación de Secuencia/métodos , Máquina de Vectores de Soporte , Bases de Datos Genéticas , Análisis de los Mínimos Cuadrados , Reproducibilidad de los Resultados , Análisis de Secuencia de Proteína
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...