Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Genes (Basel) ; 15(3)2024 Feb 28.
Artigo em Inglês | MEDLINE | ID: mdl-38540371

RESUMO

The analysis of gene expression quantification data is a powerful and widely used approach in cancer research. This work provides new insights into the transcriptomic changes that occur in healthy uterine tissue compared to those in cancerous tissues and explores the differences associated with uterine cancer localizations and histological subtypes. To achieve this, RNA-Seq data from the TCGA database were preprocessed and analyzed using the KnowSeq package. Firstly, a kNN model was applied to classify uterine cervix cancer, uterine corpus cancer, and healthy uterine samples. Through variable selection, a three-gene signature was identified (VWCE, CLDN15, ADCYAP1R1), achieving consistent 100% test accuracy across 20 repetitions of a 5-fold cross-validation. A supplementary similar analysis using miRNA-Seq data from the same samples identified an optimal two-gene miRNA-coding signature potentially regulating the three-gene signature previously mentioned, which attained optimal classification performance with an 82% F1-macro score. Subsequently, a kNN model was implemented for the classification of cervical cancer samples into their two main histological subtypes (adenocarcinoma and squamous cell carcinoma). A uni-gene signature (ICA1L) was identified, achieving 100% test accuracy through 20 repetitions of a 5-fold cross-validation and externally validated through the CGCI program. Finally, an examination of six cervical adenosquamous carcinoma (mixed) samples revealed a pattern where the gene expression value in the mixed class aligned closer to the histological subtype with lower expression, prompting a reconsideration of the diagnosis for these mixed samples. In summary, this study provides valuable insights into the molecular mechanisms of uterine cervix and corpus cancers. The newly identified gene signatures demonstrate robust predictive capabilities, guiding future research in cancer diagnosis and treatment methodologies.


Assuntos
Carcinoma Adenoescamoso , Carcinoma de Células Escamosas , MicroRNAs , Neoplasias do Colo do Útero , Feminino , Humanos , Neoplasias do Colo do Útero/genética , Neoplasias do Colo do Útero/metabolismo , Carcinoma de Células Escamosas/patologia , Perfilação da Expressão Gênica , Carcinoma Adenoescamoso/genética , Carcinoma Adenoescamoso/patologia , MicroRNAs/genética
4.
Cancer Imaging ; 23(1): 66, 2023 Jun 26.
Artigo em Inglês | MEDLINE | ID: mdl-37365659

RESUMO

BACKGROUND: Pancreatic ductal carcinoma patients have a really poor prognosis given its difficult early detection and the lack of early symptoms. Digital pathology is routinely used by pathologists to diagnose the disease. However, visually inspecting the tissue is a time-consuming task, which slows down the diagnostic procedure. With the advances occurred in the area of artificial intelligence, specifically with deep learning models, and the growing availability of public histology data, clinical decision support systems are being created. However, the generalization capabilities of these systems are not always tested, nor the integration of publicly available datasets for pancreatic ductal carcinoma detection (PDAC). METHODS: In this work, we explored the performace of two weakly-supervised deep learning models using the two more widely available datasets with pancreatic ductal carcinoma histology images, The Cancer Genome Atlas Project (TCGA) and the Clinical Proteomic Tumor Analysis Consortium (CPTAC). In order to have sufficient training data, the TCGA dataset was integrated with the Genotype-Tissue Expression (GTEx) project dataset, which contains healthy pancreatic samples. RESULTS: We showed how the model trained on CPTAC generalizes better than the one trained on the integrated dataset, obtaining an inter-dataset accuracy of 90.62% ± 2.32 and an outer-dataset accuracy of 92.17% when evaluated on TCGA + GTEx. Furthermore, we tested the performance on another dataset formed by tissue micro-arrays, obtaining an accuracy of 98.59%. We showed how the features learned in an integrated dataset do not differentiate between the classes, but between the datasets, noticing that a stronger normalization might be needed when creating clinical decision support systems with datasets obtained from different sources. To mitigate this effect, we proposed to train on the three available datasets, improving the detection performance and generalization capabilities of a model trained only on TCGA + GTEx and achieving a similar performance to the model trained only on CPTAC. CONCLUSIONS: The integration of datasets where both classes are present can mitigate the batch effect present when integrating datasets, improving the classification performance, and accurately detecting PDAC across different datasets.


Assuntos
Carcinoma Ductal Pancreático , Aprendizado Profundo , Neoplasias Pancreáticas , Humanos , Inteligência Artificial , Carcinoma Ductal Pancreático/diagnóstico , Carcinoma Ductal Pancreático/patologia , Proteômica , Neoplasias Pancreáticas/diagnóstico , Neoplasias Pancreáticas
5.
Viruses ; 14(9)2022 08 27.
Artigo em Inglês | MEDLINE | ID: mdl-36146700

RESUMO

OBJECTIVES: More than two years into the COVID-19 pandemic, SARS-CoV-2 still remains a global public health problem. Successive waves of infection have produced new SARS-CoV-2 variants with new mutations for which the impact on COVID-19 severity and patient survival is uncertain. METHODS: A total of 764 SARS-CoV-2 genomes, sequenced from COVID-19 patients, hospitalized from 19th February 2020 to 30 April 2021, along with their clinical data, were used for survival analysis. RESULTS: A significant association of B.1.1.7, the alpha lineage, with patient mortality (log hazard ratio (LHR) = 0.51, C.I. = [0.14,0.88]) was found upon adjustment by all the covariates known to affect COVID-19 prognosis. Moreover, survival analysis of mutations in the SARS-CoV-2 genome revealed 27 of them were significantly associated with higher mortality of patients. Most of these mutations were located in the genes coding for the S, ORF8, and N proteins. CONCLUSIONS: This study illustrates how a combination of genomic and clinical data can provide solid evidence for the impact of viral lineage on patient survival.


Assuntos
COVID-19 , SARS-CoV-2 , Genoma Viral , Humanos , Mutação , Pandemias , Filogenia , SARS-CoV-2/genética
6.
Gigascience ; 10(12)2021 12 02.
Artigo em Inglês | MEDLINE | ID: mdl-34865008

RESUMO

BACKGROUND: The current SARS-CoV-2 pandemic has emphasized the utility of viral whole-genome sequencing in the surveillance and control of the pathogen. An unprecedented ongoing global initiative is producing hundreds of thousands of sequences worldwide. However, the complex circumstances in which viruses are sequenced, along with the demand of urgent results, causes a high rate of incomplete and, therefore, useless sequences. Viral sequences evolve in the context of a complex phylogeny and different positions along the genome are in linkage disequilibrium. Therefore, an imputation method would be able to predict missing positions from the available sequencing data. RESULTS: We have developed the impuSARS application, which takes advantage of the enormous number of SARS-CoV-2 genomes available, using a reference panel containing 239,301 sequences, to produce missing data imputation in viral genomes. ImpuSARS was tested in a wide range of conditions (continuous fragments, amplicons or sparse individual positions missing), showing great fidelity when reconstructing the original sequences, recovering the lineage with a 100% precision for almost all the lineages, even in very poorly covered genomes (<20%). CONCLUSIONS: Imputation can improve the pace of SARS-CoV-2 sequencing production by recovering many incomplete or low-quality sequences that would be otherwise discarded. ImpuSARS can be incorporated in any primary data processing pipeline for SARS-CoV-2 whole-genome sequencing.


Assuntos
Genoma Viral , SARS-CoV-2 , Filogenia , SARS-CoV-2/genética , Sequenciamento Completo do Genoma
7.
PLoS Comput Biol ; 17(2): e1008748, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-33571195

RESUMO

MIGNON is a workflow for the analysis of RNA-Seq experiments, which not only efficiently manages the estimation of gene expression levels from raw sequencing reads, but also calls genomic variants present in the transcripts analyzed. Moreover, this is the first workflow that provides a framework for the integration of transcriptomic and genomic data based on a mechanistic model of signaling pathway activities that allows a detailed biological interpretation of the results, including a comprehensive functional profiling of cell activity. MIGNON covers the whole process, from reads to signaling circuit activity estimations, using state-of-the-art tools, it is easy to use and it is deployable in different computational environments, allowing an optimized use of the resources available.


Assuntos
Biologia Computacional/métodos , Genômica , RNA-Seq , Transdução de Sinais , Algoritmos , Linhagem Celular Tumoral , Bases de Dados Factuais , Perfilação da Expressão Gênica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Modelos Teóricos , Mutação , Software , Transcriptoma , Sequenciamento do Exoma , Fluxo de Trabalho
8.
Nucleic Acids Res ; 49(D1): D1130-D1137, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-32990755

RESUMO

The knowledge of the genetic variability of the local population is of utmost importance in personalized medicine and has been revealed as a critical factor for the discovery of new disease variants. Here, we present the Collaborative Spanish Variability Server (CSVS), which currently contains more than 2000 genomes and exomes of unrelated Spanish individuals. This database has been generated in a collaborative crowdsourcing effort collecting sequencing data produced by local genomic projects and for other purposes. Sequences have been grouped by ICD10 upper categories. A web interface allows querying the database removing one or more ICD10 categories. In this way, aggregated counts of allele frequencies of the pseudo-control Spanish population can be obtained for diseases belonging to the category removed. Interestingly, in addition to pseudo-control studies, some population studies can be made, as, for example, prevalence of pharmacogenomic variants, etc. In addition, this genomic data has been used to define the first Spanish Genome Reference Panel (SGRP1.0) for imputation. This is the first local repository of variability entirely produced by a crowdsourcing effort and constitutes an example for future initiatives to characterize local variability worldwide. CSVS is also part of the GA4GH Beacon network. CSVS can be accessed at: http://csvs.babelomics.org/.


Assuntos
Crowdsourcing , Bases de Dados Genéticas , Genética Populacional/métodos , Genoma Humano , Software , Alelos , Mapeamento Cromossômico , Exoma , Frequência do Gene , Variação Genética , Genômica , Humanos , Internet , Medicina de Precisão/métodos , Espanha
9.
IEEE J Biomed Health Inform ; 24(7): 2119-2130, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-31871000

RESUMO

Many clinical studies have revealed the high biological similarities existing among different skin pathological states. These similarities create difficulties in the efficient diagnosis of skin cancer, and encourage to study and design new intelligent clinical decision support systems. In this sense, gene expression analysis can help find differentially expressed genes (DEGs) simultaneously discerning multiple skin pathological states in a single test. The integration of multiple heterogeneous transcriptomic datasets requires different pipeline stages to be properly designed: from suitable batch merging and efficient biomarker selection to automated classification assessment. This article presents a novel approach addressing all these technical issues, with the intention of providing new sights about skin cancer diagnosis. Although new future efforts will have to be made in the search for better biomarkers recognizing specific skin pathological states, our study found a panel of 8 highly relevant multiclass DEGs for discerning up to 10 skin pathological states: 2 healthy skin conditions a priori, 2 cataloged precancerous skin diseases and 6 cancerous skin states. Their power of diagnosis over new samples was widely tested by previously well-trained classification models. Robust performance metrics such as overall and mean multiclass F1-score outperformed recognition rates of 94% and 80%, respectively. Clinicians should give special attention to highlighted multiclass DEGs that have high gene expression changes present among them, and understand their biological relationship to different skin pathological states.


Assuntos
Diagnóstico por Computador/métodos , Perfilação da Expressão Gênica/métodos , Aprendizado de Máquina , RNA-Seq/métodos , Neoplasias Cutâneas/diagnóstico , Biomarcadores Tumorais/análise , Biomarcadores Tumorais/genética , Biomarcadores Tumorais/metabolismo , Biologia Computacional , Humanos , Neoplasias Cutâneas/genética , Neoplasias Cutâneas/metabolismo
11.
Toxicol Appl Pharmacol ; 311: 113-116, 2016 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-27720938

RESUMO

Erlotinib is an epidermal growth factor receptor (EGFR) tyrosine kinase inhibitor that showed activity against pancreatic ductal adenocarcinoma (PDAC). The drug's most frequently reported side effect as a result of EGFR inhibition is skin rash (SR), a symptom which has been associated with a better therapeutic response to the drug. Gene expression profiling can be used as a tool to predict which patients will develop this important cutaneous manifestation. The aim of the present study was to identify which genes may influence the appearance of SR in PDAC patients. The study included 34 PDAC patients treated with erlotinib: 21 patients developed any grade of SR, while 13 patients did not (controls). Before administering any chemotherapy regimen and the development of SR, we collected RNA from peripheral blood samples of all patients and studied the differential gene expression pattern using the Illumina microarray platform HumanHT-12 v4 Expression BeadChip. Seven genes (FAM46C, IFITM3, GMPR, DENND6B, SELENBP1, NOL10, and SIAH2), involved in different pathways including regulatory, migratory, and signalling processes, were downregulated in PDAC patients with SR. Our results suggest the existence of a gene expression profiling significantly correlated with erlotinib-induced SR in PDAC that could be used as prognostic indicator in this patients.


Assuntos
Adenocarcinoma/tratamento farmacológico , Cloridrato de Erlotinib/efeitos adversos , Perfilação da Expressão Gênica , Neoplasias Pancreáticas/tratamento farmacológico , Pele/efeitos dos fármacos , Idoso , Feminino , Humanos , Masculino , Pessoa de Meia-Idade
12.
Biomed Res Int ; 2015: 518284, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26346854

RESUMO

The overall survival of patients with pancreatic ductal adenocarcinoma is extremely low. Although gemcitabine is the standard used chemotherapy for this disease, clinical outcomes do not reflect significant improvements, not even when combined with adjuvant treatments. There is an urgent need for prognosis markers to be found. The aim of this study was to analyze the potential value of serum cytokines to find a profile that can predict the clinical outcome in patients with pancreatic cancer and to establish a practical prognosis index that significantly predicts patients' outcomes. We have conducted an extensive analysis of serum prognosis biomarkers using an antibody array comprising 507 human cytokines. Overall survival was estimated using the Kaplan-Meier method. Univariate and multivariate Cox's proportional hazard models were used to analyze prognosis factors. To determine the extent that survival could be predicted based on this index, we used the leave-one-out cross-validation model. The multivariate model showed a better performance and it could represent a novel panel of serum cytokines that correlates to poor prognosis in pancreatic cancer. B7-1/CD80, EG-VEGF/PK1, IL-29, NRG1-beta1/HRG1-beta1, and PD-ECGF expressions portend a poor prognosis for patients with pancreatic cancer and these cytokines could represent novel therapeutic targets for this disease.


Assuntos
Carcinoma Ductal Pancreático/sangue , Carcinoma Ductal Pancreático/mortalidade , Citocinas/sangue , Neoplasias Pancreáticas/sangue , Neoplasias Pancreáticas/mortalidade , Adulto , Idoso , Intervalo Livre de Doença , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Taxa de Sobrevida
14.
Pancreas ; 43(7): 1042-9, 2014 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-24979617

RESUMO

OBJECTIVE: Pancreatic ductal adenocarcinoma is a deadly disease because of late diagnosis and chemoresistance. We aimed to find a panel of serum cytokines representing diagnostic and predictive biomarkers for pancreatic cancer. METHODS: A cytokine antibody array was performed to simultaneously identify 507 cytokines in sera of patients with pancreatic cancer and healthy controls. The nonparametric Mann-Whitney U test was used to pairwise compare the controls, the pretreated patients, and the posttreated patients. Fold changes greater than or equal to 1.5 or less than or equal to 1/1.5 were considered significant. Receiver operating characteristic curves were used to assess the performance of the model. A leave-one-out cross-validation was used for estimating prediction error. RESULTS: Comparing the sera of pretreated patients against the control samples, the cytokines fibroblast growth factor 10 (FGF-10/keratinocyte growth factor-2 (KGF-2), chemokine (C-X-C motif) ligand 11 interferon inducible T cell alpha chemokine (I-TAC)/chemokine [C-X-C motif] ligand 11 (CXCL11), oncostatin M (OSM), osteoactivin/glycoprotein nonmetastatic melanoma protein B, and stem cell factor (SCF) were found significantly overexpressed. Besides, the cytokines CD30 ligand/tumor necrosis factor superfamily, member 8 (TNFSF8), chordin-like 2, FGF-10/KGF-2, growth/differentiation factor 15, I-TAC/CXCL11, OSM, and SCF were differentially expressed in response to treatment. CONCLUSIONS: We propose a role for FGF-10/KGF-2, I-TAC/CXCL11, OSM, osteoactivin/glycoprotein nonmetastatic melanoma protein B, and SCF as novel diagnostic biomarkers. CD30 ligand/TNFSF8, chordin-like 2, FGF-10/KGF-2, growth/differentiation factor 15, I-TAC/CXCL11, OSM, and SCF might represent as predictive biomarkers for gemcitabine and erlotinib response of patients with pancreatic cancer.


Assuntos
Biomarcadores Tumorais/sangue , Carcinoma Ductal Pancreático/sangue , Citocinas/sangue , Proteínas de Neoplasias/sangue , Neoplasias Pancreáticas/sangue , Idoso , Antígenos de Neoplasias/sangue , Protocolos de Quimioterapia Combinada Antineoplásica/uso terapêutico , Área Sob a Curva , Antígeno CA-19-9/sangue , Antígeno Carcinoembrionário/sangue , Carcinoma Ductal Pancreático/tratamento farmacológico , Carcinoma Ductal Pancreático/epidemiologia , Comorbidade , Desoxicitidina/administração & dosagem , Desoxicitidina/análogos & derivados , Diabetes Mellitus Tipo 2/epidemiologia , Cloridrato de Erlotinib , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Neoplasias Pancreáticas/tratamento farmacológico , Neoplasias Pancreáticas/epidemiologia , Valor Preditivo dos Testes , Quinazolinas/administração & dosagem , Curva ROC , Sensibilidade e Especificidade , Fumar/epidemiologia , Microambiente Tumoral , Gencitabina
15.
Bioinformatics ; 29(17): 2112-21, 2013 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-23793754

RESUMO

MOTIVATION: Multiple sequence alignments (MSAs) are widely used approaches in bioinformatics to carry out other tasks such as structure predictions, biological function analyses or phylogenetic modeling. However, current tools usually provide partially optimal alignments, as each one is focused on specific biological features. Thus, the same set of sequences can produce different alignments, above all when sequences are less similar. Consequently, researchers and biologists do not agree about which is the most suitable way to evaluate MSAs. Recent evaluations tend to use more complex scores including further biological features. Among them, 3D structures are increasingly being used to evaluate alignments. Because structures are more conserved in proteins than sequences, scores with structural information are better suited to evaluate more distant relationships between sequences. RESULTS: The proposed multiobjective algorithm, based on the non-dominated sorting genetic algorithm, aims to jointly optimize three objectives: STRIKE score, non-gaps percentage and totally conserved columns. It was significantly assessed on the BAliBASE benchmark according to the Kruskal-Wallis test (P < 0.01). This algorithm also outperforms other aligners, such as ClustalW, Multiple Sequence Alignment Genetic Algorithm (MSA-GA), PRRP, DIALIGN, Hidden Markov Model Training (HMMT), Pattern-Induced Multi-sequence Alignment (PIMA), MULTIALIGN, Sequence Alignment Genetic Algorithm (SAGA), PILEUP, Rubber Band Technique Genetic Algorithm (RBT-GA) and Vertical Decomposition Genetic Algorithm (VDGA), according to the Wilcoxon signed-rank test (P < 0.05), whereas it shows results not significantly different to 3D-COFFEE (P > 0.05) with the advantage of being able to use less structures. Structural information is included within the objective function to evaluate more accurately the obtained alignments. AVAILABILITY: The source code is available at http://www.ugr.es/~fortuno/MOSAStrE/MO-SAStrE.zip.


Assuntos
Algoritmos , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína , Bases de Dados de Proteínas , Filogenia , Conformação Proteica , Proteínas/classificação
16.
BMC Bioinformatics ; 14: 113, 2013 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-23537461

RESUMO

BACKGROUND: A popular query from scientists reading a biomedical abstract is to search for topic-related documents in bibliographic databases. Such a query is challenging because the amount of information attached to a single abstract is little, whereas classification-based retrieval algorithms are optimally trained with large sets of relevant documents. As a solution to this problem, we propose a query expansion method that extends the information related to a manuscript using its cited references. RESULTS: Data on cited references and text sections in 249,108 full-text biomedical articles was extracted from the Open Access subset of the PubMed Central® database (PMC-OA). Of the five standard sections of a scientific article, the Introduction and Discussion sections contained most of the citations (mean = 10.2 and 9.9 citations, respectively). A large proportion of articles (98.4%) and their cited references (79.5%) were indexed in the PubMed® database. Using the MedlineRanker abstract classification tool, cited references allowed accurate retrieval of the citing document in a test set of 10,000 documents and also of documents related to six biomedical topics defined by particular MeSH® terms from the entire PMC-OA (p-value<0.01). Classification performance was sensitive to the topic and also to the text sections from which the references were selected. Classifiers trained on the baseline (i.e., only text from the query document and not from the references) were outperformed in almost all the cases. Best performance was often obtained when using all cited references, though using the references from Introduction and Discussion sections led to similarly good results. This query expansion method performed significantly better than pseudo relevance feedback in 4 out of 6 topics. CONCLUSIONS: The retrieval of documents related to a single document can be significantly improved by using the references cited by this document (p-value<0.01). Using references from Introduction and Discussion performs almost as well as using all references, which might be useful for methods that require reduced datasets due to computational limitations. Cited references from particular sections might not be appropriate for all topics. Our method could be a better alternative to pseudo relevance feedback though it is limited by full text availability.


Assuntos
Mineração de Dados/métodos , PubMed , Algoritmos , Medical Subject Headings
17.
Nucleic Acids Res ; 41(1): e26, 2013 Jan 07.
Artigo em Inglês | MEDLINE | ID: mdl-23066102

RESUMO

Multiple sequence alignments (MSAs) have become one of the most studied approaches in bioinformatics to perform other outstanding tasks such as structure prediction, biological function analysis or next-generation sequencing. However, current MSA algorithms do not always provide consistent solutions, since alignments become increasingly difficult when dealing with low similarity sequences. As widely known, these algorithms directly depend on specific features of the sequences, causing relevant influence on the alignment accuracy. Many MSA tools have been recently designed but it is not possible to know in advance which one is the most suitable for a particular set of sequences. In this work, we analyze some of the most used algorithms presented in the bibliography and their dependences on several features. A novel intelligent algorithm based on least square support vector machine is then developed to predict how accurate each alignment could be, depending on its analyzed features. This algorithm is performed with a dataset of 2180 MSAs. The proposed system first estimates the accuracy of possible alignments. The most promising methodologies are then selected in order to align each set of sequences. Since only one selected algorithm is run, the computational time is not excessively increased.


Assuntos
Alinhamento de Sequência/métodos , Máquina de Vetores de Suporte , Bases de Dados Genéticas , Análise dos Mínimos Quadrados , Reprodutibilidade dos Testes , Análise de Sequência de Proteína
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...