Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 64
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
BMC Bioinformatics ; 24(1): 17, 2023 Jan 16.
Artículo en Inglés | MEDLINE | ID: mdl-36647008

RESUMEN

Colorectal cancer (CRC) is the third most common cancer and the second most deathly worldwide. It is a very heterogeneous disease that can develop via distinct pathways where metastasis is the primary cause of death. Therefore, it is crucial to understand the molecular mechanisms underlying metastasis. RNA-sequencing is an essential tool used for studying the transcriptional landscape. However, the high-dimensionality of gene expression data makes selecting novel metastatic biomarkers problematic. To distinguish early-stage CRC patients at risk of developing metastasis from those that are not, three types of binary classification approaches were used: (1) classification methods (decision trees, linear and radial kernel support vector machines, logistic regression, and random forest) using differentially expressed genes (DEGs) as input features; (2) regularized logistic regression based on the Elastic Net penalty and the proposed iTwiner-a network-based regularizer accounting for gene correlation information; and (3) classification methods based on the genes pre-selected using regularized logistic regression. Classifiers using the DEGs as features showed similar results, with random forest showing the highest accuracy. Using regularized logistic regression on the full dataset yielded no improvement in the methods' accuracy. Further classification using the pre-selected genes found by different penalty factors, instead of the DEGs, significantly improved the accuracy of the binary classifiers. Moreover, the use of network-based correlation information (iTwiner) for gene selection produced the best classification results and the identification of more stable and robust gene sets. Some are known to be tumor suppressor genes (OPCML-IT2), to be related to resistance to cancer therapies (RAC1P3), or to be involved in several cancer processes such as genome stability (XRCC6P2), tumor growth and metastasis (MIR602) and regulation of gene transcription (NME2P2). We show that the classification of CRC patients based on pre-selected features by regularized logistic regression is a valuable alternative to using DEGs, significantly increasing the models' predictive performance. Moreover, the use of correlation-based penalization for biomarker selection stands as a promising strategy for predicting patients' groups based on RNA-seq data.


Asunto(s)
Neoplasias Colorrectales , Humanos , Biomarcadores , Modelos Logísticos , Neoplasias Colorrectales/genética , Neoplasias Colorrectales/patología , Biomarcadores de Tumor/genética , Biomarcadores de Tumor/metabolismo , Moléculas de Adhesión Celular , Proteínas Ligadas a GPI
2.
Brief Bioinform ; 22(1): 77-87, 2021 01 18.
Artículo en Inglés | MEDLINE | ID: mdl-32597465

RESUMEN

The development of new molecular and cell technologies is having a significant impact on the quantity of data generated nowadays. The growth of omics databases is creating a considerable potential for knowledge discovery and, concomitantly, is bringing new challenges to statistical learning and computational biology for health applications. Indeed, the high dimensionality of these data may hamper the use of traditional regression methods and parameter estimation algorithms due to the intrinsic non-identifiability of the inherent optimization problem. Regularized optimization has been rising as a promising and useful strategy to solve these ill-posed problems by imposing additional constraints in the solution parameter space. In particular, the field of statistical learning with sparsity has been significantly contributing to building accurate models that also bring interpretability to biological observations and phenomena. Beyond the now-classic elastic net, one of the best-known methods that combine lasso with ridge penalizations, we briefly overview recent literature on structured regularizers and penalty functions that have been applied in biomedical data to build parsimonious models in a variety of underlying contexts, from survival to generalized linear models. These methods include functions of $\ell _k$-norms and network-based penalties that take into account the inherent relationships between the features. The successful application to omics data illustrates the potential of sparse structured regularization for identifying disease's molecular signatures and for creating high-performance clinical decision support systems towards more personalized healthcare. Supplementary information: Supplementary data are available at Briefings in Bioinformatics online.


Asunto(s)
Biología Computacional/métodos , Algoritmos , Sistemas de Apoyo a Decisiones Clínicas/normas , Humanos , Medicina de Precisión/métodos
3.
J Biomed Inform ; 140: 104328, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36924843

RESUMEN

In the healthcare sector, resorting to big data and advanced analytics is a great advantage when dealing with complex groups of patients in terms of comorbidities, representing a significant step towards personalized targeting. In this work, we focus on understanding key features and clinical pathways of patients with multimorbidity suffering from Dementia. This disease can result from many heterogeneous factors, potentially becoming more prevalent as the population ages. We present a set of methods that allow us to identify medical appointment patterns within a cohort of 1924 patients followed from January 2007 to August 2021 in Hospital da Luz (Lisbon), and to stratify patients into subgroups that exhibit similar patterns of interaction. With Markov Chains, we are able to identify the most prevailing medical appointments attended by Dementia patients, as well as recurring transitions between these. To perform patient stratification, we applied AliClu, a temporal sequence alignment algorithm for clustering longitudinal clinical data, which allowed us to successfully identify patient subgroups with similar medical appointment activity. A feature analysis per cluster obtained allows the identification of distinct patterns and characteristics. This pipeline provides a tool to identify prevailing clinical pathways of medical appointments within the dataset, as well as the most common transitions between medical specialities within Dementia patients. This methodology, alongside demographic and clinical data, has the potential to provide early signalling of the most likely clinical pathways and serve as a support tool for health providers in deciding the best course of treatment, considering a patient as a whole.


Asunto(s)
Demencia , Multimorbilidad , Humanos , Cadenas de Markov , Comorbilidad , Algoritmos , Demencia/diagnóstico
4.
J Math Biol ; 83(4): 39, 2021 09 22.
Artículo en Inglés | MEDLINE | ID: mdl-34553267

RESUMEN

Bone is constantly being renewed: in the adult skeleton, bone resorption and formation are in a tightly coupled balance, allowing for a constant bone density to be maintained. Yet this micro-environment provides the necessary conditions for the growth and proliferation of tumor cells, and thus bone is a common site for the development of metastases, mainly from primary breast and prostate cancer. Mathematical and computational models with differential equations can replicate this bone remodeling process. These models have been extended to include the effects of disruptive tumor pathologies in the bone dynamics, as metastases contribute to the decoupling between bone resorption and formation and to the self-perpetuating tumor growth cycle. Such models may also contemplate the counteraction effects of currently used therapies, and, in the case of treatments with drugs, their pharmocokinetics and pharmacodynamics. We present a thorough overview of biochemical models for bone remodeling, in the presence of a tumour together with anti-cancer and anti-resorptive therapy, formulated as systems of first-order differential equations, or simplified using variable order derivatives. The latter models, of which some are new to this paper, result in equations with fewer parameters, and allow accounting for anomalous diffusion processes. In this way, more compact and parsimonious models, that promptly highlight tumorous bone interactions, are achieved, providing an effective framework to counteract the loss of bone integrity on the affected areas.


Asunto(s)
Neoplasias Óseas , Neoplasias de la Próstata , Neoplasias Óseas/tratamiento farmacológico , Remodelación Ósea , Humanos , Masculino , Radiofármacos , Microambiente Tumoral
5.
BMC Bioinformatics ; 21(1): 59, 2020 Feb 18.
Artículo en Inglés | MEDLINE | ID: mdl-32070274

RESUMEN

BACKGROUND: Understanding cellular and molecular heterogeneity in glioblastoma (GBM), the most common and aggressive primary brain malignancy, is a crucial step towards the development of effective therapies. Besides the inter-patient variability, the presence of multiple cell populations within tumors calls for the need to develop modeling strategies able to extract the molecular signatures driving tumor evolution and treatment failure. With the advances in single-cell RNA Sequencing (scRNA-Seq), tumors can now be dissected at the cell level, unveiling information from their life history to their clinical implications. RESULTS: We propose a classification setting based on GBM scRNA-Seq data, through sparse logistic regression, where different cell populations (neoplastic and normal cells) are taken as classes. The goal is to identify gene features discriminating between the classes, but also those shared by different neoplastic clones. The latter will be approached via the network-based twiner regularizer to identify gene signatures shared by neoplastic cells from the tumor core and infiltrating neoplastic cells originated from the tumor periphery, as putative disease biomarkers to target multiple neoplastic clones. Our analysis is supported by the literature through the identification of several known molecular players in GBM. Moreover, the relevance of the selected genes was confirmed by their significance in the survival outcomes in bulk GBM RNA-Seq data, as well as their association with several Gene Ontology (GO) biological process terms. CONCLUSIONS: We presented a methodology intended to identify genes discriminating between GBM clones, but also those playing a similar role in different GBM neoplastic clones (including migrating cells), therefore potential targets for therapy research. Our results contribute to a deeper understanding on the genetic features behind GBM, by disclosing novel therapeutic directions accounting for GBM heterogeneity.


Asunto(s)
Neoplasias Encefálicas/genética , Glioblastoma/genética , RNA-Seq , Neoplasias Encefálicas/metabolismo , Clasificación/métodos , Ontología de Genes , Glioblastoma/metabolismo , Humanos , Análisis de la Célula Individual
6.
BMC Bioinformatics ; 21(1): 69, 2020 Feb 24.
Artículo en Inglés | MEDLINE | ID: mdl-32093622

RESUMEN

BACKGROUND: In this paper, we explore the concept of multi-objective optimization in the field of metabolic engineering when both continuous and integer decision variables are involved in the model. In particular, we propose a multi-objective model that may be used to suggest reaction deletions that maximize and/or minimize several functions simultaneously. The applications may include, among others, the concurrent maximization of a bioproduct and of biomass, or maximization of a bioproduct while minimizing the formation of a given by-product, two common requirements in microbial metabolic engineering. RESULTS: Production of ethanol by the widely used cell factory Saccharomyces cerevisiae was adopted as a case study to demonstrate the usefulness of the proposed approach in identifying genetic manipulations that improve productivity and yield of this economically highly relevant bioproduct. We did an in vivo validation and we could show that some of the predicted deletions exhibit increased ethanol levels in comparison with the wild-type strain. CONCLUSIONS: The multi-objective programming framework we developed, called MOMO, is open-source and uses POLYSCIP (Available at http://polyscip.zib.de/). as underlying multi-objective solver. MOMO is available at http://momo-sysbio.gforge.inria.fr.


Asunto(s)
Ingeniería Metabólica/métodos , Programas Informáticos , Biomasa , Etanol/metabolismo , Modelos Biológicos , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo
7.
BMC Bioinformatics ; 20(1): 356, 2019 Jun 25.
Artículo en Inglés | MEDLINE | ID: mdl-31238876

RESUMEN

BACKGROUND: Breast and prostate cancers are typical examples of hormone-dependent cancers, showing remarkable similarities at the hormone-related signaling pathways level, and exhibiting a high tropism to bone. While the identification of genes playing a specific role in each cancer type brings invaluable insights for gene therapy research by targeting disease-specific cell functions not accounted so far, identifying a common gene signature to breast and prostate cancers could unravel new targets to tackle shared hormone-dependent disease features, like bone relapse. This would potentially allow the development of new targeted therapies directed to genes regulating both cancer types, with a consequent positive impact in cancer management and health economics. RESULTS: We address the challenge of extracting gene signatures from transcriptomic data of prostate adenocarcinoma (PRAD) and breast invasive carcinoma (BRCA) samples, particularly estrogen positive (ER+), and androgen positive (AR+) triple-negative breast cancer (TNBC), using sparse logistic regression. The introduction of gene network information based on the distances between BRCA and PRAD correlation matrices is investigated, through the proposed twin networks recovery (twiner) penalty, as a strategy to ensure similarly correlated gene features in two diseases to be less penalized during the feature selection procedure. CONCLUSIONS: Our analysis led to the identification of genes that show a similar correlation pattern in BRCA and PRAD transcriptomic data, and are selected as key players in the classification of breast and prostate samples into ER+ BRCA/AR+ TNBC/PRAD tumor and normal tissues, and also associated with survival time distributions. The results obtained are supported by the literature and are expected to unveil the similarities between the diseases, disclose common disease biomarkers, and help in the definition of new strategies for more effective therapies.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Neoplasias de la Próstata/genética , Transcriptoma , Neoplasias de la Mama Triple Negativas/genética , Estrógenos/metabolismo , Femenino , Redes Reguladoras de Genes , Humanos , Modelos Logísticos , Masculino , Análisis de Componente Principal , Neoplasias de la Próstata/mortalidad , Neoplasias de la Próstata/patología , Receptores Androgénicos/metabolismo , Análisis de Supervivencia , Neoplasias de la Mama Triple Negativas/mortalidad , Neoplasias de la Mama Triple Negativas/patología
8.
BMC Med Inform Decis Mak ; 19(1): 289, 2019 12 30.
Artículo en Inglés | MEDLINE | ID: mdl-31888660

RESUMEN

BACKGROUND: Patient stratification is a critical task in clinical decision making since it can allow physicians to choose treatments in a personalized way. Given the increasing availability of electronic medical records (EMRs) with longitudinal data, one crucial problem is how to efficiently cluster the patients based on the temporal information from medical appointments. In this work, we propose applying the Temporal Needleman-Wunsch (TNW) algorithm to align discrete sequences with the transition time information between symbols. These symbols may correspond to a patient's current therapy, their overall health status, or any other discrete state. The transition time information represents the duration of each of those states. The obtained TNW pairwise scores are then used to perform hierarchical clustering. To find the best number of clusters and assess their stability, a resampling technique is applied. RESULTS: We propose the AliClu, a novel tool for clustering temporal clinical data based on the TNW algorithm coupled with clustering validity assessments through bootstrapping. The AliClu was applied for the analysis of the rheumatoid arthritis EMRs obtained from the Portuguese database of rheumatologic patient visits (Reuma.pt). In particular, the AliClu was used for the analysis of therapy switches, which were coded as letters corresponding to biologic drugs and included their durations before each change occurred. The obtained optimized clusters allow one to stratify the patients based on their temporal therapy profiles and to support the identification of common features for those groups. CONCLUSIONS: The AliClu is a promising computational strategy to analyse longitudinal patient data by providing validated clusters and by unravelling the patterns that exist in clinical outcomes. Patient stratification is performed in an automatic or semi-automatic way, allowing one to tune the alignment, clustering, and validation parameters. The AliClu is freely available at https://github.com/sysbiomed/AliClu.


Asunto(s)
Algoritmos , Análisis por Conglomerados , Registros Electrónicos de Salud , Humanos , Estudios Longitudinales , Factores de Tiempo
9.
BMC Med Inform Decis Mak ; 19(1): 13, 2019 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-30654776

RESUMEN

BACKGROUND: Joint models (JM) have emerged as a promising statistical framework to concurrently analyse survival data and multiple longitudinal responses. This is particularly relevant in clinical studies where the goal is to estimate the association between time-to-event data and the biomarkers evolution. In the context of oncological data, JM can indeed provide interesting prognostic markers for the event under study and thus support clinical decisions and treatment choices. However, several problems arise when dealing with this type of data, such as the high-dimensionality of the covariates space, the lack of knowledge about the function structure of the time series and the presence of missing data, facts that may hamper the accurate estimation of the JM. METHODS: We propose to apply JM for the analysis of bone metastatic patients and infer the association of their survival with several covariates, in particular the N-Telopeptide of Type I Collagen (NTX) dynamics. This biomarker has been identified as a relevant prognostic factor in patients with metastatic cancer, but only using static information in some specific time points. RESULTS: We extended this analysis using the full NTX time series for a larger cohort of patients with bone metastasis, and compared the results obtained by the JM and the extended Cox regression model. Imputation based on fuzzy clustering was used to deal with missing values and several functions for NTX evolution were compared, such as rational, exponential and cubic splines. CONCLUSIONS: The JM obtained confirm the association between NTX values and patients' response, attesting the importance of this time series, and additionally provide a deep understanding of the key survival covariates.


Asunto(s)
Biomarcadores de Tumor/metabolismo , Neoplasias Óseas/metabolismo , Colágeno Tipo I/metabolismo , Modelos Teóricos , Péptidos/metabolismo , Análisis de Supervivencia , Neoplasias Óseas/secundario , Humanos , Estudios Longitudinales
10.
BMC Bioinformatics ; 19(1): 168, 2018 05 04.
Artículo en Inglés | MEDLINE | ID: mdl-29728051

RESUMEN

BACKGROUND: Learning accurate models from 'omics data is bringing many challenges due to their inherent high-dimensionality, e.g. the number of gene expression variables, and comparatively lower sample sizes, which leads to ill-posed inverse problems. Furthermore, the presence of outliers, either experimental errors or interesting abnormal clinical cases, may severely hamper a correct classification of patients and the identification of reliable biomarkers for a particular disease. We propose to address this problem through an ensemble classification setting based on distinct feature selection and modeling strategies, including logistic regression with elastic net regularization, Sparse Partial Least Squares - Discriminant Analysis (SPLS-DA) and Sparse Generalized PLS (SGPLS), coupled with an evaluation of the individuals' outlierness based on the Cook's distance. The consensus is achieved with the Rank Product statistics corrected for multiple testing, which gives a final list of sorted observations by their outlierness level. RESULTS: We applied this strategy for the classification of Triple-Negative Breast Cancer (TNBC) RNA-Seq and clinical data from the Cancer Genome Atlas (TCGA). The detected 24 outliers were identified as putative mislabeled samples, corresponding to individuals with discrepant clinical labels for the HER2 receptor, but also individuals with abnormal expression values of ER, PR and HER2, contradictory with the corresponding clinical labels, which may invalidate the initial TNBC label. Moreover, the model consensus approach leads to the selection of a set of genes that may be linked to the disease. These results are robust to a resampling approach, either by selecting a subset of patients or a subset of genes, with a significant overlap of the outlier patients identified. CONCLUSIONS: The proposed ensemble outlier detection approach constitutes a robust procedure to identify abnormal cases and consensus covariates, which may improve biomarker selection for precision medicine applications. The method can also be easily extended to other regression models and datasets.


Asunto(s)
Neoplasias de la Mama Triple Negativas/genética , Secuenciación Completa del Genoma/métodos , Femenino , Humanos , Tamaño de la Muestra , Neoplasias de la Mama Triple Negativas/patología
11.
BMC Bioinformatics ; 17(Suppl 16): 449, 2016 Dec 13.
Artículo en Inglés | MEDLINE | ID: mdl-28105908

RESUMEN

BACKGROUND: Modeling survival oncological data has become a major challenge as the increase in the amount of molecular information nowadays available means that the number of features greatly exceeds the number of observations. One possible solution to cope with this dimensionality problem is the use of additional constraints in the cost function optimization. LASSO and other sparsity methods have thus already been successfully applied with such idea. Although this leads to more interpretable models, these methods still do not fully profit from the relations between the features, specially when these can be represented through graphs. We propose DEGREECOX, a method that applies network-based regularizers to infer Cox proportional hazard models, when the features are genes and the outcome is patient survival. In particular, we propose to use network centrality measures to constrain the model in terms of significant genes. RESULTS: We applied DEGREECOX to three datasets of ovarian cancer carcinoma and tested several centrality measures such as weighted degree, betweenness and closeness centrality. The a priori network information was retrieved from Gene Co-Expression Networks and Gene Functional Maps. When compared with RIDGE and LASSO, DEGREECOX shows an improvement in the classification of high and low risk patients in a par with NET-COX. The use of network information is especially relevant with datasets that are not easily separated. In terms of RMSE and C-index, DEGREECOX gives results that are similar to those of the best performing methods, in a few cases slightly better. CONCLUSIONS: Network-based regularization seems a promising framework to deal with the dimensionality problem. The centrality metrics proposed can be easily expanded to accommodate other topological properties of different biological networks.


Asunto(s)
Algoritmos , Regulación Neoplásica de la Expresión Génica , Redes Reguladoras de Genes , Neoplasias Ováricas/genética , Modelos de Riesgos Proporcionales , Femenino , Humanos , Modelos Genéticos
12.
Brief Bioinform ; 15(3): 376-89, 2014 May.
Artículo en Inglés | MEDLINE | ID: mdl-24058049

RESUMEN

Information theory (IT) addresses the analysis of communication systems and has been widely applied in molecular biology. In particular, alignment-free sequence analysis and comparison greatly benefited from concepts derived from IT, such as entropy and mutual information. This review covers several aspects of IT applications, ranging from genome global analysis and comparison, including block-entropy estimation and resolution-free metrics based on iterative maps, to local analysis, comprising the classification of motifs, prediction of transcription factor binding sites and sequence characterization based on linguistic complexity and entropic profiles. IT has also been applied to high-level correlations that combine DNA, RNA or protein features with sequence-independent properties, such as gene mapping and phenotype analysis, and has also provided models based on communication systems theory to describe information transmission channels at the cell level and also during evolutionary processes. While not exhaustive, this review attempts to categorize existing methods and to indicate their relation with broader transversal topics such as genomic signatures, data compression and complexity, time series analysis and phylogenetic classification, providing a resource for future developments in this promising area.


Asunto(s)
Biología Computacional/métodos , Teoría de la Información , Análisis de Secuencia/métodos , Sitios de Unión/genética , Genómica/métodos , Genómica/estadística & datos numéricos , Humanos , Modelos Estadísticos , Dinámicas no Lineales , Filogenia , Saccharomyces cerevisiae/genética , Alineación de Secuencia , Análisis de Secuencia/estadística & datos numéricos , Programas Informáticos , Factores de Transcripción/metabolismo
13.
J Theor Biol ; 391: 1-12, 2016 Feb 21.
Artículo en Inglés | MEDLINE | ID: mdl-26657065

RESUMEN

Bone is a common site for the development of metastasis, as its microenvironment provides the necessary conditions for the growth and proliferation of cancer cells. Several mathematical models to describe the bone remodeling process and how osteoclasts and osteoblasts coupled action ensures bone homeostasis have been proposed and further extended to include the effect of cancer cells. The model proposed here includes the influence of the parathyroid hormone (PTH) as capable of triggering and regulating the bone remodeling cycle. It also considers the secretion of PTH-related protein (PTHrP) by cancer cells, which stimulates the production of receptor activator of nuclear factor kappa-B ligand (RANKL) by osteoblasts that activates osteoclasts, increasing bone resorption and the subsequent release of growth factors entrapped in the bone matrix, which induce tumor growth, giving rise to a self-perpetuating cycle known as the vicious cycle of bone metastases. The model additionally describes how the presence of metastases contributes to the decoupling between bone resorption and formation. Moreover, the effects of anti-cancer and anti-resorptive treatments, through chemotherapy and the administration of bisphosphonates or denosumab, are also included, along with their corresponding pharmacokinetics (PK) and pharmacodynamics (PD). The simulated models, available at http://sels.tecnico.ulisboa.pt/software/, are able to describe bone remodeling cycles, the growth of bone metastases and how treatment can effectively reduce tumor burden on bone and prevent loss of bone strength.


Asunto(s)
Neoplasias Óseas , Denosumab/uso terapéutico , Difosfonatos/uso terapéutico , Modelos Biológicos , Hormona Paratiroidea/metabolismo , Microambiente Tumoral , Animales , Neoplasias Óseas/tratamiento farmacológico , Neoplasias Óseas/metabolismo , Neoplasias Óseas/patología , Neoplasias Óseas/secundario , Humanos , Metástasis de la Neoplasia , Osteoblastos/metabolismo , Osteoblastos/patología , Osteoclastos/metabolismo , Osteoclastos/patología
14.
BMC Bioinformatics ; 14: 283, 2013 Sep 25.
Artículo en Inglés | MEDLINE | ID: mdl-24067087

RESUMEN

BACKGROUND: Existing tools to model cell growth curves do not offer a flexible integrative approach to manage large datasets and automatically estimate parameters. Due to the increase of experimental time-series from microbiology and oncology, the need for a software that allows researchers to easily organize experimental data and simultaneously extract relevant parameters in an efficient way is crucial. RESULTS: BGFit provides a web-based unified platform, where a rich set of dynamic models can be fitted to experimental time-series data, further allowing to efficiently manage the results in a structured and hierarchical way. The data managing system allows to organize projects, experiments and measurements data and also to define teams with different editing and viewing permission. Several dynamic and algebraic models are already implemented, such as polynomial regression, Gompertz, Baranyi, Logistic and Live Cell Fraction models and the user can add easily new models thus expanding current ones. CONCLUSIONS: BGFit allows users to easily manage their data and models in an integrated way, even if they are not familiar with databases or existing computational tools for parameter estimation. BGFit is designed with a flexible architecture that focus on extensibility and leverages free software with existing tools and methods, allowing to compare and evaluate different data modeling techniques. The application is described in the context of bacterial and tumor cells growth data fitting, but it is also applicable to any type of two-dimensional data, e.g. physical chemistry and macroeconomic time series, being fully scalable to high number of projects, data and model complexity.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Factuales , Modelos Biológicos , Programas Informáticos , Algoritmos , Proliferación Celular , Internet , Interfaz Usuario-Computador
15.
Stat Methods Med Res ; 31(5): 947-958, 2022 05.
Artículo en Inglés | MEDLINE | ID: mdl-35072570

RESUMEN

The extraction of novel information from omics data is a challenging task, in particular, since the number of features (e.g. genes) often far exceeds the number of samples. In such a setting, conventional parameter estimation leads to ill-posed optimization problems, and regularization may be required. In addition, outliers can largely impact classification accuracy.Here we introduce ROSIE, an ensemble classification approach, which combines three sparse and robust classification methods for outlier detection and feature selection and further performs a bootstrap-based validity check. Outliers of ROSIE are determined by the rank product test using outlier rankings of all three methods, and important features are selected as features commonly selected by all methods.We apply ROSIE to RNA-Seq data from The Cancer Genome Atlas (TCGA) to classify observations into Triple-Negative Breast Cancer (TNBC) and non-TNBC tissue samples. The pre-processed dataset consists of 16,600 genes and more than 1,000 samples. We demonstrate that ROSIE selects important features and outliers in a robust way. Identified outliers are concordant with the distribution of the commonly selected genes by the three methods, and results are in line with other independent studies. Furthermore, we discuss the association of some of the selected genes with the TNBC subtype in other investigations. In summary, ROSIE constitutes a robust and sparse procedure to identify outliers and important genes through binary classification. Our approach is ad hoc applicable to other datasets, fulfilling the overall goal of simultaneously identifying outliers and candidate disease biomarkers to the targeted in therapy research and personalized medicine frameworks.


Asunto(s)
Neoplasias de la Mama Triple Negativas , Humanos , Neoplasias de la Mama Triple Negativas/genética
16.
Cells ; 11(15)2022 07 27.
Artículo en Inglés | MEDLINE | ID: mdl-35954157

RESUMEN

Clear cell renal cell carcinoma (ccRCC) is the most common subtype of RCC showing a significant percentage of mortality. One of the priorities of kidney cancer research is to identify RCC-specific biomarkers for early detection and screening of the disease. With the development of high-throughput technology, it is now possible to measure the expression levels of thousands of genes in parallel and assess the molecular profile of individual tumors. Studying the relationship between gene expression and survival outcome has been widely used to find genes associated with cancer survival, providing new information for clinical decision-making. One of the challenges of using transcriptomics data is their high dimensionality which can lead to instability in the selection of gene signatures. Here we identify potential prognostic biomarkers correlated to the survival outcome of ccRCC patients using two network-based regularizers (EN and TCox) applied to Cox models. Some genes always selected by each method were found (COPS7B, DONSON, GTF2E2, HAUS8, PRH2, and ZNF18) with known roles in cancer formation and progression. Afterward, different lists of genes ranked based on distinct metrics (logFC of DEGs or ß coefficients of regression) were analyzed using GSEA to try to find over- or under-represented mechanisms and pathways. Some ontologies were found in common between the gene sets tested, such as nuclear division, microtubule and tubulin binding, and plasma membrane and chromosome regions. Additionally, genes that were more involved in these ontologies and genes selected by the regularizers were used to create a new gene set where we applied the Cox regression model. With this smaller gene set, we were able to significantly split patients into high/low risk groups showing the importance of studying these genes as potential prognostic factors to help clinicians better identify and monitor patients with ccRCC.


Asunto(s)
Carcinoma de Células Renales , Neoplasias Renales , Biomarcadores de Tumor/genética , Biomarcadores de Tumor/metabolismo , Carcinoma de Células Renales/metabolismo , Humanos , Riñón/patología , Neoplasias Renales/genética , Neoplasias Renales/patología , Transcriptoma/genética
17.
Toxins (Basel) ; 14(10)2022 09 30.
Artículo en Inglés | MEDLINE | ID: mdl-36287948

RESUMEN

Diarrhetic Shellfish Poisoning (DSP) is an acute intoxication caused by the consumption of contaminated shellfish, which is common in many regions of the world. To safeguard human health, most countries implement programs focused on the surveillance of toxic phytoplankton abundance and shellfish toxicity levels, an effort that can be complemented by a deeper understanding of the underlying phenomena. In this work, we identify patterns of seasonality in shellfish toxicity across the Portuguese coast and analyse time-lagged correlations between this toxicity and various potential risk factors. We extend the understanding of these relations through the introduction of temporal lags, allowing the analysis of time series at different points in time and the study of the predictive power of the tested variables. This study confirms previous findings about toxicity seasonality patterns on the Portuguese coast and provides further quantitative data about the relations between shellfish toxicity and geographical location, shellfish species, toxic phytoplankton abundances, and environmental conditions. Furthermore, multiple pairs of areas and shellfish species are identified as having correlations high enough to allow for a predictive analysis. These results represent the first step towards understanding the dynamics of DSP toxicity in Portuguese shellfish producing areas, such as temporal and spatial variability, and towards the development of a shellfish safety forecasting system.


Asunto(s)
Intoxicación por Mariscos , Humanos , Toxinas Marinas/toxicidad , Toxinas Marinas/análisis , Mariscos/análisis , Fitoplancton
18.
Front Genet ; 13: 815476, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35281848

RESUMEN

Motivation: The increasing availability of metabolomic data and their analysis are improving the understanding of cellular mechanisms and how biological systems respond to different perturbations. Currently, there is a need for novel computational methods that facilitate the analysis and integration of increasing volume of available data. Results: In this paper, we present Totoro a new constraint-based approach that integrates quantitative non-targeted metabolomic data of two different metabolic states into genome-wide metabolic models and predicts reactions that were most likely active during the transient state. We applied Totoro to real data of three different growth experiments (pulses of glucose, pyruvate, succinate) from Escherichia coli and we were able to predict known active pathways and gather new insights on the different metabolisms related to each substrate. We used both the E. coli core and the iJO1366 models to demonstrate that our approach is applicable to both smaller and larger networks. Availability: Totoro is an open source method (available at https://gitlab.inria.fr/erable/totoro) suitable for any organism with an available metabolic model. It is implemented in C++ and depends on IBM CPLEX which is freely available for academic purposes.

20.
BioData Min ; 14(1): 25, 2021 Apr 14.
Artículo en Inglés | MEDLINE | ID: mdl-33853663

RESUMEN

BACKGROUND: Longitudinal gene expression analysis and survival modeling have been proved to add valuable biological and clinical knowledge. This study proposes a novel framework to discover gene signatures and patterns in a high-dimensional time series transcriptomics data and to assess their association with hospital length of stay. METHODS: We investigated a longitudinal and high-dimensional gene expression dataset from 168 blunt-force trauma patients followed during the first 28 days after injury. To model the length of stay, an initial dimensionality reduction step was performed by applying Cox regression with elastic net regularization using gene expression data from the first hospitalization days. Also, a novel methodology to impute missing values to the genes selected previously was proposed. We then applied multivariate time series (MTS) clustering to analyse gene expression over time and to stratify patients with similar trajectories. The validation of the patients' partitions obtained by MTS clustering was performed using Kaplan-Meier curves and log-rank tests. RESULTS: We were able to unravel 22 genes strongly associated with hospital's discharge. Their expression values in the first days after trauma showed to be good predictors of the length of stay. The proposed mixed imputation method allowed to achieve a complete dataset of short time series with a minimum loss of information for the 28 days of follow-up. MTS clustering enabled to group patients with similar genes trajectories and, notably, with similar discharge days from the hospital. Patients within each cluster have comparable genes' trajectories and may have an analogous response to injury. CONCLUSION: The proposed framework was able to tackle the joint analysis of time-to-event information with longitudinal multivariate high-dimensional data. The application to length of stay and transcriptomics data revealed a strong relationship between gene expression trajectory and patients' recovery, which may improve trauma patient's management by healthcare systems. The proposed methodology can be easily adapted to other medical data, towards more effective clinical decision support systems for health applications.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA