Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31
Filtrar
1.
Bioinformatics ; 40(2)2024 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-38390963

RESUMO

MOTIVATION: A patient's disease phenotype can be driven and determined by specific groups of cells whose marker genes are either unknown or can only be detected at late-stage using conventional bulk assays such as RNA-Seq technology. Recent advances in single-cell RNA sequencing (scRNA-seq) enable gene expression profiling in cell-level resolution, and therefore have the potential to identify those cells driving the disease phenotype even while the number of these cells is small. However, most existing methods rely heavily on accurate cell type detection, and the number of available annotated samples is usually too small for training deep learning predictive models. RESULTS: Here, we propose the method ScRAT for phenotype prediction using scRNA-seq data. To train ScRAT with a limited number of samples of different phenotypes, such as coronavirus disease (COVID) and non-COVID, ScRAT first applies a mixup module to increase the number of training samples. A multi-head attention mechanism is employed to learn the most informative cells for each phenotype without relying on a given cell type annotation. Using three public COVID datasets, we show that ScRAT outperforms other phenotype prediction methods. The performance edge of ScRAT over its competitors increases as the number of training samples decreases, indicating the efficacy of our sample mixup. Critical cell types detected based on high-attention cells also support novel findings in the original papers and the recent literature. This suggests that ScRAT overcomes the challenge of missing marker genes and limited sample number with great potential revealing novel molecular mechanisms and/or therapies. AVAILABILITY AND IMPLEMENTATION: The code of our proposed method ScRAT is published at https://github.com/yuzhenmao/ScRAT.


Assuntos
Análise de Célula Única , Análise da Expressão Gênica de Célula Única , Humanos , Análise de Célula Única/métodos , RNA-Seq , Perfilação da Expressão Gênica , Redes Neurais de Computação , Fenótipo , Análise de Sequência de RNA , Análise por Conglomerados
2.
Pac Symp Biocomput ; 28: 299-310, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36540986

RESUMO

Several biomedical applications contain multiple treatments from which we want to estimate the causal effect on a given outcome. Most existing Causal Inference methods, however, focus on single treatments. In this work, we propose a neural network that adopts a multi-task learning approach to estimate the effect of multiple treatments. We validated M3E2 in three synthetic benchmark datasets that mimic biomedical datasets. Our analysis showed that our method makes more accurate estimations than existing baselines.


Assuntos
Biologia Computacional , Redes Neurais de Computação , Humanos , Aprendizado de Máquina , Benchmarking
3.
Molecules ; 27(16)2022 Aug 11.
Artigo em Inglês | MEDLINE | ID: mdl-36014351

RESUMO

Computational prediction of ligand-target interactions is a crucial part of modern drug discovery as it helps to bypass high costs and labor demands of in vitro and in vivo screening. As the wealth of bioactivity data accumulates, it provides opportunities for the development of deep learning (DL) models with increasing predictive powers. Conventionally, such models were either limited to the use of very simplified representations of proteins or ineffective voxelization of their 3D structures. Herein, we present the development of the PSG-BAR (Protein Structure Graph-Binding Affinity Regression) approach that utilizes 3D structural information of the proteins along with 2D graph representations of ligands. The method also introduces attention scores to selectively weight protein regions that are most important for ligand binding. Results: The developed approach demonstrates the state-of-the-art performance on several binding affinity benchmarking datasets. The attention-based pooling of protein graphs enables identification of surface residues as critical residues for protein-ligand binding. Finally, we validate our model predictions against an experimental assay on a viral main protease (Mpro)-the hallmark target of SARS-CoV-2 coronavirus.


Assuntos
COVID-19 , SARS-CoV-2 , Humanos , Ligantes , Ligação Proteica , Proteínas/química
4.
BMC Bioinformatics ; 23(1): 42, 2022 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-35033007

RESUMO

BACKGROUND: There has been a simultaneous increase in demand and accessibility across genomics, transcriptomics, proteomics and metabolomics data, known as omics data. This has encouraged widespread application of omics data in life sciences, from personalized medicine to the discovery of underlying pathophysiology of diseases. Causal analysis of omics data may provide important insight into the underlying biological mechanisms. Existing causal analysis methods yield promising results when identifying potential general causes of an observed outcome based on omics data. However, they may fail to discover the causes specific to a particular stratum of individuals and missing from others. METHODS: To fill this gap, we introduce the problem of stratified causal discovery and propose a method, Aristotle, for solving it. Aristotle addresses the two challenges intrinsic to omics data: high dimensionality and hidden stratification. It employs existing biological knowledge and a state-of-the-art patient stratification method to tackle the above challenges and applies a quasi-experimental design method to each stratum to find stratum-specific potential causes. RESULTS: Evaluation based on synthetic data shows better performance for Aristotle in discovering true causes under different conditions compared to existing causal discovery methods. Experiments on a real dataset on Anthracycline Cardiotoxicity indicate that Aristotle's predictions are consistent with the existing literature. Moreover, Aristotle makes additional predictions that suggest further investigations.


Assuntos
Genômica , Proteômica , Humanos , Metabolômica , Medicina de Precisão , Transcriptoma
5.
Surg Obes Relat Dis ; 18(4): 546-554, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-34961735

RESUMO

BACKGROUND: Major concerns years after the sleeve gastrectomy (SG) include weight regain, development of hiatal hernia (HH) and gastroesophageal reflux disease, with esophagitis and Barrett's esophagus (BE). Both problems could be related, and the incidence of asymptomatic patients is troubling. OBJECTIVE: To study the incidence of reflux symptoms, esophagitis, BE, HH, and asymptomatic pathology and their relationship with weight regain in patients 5 years after undergoing SG at different bariatric centers in Spain. SETTING: Public and private hospitals with bariatric surgery units. METHODS: Prospective, multicenter, nonrandomized study involving 13 Spanish hospitals with a cumulative experience of 4,500 patients having undergone the SG procedure and patients who had been subjected to the procedure at least 5 years previously along with preoperative gastroscopy. The clinical history, preoperative gastroscopy, and technical details of the SG were recorded. A specific clinical questionnaire was given that recorded the intake volume, perception of satiety, and gastroesophageal reflux (GER) symptoms. Gastroscopy, pH-metry, and manometry studies were carried out, and the data were analyzed statistically. The study has been authorized by the official Spanish ethics committee CEI/CEIm Hospital Universitario Gran Canaria Dr Negrín (code 2019-216-1). RESULTS: One hundred and five patients who underwent SG and who had with at least 5 years of follow-up were included. All procedures were performed laparoscopically. The mean age of patients was 51.1 years, and 70.5% were women. The mean characteristics of the SG procedure were a 37.2F probe, at 4.6 cm from the pylorus, and a crura closure was performed in 5 cases. There were no major complications (Clavien-Dindo grade >3) or deaths. The average preoperative body mass index was 46.3 kg/m2, the minimum reached was 20.6 kg/m2, whereas the average after 5 years was of 34.5 kg/m2. GER, HH, and esophagitis symptoms went from 17.1%, 28.6%, and 5.7%, respectively, before the SG to 76%, 30.5%, and 31.4%, respectively, 5 years after the procedure. Symptoms persisted over the years in 37.1% of cases and presented de novo in 52.8% of cases. Fifty-three percent of manometries (n = 27, total 51) and 64% of pH-metries (n = 32, total 53; DeMeester average score was 65) were pathologic 5 years after the procedure. Concerning gastroscopies, 5 years after the procedure, HH was found in 33 patients (30.5% of total) and esophagitis in 32 patients (31.4% of total). Eighty patients (76%) had GER symptoms, and 25 patients (24%) were asymptomatic. Only 1 patient (.9%) developed BE. CONCLUSIONS: Our study has confirmed a high rate of both persistent and de novo esophagitis and hiatal hernia, many of which were asymptomatic, 5 years after SG had been performed. Weight regain and a striking increase in gastric capacity are risk factors indicative of esophagitis, even when patients are asymptomatic. We consider a control gastroscopy and the preventive use of proton pump inhibitors necessary in these cases regardless of symptoms. We recommend that a control gastroscopy should be performed in all cases regardless of symptoms 5 years after SG. Further studies are needed to validate these recommendations.


Assuntos
Esôfago de Barrett , Esofagite , Refluxo Gastroesofágico , Hérnia Hiatal , Obesidade Mórbida , Esôfago de Barrett/diagnóstico , Esôfago de Barrett/epidemiologia , Esôfago de Barrett/etiologia , Esofagite/epidemiologia , Esofagite/etiologia , Feminino , Gastrectomia/métodos , Refluxo Gastroesofágico/complicações , Refluxo Gastroesofágico/etiologia , Hérnia Hiatal/epidemiologia , Hérnia Hiatal/etiologia , Hérnia Hiatal/cirurgia , Humanos , Pessoa de Meia-Idade , Obesidade Mórbida/complicações , Estudos Prospectivos , Estudos Retrospectivos , Espanha/epidemiologia , Aumento de Peso
6.
Aging (Albany NY) ; 13(24): 25643-25652, 2021 12 16.
Artigo em Inglês | MEDLINE | ID: mdl-34915450

RESUMO

As the number of older adults increases, so does the pressure on health care systems due to age-related disorders. Attempts to reduce cognitive decline have focused on individual interventions such as exercise or diet, with limited success. This study adopted a different approach by investigating the impact of combined daily activities on memory decline. We used data from the National Institute of Aging's Health and Retirement Study to explore two new questions: does combining activities affect memory decline, and if yes, does this impact change across the lifespan? We created a new machine learning model using 33 daily activities and involving 3210 participants. Our results showed that the effect of combined activities on memory decline was stronger than any individual activity's impact. Moreover, this effect increased with age, whereas the importance of historical factors such as education, and baseline memory decreased. The present findings point out the importance of selecting multiple, diverse activities for older adults as they age. These results could have a significant impact on aging health policies promoting new programs such as social prescribing.


Assuntos
Cognição/fisiologia , Disfunção Cognitiva/prevenção & controle , Envelhecimento Saudável/fisiologia , Atividades de Lazer/psicologia , Atividades Cotidianas , Idoso , Idoso de 80 Anos ou mais , Relações Familiares/psicologia , Feminino , Humanos , Aprendizado de Máquina , Masculino
7.
Front Aging Neurosci ; 13: 693791, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34483879

RESUMO

Introduction: Rates of dementia are projected to increase over the coming years as global populations age. Without a treatment to slow the progression of dementia, many health policies are focusing on preventing dementia by slowing the rate of cognitive decline with age. However, it is unclear which lifestyle changes in old age meaningfully reduce the rate of cognitive decline associated with aging. Objectives: Use existing, multi-year longitudinal health data to determine if engagement in a variety of different lifestyle activities can slow the rate of cognitive decline as older adults age. Method: Data from the English Longitudinal Study of Aging was analyzed using a quasi-experimental, efficient matched-pair design inspired by the clinical trial methodology. Changes in short-term memory scores were assessed over a multi-year interval for groups who undertook one of 11 different lifestyle activities, compared to control groups matched across confounding socioeconomic and lifestyle factors. Results: Two factors, moderate-intensity physical activity and learning activities, resulted in significant positive impact on cognitive function. Conclusion: Our analysis brings cognitive benefit arguments in favor of two lifestyle activities, moderate-intensity physical activity and learning activities, while rejecting other factors advanced by the literature such as vigorous-intensity physical activity. Those findings justify and encourage the development of new lifestyle health programs by health authorities and bring forward the new health system solution, social prescribing.

8.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34382071

RESUMO

The goal of precision oncology is to tailor treatment for patients individually using the genomic profile of their tumors. Pharmacogenomics datasets such as cancer cell lines are among the most valuable resources for drug sensitivity prediction, a crucial task of precision oncology. Machine learning methods have been employed to predict drug sensitivity based on the multiple omics data available for large panels of cancer cell lines. However, there are no comprehensive guidelines on how to properly train and validate such machine learning models for drug sensitivity prediction. In this paper, we introduce a set of guidelines for different aspects of training gene expression-based predictors using cell line datasets. These guidelines provide extensive analysis of the generalization of drug sensitivity predictors and challenge many current practices in the community including the choice of training dataset and measure of drug sensitivity. The application of these guidelines in future studies will enable the development of more robust preclinical biomarkers.


Assuntos
Resistencia a Medicamentos Antineoplásicos , Aprendizado de Máquina , Farmacogenética , Algoritmos , Linhagem Celular Tumoral , Conjuntos de Dados como Assunto , Humanos
9.
Pac Symp Biocomput ; 26: 196-207, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33691017

RESUMO

Methods for causal inference from observational data are an alternative for scenarios where collecting counterfactual data or realizing a randomized experiment is not possible. Our proposed method ParKCA combines the results of several causal inference methods to learn new causes in applications with some known causes and many potential causes. We validate ParKCA in two Genome-wide association studies, one real-world and one simulated dataset. Our results show that ParKCA can infer more causes than existing methods.


Assuntos
Biologia Computacional , Estudo de Associação Genômica Ampla , Causalidade , Humanos
10.
Bioinformatics ; 37(12): 1691-1698, 2021 Jul 19.
Artigo em Inglês | MEDLINE | ID: mdl-33325506

RESUMO

MOTIVATION: Identification of differentially expressed genes is necessary for unraveling disease pathogenesis. This task is complicated by the fact that many diseases are heterogeneous at the molecular level and samples representing distinct disease subtypes may demonstrate different patterns of dysregulation. Biclustering methods are capable of identifying genes that follow a similar expression pattern only in a subset of samples and hence can consider disease heterogeneity. However, identifying biologically significant and reproducible sets of genes and samples remain challenging for the existing tools. Many recent studies have shown that the integration of gene expression and protein interaction data improves the robustness of prediction and classification and advances biomarker discovery. RESULTS: Here, we present DESMOND, a new method for identification of Differentially ExpreSsed gene MOdules iN Diseases. DESMOND performs network-constrained biclustering on gene expression data and identifies gene modules-connected sets of genes up- or down-regulated in subsets of samples. We applied DESMOND on expression profiles of samples from two large breast cancer cohorts and have shown that the capability of DESMOND to incorporate protein interactions allows identifying the biologically meaningful gene and sample subsets and improves the reproducibility of the results. AVAILABILITY AND IMPLEMENTATION: https://github.com/ozolotareva/DESMOND. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

11.
Sci Rep ; 10(1): 19389, 2020 11 09.
Artigo em Inglês | MEDLINE | ID: mdl-33168895

RESUMO

This project aimed to develop and evaluate a fast and fully-automated deep-learning method applying convolutional neural networks with deep supervision (CNN-DS) for accurate hematoma segmentation and volume quantification in computed tomography (CT) scans. Non-contrast whole-head CT scans of 55 patients with hemorrhagic stroke were used. Individual scans were standardized to 64 axial slices of 128 × 128 voxels. Each voxel was annotated independently by experienced raters, generating a binary label of hematoma versus normal brain tissue based on majority voting. The dataset was split randomly into training (n = 45) and testing (n = 10) subsets. A CNN-DS model was built applying the training data and examined using the testing data. Performance of the CNN-DS solution was compared with three previously established methods. The CNN-DS achieved a Dice coefficient score of 0.84 ± 0.06 and recall of 0.83 ± 0.07, higher than patch-wise U-Net (< 0.76). CNN-DS average running time of 0.74 ± 0.07 s was faster than PItcHPERFeCT (> 1412 s) and slice-based U-Net (> 12 s). Comparable interrater agreement rates were observed between "method-human" vs. "human-human" (Cohen's kappa coefficients > 0.82). The fully automated CNN-DS approach demonstrated expert-level accuracy in fast segmentation and quantification of hematoma, substantially improving over previous methods. Further research is warranted to test the CNN-DS solution as a software tool in clinical settings for effective stroke management.


Assuntos
Aprendizado Profundo , Cabeça/diagnóstico por imagem , Hemorragias Intracranianas/diagnóstico por imagem , Tomografia Computadorizada por Raios X , Idoso , Feminino , Humanos , Masculino , Pessoa de Meia-Idade
12.
Int J Mol Sci ; 21(16)2020 Aug 14.
Artigo em Inglês | MEDLINE | ID: mdl-32823970

RESUMO

Gain-of-function mutations in human androgen receptor (AR) are among the major causes of drug resistance in prostate cancer (PCa). Identifying mutations that cause resistant phenotype is of critical importance for guiding treatment protocols, as well as for designing drugs that do not elicit adverse responses. However, experimental characterization of these mutations is time consuming and costly; thus, predictive models are needed to anticipate resistant mutations and to guide the drug discovery process. In this work, we leverage experimental data collected on 68 AR mutants, either observed in the clinic or described in the literature, to train a deep neural network (DNN) that predicts the response of these mutants to currently used and experimental anti-androgens and testosterone. We demonstrate that the use of this DNN, with general 2D descriptors, provides a more accurate prediction of the biological outcome (inhibition, activation, no-response, mixed-response) in AR mutant-drug pairs compared to other machine learning approaches. Finally, the developed approach was used to make predictions of AR mutant response to the latest AR inhibitor darolutamide, which were then validated by in-vitro experiments.


Assuntos
Aprendizado Profundo , Neoplasias da Próstata/metabolismo , Receptores Androgênicos/metabolismo , Antagonistas de Receptores de Andrógenos/química , Antagonistas de Receptores de Andrógenos/farmacologia , Linhagem Celular Tumoral , Humanos , Masculino , Mutação/genética , Redes Neurais de Computação , Neoplasias da Próstata/tratamento farmacológico , Neoplasias da Próstata/genética , Curva ROC , Receptores Androgênicos/genética , Transcrição Gênica/efeitos dos fármacos
13.
Bioinformatics ; 36(Suppl_1): i380-i388, 2020 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-32657371

RESUMO

MOTIVATION: The goal of pharmacogenomics is to predict drug response in patients using their single- or multi-omics data. A major challenge is that clinical data (i.e. patients) with drug response outcome is very limited, creating a need for transfer learning to bridge the gap between large pre-clinical pharmacogenomics datasets (e.g. cancer cell lines), as a source domain, and clinical datasets as a target domain. Two major discrepancies exist between pre-clinical and clinical datasets: (i) in the input space, the gene expression data due to difference in the basic biology, and (ii) in the output space, the different measures of the drug response. Therefore, training a computational model on cell lines and testing it on patients violates the i.i.d assumption that train and test data are from the same distribution. RESULTS: We propose Adversarial Inductive Transfer Learning (AITL), a deep neural network method for addressing discrepancies in input and output space between the pre-clinical and clinical datasets. AITL takes gene expression of patients and cell lines as the input, employs adversarial domain adaptation and multi-task learning to address these discrepancies, and predicts the drug response as the output. To the best of our knowledge, AITL is the first adversarial inductive transfer learning method to address both input and output discrepancies. Experimental results indicate that AITL outperforms state-of-the-art pharmacogenomics and transfer learning baselines and may guide precision oncology more accurately. AVAILABILITY AND IMPLEMENTATION: https://github.com/hosseinshn/AITL. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Neoplasias , Farmacogenética , Humanos , Aprendizado de Máquina , Redes Neurais de Computação , Medicina de Precisão
14.
PLoS Comput Biol ; 15(11): e1007451, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31710622

RESUMO

Cancer is driven by genetic mutations that dysregulate pathways important for proper cell function. Therefore, discovering these cancer pathways and their dysregulation order is key to understanding and treating cancer. However, the heterogeneity of mutations between different individuals makes this challenging and requires that cancer progression is studied in a subtype-specific way. To address this challenge, we provide a mathematical model, called Subtype-specific Pathway Linear Progression Model (SPM), that simultaneously captures cancer subtypes and pathways and order of dysregulation of the pathways within each subtype. Experiments with synthetic data indicate the robustness of SPM to problem specifics including noise compared to an existing method. Moreover, experimental results on glioblastoma multiforme and colorectal adenocarcinoma show the consistency of SPM's results with the existing knowledge and its superiority to an existing method in certain cases. The implementation of our method is available at https://github.com/Dalton386/SPM.


Assuntos
Biologia Computacional/métodos , Redes e Vias Metabólicas/genética , Neoplasias/genética , Algoritmos , Neoplasias Colorretais/genética , Progressão da Doença , Glioblastoma/genética , Humanos , Modelos Lineares , Modelos Teóricos , Mutação , Neoplasias/metabolismo , Transdução de Sinais/genética
15.
Bioinformatics ; 35(14): i379-i388, 2019 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-31510674

RESUMO

MOTIVATION: Despite the remarkable advances in sequencing and computational techniques, noise in the data and complexity of the underlying biological mechanisms render deconvolution of the phylogenetic relationships between cancer mutations difficult. Besides that, the majority of the existing datasets consist of bulk sequencing data of single tumor sample of an individual. Accurate inference of the phylogenetic order of mutations is particularly challenging in these cases and the existing methods are faced with several theoretical limitations. To overcome these limitations, new methods are required for integrating and harnessing the full potential of the existing data. RESULTS: We introduce a method called Hintra for intra-tumor heterogeneity detection. Hintra integrates sequencing data for a cohort of tumors and infers tumor phylogeny for each individual based on the evolutionary information shared between different tumors. Through an iterative process, Hintra learns the repeating evolutionary patterns and uses this information for resolving the phylogenetic ambiguities of individual tumors. The results of synthetic experiments show an improved performance compared to two state-of-the-art methods. The experimental results with a recent Breast Cancer dataset are consistent with the existing knowledge and provide potentially interesting findings. AVAILABILITY AND IMPLEMENTATION: The source code for Hintra is available at https://github.com/sahandk/HINTRA.


Assuntos
Neoplasias , Software , Humanos , Mutação , Filogenia , Análise de Sequência
16.
Bioinformatics ; 35(14): i501-i509, 2019 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-31510700

RESUMO

MOTIVATION: Historically, gene expression has been shown to be the most informative data for drug response prediction. Recent evidence suggests that integrating additional omics can improve the prediction accuracy which raises the question of how to integrate the additional omics. Regardless of the integration strategy, clinical utility and translatability are crucial. Thus, we reasoned a multi-omics approach combined with clinical datasets would improve drug response prediction and clinical relevance. RESULTS: We propose MOLI, a multi-omics late integration method based on deep neural networks. MOLI takes somatic mutation, copy number aberration and gene expression data as input, and integrates them for drug response prediction. MOLI uses type-specific encoding sub-networks to learn features for each omics type, concatenates them into one representation and optimizes this representation via a combined cost function consisting of a triplet loss and a binary cross-entropy loss. The former makes the representations of responder samples more similar to each other and different from the non-responders, and the latter makes this representation predictive of the response values. We validate MOLI on in vitro and in vivo datasets for five chemotherapy agents and two targeted therapeutics. Compared to state-of-the-art single-omics and early integration multi-omics methods, MOLI achieves higher prediction accuracy in external validations. Moreover, a significant improvement in MOLI's performance is observed for targeted drugs when training on a pan-drug input, i.e. using all the drugs with the same target compared to training only on drug-specific inputs. MOLI's high predictive power suggests it may have utility in precision oncology. AVAILABILITY AND IMPLEMENTATION: https://github.com/hosseinshn/MOLI. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Antineoplásicos , Neoplasias , Redes Neurais de Computação , Algoritmos , Previsões , Humanos , Neoplasias/tratamento farmacológico , Preparações Farmacêuticas , Medicina de Precisão
17.
PLoS One ; 14(3): e0213584, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30897097

RESUMO

Large survey databases for aging-related analysis are often examined to discover key factors that affect a dependent variable of interest. Typically, this analysis is performed with methods assuming linear dependencies between variables. Such assumptions however do not hold in many cases, wherein data are linked by way of non-linear dependencies. This in turn requires applications of analytic methods, which are more accurate in identifying potentially non-linear dependencies. Here, we objectively compared the feature selection performance of several frequently-used linear selection methods and three non-linear selection methods in the context of large survey data. These methods were assessed using both synthetic and real-world datasets, wherein relationships between the features and dependent variables were known in advance. In contrast to linear methods, we found that the non-linear methods offered better overall feature selection performance than linear methods in all usage conditions. Moreover, the performance of the non-linear methods was more stable, being unaffected by the inclusion or exclusion of variables from the datasets. These properties make non-linear feature selection methods a potentially preferable tool for both hypothesis-driven and exploratory analyses for aging-related datasets.


Assuntos
Algoritmos , Bases de Dados Factuais , Processamento Eletrônico de Dados , Modelos Teóricos , Humanos
18.
Bioinformatics ; 35(18): 3263-3272, 2019 09 15.
Artigo em Inglês | MEDLINE | ID: mdl-30768166

RESUMO

MOTIVATION: Patient stratification methods are key to the vision of precision medicine. Here, we consider transcriptional data to segment the patient population into subsets relevant to a given phenotype. Whereas most existing patient stratification methods focus either on predictive performance or interpretable features, we developed a method striking a balance between these two important goals. RESULTS: We introduce a Bayesian method called SUBSTRA that uses regularized biclustering to identify patient subtypes and interpretable subtype-specific transcript clusters. The method iteratively re-weights feature importance to optimize phenotype prediction performance by producing more phenotype-relevant patient subtypes. We investigate the performance of SUBSTRA in finding relevant features using simulated data and successfully benchmark it against state-of-the-art unsupervised stratification methods and supervised alternatives. Moreover, SUBSTRA achieves predictive performance competitive with the supervised benchmark methods and provides interpretable transcriptional features in diverse biological settings, such as drug response prediction, cancer diagnosis, or kidney transplant rejection. AVAILABILITY AND IMPLEMENTATION: The R code of SUBSTRA is available at https://github.com/sahandk/SUBSTRA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Teorema de Bayes , Fenótipo , Medicina de Precisão
19.
IEEE/ACM Trans Comput Biol Bioinform ; 16(5): 1471-1482, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30736003

RESUMO

The understanding of subcellular localization (SCL) of proteins and proteome variation in the different tissues and organs of the human body are two crucial aspects for increasing our knowledge of the dynamic rules of proteins, the cell biology, and the mechanism of diseases. Although there have been tremendous contributions to these two fields independently, the lack of knowledge of the variation of spatial distribution of proteins in the different tissues still exists. Here, we proposed an approach that allows predicting protein SCL on tissue specificity through the use of tissue-specific functional associations and physical protein-protein interactions (PPIs). We applied our previously developed Bayesian collective Markov random fields (BCMRFs) on tissue-specific protein-protein interaction network (PPI network) for nine types of tissues focusing on eight high-level SCL. The evaluated results demonstrate the strength of our approach in predicting tissue-specific SCL. We identified 1,314 proteins that their SCL were previously proven cell line dependent. We predicted 549 novel tissue-specific localized candidate proteins while some of them were validated via text-mining.


Assuntos
Biologia Computacional/métodos , Espaço Intracelular/metabolismo , Especificidade de Órgãos/genética , Algoritmos , Teorema de Bayes , Humanos , Espaço Intracelular/química , Espaço Intracelular/genética , Cadeias de Markov , Mapeamento de Interação de Proteínas/métodos , Mapas de Interação de Proteínas/genética , Proteoma/química , Proteoma/genética , Proteoma/metabolismo , Reprodutibilidade dos Testes
20.
Bioinformatics ; 34(24): 4274-4283, 2018 12 15.
Artigo em Inglês | MEDLINE | ID: mdl-29931042

RESUMO

Motivation: Adverse drug reactions are one of the major factors that affect the wellbeing of patients and financial costs of healthcare systems. Genetic variations of patients have been shown to be a key factor in the occurrence and severity of many ADRs. However, the large number of confounding drugs and genetic biomarkers for each adverse reaction case demands a method that evaluates all potential genetic causes of ADRs simultaneously. Results: To address this challenge, we propose HUME, a multi-phase algorithm that recommends genetic factors for ADRs that are causally supported by the patient record data. HUME consists of the construction of a network from co-prevalence between significant genetic biomarkers and ADRs, a link score phase for predicting candidate relations based on the Adamic-Adar measure, and a causal refinement phase based on multiple hypothesis testing of quasi experimental designs for evaluating evidence and counter evidence of candidate relations in the patient records. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Biologia Computacional/métodos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Marcadores Genéticos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/genética , Marcadores Genéticos/genética , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...