Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 68
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Hum Mol Genet ; 33(19): 1697-1710, 2024 Sep 19.
Artigo em Inglês | MEDLINE | ID: mdl-39017605

RESUMO

Disease risk prediction based on genomic sequence and transcriptional profile can improve disease screening and prevention. Despite identifying many disease-associated DNA variants, distinguishing deleterious non-coding DNA variations remains poor for most common diseases. In this study, we designed in vitro experiments to uncover the significance of occupancy and competitive binding between P53 and cMYC on common target genes. Analyzing publicly available ChIP-seq data for P53 and cMYC in embryonic stem cells showed that ~344-366 regions are co-occupied, and on average, two cis-overlapping motifs (CisOMs) per region were identified, suggesting that co-occupancy is evolutionarily conserved. Using U2OS and Raji cells untreated and treated with doxorubicin to increase P53 protein level while potentially reducing cMYC level, ChIP-seq analysis illustrated that around 16 to 922 genomic regions were co-occupied by P53 and cMYC, and substitutions of cMYC signals by P53 were detected post doxorubicin treatment. Around 187 expressed genes near co-occupied regions were altered at mRNA level according to RNA-seq data analysis. We utilized a computational motif-matching approach to illustrate that changes in predicted P53 binding affinity in CisOMs of co-occupied elements significantly correlate with alterations in reporter gene expression. We performed a similar analysis using SNPs mapped in CisOMs for P53 and cMYC from ChIP-seq data, and expression of target genes from GTEx portal. We found significant correlation between change in cMYC-motif binding affinity in CisOMs and altered expression. Our study brings us closer to developing a generally applicable approach to filter etiological non-coding variations associated with common diseases.


Assuntos
Proteínas Proto-Oncogênicas c-myc , Proteína Supressora de Tumor p53 , Humanos , Proteína Supressora de Tumor p53/genética , Proteína Supressora de Tumor p53/metabolismo , Proteínas Proto-Oncogênicas c-myc/genética , Proteínas Proto-Oncogênicas c-myc/metabolismo , Polimorfismo de Nucleotídeo Único , Doxorrubicina/farmacologia , Sítios de Ligação , Ligação Proteica , Linhagem Celular Tumoral
2.
BMC Bioinformatics ; 25(1): 250, 2024 Jul 30.
Artigo em Inglês | MEDLINE | ID: mdl-39080535

RESUMO

BACKGROUND: The potential benefits of drug combination synergy in cancer medicine are significant, yet the risks must be carefully managed due to the possibility of increased toxicity. Although artificial intelligence applications have demonstrated notable success in predicting drug combination synergy, several key challenges persist: (1) Existing models often predict average synergy values across a restricted range of testing dosages, neglecting crucial dose amounts and the mechanisms of action of the drugs involved. (2) Many graph-based models rely on static protein-protein interactions, failing to adapt to dynamic and higher-order relationships. These limitations constrain the applicability of current methods. RESULTS: We introduce SAFER, a Sub-hypergraph Attention-based graph model, addressing these issues by incorporating complex relationships among biological knowledge networks and considering dosing effects on subject-specific networks. SAFER outperformed previous models on the benchmark and the independent test set. The analysis of subgraph attention weight for the lung cancer cell line highlighted JAK-STAT signaling pathway, PRDM12, ZNF781, and CDC5L that have been implicated in lung fibrosis. CONCLUSIONS: SAFER presents an interpretable framework designed to identify drug-responsive signals. Tailored for comprehending dose effects on subject-specific molecular contexts, our model uniquely captures dose-level drug combination responses. This capability unlocks previously inaccessible avenues of investigation compared to earlier models. Furthermore, the SAFER framework can be leveraged by future inquiries to investigate molecular networks that uniquely characterize individual patients and can be applied to prioritize personalized effective treatment based on safe dose combinations.


Assuntos
Redes Neurais de Computação , Humanos , Linhagem Celular Tumoral , Sinergismo Farmacológico , Neoplasias Pulmonares/tratamento farmacológico , Neoplasias Pulmonares/metabolismo , Relação Dose-Resposta a Droga , Transdução de Sinais/efeitos dos fármacos , Antineoplásicos/farmacologia
3.
Brief Bioinform ; 23(3)2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35419584

RESUMO

Gene Ontology (GO) is widely used in the biological domain. It is the most comprehensive ontology providing formal representation of gene functions (GO concepts) and relations between them. However, unintentional quality defects (e.g. missing or erroneous relations) in GO may exist due to the large size of GO concepts and complexity of GO structures. Such quality defects would impact the results of GO-based analyses and applications. In this work, we introduce a novel evidence-based lexical pattern approach for quality assurance of GO relations. We leverage two layers of evidence to suggest potentially missing relations in GO as follows. We first utilize related concept pairs (i.e. existing relations) in GO to extract relationship-specific lexical patterns, which serve as the first layer evidence to automatically suggest potentially missing relations between unrelated concept pairs. For each suggested missing relation, we further identify two other existing relations as the second layer of evidence that resemble the difference between the missing relation and the existing relation based on which the missing relation is suggested. Applied to the 15 December 2021 release of GO, this approach suggested a total of 866 potentially missing relations. Local domain experts evaluated the entire set of potentially missing relations, and identified 821 as missing relations and 45 indicate erroneous existing relations. We submitted these findings to the GO consortium for further validation and received encouraging feedback. These indicate that our evidence-based approach can be utilized to uncover missing relations and erroneous existing relations in GO.


Assuntos
Ontologia Genética
4.
Bioinformatics ; 38(8): 2096-2101, 2022 04 12.
Artigo em Inglês | MEDLINE | ID: mdl-35176131

RESUMO

MOTIVATION: Cross-sectional analyses of primary cancer genomes have identified regions of recurrent somatic copy-number alteration, many of which result from positive selection during cancer formation and contain driver genes. However, no effective approach exists for identifying genomic loci under significantly different degrees of selection in cancers of different subtypes, anatomic sites or disease stages. RESULTS: CNGPLD is a new tool for performing case-control somatic copy-number analysis that facilitates the discovery of differentially amplified or deleted copy-number aberrations in a case group of cancer compared with a control group of cancer. This tool uses a Gaussian process statistical framework in order to account for the covariance structure of copy-number data along genomic coordinates and to control the false discovery rate at the region level. AVAILABILITY AND IMPLEMENTATION: CNGPLD is freely available at https://bitbucket.org/djhshih/cngpld as an R package. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genoma , Neoplasias , Humanos , Estudos Transversais , Genômica , Variações do Número de Cópias de DNA , Neoplasias/genética , Estudos de Casos e Controles , Software
5.
Am J Respir Cell Mol Biol ; 66(1): 53-63, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34370624

RESUMO

Idiopathic pulmonary fibrosis (IPF), a devastating, fibroproliferative, chronic lung disorder, is associated with expansion of fibroblasts/myofibroblasts, which leads to excessive production and deposition of extracellular matrix. IPF is typically clinically identified as end-stage lung disease, after fibrotic processes are well-established and advanced. Fibroblasts have been shown to be critically important in the development and progression of IPF. We hypothesize that differential chromatin access can drive genetic differences in IPF fibroblasts relative to healthy fibroblasts. To this end, we performed assay of transposase-accessible chromatin sequencing to identify differentially accessible regions within the genomes of fibroblasts from healthy and IPF lungs. Multiple motifs were identified to be enriched in IPF fibroblasts compared with healthy fibroblasts, including binding motifs for TWIST1 and FOXA1. RNA sequencing identified 93 genes that could be annotated to differentially accessible regions. Pathway analysis of the annotated genes identified cellular adhesion, cytoskeletal anchoring, and cell differentiation as important biological processes. In addition, single nucleotide polymorphism analysis showed that linkage disequilibrium blocks of IPF risk single nucleotide polymorphisms with IPF-accessible regions that have been identified to be located in genes that are important in IPF, including MUC5B, TERT, and TOLLIP. Validation studies in isolated lung tissue confirmed increased expression for TWIST1 and FOXA1 in addition to revealing SHANK2 and CSPR2 as novel targets. Thus, modulation of differential chromatin access may be an important mechanism in the pathogenesis of lung fibrosis.


Assuntos
Epigênese Genética , Fibroblastos/metabolismo , Fibrose Pulmonar Idiopática/genética , Fibrose Pulmonar Idiopática/patologia , Transcriptoma/genética , Sequência de Bases , Cromatina/metabolismo , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Humanos , Anotação de Sequência Molecular , Polimorfismo de Nucleotídeo Único/genética , Fatores de Transcrição/metabolismo , Transposases/metabolismo
6.
Ann Rheum Dis ; 81(6): 854-860, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35190386

RESUMO

OBJECTIVES: To characterise the peripheral blood cell (PBC) gene expression changes ensuing from mycophenolate mofetil (MMF) or cyclophosphamide (CYC) treatment and to determine the predictive significance of baseline PBC transcript scores for response to immunosuppression in systemic sclerosis (SSc)-related interstitial lung disease (ILD). METHODS: PBC RNA samples from baseline and 12-month visits, corresponding to the active treatment period of both arms in Scleroderma Lung Study II, were investigated by global RNA sequencing. Joint models were created to examine the predictive significance of baseline composite modular scores for the course of forced vital capacity (FVC) per cent predicted measurements from 3 to 12 months. RESULTS: 134 patients with SSc-ILD (CYC=69 and MMF=65) were investigated. CYC led to an upregulation of erythropoiesis, inflammation and myeloid lineage-related modules and a downregulation of lymphoid lineage-related modules. The modular changes resulting from MMF treatment were more modest and included a downregulation of plasmablast module. In the longitudinal analysis, none of the baseline transcript module scores showed predictive significance for FVC% course in the CYC arm. In contrast, in the MMF arm, higher baseline lymphoid lineage modules predicted better subsequent FVC% course, while higher baseline myeloid lineage and inflammation modules predicted worse subsequent FVC% course. CONCLUSION: Consistent with the primary mechanism of action of MMF on lymphocytes, patients with SSc-ILD with higher baseline lymphoid module scores had better FVC% course, while those with higher myeloid cell lineage activation score had poorer FVC% course on MMF.


Assuntos
Doenças Pulmonares Intersticiais , Escleroderma Sistêmico , Ciclofosfamida/uso terapêutico , Perfilação da Expressão Gênica , Humanos , Imunossupressores/uso terapêutico , Inflamação , Pulmão , Doenças Pulmonares Intersticiais/etiologia , Doenças Pulmonares Intersticiais/genética , Ácido Micofenólico/uso terapêutico , Escleroderma Sistêmico/complicações , Escleroderma Sistêmico/tratamento farmacológico , Escleroderma Sistêmico/genética , Capacidade Vital
7.
Artigo em Inglês | MEDLINE | ID: mdl-39257897

RESUMO

In longitudinal cohort studies, it is often of interest to predict the risk of a terminal clinical event using longitudinal predictor data among subjects at risk by the time of the prediction. The at-risk population changes over time; so does the association between predictors and the outcome, as well as the accumulating longitudinal predictor history. The dynamic nature of this prediction problem has received increasing interest in the literature, but computation often poses a challenge. The widely used joint model of longitudinal and survival data often comes with intensive computation and excessive model fitting time, due to numerical optimization and the analytically intractable high-dimensional integral in the likelihood function. This problem is exacerbated when the model is fit to a large dataset or the model involves multiple longitudinal predictors with nonlinear trajectories. This challenge can be addressed from an algorithmic perspective, by a novel two-stage estimation procedure, and from a computing perspective, by Graphics Processing Unit (GPU) programming. The latter is implemented through PyTorch, an emerging deep learning framework. The numerical studies demonstrate that the proposed algorithm and software can substantially speed up the estimation of the joint model, particularly with large datasets. The numerical studies also concluded that accounting for nonlinearity in longitudinal predictor trajectories can improve the prediction accuracy in comparison to joint modeling that ignore nonlinearity.

8.
Hum Genet ; 139(10): 1261-1272, 2020 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-32318854

RESUMO

Nonsyndromic cleft lip with or without cleft palate (NSCLP) is a common birth defect for which only ~ 20% of the underlying genetic variation has been identified. Variants in noncoding regions have been increasingly suggested to contribute to the missing heritability. In this study, we investigated whether variation in craniofacial enhancers contributes to NSCLP. Candidate enhancers were identified using VISTA Enhancer Browser and previous publications. Prioritization was based on patterning defects in knockout mice, deletion/duplication of craniofacial genes in animal models and results of whole exome/whole genome sequencing studies. This resulted in 20 craniofacial enhancers to be investigated. Custom amplicon-based sequencing probes were designed and used for sequencing 380 NSCLP probands (from multiplex and simplex families of non-Hispanic white (NHW) and Hispanic ethnicities) using Illumina MiSeq. The frequencies of identified variants were compared to ethnically matched European (CEU) and Los Angeles Mexican (MXL) control genomes and used for association analyses. Variants in mm427/MSX1 and hs1582/SPRY1 showed genome-wide significant association with NSCLP (p ≤ 6.4 × 10-11). In silico analysis showed that these enhancer variants may disrupt important transcription factor binding sites. Haplotypes involving these enhancers and also mm435/ABCA4 were significantly associated with NSCLP, especially in NHW (p ≤ 6.3 × 10-7). Importantly, groupwise burden analysis showed several enhancer combinations significantly over-represented in NSCLP individuals, revealing novel NSCLP pathways and supporting a polygenic inheritance model. Our findings support the role of craniofacial enhancer sequence variation in the etiology of NSCLP.


Assuntos
Fenda Labial/genética , Fissura Palatina/genética , Elementos Facilitadores Genéticos , Predisposição Genética para Doença , Variação Genética , Herança Multifatorial , Transportadores de Cassetes de Ligação de ATP/genética , Animais , Doenças Assintomáticas , Fenda Labial/etnologia , Fenda Labial/patologia , Fissura Palatina/etnologia , Fissura Palatina/patologia , Embrião de Mamíferos , Feminino , Estudos de Associação Genética , Sequenciamento de Nucleotídeos em Larga Escala , Hispânico ou Latino , Humanos , Fator de Transcrição MSX1/genética , Masculino , Proteínas de Membrana/genética , Camundongos , Linhagem , Fosfoproteínas/genética , Estados Unidos , População Branca
9.
Ann Rheum Dis ; 79(3): 379-386, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-31767698

RESUMO

OBJECTIVES: Determine global skin transcriptome patterns of early diffuse systemic sclerosis (SSc) and how they differ from later disease. METHODS: Skin biopsy RNA from 48 patients in the Prospective Registry for Early Systemic Sclerosis (PRESS) cohort (mean disease duration 1.3 years) and 33 matched healthy controls was examined by next-generation RNA sequencing. Data were analysed for cell type-specific signatures and compared with similarly obtained data from 55 previously biopsied patients in Genetics versus Environment in Scleroderma Outcomes Study cohort with longer disease duration (mean 7.4 years) and their matched controls. Correlations with histological features and clinical course were also evaluated. RESULTS: SSc patients in PRESS had a high prevalence of M2 (96%) and M1 (94%) macrophage and CD8 T cell (65%), CD4 T cell (60%) and B cell (69%) signatures. Immunohistochemical staining of immune cell markers correlated with the gene expression-based immune cell signatures. The prevalence of immune cell signatures in early diffuse SSc patients was higher than in patients with longer disease duration. In the multivariable model, adaptive immune cell signatures were significantly associated with shorter disease duration, while fibroblast and macrophage cell type signatures were associated with higher modified Rodnan Skin Score (mRSS). Immune cell signatures also correlated with skin thickness progression rate prior to biopsy, but did not predict subsequent mRSS progression. CONCLUSIONS: Skin in early diffuse SSc has prominent innate and adaptive immune cell signatures. As a prominently affected end organ, these signatures reflect the preceding rate of disease progression. These findings could have implications in understanding SSc pathogenesis and clinical trial design.


Assuntos
Imunidade Adaptativa/genética , Imunidade Inata/genética , Esclerodermia Difusa/genética , Esclerodermia Difusa/imunologia , Adulto , Biomarcadores/análise , Biópsia , Feminino , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Masculino , Pessoa de Meia-Idade , Análise Multivariada , Estudos Prospectivos , Sistema de Registros , Análise de Regressão , Esclerodermia Difusa/patologia , Análise de Sequência de RNA , Índice de Gravidade de Doença , Pele/imunologia , Pele/patologia , Transcriptoma
10.
J Biomed Inform ; 104: 103399, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-32151769

RESUMO

OBJECTIVE: The centrality of data to biomedical research is difficult to understate, and the same is true for the importance of the biomedical literature in disseminating empirical findings to scientific questions made on such data. But the connections between the literature and related datasets are often weak, hampering the ability of scientists to easily move between existing datasets and existing findings to derive new scientific hypotheses. This work aims to recommend relevant literature articles for datasets with the ultimate goal of increasing the productivity of researchers. Our approach to literature recommendation for datasets is a part of the dataset reusability platform developed at the University Texas Health Science Center at Houston for datasets related to gene expression. This platform incorporates datasets from Gene Expression Omnibus (GEO). An average of 34 datasets were added to GEO daily in the last five years (i.e. 2014 to 2018), demonstrating the need for automatic methods to connect these datasets with relevant literature. The relevant literature for a given dataset may describe that dataset, provide a scientific finding based on that dataset, or even describe prior and related work to the dataset's topic that is of interest to users of the dataset. MATERIALS AND METHODS: We adopt an information retrieval paradigm for literature recommendation. In our experiments, distributional semantic features are created from the title and abstract of MEDLINE articles. Then, related articles are identified for datasets in GEO. We evaluate multiple distributional methods such as TF-IDF, BM25, Latent Semantic Analysis, Latent Dirichlet Allocation, word2vec, and doc2vec. Top similar papers are recommended for each dataset using cosine similarity between the dataset's vector representation and every paper's vector representation. We also propose several novel re-ranking and normalization methods over embeddings to improve the recommendations. RESULTS: The top-performing literature recommendation technique achieved a strict precision at 10 of 0.8333 and a partial precision at 10 of 0.9000 using BM25 based on a manual evaluation of 36 datasets. Evaluation on a larger, automatically-collected benchmark shows small but consistent gains by emphasizing the similarity of dataset and article titles. CONCLUSION: This work is the first step toward developing a literature recommendation tool by recommending relevant literature for datasets. This will hopefully lead to better data reuse experience.


Assuntos
Pesquisa Biomédica , Armazenamento e Recuperação da Informação , Expressão Gênica , Humanos , Publicações , Semântica
11.
J Med Internet Res ; 22(7): e16981, 2020 07 31.
Artigo em Inglês | MEDLINE | ID: mdl-32735224

RESUMO

BACKGROUND: Asthma exacerbation is an acute or subacute episode of progressive worsening of asthma symptoms and can have a significant impact on patients' quality of life. However, efficient methods that can help identify personalized risk factors and make early predictions are lacking. OBJECTIVE: This study aims to use advanced deep learning models to better predict the risk of asthma exacerbations and to explore potential risk factors involved in progressive asthma. METHODS: We proposed a novel time-sensitive, attentive neural network to predict asthma exacerbation using clinical variables from large electronic health records. The clinical variables were collected from the Cerner Health Facts database between 1992 and 2015, including 31,433 adult patients with asthma. Interpretations on both patient and cohort levels were investigated based on the model parameters. RESULTS: The proposed model obtained an area under the curve value of 0.7003 through a five-fold cross-validation, which outperformed the baseline methods. The results also demonstrated that the addition of elapsed time embeddings considerably improved the prediction performance. Further analysis observed diverse distributions of contributing factors across patients as well as some possible cohort-level risk factors, which could be found supporting evidence from peer-reviewed literature such as respiratory diseases and esophageal reflux. CONCLUSIONS: The proposed neural network model performed better than previous methods for the prediction of asthma exacerbation. We believe that personalized risk scores and analyses of contributing factors can help clinicians better assess the individual's level of disease progression and afford the opportunity to adjust treatment, prevent exacerbation, and improve outcomes.


Assuntos
Asma/fisiopatologia , Aprendizado Profundo/normas , Redes Neurais de Computação , Qualidade de Vida/psicologia , Progressão da Doença , Feminino , Humanos , Masculino , Estudos Retrospectivos , Medição de Risco , Fatores de Risco
12.
Ann Rheum Dis ; 78(10): 1371-1378, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-31391177

RESUMO

OBJECTIVE: In the randomised scleroderma: Cyclophosphamide Or Transplantation (SCOT trial) (NCT00114530), myeloablation, followed by haematopoietic stem cell transplantation (HSCT), led to improved clinical outcomes compared with monthly cyclophosphamide (CYC) treatment in systemic sclerosis (SSc). Herein, the study aimed to determine global molecular changes at the whole blood transcript and serum protein levels ensuing from HSCT in comparison to intravenous monthly CYC in 62 participants enrolled in the SCOT study. METHODS: Global transcript studies were performed at pretreatment baseline, 8 months and 26 months postrandomisation using Illumina HT-12 arrays. Levels of 102 proteins were measured in the concomitantly collected serum samples. RESULTS: At the baseline visit, interferon (IFN) and neutrophil transcript modules were upregulated and the cytotoxic/NK module was downregulated in SSc compared with unaffected controls. A paired comparison of the 26 months to the baseline samples revealed a significant decrease of the IFN and neutrophil modules and an increase in the cytotoxic/NK module in the HSCT arm while there was no significant change in the CYC control arm. Also, a composite score of correlating serum proteins with IFN and neutrophil transcript modules, as well as a multilevel analysis showed significant changes in SSc molecular signatures after HSCT while similar changes were not observed in the CYC arm. Lastly, a decline in the IFN and neutrophil modules was associated with an improvement in pulmonary forced vital capacity and an increase in the cytotoxic/NK module correlated with improvement in skin score. CONCLUSION: HSCT contrary to conventional treatment leads to a significant 'correction' in disease-related molecular signatures.


Assuntos
Interferons/sangue , Neutrófilos/metabolismo , Escleroderma Sistêmico/genética , Transcriptoma , Condicionamento Pré-Transplante/métodos , Adulto , Ciclofosfamida/uso terapêutico , Regulação para Baixo , Feminino , Transplante de Células-Tronco Hematopoéticas/métodos , Humanos , Masculino , Pessoa de Meia-Idade , Análise Multinível , Agonistas Mieloablativos/uso terapêutico , Ensaios Clínicos Controlados Aleatórios como Assunto , Escleroderma Sistêmico/sangue , Escleroderma Sistêmico/terapia , Transplante Autólogo , Resultado do Tratamento , Regulação para Cima
13.
J Biomed Inform ; 85: 149-154, 2018 09.
Artigo em Inglês | MEDLINE | ID: mdl-30081101

RESUMO

The synergistic effect of drug combination is one of the most desirable properties for treating cancer. However, systematically predicting effective drug combination is a significant challenge. We report here a novel method based on deep belief network to predict drug synergy from gene expression, pathway and the Ontology Fingerprints-a literature derived ontological profile of genes. Using data sets provided by 2015 DREAM competition, our analysis shows that this integrative method outperforms published results from the DREAM website for 4999 drug pairs, demonstrating the feasibility of predicting drug synergy from literature and the -omics data using advanced artificial intelligence approach.


Assuntos
Aprendizado Profundo , Combinação de Medicamentos , Sinergismo Farmacológico , Protocolos de Quimioterapia Combinada Antineoplásica , Linhagem Celular Tumoral , Biologia Computacional , Bases de Dados de Produtos Farmacêuticos , Perfilação da Expressão Gênica , Ontologia Genética , Redes Reguladoras de Genes , Humanos , Neoplasias/tratamento farmacológico , Neoplasias/genética
14.
J Biomed Inform ; 84: 11-16, 2018 08.
Artigo em Inglês | MEDLINE | ID: mdl-29908902

RESUMO

Recently, recurrent neural networks (RNNs) have been applied in predicting disease onset risks with Electronic Health Record (EHR) data. While these models demonstrated promising results on relatively small data sets, the generalizability and transferability of those models and its applicability to different patient populations across hospitals have not been evaluated. In this study, we evaluated an RNN model, RETAIN, over Cerner Health Facts® EMR data, for heart failure onset risk prediction. Our data set included over 150,000 heart failure patients and over 1,000,000 controls from nearly 400 hospitals. Convincingly, RETAIN achieved an AUC of 82% in comparison to an AUC of 79% for logistic regression, demonstrating the power of more expressive deep learning models for EHR predictive modeling. The prediction performance fluctuated across different patient groups and varied from hospital to hospital. Also, we trained RETAIN models on individual hospitals and found that the model can be applied to other hospitals with only about 3.6% of reduction of AUC. Our results demonstrated the capability of RNN for predictive modeling with large and heterogeneous EHR data, and pave the road for future improvements.


Assuntos
Aprendizado Profundo , Registros Eletrônicos de Saúde , Insuficiência Cardíaca/diagnóstico , Redes Neurais de Computação , Idoso , Idoso de 80 Anos ou mais , Algoritmos , Área Sob a Curva , Estudos de Casos e Controles , Simulação por Computador , Bases de Dados Factuais , Feminino , Humanos , Modelos Logísticos , Masculino , Informática Médica/métodos , Pessoa de Meia-Idade , Reprodutibilidade dos Testes
15.
BMC Bioinformatics ; 18(Suppl 11): 405, 2017 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-28984189

RESUMO

The 2016 International Conference on Intelligent Biology and Medicine (ICIBM 2016) was held on December 8-10, 2016 in Houston, Texas, USA. ICIBM included eight scientific sessions, four tutorials, one poster session, four highlighted talks and four keynotes that covered topics on 3D genomics structural analysis, next generation sequencing (NGS) analysis, computational drug discovery, medical informatics, cancer genomics, and systems biology. Here, we present a summary of the nine research articles selected from ICIBM 2016 program for publishing in BMC Bioinformatics.


Assuntos
Biologia , Congressos como Assunto , Internacionalidade , Medicina , Estatística como Assunto , Algoritmos , Variações do Número de Cópias de DNA/genética , Humanos , Aprendizado de Máquina , Redes Neurais de Computação , Splicing de RNA/genética , Análise de Sequência de RNA
16.
Stat Med ; 36(22): 3461-3474, 2017 Sep 30.
Artigo em Inglês | MEDLINE | ID: mdl-28675924

RESUMO

In systems biology, it is of great interest to identify new genes that were not previously reported to be associated with biological pathways related to various functions and diseases. Identification of these new pathway-modulating genes does not only promote understanding of pathway regulation mechanisms but also allow identification of novel targets for therapeutics. Recently, biomedical literature has been considered as a valuable resource to investigate pathway-modulating genes. While the majority of currently available approaches are based on the co-occurrence of genes within an abstract, it has been reported that these approaches show only sub-optimal performances because 70% of abstracts contain information only for a single gene. To overcome such limitation, we propose a novel statistical framework based on the concept of ontology fingerprint that uses gene ontology to extract information from large biomedical literature data. The proposed framework simultaneously identifies pathway-modulating genes and facilitates interpreting functions of these new genes. We also propose a computationally efficient posterior inference procedure based on Metropolis-Hastings within Gibbs sampler for parameter updates and the poor man's reversible jump Markov chain Monte Carlo approach for model selection. We evaluate the proposed statistical framework with simulation studies, experimental validation, and an application to studies of pathway-modulating genes in yeast. The R implementation of the proposed model is currently available at https://dongjunchung.github.io/bayesGO/. Copyright © 2017 John Wiley & Sons, Ltd.


Assuntos
Teorema de Bayes , Mineração de Dados/métodos , Ontologia Genética , Biologia de Sistemas/métodos , Algoritmos , Biometria , Simulação por Computador , Impressões Digitais de DNA , Genes , Humanos , Cadeias de Markov , Método de Monte Carlo , Reprodutibilidade dos Testes , Saccharomyces cerevisiae/genética , Esfingolipídeos/genética
17.
Nucleic Acids Res ; 42(18): e138, 2014 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-25063300

RESUMO

To enhance our knowledge regarding biological pathway regulation, we took an integrated approach, using the biomedical literature, ontologies, network analyses and experimental investigation to infer novel genes that could modulate biological pathways. We first constructed a novel gene network via a pairwise comparison of all yeast genes' Ontology Fingerprints--a set of Gene Ontology terms overrepresented in the PubMed abstracts linked to a gene along with those terms' corresponding enrichment P-values. The network was further refined using a Bayesian hierarchical model to identify novel genes that could potentially influence the pathway activities. We applied this method to the sphingolipid pathway in yeast and found that many top-ranked genes indeed displayed altered sphingolipid pathway functions, initially measured by their sensitivity to myriocin, an inhibitor of de novo sphingolipid biosynthesis. Further experiments confirmed the modulation of the sphingolipid pathway by one of these genes, PFA4, encoding a palmitoyl transferase. Comparative analysis showed that few of these novel genes could be discovered by other existing methods. Our novel gene network provides a unique and comprehensive resource to study pathway modulations and systems biology in general.


Assuntos
Ontologia Genética , Redes Reguladoras de Genes , Teorema de Bayes , Genes Fúngicos , Redes e Vias Metabólicas/genética , PubMed , Esfingolipídeos/metabolismo , Leveduras/genética , Leveduras/metabolismo
18.
Res Sq ; 2024 Apr 30.
Artigo em Inglês | MEDLINE | ID: mdl-38746131

RESUMO

Background: The potential benefits of drug combination synergy in cancer medicine are significant, yet the risks must be carefully managed due to the possibility of increased toxicity. Although artificial intelligence applications have demonstrated notable success in predicting drug combination synergy, several key challenges persist: (1) Existing models often predict average synergy values across a restricted range of testing dosages, neglecting crucial dose amounts and the mechanisms of action of the drugs involved. (2) Many graph-based models rely on static protein-protein interactions, failing to adapt to dynamic and context-dependent networks. This limitation constrains the applicability of current methods. Results: We introduced SAFER, a Sub-hypergraph Attention-based graph model, addressing these issues by incorporating complex relationships among biological knowledge networks and considering dosing effects on subject-specific networks. SAFER outperformed previous models on the benchmark and the independent test set. The analysis of subgraph attention weight for the lung cancer cell line highlighted JAK-STAT signaling pathway, PRDM12, ZNF781, and CDC5L that have been implicated in lung fibrosis. Conclusions: SAFER presents an interpretable framework designed to identify drug-responsive signals. Tailored for comprehending dose effects on subject-specific molecular contexts, our model uniquely captures dose-level drug combination responses. This capability unlocks previously inaccessible avenues of investigation compared to earlier models. Finally, the SAFER framework can be leveraged by future inquiries to investigate molecular networks that uniquely characterize individual patients.

19.
ArXiv ; 2024 Oct 17.
Artigo em Inglês | MEDLINE | ID: mdl-39483354

RESUMO

By leveraging GPT-4 for ontology narration, we developed GPTON to infuse structured knowledge into LLMs through verbalized ontology terms, achieving accurate text and ontology annotations for over 68% of gene sets in the top five predictions. Manual evaluations confirm GPTON's robustness, highlighting its potential to harness LLMs and structured knowledge to significantly advance biomedical research beyond gene set annotation.

20.
J Am Med Inform Assoc ; 31(9): 1904-1911, 2024 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-38520725

RESUMO

OBJECTIVES: The rapid expansion of biomedical literature necessitates automated techniques to discern relationships between biomedical concepts from extensive free text. Such techniques facilitate the development of detailed knowledge bases and highlight research deficiencies. The LitCoin Natural Language Processing (NLP) challenge, organized by the National Center for Advancing Translational Science, aims to evaluate such potential and provides a manually annotated corpus for methodology development and benchmarking. MATERIALS AND METHODS: For the named entity recognition (NER) task, we utilized ensemble learning to merge predictions from three domain-specific models, namely BioBERT, PubMedBERT, and BioM-ELECTRA, devised a rule-driven detection method for cell line and taxonomy names and annotated 70 more abstracts as additional corpus. We further finetuned the T0pp model, with 11 billion parameters, to boost the performance on relation extraction and leveraged entites' location information (eg, title, background) to enhance novelty prediction performance in relation extraction (RE). RESULTS: Our pioneering NLP system designed for this challenge secured first place in Phase I-NER and second place in Phase II-relation extraction and novelty prediction, outpacing over 200 teams. We tested OpenAI ChatGPT 3.5 and ChatGPT 4 in a Zero-Shot setting using the same test set, revealing that our finetuned model considerably surpasses these broad-spectrum large language models. DISCUSSION AND CONCLUSION: Our outcomes depict a robust NLP system excelling in NER and RE across various biomedical entities, emphasizing that task-specific models remain superior to generic large ones. Such insights are valuable for endeavors like knowledge graph development and hypothesis formulation in biomedical research.


Assuntos
Processamento de Linguagem Natural , Mineração de Dados/métodos , Aprendizado de Máquina , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA