Pesquisa | Portal de Pesquisa da BVS

1.

TA-RNN: an attention-based time-aware recurrent neural network architecture for electronic health records.

Al Olaimat, Mohammad; Bozdag, Serdar.

Bioinformatics ; 40(Supplement_1): i169-i179, 2024 Jun 28.

Artigo em Inglês | MEDLINE | ID: mdl-38940180

RESUMO

MOTIVATION: Electronic health records (EHRs) represent a comprehensive resource of a patient's medical history. EHRs are essential for utilizing advanced technologies such as deep learning (DL), enabling healthcare providers to analyze extensive data, extract valuable insights, and make precise and data-driven clinical decisions. DL methods such as recurrent neural networks (RNN) have been utilized to analyze EHR to model disease progression and predict diagnosis. However, these methods do not address some inherent irregularities in EHR data such as irregular time intervals between clinical visits. Furthermore, most DL models are not interpretable. In this study, we propose two interpretable DL architectures based on RNN, namely time-aware RNN (TA-RNN) and TA-RNN-autoencoder (TA-RNN-AE) to predict patient's clinical outcome in EHR at the next visit and multiple visits ahead, respectively. To mitigate the impact of irregular time intervals, we propose incorporating time embedding of the elapsed times between visits. For interpretability, we propose employing a dual-level attention mechanism that operates between visits and features within each visit. RESULTS: The results of the experiments conducted on Alzheimer's Disease Neuroimaging Initiative (ADNI) and National Alzheimer's Coordinating Center (NACC) datasets indicated the superior performance of proposed models for predicting Alzheimer's Disease (AD) compared to state-of-the-art and baseline approaches based on F2 and sensitivity. Additionally, TA-RNN showed superior performance on the Medical Information Mart for Intensive Care (MIMIC-III) dataset for mortality prediction. In our ablation study, we observed enhanced predictive performance by incorporating time embedding and attention mechanisms. Finally, investigating attention weights helped identify influential visits and features in predictions. AVAILABILITY AND IMPLEMENTATION: https://github.com/bozdaglab/TA-RNN.

Assuntos

Aprendizado Profundo , Registros Eletrônicos de Saúde , Redes Neurais de Computação , Humanos , Doença de Alzheimer

2.

PPAD: a deep learning architecture to predict progression of Alzheimer's disease.

Al Olaimat, Mohammad; Martinez, Jared; Saeed, Fahad; Bozdag, Serdar.

Bioinformatics ; 39(39 Suppl 1): i149-i157, 2023 06 30.

Artigo em Inglês | MEDLINE | ID: mdl-37387135

RESUMO

MOTIVATION: Alzheimer's disease (AD) is a neurodegenerative disease that affects millions of people worldwide. Mild cognitive impairment (MCI) is an intermediary stage between cognitively normal state and AD. Not all people who have MCI convert to AD. The diagnosis of AD is made after significant symptoms of dementia such as short-term memory loss are already present. Since AD is currently an irreversible disease, diagnosis at the onset of the disease brings a huge burden on patients, their caregivers, and the healthcare sector. Thus, there is a crucial need to develop methods for the early prediction AD for patients who have MCI. Recurrent neural networks (RNN) have been successfully used to handle electronic health records (EHR) for predicting conversion from MCI to AD. However, RNN ignores irregular time intervals between successive events which occurs common in electronic health record data. In this study, we propose two deep learning architectures based on RNN, namely Predicting Progression of Alzheimer's Disease (PPAD) and PPAD-Autoencoder. PPAD and PPAD-Autoencoder are designed for early predicting conversion from MCI to AD at the next visit and multiple visits ahead for patients, respectively. To minimize the effect of the irregular time intervals between visits, we propose using age in each visit as an indicator of time change between successive visits. RESULTS: Our experimental results conducted on Alzheimer's Disease Neuroimaging Initiative and National Alzheimer's Coordinating Center datasets showed that our proposed models outperformed all baseline models for most prediction scenarios in terms of F2 and sensitivity. We also observed that the age feature was one of top features and was able to address irregular time interval problem. AVAILABILITY AND IMPLEMENTATION: https://github.com/bozdaglab/PPAD.

Assuntos

Doença de Alzheimer , Disfunção Cognitiva , Aprendizado Profundo , Doenças Neurodegenerativas , Humanos , Doença de Alzheimer/diagnóstico por imagem , Disfunção Cognitiva/diagnóstico por imagem , Registros Eletrônicos de Saúde

3.

Do In-Hospital Rothman Index Scores Predict Postdischarge Adverse Events and Discharge Location After Total Knee Arthroplasty?

Kleven, Andrew D; Middleton, Austin H; Kesimoglu, Ziynet Nesibe; Slagel, Isaac C; Creager, Ashley E; Hanson, Ryan; Bozdag, Serdar; Edelstein, Adam I.

J Arthroplasty ; 37(4): 668-673, 2022 04.

Artigo em Inglês | MEDLINE | ID: mdl-34954019

RESUMO

BACKGROUND: There have been efforts to reduce adverse events and unplanned readmissions after total joint arthroplasty. The Rothman Index (RI) is a real-time, composite measure of medical acuity for hospitalized patients. We aimed to examine the association among in-hospital RI scores and complications, readmissions, and discharge location after total knee arthroplasty (TKA). We hypothesized that RI scores could be used to predict the outcomes of interest. METHODS: This is a retrospective study of an institutional database of elective, primary TKA from July 2018 until December 2019. Complications and readmissions were defined per Centers for Medicare and Medicaid Services. Analysis included multivariate regression, computation of the area under the curve (AUC), and the Youden Index to set RI thresholds. RESULTS: The study cohort's (n = 957) complications (2.4%), readmissions (3.6%), and nonhome discharge (13.7%) were reported. All RI metrics (minimum, maximum, last, mean, range, 25th%, and 75th%) were significantly associated with increased odds of readmission and home discharge (all P < .05). RI scores were not significantly associated with complications. The optimal RI thresholds for increased risk of readmission were last ≤ 71 (AUC = 0.65), mean ≤ 67 (AUC = 0.66), or maximum ≤ 80 (AUC = 0.63). The optimal RI thresholds for increased risk of home discharge were minimum ≥ 53 (AUC = 0.65), mean ≥ 69 (AUC = 0.65), or maximum ≥ 81 (AUC = 0.60). CONCLUSION: RI values may be used to predict readmission or home discharge after TKA.

Assuntos

Artroplastia de Quadril , Artroplastia do Joelho , Assistência ao Convalescente , Idoso , Artroplastia de Quadril/efeitos adversos , Artroplastia do Joelho/efeitos adversos , Hospitais , Humanos , Medicare , Alta do Paciente , Readmissão do Paciente , Complicações Pós-Operatórias/epidemiologia , Complicações Pós-Operatórias/etiologia , Estudos Retrospectivos , Fatores de Risco , Estados Unidos/epidemiologia

4.

The Impact of Pre-Operative Healthcare Utilization on Complications, Readmissions, and Post-Operative Healthcare Utilization Following Total Joint Arthroplasty.

Creager, Ashley E; Kleven, Andrew D; Kesimoglu, Ziynet Nesibe; Middleton, Austin H; Holub, Meaghan N; Bozdag, Serdar; Edelstein, Adam I.

J Arthroplasty ; 37(3): 414-418, 2022 03.

Artigo em Inglês | MEDLINE | ID: mdl-34793857

RESUMO

BACKGROUND: Identifying risk factors for adverse outcomes and increased costs following total joint arthroplasty (TJA) is needed to ensure quality. The interaction between pre-operative healthcare utilization (pre-HU) and outcomes following TJA has not been fully characterized. METHODS: This is a retrospective cohort study of patients undergoing elective, primary total hip arthroplasty (THA, N = 1785) or total knee arthroplasty (TKA, N = 2159) between 2015 and 2019 at a single institution. Pre-HU and post-operative healthcare utilization (post-HU) included non-elective healthcare utilization in the 90 days prior to and following TJA, respectively (emergency department, urgent care, observation admission, inpatient admission). Multivariate regression models including age, gender, American Society of Anesthesiologists, Medicaid status, and body mass index were fit for 30-day readmission, Centers for Medicare and Medicaid services (CMS)-defined complications, length of stay, and post-HU. RESULTS: The 30-day readmission rate was 3.2% and 3.4% and the CMS-defined complication rate was 3.8% and 2.9% for THA and TKA, respectively. Multivariate regression showed that for THA, presence of any pre-HU was associated with increased risk of 30-day readmission (odds ratio [OR] 2.85, 95% confidence interval [CI] 1.48-5.50, P = .002), CMS complications (OR 2.42, 95% CI 1.27-4.59, P = .007), and post-HU (OR 3.65, 95% CI 2.54-5.26, P < .001). For TKA, ≥2 pre-HU events were associated with increased risk of 30-day readmission (OR 3.52, 95% CI 1.17-10.61, P = .026) and post-HU (OR 2.64, 95% CI 1.29-5.40, P = .008). There were positive correlations for THA (any pre-HU) and TKA (≥2 pre-HU) with length of stay and number of post-HU events. CONCLUSION: Patients who utilize non-elective healthcare in the 90 days prior to TJA are at increased risk of readmission, complications, and unplanned post-HU. LEVEL OF EVIDENCE: Level III.

Assuntos

Artroplastia de Quadril , Readmissão do Paciente , Idoso , Artroplastia de Quadril/efeitos adversos , Humanos , Tempo de Internação , Medicare , Aceitação pelo Paciente de Cuidados de Saúde , Complicações Pós-Operatórias/etiologia , Estudos Retrospectivos , Fatores de Risco , Estados Unidos/epidemiologia

5.

GSEPD: a Bioconductor package for RNA-seq gene set enrichment and projection display.

Stamm, Karl; Tomita-Mitchell, Aoy; Bozdag, Serdar.

BMC Bioinformatics ; 20(1): 115, 2019 Mar 06.

Artigo em Inglês | MEDLINE | ID: mdl-30841846

RESUMO

BACKGROUND: RNA-seq, wherein RNA transcripts expressed in a sample are sequenced and quantified, has become a widely used technique to study disease and development. With RNA-seq, transcription abundance can be measured, differential expression genes between groups and functional enrichment of those genes can be computed. However, biological insights from RNA-seq are often limited by computational analysis and the enormous volume of resulting data, preventing facile and meaningful review and interpretation of gene expression profiles. Particularly, in cases where the samples under study exhibit uncontrolled variation, deeper analysis of functional enrichment would be necessary to visualize samples' gene expression activity under each biological function. RESULTS: We developed a Bioconductor package rgsepd that streamlines RNA-seq data analysis by wrapping commonly used tools DESeq2 and GOSeq in a user-friendly interface and performs a gene-subset linear projection to cluster heterogeneous samples by Gene Ontology (GO) terms. Rgsepd computes significantly enriched GO terms for each experimental condition and generates multidimensional projection plots highlighting how each predefined gene set's multidimensional expression may delineate samples. CONCLUSIONS: The rgsepd serves to automate differential expression, functional annotation, and exploratory data analyses to highlight subtle expression differences among samples based on each significant biological function.

Assuntos

Análise de Sequência de RNA/métodos , Software , Ontologia Genética , Átrios do Coração/metabolismo , Humanos , RNA/genética , RNA/metabolismo

6.

Cancerin: A computational pipeline to infer cancer-associated ceRNA interaction networks.

Do, Duc; Bozdag, Serdar.

PLoS Comput Biol ; 14(7): e1006318, 2018 07.

Artigo em Inglês | MEDLINE | ID: mdl-30011266

RESUMO

MicroRNAs (miRNAs) inhibit expression of target genes by binding to their RNA transcripts. It has been recently shown that RNA transcripts targeted by the same miRNA could "compete" for the miRNA molecules and thereby indirectly regulate each other. Experimental evidence has suggested that the aberration of such miRNA-mediated interaction between RNAs-called competing endogenous RNA (ceRNA) interaction-can play important roles in tumorigenesis. Given the difficulty of deciphering context-specific miRNA binding, and the existence of various gene regulatory factors such as DNA methylation and copy number alteration, inferring context-specific ceRNA interactions accurately is a computationally challenging task. Here we propose a computational method called Cancerin to identify cancer-associated ceRNA interactions. Cancerin incorporates DNA methylation, copy number alteration, gene and miRNA expression datasets to construct cancer-specific ceRNA networks. We applied Cancerin to three cancer datasets from the Cancer Genome Atlas (TCGA) project. Our results indicated that ceRNAs were enriched with cancer-related genes, and ceRNA modules in the inferred ceRNA networks were involved in cancer-associated biological processes. Using LINCS-L1000 shRNA-mediated gene knockdown experiment in breast cancer cell line to assess accuracy, Cancerin was able to predict expression outcome of ceRNA genes with high accuracy.

Assuntos

Neoplasias da Mama/genética , Simulação por Computador , Redes Reguladoras de Genes , Genes Neoplásicos , RNA Neoplásico/genética , Atlas como Assunto , Linhagem Celular Tumoral , Variações do Número de Cópias de DNA , Metilação de DNA , Conjuntos de Dados como Assunto , Feminino , Regulação Neoplásica da Expressão Gênica , Humanos , MicroRNAs/genética , Proteínas de Neoplasias/metabolismo , Prognóstico , Ligação Proteica , Processamento Pós-Transcricional do RNA

7.

Genome resources for climate-resilient cowpea, an essential crop for food security.

Muñoz-Amatriaín, María; Mirebrahim, Hamid; Xu, Pei; Wanamaker, Steve I; Luo, MingCheng; Alhakami, Hind; Alpert, Matthew; Atokple, Ibrahim; Batieno, Benoit J; Boukar, Ousmane; Bozdag, Serdar; Cisse, Ndiaga; Drabo, Issa; Ehlers, Jeffrey D; Farmer, Andrew; Fatokun, Christian; Gu, Yong Q; Guo, Yi-Ning; Huynh, Bao-Lam; Jackson, Scott A; Kusi, Francis; Lawley, Cynthia T; Lucas, Mitchell R; Ma, Yaqin; Timko, Michael P; Wu, Jiajie; You, Frank; Barkley, Noelle A; Roberts, Philip A; Lonardi, Stefano; Close, Timothy J.

Plant J ; 89(5): 1042-1054, 2017 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-27775877

RESUMO

Cowpea (Vigna unguiculata L. Walp.) is a legume crop that is resilient to hot and drought-prone climates, and a primary source of protein in sub-Saharan Africa and other parts of the developing world. However, genome resources for cowpea have lagged behind most other major crops. Here we describe foundational genome resources and their application to the analysis of germplasm currently in use in West African breeding programs. Resources developed from the African cultivar IT97K-499-35 include a whole-genome shotgun (WGS) assembly, a bacterial artificial chromosome (BAC) physical map, and assembled sequences from 4355 BACs. These resources and WGS sequences of an additional 36 diverse cowpea accessions supported the development of a genotyping assay for 51 128 SNPs, which was then applied to five bi-parental RIL populations to produce a consensus genetic map containing 37 372 SNPs. This genetic map enabled the anchoring of 100 Mb of WGS and 420 Mb of BAC sequences, an exploration of genetic diversity along each linkage group, and clarification of macrosynteny between cowpea and common bean. The SNP assay enabled a diversity analysis of materials from West African breeding programs. Two major subpopulations exist within those materials, one of which has significant parentage from South and East Africa and more diversity. There are genomic regions of high differentiation between subpopulations, one of which coincides with a cluster of nodulin genes. The new resources and knowledge help to define goals and accelerate the breeding of improved varieties to address food security issues related to limited-input small-holder farming and climate stress.

Assuntos

Produtos Agrícolas/genética , Produtos Agrícolas/fisiologia , Vigna/genética , Vigna/fisiologia , Cromossomos Artificiais Bacterianos , Cromossomos de Plantas/genética , Clima , Abastecimento de Alimentos , Genoma de Planta/genética , Genótipo

8.

ProcessDriver: A computational pipeline to identify copy number drivers and associated disrupted biological processes in cancer.

Baur, Brittany; Bozdag, Serdar.

Genomics ; 109(3-4): 233-240, 2017 07.

Artigo em Inglês | MEDLINE | ID: mdl-28438487

RESUMO

Copy number amplifications and deletions that are recurrent in cancer samples harbor genes that confer a fitness advantage to cancer tumor proliferation and survival. One important challenge in computational biology is to separate the causal (i.e., driver) genes from passenger genes in large, aberrated regions. Many previous studies focus on the genes within the aberration (i.e., cis genes), but do not utilize the genes that are outside of the aberrated region and dysregulated as a result of the aberration (i.e., trans genes). We propose a computational pipeline, called ProcessDriver, that prioritizes candidate drivers by relating cis genes to dysregulated trans genes and biological processes. ProcessDriver is based on the assumption that a driver cis gene should be closely associated with the dysregulated trans genes and biological processes, as opposed to previous studies that assume a driver cis gene should be the most correlated gene to the copy number of an aberrated region. We applied our method on breast, bladder and ovarian cancer data from the Cancer Genome Atlas database. Our results included previously known driver genes and cancer genes, as well as potentially novel driver genes. Additionally, many genes in the final set of drivers were linked to new tumor events after initial treatment using survival analysis. Our results highlight the importance of selecting driver genes based on their widespread downstream effects in trans.

Assuntos

Neoplasias da Mama/genética , Dosagem de Genes , Genômica/métodos , Oncogenes , Neoplasias Ovarianas/genética , Neoplasias da Bexiga Urinária/genética , Algoritmos , Neoplasias da Mama/patologia , Variações do Número de Cópias de DNA , Progressão da Doença , Feminino , Humanos , Neoplasias Ovarianas/patologia , Neoplasias da Bexiga Urinária/patologia

9.

Sequencing of 15 622 gene-bearing BACs clarifies the gene-dense regions of the barley genome.

Muñoz-Amatriaín, María; Lonardi, Stefano; Luo, MingCheng; Madishetty, Kavitha; Svensson, Jan T; Moscou, Matthew J; Wanamaker, Steve; Jiang, Tao; Kleinhofs, Andris; Muehlbauer, Gary J; Wise, Roger P; Stein, Nils; Ma, Yaqin; Rodriguez, Edmundo; Kudrna, Dave; Bhat, Prasanna R; Chao, Shiaoman; Condamine, Pascal; Heinen, Shane; Resnik, Josh; Wing, Rod; Witt, Heather N; Alpert, Matthew; Beccuti, Marco; Bozdag, Serdar; Cordero, Francesca; Mirebrahim, Hamid; Ounit, Rachid; Wu, Yonghui; You, Frank; Zheng, Jie; Simková, Hana; Dolezel, Jaroslav; Grimwood, Jane; Schmutz, Jeremy; Duma, Denisa; Altschmied, Lothar; Blake, Tom; Bregitzer, Phil; Cooper, Laurel; Dilbirligi, Muharrem; Falk, Anders; Feiz, Leila; Graner, Andreas; Gustafson, Perry; Hayes, Patrick M; Lemaux, Peggy; Mammadov, Jafar; Close, Timothy J.

Plant J ; 84(1): 216-27, 2015 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-26252423

RESUMO

Barley (Hordeum vulgare L.) possesses a large and highly repetitive genome of 5.1 Gb that has hindered the development of a complete sequence. In 2012, the International Barley Sequencing Consortium released a resource integrating whole-genome shotgun sequences with a physical and genetic framework. However, because only 6278 bacterial artificial chromosome (BACs) in the physical map were sequenced, fine structure was limited. To gain access to the gene-containing portion of the barley genome at high resolution, we identified and sequenced 15 622 BACs representing the minimal tiling path of 72 052 physical-mapped gene-bearing BACs. This generated ~1.7 Gb of genomic sequence containing an estimated 2/3 of all Morex barley genes. Exploration of these sequenced BACs revealed that although distal ends of chromosomes contain most of the gene-enriched BACs and are characterized by high recombination rates, there are also gene-dense regions with suppressed recombination. We made use of published map-anchored sequence data from Aegilops tauschii to develop a synteny viewer between barley and the ancestor of the wheat D-genome. Except for some notable inversions, there is a high level of collinearity between the two species. The software HarvEST:Barley provides facile access to BAC sequences and their annotations, along with the barley-Ae. tauschii synteny viewer. These BAC sequences constitute a resource to improve the efficiency of marker development, map-based cloning, and comparative genomics in barley and related crops. Additional knowledge about regions of the barley genome that are gene-dense but low recombination is particularly relevant.

Assuntos

Cromossomos Artificiais Bacterianos/genética , Genoma de Planta/genética , Hordeum/genética , Dados de Sequência Molecular

10.

Combinatorial pooling enables selective sequencing of the barley gene space.

Lonardi, Stefano; Duma, Denisa; Alpert, Matthew; Cordero, Francesca; Beccuti, Marco; Bhat, Prasanna R; Wu, Yonghui; Ciardo, Gianfranco; Alsaihati, Burair; Ma, Yaqin; Wanamaker, Steve; Resnik, Josh; Bozdag, Serdar; Luo, Ming-Cheng; Close, Timothy J.

PLoS Comput Biol ; 9(4): e1003010, 2013 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-23592960

RESUMO

For the vast majority of species - including many economically or ecologically important organisms, progress in biological research is hampered due to the lack of a reference genome sequence. Despite recent advances in sequencing technologies, several factors still limit the availability of such a critical resource. At the same time, many research groups and international consortia have already produced BAC libraries and physical maps and now are in a position to proceed with the development of whole-genome sequences organized around a physical map anchored to a genetic map. We propose a BAC-by-BAC sequencing protocol that combines combinatorial pooling design and second-generation sequencing technology to efficiently approach denovo selective genome sequencing. We show that combinatorial pooling is a cost-effective and practical alternative to exhaustive DNA barcoding when preparing sequencing libraries for hundreds or thousands of DNA samples, such as in this case gene-bearing minimum-tiling-path BAC clones. The novelty of the protocol hinges on the computational ability to efficiently compare hundred millions of short reads and assign them to the correct BAC clones (deconvolution) so that the assembly can be carried out clone-by-clone. Experimental results on simulated data for the rice genome show that the deconvolution is very accurate, and the resulting BAC assemblies have high quality. Results on real data for a gene-rich subset of the barley genome confirm that the deconvolution is accurate and the BAC assemblies have good quality. While our method cannot provide the level of completeness that one would achieve with a comprehensive whole-genome sequencing project, we show that it is quite successful in reconstructing the gene sequences within BACs. In the case of plants such as barley, this level of sequence knowledge is sufficient to support critical end-point objectives such as map-based cloning and marker-assisted breeding.

Assuntos

Mapeamento de Sequências Contíguas/métodos , Hordeum/genética , Análise de Sequência de DNA , Cromossomos Artificiais Bacterianos , Clonagem Molecular , Biologia Computacional/métodos , Simulação por Computador , Genes de Plantas , Marcadores Genéticos/genética , Biblioteca Genômica , Genômica , Modelos Genéticos , Oryza/genética , Mapeamento Físico do Cromossomo , Especificidade da Espécie

11.

Involvement of microRNA families in cancer.

Wuchty, Stefan; Arjona, Dolores; Bozdag, Serdar; Bauer, Peter O.

Nucleic Acids Res ; 40(17): 8219-26, 2012 Sep 01.

Artigo em Inglês | MEDLINE | ID: mdl-22743268

RESUMO

Collecting representative sets of cancer microRNAs (miRs) from the literature we show that their corresponding families are enriched in sets of highly interacting miR families. Targeting cancer genes on a statistically significant level, such cancer miR families strongly intervene with signaling pathways that harbor numerous cancer genes. Clustering miR family-specific profiles of pathway intervention, we found that different miR families share similar interaction patterns. Resembling corresponding patterns of cancer miRs families, such interaction patterns may indicate a miR family's potential role in cancer. As we find that the number of targeted cancer genes is a naïve proxy for a cancer miR family, we design a simple method to predict candidate miR families based on gene-specific interaction profiles. Assessing the impact of miR families to distinguish between (non-)cancer genes, we predict a set of 84 potential candidate families, including 75% of initially collected cancer miR families. Further confirming their relevance, predicted cancer miR families are significantly indicated in increasing, non-random numbers of tumor types.

Assuntos

MicroRNAs/metabolismo , Neoplasias/genética , Regulação Neoplásica da Expressão Gênica , Genes Neoplásicos , Humanos , MicroRNAs/classificação , MicroRNAs/fisiologia , Neoplasias/metabolismo , Mapeamento de Interação de Proteínas , RNA Mensageiro/metabolismo , Transdução de Sinais/genética

12.

SUPREME: multiomics data integration using graph convolutional networks.

Kesimoglu, Ziynet Nesibe; Bozdag, Serdar.

NAR Genom Bioinform ; 5(2): lqad063, 2023 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-37680392

RESUMO

To pave the road towards precision medicine in cancer, patients with similar biology ought to be grouped into same cancer subtypes. Utilizing high-dimensional multiomics datasets, integrative approaches have been developed to uncover cancer subtypes. Recently, Graph Neural Networks have been discovered to learn node embeddings utilizing node features and associations on graph-structured data. Some integrative prediction tools have been developed leveraging these advances on multiple networks with some limitations. Addressing these limitations, we developed SUPREME, a node classification framework, which integrates multiple data modalities on graph-structured data. On breast cancer subtyping, unlike existing tools, SUPREME generates patient embeddings from multiple similarity networks utilizing multiomics features and integrates them with raw features to capture complementary signals. On breast cancer subtype prediction tasks from three datasets, SUPREME outperformed other tools. SUPREME-inferred subtypes had significant survival differences, mostly having more significance than ground truth, and outperformed nine other approaches. These results suggest that with proper multiomics data utilization, SUPREME could demystify undiscovered characteristics in cancer subtypes that cause significant survival differences and could improve ground truth label, which depends mainly on one datatype. In addition, to show model-agnostic property of SUPREME, we applied it to two additional datasets and had a clear outperformance.

13.

PPAD: A deep learning architecture to predict progression of Alzheimer's disease.

Al Olaimat, Mohammad; Martinez, Jared; Saeed, Fahad; Bozdag, Serdar.

bioRxiv ; 2023 Jan 31.

Artigo em Inglês | MEDLINE | ID: mdl-36778453

RESUMO

Alzheimer's disease (AD) is a neurodegenerative disease that affects millions of people worldwide. Mild cognitive impairment (MCI) is an intermediary stage between cognitively normal (CN) state and AD. Not all people who have MCI convert to AD. The diagnosis of AD is made after significant symptoms of dementia such as short-term memory loss are already present. Since AD is currently an irreversible disease, diagnosis at the onset of disease brings a huge burden on patients, their caregivers, and the healthcare sector. Thus, there is a crucial need to develop methods for the early prediction AD for patients who have MCI. Recurrent Neural Networks (RNN) have been successfully used to handle Electronic Health Records (EHR) for predicting conversion from MCI to AD. However, RNN ignores irregular time intervals between successive events which occurs common in EHR data. In this study, we propose two deep learning architectures based on RNN, namely Predicting Progression of Alzheimer's Disease (PPAD) and PPAD-Autoencoder (PPAD-AE). PPAD and PPAD-AE are designed for early predicting conversion from MCI to AD at the next visit and multiple visits ahead for patients, respectively. To minimize the effect of the irregular time intervals between visits, we propose using age in each visit as an indicator of time change between successive visits. Our experimental results conducted on Alzheimer's Disease Neuroimaging Initiative (ADNI) and National Alzheimer's Coordinating Center (NACC) datasets showed that our proposed models outperformed all baseline models for most prediction scenarios in terms of F2 and sensitivity. We also observed that the age feature was one of top features and was able to address irregular time interval problem.

14.

NRPreTo: A Machine Learning-Based Nuclear Receptor and Subfamily Prediction Tool.

Madugula, Sita Sirisha; Pandey, Suman; Amalapurapu, Shreya; Bozdag, Serdar.

ACS Omega ; 8(23): 20379-20388, 2023 Jun 13.

Artigo em Inglês | MEDLINE | ID: mdl-37323377

RESUMO

The nuclear receptor (NR) superfamily includes phylogenetically related ligand-activated proteins, which play a key role in various cellular activities. NR proteins are subdivided into seven subfamilies based on their function, mechanism, and nature of the interacting ligand. Developing robust tools to identify NR could give insights into their functional relationships and involvement in disease pathways. Existing NR prediction tools only use a few types of sequence-based features and are tested on relatively similar independent datasets; thus, they may suffer from overfitting when extended to new genera of sequences. To address this problem, we developed Nuclear Receptor Prediction Tool (NRPreTo), a two-level NR prediction tool with a unique training approach where in addition to the sequence-based features used by existing NR prediction tools, six additional feature groups depicting various physiochemical, structural, and evolutionary features of proteins were utilized. The first level of NRPreTo allows for the successful prediction of a query protein as NR or non-NR and further subclassifies the protein into one of the seven NR subfamilies in the second level. We developed Random Forest classifiers to test on benchmark datasets, as well as the entire human protein datasets from RefSeq and Human Protein Reference Database (HPRD). We observed that using additional feature groups improved the performance. We also observed that NRPreTo achieved high performance on the external datasets and predicted 59 novel NRs in the human proteome. The source code of NRPreTo is publicly available at https://github.com/bozdaglab/NRPreTo.

15.

PVTAD: ALZHEIMER'S DISEASE DIAGNOSIS USING PYRAMID VISION TRANSFORMER APPLIED TO WHITE MATTER OF T1-WEIGHTED STRUCTURAL MRI DATA.

Aghdam, Maryam Akhavan; Bozdag, Serdar; Saeed, Fahad.

bioRxiv ; 2023 Dec 04.

Artigo em Inglês | MEDLINE | ID: mdl-38045324

RESUMO

Alzheimer's disease (AD) is a neurodegenerative disorder, and timely diagnosis is crucial for early interventions. AD is known to have disruptive local and global brain neural connections that may be instrumental in understanding and extracting specific biomarkers. Previous machine-learning approaches are mostly based on convolutional neural network (CNN) and standard vision transformer (ViT) models which may not sufficiently capture the multidimensional local and global patterns that may be indicative of AD. Therefore, in this paper, we propose a novel approach called PVTAD to classify AD and cognitively normal (CN) cases using pretrained pyramid vision transformer (PVT) and white matter (WM) of T1-weighted structural MRI (sMRI) data. Our approach combines the advantages of CNN and standard ViT to extract both local and global features indicative of AD from the WM coronal middle slices. We performed experiments on subjects with T1-weighed MPRAGE sMRI scans from the ADNI dataset. Our results demonstrate that the PVTAD achieves an average accuracy of 97.7% and F1-score of 97.6%, outperforming the single and parallel CNN and standard ViT architectures based on sMRI data for AD vs. CN classification.

16.

Computing microRNA-gene interaction networks in pan-cancer using miRDriver.

Bose, Banabithi; Moravec, Matthew; Bozdag, Serdar.

Sci Rep ; 12(1): 3717, 2022 03 08.

Artigo em Inglês | MEDLINE | ID: mdl-35260634

RESUMO

DNA copy number aberrated regions in cancer are known to harbor cancer driver genes and the short non-coding RNA molecules, i.e., microRNAs. In this study, we integrated the multi-omics datasets such as copy number aberration, DNA methylation, gene and microRNA expression to identify the signature microRNA-gene associations from frequently aberrated DNA regions across pan-cancer utilizing a LASSO-based regression approach. We studied 7294 patient samples associated with eighteen different cancer types from The Cancer Genome Atlas (TCGA) database and identified several cancer-specific and common microRNA-gene interactions enriched in experimentally validated microRNA-target interactions. We highlighted several oncogenic and tumor suppressor microRNAs that were cancer-specific and common in several cancer types. Our method substantially outperformed the five state-of-art methods in selecting significantly known microRNA-gene interactions in multiple cancer types. Several microRNAs and genes were found to be associated with tumor survival and progression. Selected target genes were found to be significantly enriched in cancer-related pathways, cancer hallmark and Gene Ontology (GO) terms. Furthermore, subtype-specific potential gene signatures were discovered in multiple cancer types.

Assuntos

MicroRNAs , Neoplasias , Metilação de DNA , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes , Humanos , MicroRNAs/genética , MicroRNAs/metabolismo , Neoplasias/genética , Oncogenes

17.

PhenoGeneRanker: Gene and Phenotype Prioritization Using Multiplex Heterogeneous Networks.

Dursun, Cagatay; Kwitek, Anne E; Bozdag, Serdar.

IEEE/ACM Trans Comput Biol Bioinform ; 19(5): 2950-2962, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-34283720

RESUMO

Uncovering genotype-phenotype relationships is a fundamental challenge in genomics. Gene prioritization is an important step for this endeavor to make a short manageable list from a list of thousands of genes coming from high-throughput studies. Network propagation methods are promising and state of the art methods for gene prioritization based on the premise that functionally related genes tend to be close to each other in the biological networks. Recently, we introduced PhenoGeneRanker, a network-propagation algorithm for multiplex heterogeneous networks. PhenoGeneRanker allows multi-layer gene and phenotype networks. It also calculates empirical p values of gene and phenotype ranks using random stratified sampling of seeds of genes and phenotypes based on their connectivity degree in the network. In this study, we introduce the PhenoGeneRanker Bioconductor package and its application to multi-omics rat genome datasets to rank hypertension disease-related genes and strains. We showed that PhenoGeneRanker performed better to rank hypertension disease-related genes using multiplex gene networks than aggregated gene networks. We also showed that PhenoGeneRanker performed better to rank hypertension disease-related strains using multiplex phenotype network than single or aggregated phenotype networks. We performed a rigorous hyperparameter analysis and, finally showed that Gene Ontology (GO) enrichment of statistically significant top-ranked genes resulted in hypertension disease-related GO terms.

Assuntos

Algoritmos , Hipertensão , Animais , Redes Reguladoras de Genes/genética , Genômica/métodos , Fenótipo , Ratos

18.

Classification of Autism Spectrum Disorder Using rs-fMRI data and Graph Convolutional Networks.

Yang, Tianren; Al-Duailij, Mai A; Bozdag, Serdar; Saeed, Fahad.

Proc IEEE Int Conf Big Data ; 2022: 3131-3138, 2022 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-38952948

RESUMO

Autism spectrum disorder (ASD) affects large number of children and adults in the US, and worldwide. Early and quick diagnosis of ASD can improve the quality of life significantly both for patients and their families. Prior research provides strong evidence that structural and functional magnetic resonance imaging (MRI) data collected from individuals with ASD exhibit distinguishing characteristics that differ in local and global, spatial and temporal neural patterns of the brain - and therefore can be used for diagnostic purposes for various mental disorders. However, the data from MRI are high-dimensional and advanced methods are needed to make sense out of these datasets. In this paper, we present a novel model based on graph convolutional network (GCN) that can utilize resting state fMRI (rs-fMRI) data to classify ASD subjects from health controls (HC). In addition to using the graph from traditional correlation matrices, our proposed GCN model incorporates graphlet topological counting as one of the training features. Our results show that graphlets can preserve the topological information of the graphs obtained from fMRI data. Combined with our GCN, the graphlets retain enough topological information to differentiate between the ASD and HC. Our proposed model gives an average accuracy of 64.27% on the whole ABIDE-I data sets (1035 subjects) and highest site-specific accuracy of 75.9%, which is comparable to other state-of-the-art methods - while potentially open to being more interpretable.

19.

FastMEDUSA: a parallelized tool to infer gene regulatory networks.

Bozdag, Serdar; Li, Aiguo; Wuchty, Stefan; Fine, Howard A.

Bioinformatics ; 26(14): 1792-3, 2010 Jul 15.

Artigo em Inglês | MEDLINE | ID: mdl-20513661

RESUMO

MOTIVATION: In order to construct gene regulatory networks of higher organisms from gene expression and promoter sequence data efficiently, we developed FastMEDUSA. In this parallelized version of the regulatory network-modeling tool MEDUSA, expression and sequence data are shared among a user-defined number of processors on a single multi-core machine or cluster. Our results show that FastMEDUSA allows a more efficient utilization of computational resources. While the determination of a regulatory network of brain tumor in Homo sapiens takes 12 days with MEDUSA, FastMEDUSA obtained the same results in 6 h by utilizing 100 processors. AVAILABILITY: Source code and documentation of FastMEDUSA are available at https://wiki.nci.nih.gov/display/NOBbioinf/FastMEDUSA

Assuntos

Redes Reguladoras de Genes , Genômica/métodos , Software , Perfilação da Expressão Gênica/métodos , Análise de Sequência de DNA

20.

Crinet: A computational tool to infer genome-wide competing endogenous RNA (ceRNA) interactions.

Kesimoglu, Ziynet Nesibe; Bozdag, Serdar.

PLoS One ; 16(5): e0251399, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-33983999

RESUMO

To understand driving biological factors for complex diseases like cancer, regulatory circuity of genes needs to be discovered. Recently, a new gene regulation mechanism called competing endogenous RNA (ceRNA) interactions has been discovered. Certain genes targeted by common microRNAs (miRNAs) "compete" for these miRNAs, thereby regulate each other by making others free from miRNA regulation. Several computational tools have been published to infer ceRNA networks. In most existing tools, however, expression abundance sufficiency, collective regulation, and groupwise effect of ceRNAs are not considered. In this study, we developed a computational tool named Crinet to infer genome-wide ceRNA networks addressing critical drawbacks. Crinet considers all mRNAs, lncRNAs, and pseudogenes as potential ceRNAs and incorporates a network deconvolution method to exclude the spurious ceRNA pairs. We tested Crinet on breast cancer data in TCGA. Crinet inferred reproducible ceRNA interactions and groups, which were significantly enriched in the cancer-related genes and processes. We validated the selected miRNA-target interactions with the protein expression-based benchmarks and also evaluated the inferred ceRNA interactions predicting gene expression change in knockdown assays. The hub genes in the inferred ceRNA network included known suppressor/oncogene lncRNAs in breast cancer showing the importance of non-coding RNA's inclusion for ceRNA inference. Crinet-inferred ceRNA groups that were consistently involved in the immune system related processes could be important assets in the light of the studies confirming the relation between immunotherapy and cancer. The source code of Crinet is in R and available at https://github.com/bozdaglab/crinet.

Assuntos

Redes Reguladoras de Genes , MicroRNAs/genética , RNA Longo não Codificante/genética , RNA Mensageiro/genética , Regulação Neoplásica da Expressão Gênica , Genômica/métodos , Humanos , Neoplasias/genética

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA