Búsqueda | Portal de Búsqueda de la BVS Ecuador

1.

PHACTboost: A Phylogeny-aware Pathogenicity Predictor for the Missense Mutations via Boosting.

Dereli, Onur; Kuru, Nurdan; Akkoyun, Emrah; Bircan, Aylin; Tastan, Oznur; Adebali, Ogün.

Mol Biol Evol ; 2024 Jun 27.

Artículo en Inglés | MEDLINE | ID: mdl-38934805

RESUMEN

Most algorithms that are used to predict the effects of variants rely on evolutionary conservation. However, a majority of such techniques compute evolutionary conservation by solely using the alignment of multiple sequences while overlooking the evolutionary context of substitution events. We had introduced PHACT, a scoring-based pathogenicity predictor for missense mutations that can leverage phylogenetic trees, in our previous study. By building on this foundation, we now propose PHACTboost, a gradient boosting tree-based classifier that combines PHACT scores with information from multiple sequence alignments, phylogenetic trees, and ancestral reconstruction. By learning from data PHACTboost outperforms PHACT. Furthermore, the results of comprehensive experiments on carefully constructed sets of variants demonstrated that PHACTboost can outperform 40 prevalent pathogenicity predictors reported in the dbNSFP, including conventional tools, meta-predictors, and deep learning-based approaches as well as more recent tools such as, AlphaMissense, EVE, and CPT-1. The superiority of PHACTboost over these methods was particularly evident in case of hard variants for which different pathogenicity predictors offered conflicting results. We provide predictions of 215 million amino acid alterations over 20,191 proteins. PHACTboost is available at https://github.com/CompGenomeLab/PHACTboost. PHACTboost can improve our understanding of genetic diseases and facilitate more accurate diagnoses.

2.

Discovering misannotated lncRNAs using deep learning training dynamics.

Nabi, Afshan; Dilekoglu, Berke; Adebali, Ogun; Tastan, Oznur.

Bioinformatics ; 39(1)2023 01 01.

Artículo en Inglés | MEDLINE | ID: mdl-36571493

RESUMEN

MOTIVATION: Recent experimental evidence has shown that some long non-coding RNAs (lncRNAs) contain small open reading frames (sORFs) that are translated into functional micropeptides, suggesting that these lncRNAs are misannotated as non-coding. Current methods to detect misannotated lncRNAs rely on ribosome-profiling (Ribo-Seq) and mass-spectrometry experiments, which are cell-type dependent and expensive. RESULTS: Here, we propose a computational method to identify possible misannotated lncRNAs from sequence information alone. Our approach first builds deep learning models to discriminate coding and non-coding transcripts and leverages these models' training dynamics to identify misannotated lncRNAs-i.e. lncRNAs with coding potential. The set of misannotated lncRNAs we identified significantly overlap with experimentally validated ones and closely resemble coding protein sequences as evidenced by significant BLAST hits. Our analysis on a subset of misannotated lncRNA candidates also shows that some ORFs they contain yield high confidence folded structures as predicted by AlphaFold2. This methodology offers promising potential for assisting experimental efforts in characterizing the hidden proteome encoded by misannotated lncRNAs and for curating better datasets for building coding potential predictors. AVAILABILITY AND IMPLEMENTATION: Source code is available at https://github.com/nabiafshan/DetectingMisannotatedLncRNAs. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Aprendizaje Profundo , ARN Largo no Codificante , ARN Largo no Codificante/genética , Secuencia de Aminoácidos , Proteoma/genética , Sistemas de Lectura Abierta , Micropéptidos

3.

MuDCoD: multi-subject community detection in personalized dynamic gene networks from single-cell RNA sequencing.

Sapci, Ali Osman Berk; Lu, Shan; Yan, Shuchen; Ay, Ferhat; Tastan, Oznur; Keles, Sündüz.

Bioinformatics ; 39(10)2023 10 03.

Artículo en Inglés | MEDLINE | ID: mdl-37740957

RESUMEN

MOTIVATION: With the wide availability of single-cell RNA-seq (scRNA-seq) technology, population-scale scRNA-seq datasets across multiple individuals and time points are emerging. While the initial investigations of these datasets tend to focus on standard analysis of clustering and differential expression, leveraging the power of scRNA-seq data at the personalized dynamic gene co-expression network level has the potential to unlock subject and/or time-specific network-level variation, which is critical for understanding phenotypic differences. Community detection from co-expression networks of multiple time points or conditions has been well-studied; however, none of the existing settings included networks from multiple subjects and multiple time points simultaneously. To address this, we develop Multi-subject Dynamic Community Detection (MuDCoD) for multi-subject community detection in personalized dynamic gene networks from scRNA-seq. MuDCoD builds on the spectral clustering framework and promotes information sharing among the networks of the subjects as well as networks at different time points. It clusters genes in the personalized dynamic gene networks and reveals gene communities that are variable or shared not only across time but also among subjects. RESULTS: Evaluation and benchmarking of MuDCoD against existing approaches reveal that MuDCoD effectively leverages apparent shared signals among networks of the subjects at individual time points, and performs robustly when there is no or little information sharing among the networks. Applications to population-scale scRNA-seq datasets of human-induced pluripotent stem cells during dopaminergic neuron differentiation and CD4+ T cell activation indicate that MuDCoD enables robust inference for identifying time-varying personalized gene modules. Our results illustrate how personalized dynamic community detection can aid in the exploration of subject-specific biological processes that vary across time. AVAILABILITY AND IMPLEMENTATION: MuDCoD is publicly available at https://github.com/bo1929/MuDCoD as a Python package. Implementation includes simulation and real-data experiments together with extensive documentation.

Asunto(s)

Perfilación de la Expresión Génica , Redes Reguladoras de Genes , Humanos , Perfilación de la Expresión Génica/métodos , Programas Informáticos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Análisis por Conglomerados

4.

PHACT: Phylogeny-Aware Computing of Tolerance for Missense Mutations.

Kuru, Nurdan; Dereli, Onur; Akkoyun, Emrah; Bircan, Aylin; Tastan, Oznur; Adebali, Ogun.

Mol Biol Evol ; 39(6)2022 06 02.

Artículo en Inglés | MEDLINE | ID: mdl-35639618

RESUMEN

Evolutionary conservation is a fundamental resource for predicting the substitutability of amino acids and the loss of function in proteins. The use of multiple sequence alignment alone-without considering the evolutionary relationships among sequences-results in the redundant counting of evolutionarily related alteration events, as if they were independent. Here, we propose a new method, PHACT, that predicts the pathogenicity of missense mutations directly from the phylogenetic tree of proteins. PHACT travels through the nodes of the phylogenetic tree and evaluates the deleteriousness of a substitution based on the probability differences of ancestral amino acids between neighboring nodes in the tree. Moreover, PHACT assigns weights to each node in the tree based on their distance to the query organism. For each potential amino acid substitution, the algorithm generates a score that is used to calculate the effect of substitution on protein function. To analyze the predictive performance of PHACT, we performed various experiments over the subsets of two datasets that include 3,023 proteins and 61,662 variants in total. The experiments demonstrated that our method outperformed the widely used pathogenicity prediction tools (i.e., SIFT and PolyPhen-2) and achieved a better predictive performance than other conventional statistical approaches presented in dbNSFP. The PHACT source code is available at https://github.com/CompGenomeLab/PHACT.

Asunto(s)

Mutación Missense , Programas Informáticos , Aminoácidos , Filogenia , Proteínas/química , Proteínas/genética , Alineación de Secuencia

5.

From cell lines to cancer patients: personalized drug synergy prediction.

Kuru, Halil Ibrahim; Cicek, A Ercument; Tastan, Oznur.

Bioinformatics ; 40(5)2022 Jan 01.

Artículo en Inglés | MEDLINE | ID: mdl-38718189

RESUMEN

MOTIVATION: Combination drug therapies are effective treatments for cancer. However, the genetic heterogeneity of the patients and exponentially large space of drug pairings pose significant challenges for finding the right combination for a specific patient. Current in silico prediction methods can be instrumental in reducing the vast number of candidate drug combinations. However, existing powerful methods are trained with cancer cell line gene expression data, which limits their applicability in clinical settings. While synergy measurements on cell line models are available at large scale, patient-derived samples are too few to train a complex model. On the other hand, patient-specific single-drug response data are relatively more available. RESULTS: In this work, we propose a deep learning framework, Personalized Deep Synergy Predictor (PDSP), that enables us to use the patient-specific single drug response data for customizing patient drug synergy predictions. PDSP is first trained to learn synergy scores of drug pairs and their single drug responses for a given cell line using drug structures and large scale cell line gene expression data. Then, the model is fine-tuned for patients with their patient gene expression data and associated single drug response measured on the patient ex vivo samples. In this study, we evaluate PDSP on data from three leukemia patients and observe that it improves the prediction accuracy by 27% compared to models trained on cancer cell line data. AVAILABILITY AND IMPLEMENTATION: PDSP is available at https://github.com/hikuru/PDSP.

6.

Uncovering complementary sets of variants for predicting quantitative phenotypes.

Yilmaz, Serhan; Fakhouri, Mohamad; Koyutürk, Mehmet; Çiçek, A Ercüment; Tastan, Oznur.

Bioinformatics ; 38(4): 908-917, 2022 01 27.

Artículo en Inglés | MEDLINE | ID: mdl-34864867

RESUMEN

MOTIVATION: Genome-wide association studies show that variants in individual genomic loci alone are not sufficient to explain the heritability of complex, quantitative phenotypes. Many computational methods have been developed to address this issue by considering subsets of loci that can collectively predict the phenotype. This problem can be considered a challenging instance of feature selection in which the number of dimensions (loci that are screened) is much larger than the number of samples. While currently available methods can achieve decent phenotype prediction performance, they either do not scale to large datasets or have parameters that require extensive tuning. RESULTS: We propose a fast and simple algorithm, Macarons, to select a small, complementary subset of variants by avoiding redundant pairs that are likely to be in linkage disequilibrium. Our method features two interpretable parameters that control the time/performance trade-off without requiring parameter tuning. In our computational experiments, we show that Macarons consistently achieves similar or better prediction performance than state-of-the-art selection methods while having a simpler premise and being at least two orders of magnitude faster. Overall, Macarons can seamlessly scale to the human genome with â¼107 variants in a matter of minutes while taking the dependencies between the variants into account. AVAILABILITYAND IMPLEMENTATION: Macarons is available in Matlab and Python at https://github.com/serhan-yilmaz/macarons. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Algoritmos , Estudio de Asociación del Genoma Completo , Humanos , Estudio de Asociación del Genoma Completo/métodos , Fenotipo , Desequilibrio de Ligamiento , Genoma Humano , Polimorfismo de Nucleótido Simple

7.

PAMOGK: a pathway graph kernel-based multiomics approach for patient clustering.

Tepeli, Yasin Ilkagan; Ünal, Ali Burak; Akdemir, Furkan Mustafa; Tastan, Oznur.

Bioinformatics ; 36(21): 5237-5246, 2021 01 29.

Artículo en Inglés | MEDLINE | ID: mdl-32730565

RESUMEN

MOTIVATION: Accurate classification of patients into molecular subgroups is critical for the development of effective therapeutics and for deciphering what drives these subgroups to cancer. The availability of multiomics data catalogs for large cohorts of cancer patients provides multiple views into the molecular biology of the tumors with unprecedented resolution. RESULTS: We develop Pathway-based MultiOmic Graph Kernel clustering (PAMOGK) that integrates multiomics patient data with existing biological knowledge on pathways. We develop a novel graph kernel that evaluates patient similarities based on a single molecular alteration type in the context of a pathway. To corroborate multiple views of patients evaluated by hundreds of pathways and molecular alteration combinations, we use multiview kernel clustering. Applying PAMOGK to kidney renal clear cell carcinoma (KIRC) patients results in four clusters with significantly different survival times (P-value =1.24e-11). When we compare PAMOGK to eight other state-of-the-art multiomics clustering methods, PAMOGK consistently outperforms these in terms of its ability to partition KIRC patients into groups with different survival distributions. The discovered patient subgroups also differ with respect to other clinical parameters such as tumor stage and grade, and primary tumor and metastasis tumor spreads. The pathways identified as important are highly relevant to KIRC. AVAILABILITY AND IMPLEMENTATION: github.com/tastanlab/pamogk. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Algoritmos , Neoplasias , Análisis por Conglomerados , Humanos , Neoplasias/genética

8.

PRER: A patient representation with pairwise relative expression of proteins on biological networks.

Kuru, Halil Ibrahim; Buyukozkan, Mustafa; Tastan, Oznur.

PLoS Comput Biol ; 17(5): e1008998, 2021 05.

Artículo en Inglés | MEDLINE | ID: mdl-34038408

RESUMEN

Changes in protein and gene expression levels are often used as features in predictive modeling such as survival prediction. A common strategy to aggregate information contained in individual proteins is to integrate the expression levels with the biological networks. In this work, we propose a novel patient representation where we integrate proteins' expression levels with the protein-protein interaction (PPI) networks: Patient representation with PRER (Pairwise Relative Expressions with Random walks). PRER captures the dysregulation patterns of proteins based on the neighborhood of a protein in the PPI network. Specifically, PRER computes a feature vector for a patient by comparing the source protein's expression level with other proteins' levels that are within its neighborhood. The neighborhood of the source protein is derived by biased random-walk strategy on the network. We test PRER's performance in survival prediction task in 10 different cancers using random forest survival models. PRER yields a statistically significant predictive performance in 9 out of 10 cancers when compared to the same model trained with features based on individual protein expressions. Furthermore, we identified the pairs of proteins that their interactions are predictive of patient survival but their individual expression levels are not. The set of identified relations provides a valuable collection of protein biomarkers with high prognostic value. PRER can be used for other complex diseases and prediction tasks that use molecular expression profiles as input. PRER is freely available at: https://github.com/hikuru/PRER.

Asunto(s)

Biología Computacional/métodos , Proteínas/metabolismo , Biomarcadores/metabolismo , Pronóstico , Mapas de Interacción de Proteínas

9.

NoRCE: non-coding RNA sets cis enrichment tool.

Olgun, Gulden; Nabi, Afshan; Tastan, Oznur.

BMC Bioinformatics ; 22(1): 294, 2021 Jun 02.

Artículo en Inglés | MEDLINE | ID: mdl-34078267

RESUMEN

BACKGROUND: While some non-coding RNAs (ncRNAs) are assigned critical regulatory roles, most remain functionally uncharacterized. This presents a challenge whenever an interesting set of ncRNAs needs to be analyzed in a functional context. Transcripts located close-by on the genome are often regulated together. This genomic proximity on the sequence can hint at a functional association. RESULTS: We present a tool, NoRCE, that performs cis enrichment analysis for a given set of ncRNAs. Enrichment is carried out using the functional annotations of the coding genes located proximal to the input ncRNAs. Other biologically relevant information such as topologically associating domain (TAD) boundaries, co-expression patterns, and miRNA target prediction information can be incorporated to conduct a richer enrichment analysis. To this end, NoRCE includes several relevant datasets as part of its data repository, including cell-line specific TAD boundaries, functional gene sets, and expression data for coding & ncRNAs specific to cancer. Additionally, the users can utilize custom data files in their investigation. Enrichment results can be retrieved in a tabular format or visualized in several different ways. NoRCE is currently available for the following species: human, mouse, rat, zebrafish, fruit fly, worm, and yeast. CONCLUSIONS: NoRCE is a platform-independent, user-friendly, comprehensive R package that can be used to gain insight into the functional importance of a list of ncRNAs of any type. The tool offers flexibility to conduct the users' preferred set of analyses by designing their own pipeline of analysis. NoRCE is available in Bioconductor and https://github.com/guldenolgun/NoRCE .

Asunto(s)

MicroARNs , Pez Cebra , Animales , Genoma , Ratones , ARN no Traducido/genética , Ratas , Pez Cebra/genética

10.

DeepKinZero: zero-shot learning for predicting kinase-phosphosite associations involving understudied kinases.

Deznabi, Iman; Arabaci, Busra; Koyutürk, Mehmet; Tastan, Oznur.

Bioinformatics ; 36(12): 3652-3661, 2020 06 01.

Artículo en Inglés | MEDLINE | ID: mdl-32044914

RESUMEN

MOTIVATION: Protein phosphorylation is a key regulator of protein function in signal transduction pathways. Kinases are the enzymes that catalyze the phosphorylation of other proteins in a target-specific manner. The dysregulation of phosphorylation is associated with many diseases including cancer. Although the advances in phosphoproteomics enable the identification of phosphosites at the proteome level, most of the phosphoproteome is still in the dark: more than 95% of the reported human phosphosites have no known kinases. Determining which kinase is responsible for phosphorylating a site remains an experimental challenge. Existing computational methods require several examples of known targets of a kinase to make accurate kinase-specific predictions, yet for a large body of kinases, only a few or no target sites are reported. RESULTS: We present DeepKinZero, the first zero-shot learning approach to predict the kinase acting on a phosphosite for kinases with no known phosphosite information. DeepKinZero transfers knowledge from kinases with many known target phosphosites to those kinases with no known sites through a zero-shot learning model. The kinase-specific positional amino acid preferences are learned using a bidirectional recurrent neural network. We show that DeepKinZero achieves significant improvement in accuracy for kinases with no known phosphosites in comparison to the baseline model and other methods available. By expanding our knowledge on understudied kinases, DeepKinZero can help to chart the phosphoproteome atlas. AVAILABILITY AND IMPLEMENTATION: The source codes are available at https://github.com/Tastanlab/DeepKinZero. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Fosfoproteínas , Fosfotransferasas , Humanos , Fosfoproteínas/metabolismo , Fosforilación , Proteoma , Programas Informáticos

11.

Revisiting the complex architecture of ALS in Turkey: Expanding genotypes, shared phenotypes, molecular networks, and a public variant database.

Tunca, Ceren; Seker, Tuncay; Akçimen, Fulya; Coskun, Cemre; Bayraktar, Elif; Palvadeau, Robin; Zor, Seyit; Koçoglu, Cemile; Kartal, Ece; Sen, Nesli Ece; Hamzeiy, Hamid; Özoguz Erimis, Aslihan; Norman, Utku; Karakahya, Oguzhan; Olgun, Gülden; Akgün, Tahsin; Durmus, Hacer; Sahin, Erdi; Çakar, Arman; Basar Gürsoy, Esra; Babacan Yildiz, Gülsen; Isak, Baris; Uluç, Kayihan; Hanagasi, Hasmet; Bilgiç, Basar; Turgut, Nilda; Aysal, Fikret; Ertas, Mustafa; Boz, Cavit; Kotan, Dilcan; Idrisoglu, Halil; Soysal, Aysun; Uzun Adatepe, Nurten; Akalin, Mehmet Ali; Koç, Filiz; Tan, Ersin; Oflazer, Piraye; Deymeer, Feza; Tastan, Öznur; Çiçek, A Ercüment; Kavak, Ersen; Parman, Yesim; Basak, A Nazli.

Hum Mutat ; 41(8): e7-e45, 2020 08.

Artículo en Inglés | MEDLINE | ID: mdl-32579787

RESUMEN

The last decade has proven that amyotrophic lateral sclerosis (ALS) is clinically and genetically heterogeneous, and that the genetic component in sporadic cases might be stronger than expected. This study investigates 1,200 patients to revisit ALS in the ethnically heterogeneous yet inbred Turkish population. Familial ALS (fALS) accounts for 20% of our cases. The rates of consanguinity are 30% in fALS and 23% in sporadic ALS (sALS). Major ALS genes explained the disease cause in only 35% of fALS, as compared with ~70% in Europe and North America. Whole exome sequencing resulted in a discovery rate of 42% (53/127). Whole genome analyses in 623 sALS cases and 142 population controls, sequenced within Project MinE, revealed well-established fALS gene variants, solidifying the concept of incomplete penetrance in ALS. Genome-wide association studies (GWAS) with whole genome sequencing data did not indicate a new risk locus. Coupling GWAS with a coexpression network of disease-associated candidates, points to a significant enrichment for cell cycle- and division-related genes. Within this network, literature text-mining highlights DECR1, ATL1, HDAC2, GEMIN4, and HNRNPA3 as important genes. Finally, information on ALS-related gene variants in the Turkish cohort sequenced within Project MinE was compiled in the GeNDAL variant browser (www.gendal.org).

Asunto(s)

Esclerosis Amiotrófica Lateral/genética , Bases de Datos Genéticas , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Internet , Fenotipo , Turquía , Secuenciación Completa del Genoma

12.

A utility maximizing and privacy preserving approach for protecting kinship in genomic databases.

Kale, Gulce; Ayday, Erman; Tastan, Oznur.

Bioinformatics ; 34(2): 181-189, 2018 Jan 15.

Artículo en Inglés | MEDLINE | ID: mdl-28968635

RESUMEN

MOTIVATION: Rapid and low cost sequencing of genomes enabled widespread use of genomic data in research studies and personalized customer applications, where genomic data is shared in public databases. Although the identities of the participants are anonymized in these databases, sensitive information about individuals can still be inferred. One such information is kinship. RESULTS: We define two routes kinship privacy can leak and propose a technique to protect kinship privacy against these risks while maximizing the utility of shared data. The method involves systematic identification of minimal portions of genomic data to mask as new participants are added to the database. Choosing the proper positions to hide is cast as an optimization problem in which the number of positions to mask is minimized subject to privacy constraints that ensure the familial relationships are not revealed. We evaluate the proposed technique on real genomic data. Results indicate that concurrent sharing of data pertaining to a parent and an offspring results in high risks of kinship privacy, whereas the sharing data from further relatives together is often safer. We also show arrival order of family members have a high impact on the level of privacy risks and on the utility of sharing data. AVAILABILITY AND IMPLEMENTATION: https://github.com/tastanlab/Kinship-Privacy. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

13.

Discovering lncRNA mediated sponge interactions in breast cancer molecular subtypes.

Olgun, Gulden; Sahin, Ozgur; Tastan, Oznur.

BMC Genomics ; 19(1): 650, 2018 Sep 04.

Artículo en Inglés | MEDLINE | ID: mdl-30180792

RESUMEN

BACKGROUND: Long non-coding RNAs (lncRNAs) can indirectly regulate mRNAs expression levels by sequestering microRNAs (miRNAs), and act as competing endogenous RNAs (ceRNAs) or as sponges. Previous studies identified lncRNA-mediated sponge interactions in various cancers including the breast cancer. However, breast cancer subtypes are quite distinct in terms of their molecular profiles; therefore, ceRNAs are expected to be subtype-specific as well. RESULTS: To find lncRNA-mediated ceRNA interactions in breast cancer subtypes, we develop an integrative approach. We conduct partial correlation analysis and kernel independence tests on patient gene expression profiles and further refine the candidate interactions with miRNA target information. We find that although there are sponges common to multiple subtypes, there are also distinct subtype-specific interactions. Functional enrichment of mRNAs that participate in these interactions highlights distinct biological processes for different subtypes. Interestingly, some of the ceRNAs also reside in close proximity in the genome; for example, those involving HOX genes, HOTAIR, miR-196a-1 and miR-196a-2. We also discover subtype-specific sponge interactions with high prognostic potential. We found that patients differ significantly in their survival distributions if they are group based on the expression patterns of specific ceRNA interactions. However, it is not the case if the expression of individual RNAs participating in ceRNA is used. CONCLUSION: These results can help shed light on subtype-specific mechanisms of breast cancer, and the methodology developed herein can help uncover sponges in other diseases.

Asunto(s)

Neoplasias de la Mama/genética , Carcinoma Basocelular/genética , Redes Reguladoras de Genes , MicroARNs/genética , ARN Largo no Codificante/genética , ARN Mensajero/genética , Receptor ErbB-2/metabolismo , Neoplasias de la Mama/clasificación , Neoplasias de la Mama/metabolismo , Neoplasias de la Mama/patología , Carcinoma Basocelular/clasificación , Carcinoma Basocelular/metabolismo , Carcinoma Basocelular/patología , Biología Computacional , Femenino , Regulación Neoplásica de la Expresión Génica , Humanos , Pronóstico , Tasa de Supervivencia

14.

GLANET: genomic loci annotation and enrichment tool.

Otlu, Burçak; Firtina, Can; Keles, Sündüz; Tastan, Oznur.

Bioinformatics ; 33(18): 2818-2828, 2017 Sep 15.

Artículo en Inglés | MEDLINE | ID: mdl-28541490

RESUMEN

MOTIVATION: Genomic studies identify genomic loci representing genetic variations, transcription factor (TF) occupancy, or histone modification through next generation sequencing (NGS) technologies. Interpreting these loci requires evaluating them with known genomic and epigenomic annotations. RESULTS: We present GLANET as a comprehensive annotation and enrichment analysis tool which implements a sampling-based enrichment test that accounts for GC content and/or mappability biases, jointly or separately. GLANET annotates and performs enrichment analysis on these loci with a rich library. We introduce and perform novel data-driven computational experiments for assessing the power and Type-I error of its enrichment procedure which show that GLANET has attained high statistical power and well-controlled Type-I error rate. As a key feature, users can easily extend its library with new gene sets and genomic intervals. Other key features include assessment of impact of single nucleotide variants (SNPs) on TF binding sites and regulation based pathway enrichment analysis. AVAILABILITY AND IMPLEMENTATION: GLANET can be run using its GUI or on command line. GLANET's source code is available at https://github.com/burcakotlu/GLANET . Tutorials are provided at https://glanet.readthedocs.org . CONTACT: burcak@ceng.metu.edu.tr or oznur.tastan@cs.bilkent.edu.tr. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Sitios Genéticos , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Anotación de Secuencia Molecular/métodos , Programas Informáticos , ADN/metabolismo , Epigenómica/métodos , Genoma Humano , Humanos , Polimorfismo de Nucleótido Simple , Unión Proteica , Análisis de Secuencia de ADN/métodos , Factores de Transcripción/metabolismo

15.

Correction to: NoRCE: noncoding RNA sets cis enrichment tool.

Olgun, Gulden; Nabi, Afshan; Tastan, Oznur.

BMC Bioinformatics ; 22(1): 393, 2021 Aug 04.

Artículo en Inglés | MEDLINE | ID: mdl-34348639

16.

Integromic analysis of genetic variation and gene expression identifies networks for cardiovascular disease phenotypes.

Yao, Chen; Chen, Brian H; Joehanes, Roby; Otlu, Burcak; Zhang, Xiaoling; Liu, Chunyu; Huan, Tianxiao; Tastan, Oznur; Cupples, L Adrienne; Meigs, James B; Fox, Caroline S; Freedman, Jane E; Courchesne, Paul; O'Donnell, Christopher J; Munson, Peter J; Keles, Sunduz; Levy, Daniel.

Circulation ; 131(6): 536-49, 2015 Feb 10.

Artículo en Inglés | MEDLINE | ID: mdl-25533967

RESUMEN

BACKGROUND: Cardiovascular disease (CVD) reflects a highly coordinated complex of traits. Although genome-wide association studies have reported numerous single nucleotide polymorphisms (SNPs) to be associated with CVD, the role of most of these variants in disease processes remains unknown. METHODS AND RESULTS: We built a CVD network using 1512 SNPs associated with 21 CVD traits in genome-wide association studies (at P≤5×10(-8)) and cross-linked different traits by virtue of their shared SNP associations. We then explored whole blood gene expression in relation to these SNPs in 5257 participants in the Framingham Heart Study. At a false discovery rate <0.05, we identified 370 cis-expression quantitative trait loci (eQTLs; SNPs associated with altered expression of nearby genes) and 44 trans-eQTLs (SNPs associated with altered expression of remote genes). The eQTL network revealed 13 CVD-related modules. Searching for association of eQTL genes with CVD risk factors (lipids, blood pressure, fasting blood glucose, and body mass index) in the same individuals, we found examples in which the expression of eQTL genes was significantly associated with these CVD phenotypes. In addition, mediation tests suggested that a subset of SNPs previously associated with CVD phenotypes in genome-wide association studies may exert their function by altering expression of eQTL genes (eg, LDLR and PCSK7), which in turn may promote interindividual variation in phenotypes. CONCLUSIONS: Using a network approach to analyze CVD traits, we identified complex networks of SNP-phenotype and SNP-transcript connections. Integrating the CVD network with phenotypic data, we identified biological pathways that may provide insights into potential drug targets for treatment or prevention of CVD.

Asunto(s)

Enfermedades Cardiovasculares/genética , Redes Reguladoras de Genes/genética , Polimorfismo de Nucleótido Simple/genética , Sitios de Carácter Cuantitativo/genética , Adulto , Mapeo Cromosómico , Enfermedad de la Arteria Coronaria/genética , Diabetes Mellitus Tipo 1/genética , Femenino , Expresión Génica , Variación Genética , Estudio de Asociación del Genoma Completo , Humanos , Proteína Asociada a Proteínas Relacionadas con Receptor de LDL/genética , Lipoproteínas HDL/genética , Lipoproteínas LDL/genética , Masculino , Fenotipo , Factores de Riesgo , Fumar/genética

17.

Retinal proteins as model systems for membrane protein folding.

Tastan, Oznur; Dutta, Arpana; Booth, Paula; Klein-Seetharaman, Judith.

Biochim Biophys Acta ; 1837(5): 656-63, 2014 May.

Artículo en Inglés | MEDLINE | ID: mdl-24333783

RESUMEN

Experimental folding studies of membrane proteins are more challenging than water-soluble proteins because of the higher hydrophobicity content of membrane embedded sequences and the need to provide a hydrophobic milieu for the transmembrane regions. The first challenge is their denaturation: due to the thermodynamic instability of polar groups in the membrane, secondary structures in membrane proteins are more difficult to disrupt than in soluble proteins. The second challenge is to refold from the denatured states. Successful refolding of membrane proteins has almost always been from very subtly denatured states. Therefore, it can be useful to analyze membrane protein folding using computational methods, and we will provide results obtained with simulated unfolding of membrane protein structures using the Floppy Inclusions and Rigid Substructure Topography (FIRST) method. Computational methods have the advantage that they allow a direct comparison between diverse membrane proteins. We will review here both, experimental and FIRST studies of the retinal binding proteins bacteriorhodopsin and mammalian rhodopsin, and discuss the extension of the findings to deriving hypotheses on the mechanisms of folding of membrane proteins in general. This article is part of a Special Issue entitled: Retinal Proteins-You can teach an old dog new tricks.

Asunto(s)

Bacteriorodopsinas/química , Simulación de Dinámica Molecular , Retinaldehído/química , Rodopsina/química , Bacteriorodopsinas/metabolismo , Euryarchaeota/química , Euryarchaeota/fisiología , Humanos , Interacciones Hidrofóbicas e Hidrofílicas , Cinética , Desnaturalización Proteica , Pliegue de Proteína , Replegamiento Proteico , Estructura Secundaria de Proteína , Retinaldehído/metabolismo , Rodopsina/metabolismo , Homología Estructural de Proteína , Termodinámica

18.

Tuning Privacy-Utility Tradeoff in Genomic Studies Using Selective SNP Hiding.

Alserr, Nour Almadhoun; Kale, Gulce; Mutlu, Onur; Tastan, Oznur; Ayday, Erman.

Proc Asia Pac Bioinform Conf ; 20232023 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-37383349

RESUMEN

Researchers need a rich trove of genomic datasets that they can leverage to gain a better understanding of the genetic basis of the human genome and identify associations between phenol-types and specific parts of DNA. However, sharing genomic datasets that include sensitive genetic or medical information of individuals can lead to serious privacy-related consequences if data lands in the wrong hands. Restricting access to genomic datasets is one solution, but this greatly reduces their usefulness for research purposes. To allow sharing of genomic datasets while addressing these privacy concerns, several studies propose privacy-preserving mechanisms for data sharing. Differential privacy is one of such mechanisms that formalize rigorous mathematical foundations to provide privacy guarantees while sharing aggregated statistical information about a dataset. Nevertheless, it has been shown that the original privacy guarantees of DP-based solutions degrade when there are dependent tuples in the dataset, which is a common scenario for genomic datasets (due to the existence of family members). In this work, we introduce a new mechanism to mitigate the vulnerabilities of the inference attacks on differentially private query results from genomic datasets including dependent tuples. We propose a utility-maximizing and privacy-preserving approach for sharing statistics by hiding selective SNPs of the family members as they participate in a genomic dataset. By evaluating our mechanism on a real-world genomic dataset, we empirically demonstrate that our proposed mechanism can achieve up to 40% better privacy than state-of-the-art DP-based solutions, while near-optimally minimizing utility loss.

19.

DeepSide: A Deep Learning Approach for Drug Side Effect Prediction.

Uner, Onur Can; Kuru, Halil Ibrahim; Cinbis, R Gokberk; Tastan, Oznur; Cicek, A Ercument.

IEEE/ACM Trans Comput Biol Bioinform ; 20(1): 330-339, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-34995191

RESUMEN

Drug failures due to unforeseen adverse effects at clinical trials pose health risks for the participants and lead to substantial financial losses. Side effect prediction algorithms have the potential to guide the drug design process. LINCS L1000 dataset provides a vast resource of cell line gene expression data perturbed by different drugs and creates a knowledge base for context specific features. The state-of-the-art approach that aims at using context specific information relies on only the high-quality experiments in LINCS L1000 and discards a large portion of the experiments. In this study, our goal is to boost the prediction performance by utilizing this data to its full extent. We experiment with 5 deep learning architectures. We find that a multi-modal architecture produces the best predictive performance among multi-layer perceptron-based architectures when drug chemical structure (CS), and the full set of drug perturbed gene expression profiles (GEX) are used as modalities. Overall, we observe that the CS is more informative than the GEX. A convolutional neural network-based model that uses only SMILES string representation of the drugs achieves the best results and provides 13.0% macro-AUC and 3.1% micro-AUC improvements over the state-of-the-art. We also show that the model is able to predict side effect-drug pairs that are reported in the literature but was missing in the ground truth side effect dataset. DeepSide is available at http://github.com/OnurUner/DeepSide.

Asunto(s)

Aprendizaje Profundo , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Humanos , Redes Neurales de la Computación , Algoritmos , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos/genética , Línea Celular

20.

Deep Phenotyping of Sleep in Drosophila.

Keles, Mehmet F; Sapci, Ali; Brody, Casey; Palmer, Isabelle; Le, Christin; Tastan, Oznur; Keles, Sunduz; Wu, Mark N.

bioRxiv ; 2023 Nov 02.

Artículo en Inglés | MEDLINE | ID: mdl-37961473

RESUMEN

Sleep is an evolutionarily conserved behavior, whose function is unknown. Here, we present a method for deep phenotyping of sleep in Drosophila, consisting of a high-resolution video imaging system, coupled with closed-loop laser perturbation to measure arousal threshold. To quantify sleep-associated microbehaviors, we trained a deep-learning network to annotate body parts in freely moving flies and developed a semi-supervised computational pipeline to classify behaviors. Quiescent flies exhibit a rich repertoire of microbehaviors, including proboscis pumping (PP) and haltere switches, which vary dynamically across the night. Using this system, we characterized the effects of optogenetically activating two putative sleep circuits. These data reveal that activating dFB neurons produces micromovements, inconsistent with sleep, while activating R5 neurons triggers PP followed by behavioral quiescence. Our findings suggest that sleep in Drosophila is polyphasic with different stages and set the stage for a rigorous analysis of sleep and other behaviors in this species.

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA