Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
bioRxiv ; 2023 Nov 02.
Artigo em Inglês | MEDLINE | ID: mdl-37961473

RESUMO

Sleep is an evolutionarily conserved behavior, whose function is unknown. Here, we present a method for deep phenotyping of sleep in Drosophila, consisting of a high-resolution video imaging system, coupled with closed-loop laser perturbation to measure arousal threshold. To quantify sleep-associated microbehaviors, we trained a deep-learning network to annotate body parts in freely moving flies and developed a semi-supervised computational pipeline to classify behaviors. Quiescent flies exhibit a rich repertoire of microbehaviors, including proboscis pumping (PP) and haltere switches, which vary dynamically across the night. Using this system, we characterized the effects of optogenetically activating two putative sleep circuits. These data reveal that activating dFB neurons produces micromovements, inconsistent with sleep, while activating R5 neurons triggers PP followed by behavioral quiescence. Our findings suggest that sleep in Drosophila is polyphasic with different stages and set the stage for a rigorous analysis of sleep and other behaviors in this species.

2.
Bioinformatics ; 39(10)2023 10 03.
Artigo em Inglês | MEDLINE | ID: mdl-37740957

RESUMO

MOTIVATION: With the wide availability of single-cell RNA-seq (scRNA-seq) technology, population-scale scRNA-seq datasets across multiple individuals and time points are emerging. While the initial investigations of these datasets tend to focus on standard analysis of clustering and differential expression, leveraging the power of scRNA-seq data at the personalized dynamic gene co-expression network level has the potential to unlock subject and/or time-specific network-level variation, which is critical for understanding phenotypic differences. Community detection from co-expression networks of multiple time points or conditions has been well-studied; however, none of the existing settings included networks from multiple subjects and multiple time points simultaneously. To address this, we develop Multi-subject Dynamic Community Detection (MuDCoD) for multi-subject community detection in personalized dynamic gene networks from scRNA-seq. MuDCoD builds on the spectral clustering framework and promotes information sharing among the networks of the subjects as well as networks at different time points. It clusters genes in the personalized dynamic gene networks and reveals gene communities that are variable or shared not only across time but also among subjects. RESULTS: Evaluation and benchmarking of MuDCoD against existing approaches reveal that MuDCoD effectively leverages apparent shared signals among networks of the subjects at individual time points, and performs robustly when there is no or little information sharing among the networks. Applications to population-scale scRNA-seq datasets of human-induced pluripotent stem cells during dopaminergic neuron differentiation and CD4+ T cell activation indicate that MuDCoD enables robust inference for identifying time-varying personalized gene modules. Our results illustrate how personalized dynamic community detection can aid in the exploration of subject-specific biological processes that vary across time. AVAILABILITY AND IMPLEMENTATION: MuDCoD is publicly available at https://github.com/bo1929/MuDCoD as a Python package. Implementation includes simulation and real-data experiments together with extensive documentation.


Assuntos
Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Humanos , Perfilação da Expressão Gênica/métodos , Software , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Análise por Conglomerados
3.
Artigo em Inglês | MEDLINE | ID: mdl-37383349

RESUMO

Researchers need a rich trove of genomic datasets that they can leverage to gain a better understanding of the genetic basis of the human genome and identify associations between phenol-types and specific parts of DNA. However, sharing genomic datasets that include sensitive genetic or medical information of individuals can lead to serious privacy-related consequences if data lands in the wrong hands. Restricting access to genomic datasets is one solution, but this greatly reduces their usefulness for research purposes. To allow sharing of genomic datasets while addressing these privacy concerns, several studies propose privacy-preserving mechanisms for data sharing. Differential privacy is one of such mechanisms that formalize rigorous mathematical foundations to provide privacy guarantees while sharing aggregated statistical information about a dataset. Nevertheless, it has been shown that the original privacy guarantees of DP-based solutions degrade when there are dependent tuples in the dataset, which is a common scenario for genomic datasets (due to the existence of family members). In this work, we introduce a new mechanism to mitigate the vulnerabilities of the inference attacks on differentially private query results from genomic datasets including dependent tuples. We propose a utility-maximizing and privacy-preserving approach for sharing statistics by hiding selective SNPs of the family members as they participate in a genomic dataset. By evaluating our mechanism on a real-world genomic dataset, we empirically demonstrate that our proposed mechanism can achieve up to 40% better privacy than state-of-the-art DP-based solutions, while near-optimally minimizing utility loss.

4.
Bioinformatics ; 39(1)2023 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-36571493

RESUMO

MOTIVATION: Recent experimental evidence has shown that some long non-coding RNAs (lncRNAs) contain small open reading frames (sORFs) that are translated into functional micropeptides, suggesting that these lncRNAs are misannotated as non-coding. Current methods to detect misannotated lncRNAs rely on ribosome-profiling (Ribo-Seq) and mass-spectrometry experiments, which are cell-type dependent and expensive. RESULTS: Here, we propose a computational method to identify possible misannotated lncRNAs from sequence information alone. Our approach first builds deep learning models to discriminate coding and non-coding transcripts and leverages these models' training dynamics to identify misannotated lncRNAs-i.e. lncRNAs with coding potential. The set of misannotated lncRNAs we identified significantly overlap with experimentally validated ones and closely resemble coding protein sequences as evidenced by significant BLAST hits. Our analysis on a subset of misannotated lncRNA candidates also shows that some ORFs they contain yield high confidence folded structures as predicted by AlphaFold2. This methodology offers promising potential for assisting experimental efforts in characterizing the hidden proteome encoded by misannotated lncRNAs and for curating better datasets for building coding potential predictors. AVAILABILITY AND IMPLEMENTATION: Source code is available at https://github.com/nabiafshan/DetectingMisannotatedLncRNAs. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aprendizado Profundo , RNA Longo não Codificante , RNA Longo não Codificante/genética , Sequência de Aminoácidos , Proteoma/genética , Fases de Leitura Aberta , Micropeptídeos
5.
Artigo em Inglês | MEDLINE | ID: mdl-34995191

RESUMO

Drug failures due to unforeseen adverse effects at clinical trials pose health risks for the participants and lead to substantial financial losses. Side effect prediction algorithms have the potential to guide the drug design process. LINCS L1000 dataset provides a vast resource of cell line gene expression data perturbed by different drugs and creates a knowledge base for context specific features. The state-of-the-art approach that aims at using context specific information relies on only the high-quality experiments in LINCS L1000 and discards a large portion of the experiments. In this study, our goal is to boost the prediction performance by utilizing this data to its full extent. We experiment with 5 deep learning architectures. We find that a multi-modal architecture produces the best predictive performance among multi-layer perceptron-based architectures when drug chemical structure (CS), and the full set of drug perturbed gene expression profiles (GEX) are used as modalities. Overall, we observe that the CS is more informative than the GEX. A convolutional neural network-based model that uses only SMILES string representation of the drugs achieves the best results and provides 13.0% macro-AUC and 3.1% micro-AUC improvements over the state-of-the-art. We also show that the model is able to predict side effect-drug pairs that are reported in the literature but was missing in the ground truth side effect dataset. DeepSide is available at http://github.com/OnurUner/DeepSide.


Assuntos
Aprendizado Profundo , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Redes Neurais de Computação , Algoritmos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/genética , Linhagem Celular
6.
Mol Biol Evol ; 39(6)2022 06 02.
Artigo em Inglês | MEDLINE | ID: mdl-35639618

RESUMO

Evolutionary conservation is a fundamental resource for predicting the substitutability of amino acids and the loss of function in proteins. The use of multiple sequence alignment alone-without considering the evolutionary relationships among sequences-results in the redundant counting of evolutionarily related alteration events, as if they were independent. Here, we propose a new method, PHACT, that predicts the pathogenicity of missense mutations directly from the phylogenetic tree of proteins. PHACT travels through the nodes of the phylogenetic tree and evaluates the deleteriousness of a substitution based on the probability differences of ancestral amino acids between neighboring nodes in the tree. Moreover, PHACT assigns weights to each node in the tree based on their distance to the query organism. For each potential amino acid substitution, the algorithm generates a score that is used to calculate the effect of substitution on protein function. To analyze the predictive performance of PHACT, we performed various experiments over the subsets of two datasets that include 3,023 proteins and 61,662 variants in total. The experiments demonstrated that our method outperformed the widely used pathogenicity prediction tools (i.e., SIFT and PolyPhen-2) and achieved a better predictive performance than other conventional statistical approaches presented in dbNSFP. The PHACT source code is available at https://github.com/CompGenomeLab/PHACT.


Assuntos
Mutação de Sentido Incorreto , Software , Aminoácidos , Filogenia , Proteínas/química , Proteínas/genética , Alinhamento de Sequência
7.
IEEE/ACM Trans Comput Biol Bioinform ; 19(4): 2334-2344, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34086576

RESUMO

Drug combination therapies have been a viable strategy for the treatment of complex diseases such as cancer due to increased efficacy and reduced side effects. However, experimentally validating all possible combinations for synergistic interaction even with high-throughout screens is intractable due to vast combinatorial search space. Computational techniques can reduce the number of combinations to be evaluated experimentally by prioritizing promising candidates. We present MatchMaker that predicts drug synergy scores using drug chemical structure information and gene expression profiles of cell lines in a deep learning framework. For the first time, our model utilizes the largest known drug combination dataset to date, DrugComb. We compare the performance of MatchMaker with the state-of-the-art models and observe up to  âˆ¼ 15% correlation and  âˆ¼ 33% mean squared error (MSE) improvements over the next best method. We investigate the cell types and drug pairs that are relatively harder to predict and present novel candidate pairs. MatchMaker is built and available at https://github.com/tastanlab/matchmaker.


Assuntos
Aprendizado Profundo , Neoplasias , Biologia Computacional/métodos , Combinação de Medicamentos , Sinergismo Farmacológico , Humanos , Neoplasias/genética
8.
IEEE/ACM Trans Comput Biol Bioinform ; 19(3): 1760-1771, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-33382660

RESUMO

Although miRNAs can cause widespread changes in expression programs, single miRNAs typically induce mild repression on their targets. Cooperativity among miRNAs is reported as one strategy to overcome this constraint. Expanding the catalog of synergistic miRNAs is critical for understanding gene regulation and for developing miRNA-based therapeutics. In this study, we develop miRCoop to identify synergistic miRNA pairs that have weak or no repression on the target mRNA individually, but when act together, induce strong repression. miRCoop uses kernel-based statistical interaction tests, together with miRNA and mRNA target information. We apply our approach to patient data of two different cancer types. In kidney cancer, we identify 66 putative triplets. For 64 of these triplets, there is at least one common transcription factor that potentially regulates all participating RNAs of the triplet, supporting a functional association among them. Furthermore, we find that identified triplets are enriched for certain biological processes that are relevant to kidney cancer. Some of the synergistic miRNAs are very closely encoded in the genome, hinting a functional association among them. In applying the method on tumor data with the primary liver site, we find 3105 potential triplet interactions. We believe miRCoop can aid our understanding of the complex regulatory interactions in different health and disease states of the cell and can help in designing miRNA-based therapies. Matlab code for the methodology is provided in https://github.com/guldenolgun/miRCoop.


Assuntos
Neoplasias Renais , MicroRNAs , Software , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Redes Reguladoras de Genes , Humanos , Neoplasias Renais/genética , MicroRNAs/metabolismo , RNA Mensageiro/genética , Fatores de Transcrição
9.
Bioinformatics ; 38(4): 908-917, 2022 01 27.
Artigo em Inglês | MEDLINE | ID: mdl-34864867

RESUMO

MOTIVATION: Genome-wide association studies show that variants in individual genomic loci alone are not sufficient to explain the heritability of complex, quantitative phenotypes. Many computational methods have been developed to address this issue by considering subsets of loci that can collectively predict the phenotype. This problem can be considered a challenging instance of feature selection in which the number of dimensions (loci that are screened) is much larger than the number of samples. While currently available methods can achieve decent phenotype prediction performance, they either do not scale to large datasets or have parameters that require extensive tuning. RESULTS: We propose a fast and simple algorithm, Macarons, to select a small, complementary subset of variants by avoiding redundant pairs that are likely to be in linkage disequilibrium. Our method features two interpretable parameters that control the time/performance trade-off without requiring parameter tuning. In our computational experiments, we show that Macarons consistently achieves similar or better prediction performance than state-of-the-art selection methods while having a simpler premise and being at least two orders of magnitude faster. Overall, Macarons can seamlessly scale to the human genome with ∼107 variants in a matter of minutes while taking the dependencies between the variants into account. AVAILABILITYAND IMPLEMENTATION: Macarons is available in Matlab and Python at https://github.com/serhan-yilmaz/macarons. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Estudo de Associação Genômica Ampla , Humanos , Estudo de Associação Genômica Ampla/métodos , Fenótipo , Desequilíbrio de Ligação , Genoma Humano , Polimorfismo de Nucleotídeo Único
10.
Bioinformatics ; 40(5)2022 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-38718189

RESUMO

MOTIVATION: Combination drug therapies are effective treatments for cancer. However, the genetic heterogeneity of the patients and exponentially large space of drug pairings pose significant challenges for finding the right combination for a specific patient. Current in silico prediction methods can be instrumental in reducing the vast number of candidate drug combinations. However, existing powerful methods are trained with cancer cell line gene expression data, which limits their applicability in clinical settings. While synergy measurements on cell line models are available at large scale, patient-derived samples are too few to train a complex model. On the other hand, patient-specific single-drug response data are relatively more available. RESULTS: In this work, we propose a deep learning framework, Personalized Deep Synergy Predictor (PDSP), that enables us to use the patient-specific single drug response data for customizing patient drug synergy predictions. PDSP is first trained to learn synergy scores of drug pairs and their single drug responses for a given cell line using drug structures and large scale cell line gene expression data. Then, the model is fine-tuned for patients with their patient gene expression data and associated single drug response measured on the patient ex vivo samples. In this study, we evaluate PDSP on data from three leukemia patients and observe that it improves the prediction accuracy by 27% compared to models trained on cancer cell line data. AVAILABILITY AND IMPLEMENTATION: PDSP is available at https://github.com/hikuru/PDSP.

11.
12.
BMC Bioinformatics ; 22(1): 294, 2021 Jun 02.
Artigo em Inglês | MEDLINE | ID: mdl-34078267

RESUMO

BACKGROUND: While some non-coding RNAs (ncRNAs) are assigned critical regulatory roles, most remain functionally uncharacterized. This presents a challenge whenever an interesting set of ncRNAs needs to be analyzed in a functional context. Transcripts located close-by on the genome are often regulated together. This genomic proximity on the sequence can hint at a functional association. RESULTS: We present a tool, NoRCE, that performs cis enrichment analysis for a given set of ncRNAs. Enrichment is carried out using the functional annotations of the coding genes located proximal to the input ncRNAs. Other biologically relevant information such as topologically associating domain (TAD) boundaries, co-expression patterns, and miRNA target prediction information can be incorporated to conduct a richer enrichment analysis. To this end, NoRCE includes several relevant datasets as part of its data repository, including cell-line specific TAD boundaries, functional gene sets, and expression data for coding & ncRNAs specific to cancer. Additionally, the users can utilize custom data files in their investigation. Enrichment results can be retrieved in a tabular format or visualized in several different ways. NoRCE is currently available for the following species: human, mouse, rat, zebrafish, fruit fly, worm, and yeast. CONCLUSIONS: NoRCE is a platform-independent, user-friendly, comprehensive R package that can be used to gain insight into the functional importance of a list of ncRNAs of any type. The tool offers flexibility to conduct the users' preferred set of analyses by designing their own pipeline of analysis. NoRCE is available in Bioconductor and https://github.com/guldenolgun/NoRCE .


Assuntos
MicroRNAs , Peixe-Zebra , Animais , Genoma , Camundongos , RNA não Traduzido/genética , Ratos , Peixe-Zebra/genética
13.
PLoS Comput Biol ; 17(5): e1008998, 2021 05.
Artigo em Inglês | MEDLINE | ID: mdl-34038408

RESUMO

Changes in protein and gene expression levels are often used as features in predictive modeling such as survival prediction. A common strategy to aggregate information contained in individual proteins is to integrate the expression levels with the biological networks. In this work, we propose a novel patient representation where we integrate proteins' expression levels with the protein-protein interaction (PPI) networks: Patient representation with PRER (Pairwise Relative Expressions with Random walks). PRER captures the dysregulation patterns of proteins based on the neighborhood of a protein in the PPI network. Specifically, PRER computes a feature vector for a patient by comparing the source protein's expression level with other proteins' levels that are within its neighborhood. The neighborhood of the source protein is derived by biased random-walk strategy on the network. We test PRER's performance in survival prediction task in 10 different cancers using random forest survival models. PRER yields a statistically significant predictive performance in 9 out of 10 cancers when compared to the same model trained with features based on individual protein expressions. Furthermore, we identified the pairs of proteins that their interactions are predictive of patient survival but their individual expression levels are not. The set of identified relations provides a valuable collection of protein biomarkers with high prognostic value. PRER can be used for other complex diseases and prediction tasks that use molecular expression profiles as input. PRER is freely available at: https://github.com/hikuru/PRER.


Assuntos
Biologia Computacional/métodos , Proteínas/metabolismo , Biomarcadores/metabolismo , Prognóstico , Mapas de Interação de Proteínas
14.
IEEE/ACM Trans Comput Biol Bioinform ; 18(3): 1208-1216, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-31443041

RESUMO

Phenotypic heritability of complex traits and diseases is seldom explained by individual genetic variants identified in genome-wide association studies (GWAS). Many methods have been developed to select a subset of variant loci, which are associated with or predictive of the phenotype. Selecting connected SNPs on SNP-SNP networks have been proven successful in finding biologically interpretable and predictive SNPs. However, we argue that the connectedness constraint favors selecting redundant features that affect similar biological processes and therefore does not necessarily yield better predictive performance. In this paper, we propose a novel method called SPADIS that favors the selection of remotely located SNPs in order to account for their complementary effects in explaining a phenotype. SPADIS selects a diverse set of loci on a SNP-SNP network. This is achieved by maximizing a submodular set function with a greedy algorithm that ensures a constant factor approximation to the optimal solution. We compare SPADIS to the state-of-the-art method SConES, on a dataset of Arabidopsis Thaliana with continuous flowering time phenotypes. SPADIS has better average phenotype prediction performance in 15 out of 17 phenotypes when the same number of SNPs are selected and provides consistent improvements across multiple networks and settings on average. Moreover, it identifies more candidate genes and runs faster.


Assuntos
Algoritmos , Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único/genética , Arabidopsis/genética , Genes de Plantas/genética , Genômica/métodos , Análise de Sequência de DNA/métodos
15.
Bioinformatics ; 36(21): 5237-5246, 2021 01 29.
Artigo em Inglês | MEDLINE | ID: mdl-32730565

RESUMO

MOTIVATION: Accurate classification of patients into molecular subgroups is critical for the development of effective therapeutics and for deciphering what drives these subgroups to cancer. The availability of multiomics data catalogs for large cohorts of cancer patients provides multiple views into the molecular biology of the tumors with unprecedented resolution. RESULTS: We develop Pathway-based MultiOmic Graph Kernel clustering (PAMOGK) that integrates multiomics patient data with existing biological knowledge on pathways. We develop a novel graph kernel that evaluates patient similarities based on a single molecular alteration type in the context of a pathway. To corroborate multiple views of patients evaluated by hundreds of pathways and molecular alteration combinations, we use multiview kernel clustering. Applying PAMOGK to kidney renal clear cell carcinoma (KIRC) patients results in four clusters with significantly different survival times (P-value =1.24e-11). When we compare PAMOGK to eight other state-of-the-art multiomics clustering methods, PAMOGK consistently outperforms these in terms of its ability to partition KIRC patients into groups with different survival distributions. The discovered patient subgroups also differ with respect to other clinical parameters such as tumor stage and grade, and primary tumor and metastasis tumor spreads. The pathways identified as important are highly relevant to KIRC. AVAILABILITY AND IMPLEMENTATION: github.com/tastanlab/pamogk. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Neoplasias , Análise por Conglomerados , Humanos , Neoplasias/genética
16.
J Comput Biol ; 28(4): 378-380, 2021 04.
Artigo em Inglês | MEDLINE | ID: mdl-33325775

RESUMO

Detecting interacting loci pairs has been instrumental to understand disease etiology when single locus associations do not fully account for the underlying heritability. However, the number of loci to test is prohibitively large. Epistasis test prioritization algorithms rank likely epistatic single nucleotide polymorphism (SNP) pairs to limit the number of statistical tests. Potpourri detects epistatic SNP pairs by diversifying the selected SNPs' genomic regions and investigating their co-occurrence patterns over the case cohort. It can also input and further prioritize SNPs in regulatory or coding regions. The program identifies and returns a list of prioritized SNP pairs for epistasis testing. This article describes how to use the program and the details of the input and output data.


Assuntos
Epistasia Genética/genética , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Polimorfismo de Nucleotídeo Único/genética , Software , Algoritmos , Genoma/genética , Humanos
17.
J Comput Biol ; 28(4): 365-377, 2021 04.
Artigo em Inglês | MEDLINE | ID: mdl-33275856

RESUMO

Genome-wide association studies (GWAS) explain a fraction of the underlying heritability of genetic diseases. Investigating epistatic interactions between two or more loci help to close this gap. Unfortunately, the sheer number of loci combinations to process and hypotheses prohibit the process both computationally and statistically. Epistasis test prioritization algorithms rank likely epistatic single nucleotide polymorphism (SNP) pairs to limit the number of tests. However, they still suffer from very low precision. It was shown in the literature that selecting SNPs that are individually correlated with the phenotype and also diverse with respect to genomic location leads to better phenotype prediction due to genetic complementation. Here, we propose that an algorithm that pairs SNPs from such diverse regions and ranks them can improve prediction power. We propose an epistasis test prioritization algorithm that optimizes a submodular set function to select a diverse and complementary set of genomic regions that span the underlying genome. The SNP pairs from these regions are then further ranked w.r.t. their co-coverage of the case cohort. We compare our algorithm with the state of the art on three GWAS and show that (1) we substantially improve precision (from 0.003 to 0.652) while maintaining the significance of selected pairs, (2) decrease the number of tests by 25-fold, and (3) decrease the runtime by 4-fold. We also show that promoting SNPs from regulatory/coding regions improves the performance (up to 0.8). Potpourri is available at http:/ciceklab.cs.bilkent.edu.tr/potpourri.


Assuntos
Epistasia Genética/genética , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Polimorfismo de Nucleotídeo Único/genética , Software , Algoritmos , Genômica/estatística & dados numéricos , Humanos , Locos de Características Quantitativas/genética
18.
Hum Mutat ; 41(8): e7-e45, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32579787

RESUMO

The last decade has proven that amyotrophic lateral sclerosis (ALS) is clinically and genetically heterogeneous, and that the genetic component in sporadic cases might be stronger than expected. This study investigates 1,200 patients to revisit ALS in the ethnically heterogeneous yet inbred Turkish population. Familial ALS (fALS) accounts for 20% of our cases. The rates of consanguinity are 30% in fALS and 23% in sporadic ALS (sALS). Major ALS genes explained the disease cause in only 35% of fALS, as compared with ~70% in Europe and North America. Whole exome sequencing resulted in a discovery rate of 42% (53/127). Whole genome analyses in 623 sALS cases and 142 population controls, sequenced within Project MinE, revealed well-established fALS gene variants, solidifying the concept of incomplete penetrance in ALS. Genome-wide association studies (GWAS) with whole genome sequencing data did not indicate a new risk locus. Coupling GWAS with a coexpression network of disease-associated candidates, points to a significant enrichment for cell cycle- and division-related genes. Within this network, literature text-mining highlights DECR1, ATL1, HDAC2, GEMIN4, and HNRNPA3 as important genes. Finally, information on ALS-related gene variants in the Turkish cohort sequenced within Project MinE was compiled in the GeNDAL variant browser (www.gendal.org).


Assuntos
Esclerose Lateral Amiotrófica/genética , Bases de Dados Genéticas , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Internet , Fenótipo , Turquia , Sequenciamento Completo do Genoma
19.
Bioinformatics ; 36(12): 3652-3661, 2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32044914

RESUMO

MOTIVATION: Protein phosphorylation is a key regulator of protein function in signal transduction pathways. Kinases are the enzymes that catalyze the phosphorylation of other proteins in a target-specific manner. The dysregulation of phosphorylation is associated with many diseases including cancer. Although the advances in phosphoproteomics enable the identification of phosphosites at the proteome level, most of the phosphoproteome is still in the dark: more than 95% of the reported human phosphosites have no known kinases. Determining which kinase is responsible for phosphorylating a site remains an experimental challenge. Existing computational methods require several examples of known targets of a kinase to make accurate kinase-specific predictions, yet for a large body of kinases, only a few or no target sites are reported. RESULTS: We present DeepKinZero, the first zero-shot learning approach to predict the kinase acting on a phosphosite for kinases with no known phosphosite information. DeepKinZero transfers knowledge from kinases with many known target phosphosites to those kinases with no known sites through a zero-shot learning model. The kinase-specific positional amino acid preferences are learned using a bidirectional recurrent neural network. We show that DeepKinZero achieves significant improvement in accuracy for kinases with no known phosphosites in comparison to the baseline model and other methods available. By expanding our knowledge on understudied kinases, DeepKinZero can help to chart the phosphoproteome atlas. AVAILABILITY AND IMPLEMENTATION: The source codes are available at https://github.com/Tastanlab/DeepKinZero. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Fosfoproteínas , Fosfotransferases , Humanos , Fosfoproteínas/metabolismo , Fosforilação , Proteoma , Software
20.
BMC Genomics ; 19(1): 650, 2018 Sep 04.
Artigo em Inglês | MEDLINE | ID: mdl-30180792

RESUMO

BACKGROUND: Long non-coding RNAs (lncRNAs) can indirectly regulate mRNAs expression levels by sequestering microRNAs (miRNAs), and act as competing endogenous RNAs (ceRNAs) or as sponges. Previous studies identified lncRNA-mediated sponge interactions in various cancers including the breast cancer. However, breast cancer subtypes are quite distinct in terms of their molecular profiles; therefore, ceRNAs are expected to be subtype-specific as well. RESULTS: To find lncRNA-mediated ceRNA interactions in breast cancer subtypes, we develop an integrative approach. We conduct partial correlation analysis and kernel independence tests on patient gene expression profiles and further refine the candidate interactions with miRNA target information. We find that although there are sponges common to multiple subtypes, there are also distinct subtype-specific interactions. Functional enrichment of mRNAs that participate in these interactions highlights distinct biological processes for different subtypes. Interestingly, some of the ceRNAs also reside in close proximity in the genome; for example, those involving HOX genes, HOTAIR, miR-196a-1 and miR-196a-2. We also discover subtype-specific sponge interactions with high prognostic potential. We found that patients differ significantly in their survival distributions if they are group based on the expression patterns of specific ceRNA interactions. However, it is not the case if the expression of individual RNAs participating in ceRNA is used. CONCLUSION: These results can help shed light on subtype-specific mechanisms of breast cancer, and the methodology developed herein can help uncover sponges in other diseases.


Assuntos
Neoplasias da Mama/genética , Carcinoma Basocelular/genética , Redes Reguladoras de Genes , MicroRNAs/genética , RNA Longo não Codificante/genética , RNA Mensageiro/genética , Receptor ErbB-2/metabolismo , Neoplasias da Mama/classificação , Neoplasias da Mama/metabolismo , Neoplasias da Mama/patologia , Carcinoma Basocelular/classificação , Carcinoma Basocelular/metabolismo , Carcinoma Basocelular/patologia , Biologia Computacional , Feminino , Regulação Neoplásica da Expressão Gênica , Humanos , Prognóstico , Taxa de Sobrevida
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...