ABSTRACT
Copy number aberrations (CNAs) are known to strongly affect oncogenes and tumour suppressor genes. Given the critical role CNAs play in cancer research, it is essential to accurately identify CNAs from tumour genomes. One particular challenge in finding CNAs is the effect of confounding variables. To address this issue, we assessed how commonly used CNA identification algorithms perform on SNP 6.0 genotyping data in the presence of confounding variables. We simulated realistic synthetic data with varying levels of three confounding variables-the tumour purity, the length of a copy number region and the CNA burden (the percentage of CNAs present in a profiled genome)-and evaluated the performance of OncoSNP, ASCAT, GenoCNA, GISTIC and CGHcall. Furthermore, we implemented and assessed CGHcall*, an adjusted version of CGHcall accounting for high CNA burden. Our analysis on synthetic data indicates that tumour purity and the CNA burden strongly influence the performance of all the algorithms. No algorithm can correctly find lost and gained genomic regions across all tumour purities. The length of CNA regions influenced the performance of ASCAT, CGHcall and GISTIC. OncoSNP, GenoCNA and CGHcall* showed little sensitivity. Overall, CGHcall* and OncoSNP showed reasonable performance, particularly in samples with high tumour purity. Our analysis on the HapMap data revealed a good overlap between CGHcall, CGHcall* and GenoCNA results and experimentally validated data. Our exploratory analysis on the TCGA HNSCC data revealed plausible results of CGHcall, CGHcall* and GISTIC in consensus HNSCC CNA regions. Code is available at https://github.com/adspit/PASCAL.
ABSTRACT
Genome-wide association studies (GWAS) identify genetic variants associated with traits or diseases. GWAS never directly link variants to regulatory mechanisms. Instead, the functional annotation of variants is typically inferred by post hoc analyses. A specific class of deep learning-based methods allows for the prediction of regulatory effects per variant on several cell type-specific chromatin features. We here describe "DeepWAS", a new approach that integrates these regulatory effect predictions of single variants into a multivariate GWAS setting. Thereby, single variants associated with a trait or disease are directly coupled to their impact on a chromatin feature in a cell type. Up to 61 regulatory SNPs, called dSNPs, were associated with multiple sclerosis (MS, 4,888 cases and 10,395 controls), major depressive disorder (MDD, 1,475 cases and 2,144 controls), and height (5,974 individuals). These variants were mainly non-coding and reached at least nominal significance in classical GWAS. The prediction accuracy was higher for DeepWAS than for classical GWAS models for 91% of the genome-wide significant, MS-specific dSNPs. DSNPs were enriched in public or cohort-matched expression and methylation quantitative trait loci and we demonstrated the potential of DeepWAS to generate testable functional hypotheses based on genotype data alone. DeepWAS is available at https://github.com/cellmapslab/DeepWAS.
Subject(s)
Deep Learning , Genetic Association Studies , Multivariate Analysis , Genome-Wide Association Study , Humans , Polymorphism, Single Nucleotide , Quantitative Trait LociABSTRACT
Summary: Modelling biological associations or dependencies using linear regression is often complicated when the analyzed data-sets are high-dimensional and less observations than variables are available (n ⪠p). For genomic data-sets penalized regression methods have been applied settling this issue. Recently proposed regression models utilize prior knowledge on dependencies, e.g. in the form of graphs, arguing that this information will lead to more reliable estimates for regression coefficients. However, none of the proposed models for multivariate genomic response variables have been implemented as a computationally efficient, freely available library. In this paper we propose netReg, a package for graph-penalized regression models that use large networks and thousands of variables. netReg incorporates a priori generated biological graph information into linear models yielding sparse or smooth solutions for regression coefficients. Availability and implementation: netReg is implemented as both R-package and C ++ commandline tool. The main computations are done in C ++, where we use Armadillo for fast matrix calculations and Dlib for optimization. The R package is freely available on Bioconductorhttps://bioconductor.org/packages/netReg. The command line tool can be installed using the conda channel Bioconda. Installation details, issue reports, development versions, documentation and tutorials for the R and C ++ versions and the R package vignette can be found on GitHub https://dirmeier.github.io/netReg/. The GitHub page also contains code for benchmarking and example datasets used in this paper. Contact: simon.dirmeier@bsse.ethz.ch.
Subject(s)
Computational Biology/methods , Models, Biological , Regression Analysis , Software , Gene Expression Profiling , Protein Interaction Maps , Yeasts/metabolismABSTRACT
BACKGROUND: A standardized human model to study early pathogenic events in patients with psoriasis is missing. Activation of Toll-like receptor 7/8 by means of topical application of imiquimod is the most commonly used mouse model of psoriasis. OBJECTIVE: We sought to investigate the potential of a human imiquimod patch test model to resemble human psoriasis. METHODS: Imiquimod (Aldara 5% cream; 3M Pharmaceuticals, St Paul, Minn) was applied twice a week to the backs of volunteers (n = 18), and development of skin lesions was monitored over a period of 4 weeks. Consecutive biopsy specimens were taken for whole-genome expression analysis, histology, and T-cell isolation. Plasmacytoid dendritic cells (pDCs) were isolated from whole blood, stimulated with Toll-like receptor 7 agonist, and analyzed by means of extracellular flux analysis and real-time PCR. RESULTS: We demonstrate that imiquimod induces a monomorphic and self-limited inflammatory response in healthy subjects, as well as patients with psoriasis or eczema. The clinical and histologic phenotype, as well as the transcriptome, of imiquimod-induced inflammation in human skin resembles acute contact dermatitis rather than psoriasis. Nevertheless, the imiquimod model mimics the hallmarks of psoriasis. In contrast to classical contact dermatitis, in which myeloid dendritic cells sense haptens, pDCs are primary sensors of imiquimod. They respond with production of proinflammatory and TH17-skewing cytokines, resulting in a TH17 immune response with IL-23 as a key driver. In a proof-of-concept setting systemic treatment with ustekinumab diminished imiquimod-induced inflammation. CONCLUSION: In human subjects imiquimod induces contact dermatitis with the distinctive feature that pDCs are the primary sensors, leading to an IL-23/TH17 deviation. Despite these shortcomings, the human imiquimod model might be useful to investigate early pathogenic events and prove molecular concepts in patients with psoriasis.
Subject(s)
Dendritic Cells/metabolism , Dermatitis, Contact/metabolism , Imiquimod/adverse effects , Models, Biological , Psoriasis/metabolism , Th17 Cells/metabolism , Toll-Like Receptor 7/agonists , Administration, Cutaneous , Adult , Aged , Biomarkers/metabolism , Case-Control Studies , Dermatitis, Contact/pathology , Female , Flow Cytometry , Humans , Imiquimod/administration & dosage , Immunohistochemistry , Male , Middle Aged , Psoriasis/pathology , Real-Time Polymerase Chain Reaction , Toll-Like Receptor 8/agonistsABSTRACT
OBJECTIVE: The initial steps of pancreatic regeneration versus carcinogenesis are insufficiently understood. Although a combination of oncogenic Kras and inflammation has been shown to induce malignancy, molecular networks of early carcinogenesis remain poorly defined. DESIGN: We compared early events during inflammation, regeneration and carcinogenesis on histological and transcriptional levels with a high temporal resolution using a well-established mouse model of pancreatitis and of inflammation-accelerated KrasG12D-driven pancreatic ductal adenocarcinoma. Quantitative expression data were analysed and extensively modelled in silico. RESULTS: We defined three distinctive phases-termed inflammation, regeneration and refinement-following induction of moderate acute pancreatitis in wild-type mice. These corresponded to different waves of proliferation of mesenchymal, progenitor-like and acinar cells. Pancreas regeneration required a coordinated transition of proliferation between progenitor-like and acinar cells. In mice harbouring an oncogenic Kras mutation and challenged with pancreatitis, there was an extended inflammatory phase and a parallel, continuous proliferation of mesenchymal, progenitor-like and acinar cells. Analysis of high-resolution transcriptional data from wild-type animals revealed that organ regeneration relied on a complex interaction of a gene network that normally governs acinar cell homeostasis, exocrine specification and intercellular signalling. In mice with oncogenic Kras, a specific carcinogenic signature was found, which was preserved in full-blown mouse pancreas cancer. CONCLUSIONS: These data define a transcriptional signature of early pancreatic carcinogenesis and a molecular network driving formation of preneoplastic lesions, which allows for more targeted biomarker development in order to detect cancer earlier in patients with pancreatitis.
Subject(s)
Carcinogenesis/genetics , Carcinoma, Pancreatic Ductal/genetics , Pancreatic Neoplasms/genetics , Acinar Cells/pathology , Acute Disease , Animals , Carcinogenesis/pathology , Carcinoma, Pancreatic Ductal/pathology , Cell Proliferation/genetics , Disease Models, Animal , Disease Progression , Gene Expression Profiling/methods , Gene Expression Regulation, Neoplastic , Gene Regulatory Networks , Humans , Mesenchymal Stem Cells/pathology , Mice, Transgenic , Pancreas/physiology , Pancreatic Neoplasms/pathology , Pancreatitis/genetics , Pancreatitis/pathology , Precancerous Conditions/genetics , Precancerous Conditions/pathology , Proto-Oncogene Proteins p21(ras)/genetics , Regeneration/geneticsABSTRACT
In many human diseases, associated genetic changes tend to occur within noncoding regions, whose effect might be related to transcriptional control. A central goal in human genetics is to understand the function of such noncoding regions: given a region that is statistically associated with changes in gene expression (expression quantitative trait locus [eQTL]), does it in fact play a regulatory role? And if so, how is this role "coded" in its sequence? These questions were the subject of the Critical Assessment of Genome Interpretation eQTL challenge. Participants were given a set of sequences that flank eQTLs in humans and were asked to predict whether these are capable of regulating transcription (as evaluated by massively parallel reporter assays), and whether this capability changes between alternative alleles. Here, we report lessons learned from this community effort. By inspecting predictive properties in isolation, and conducting meta-analysis over the competing methods, we find that using chromatin accessibility and transcription factor binding as features in an ensemble of classifiers or regression models leads to the most accurate results. We then characterize the loci that are harder to predict, putting the spotlight on areas of weakness, which we expect to be the subject of future studies.
Subject(s)
Computational Biology/methods , Gene Expression , Gene Expression Regulation , Genetic Predisposition to Disease , Humans , Quantitative Trait LociABSTRACT
SUMMARY: Decreasing costs of modern high-throughput experiments allow for the simultaneous analysis of altered gene activity on various molecular levels. However, these multi-omics approaches lead to a large amount of data, which is hard to interpret for a non-bioinformatician. Here, we present the remotely accessible multilevel ontology analysis (RAMONA). It offers an easy-to-use interface for the simultaneous gene set analysis of combined omics datasets and is an extension of the previously introduced MONA approach. RAMONA is based on a Bayesian enrichment method for the inference of overrepresented biological processes among given gene sets. Overrepresentation is quantified by interpretable term probabilities. It is able to handle data from various molecular levels, while in parallel coping with redundancies arising from gene set overlaps and related multiple testing problems. The comprehensive output of RAMONA is easy to interpret and thus allows for functional insight into the affected biological processes. With RAMONA, we provide an efficient implementation of the Bayesian inference problem such that ontologies consisting of thousands of terms can be processed in the order of seconds. AVAILABILITY AND IMPLEMENTATION: RAMONA is implemented as ASP.NET Web application and publicly available at http://icb.helmholtz-muenchen.de/ramona.
Subject(s)
Genes/genetics , MicroRNAs/genetics , Proteins/metabolism , Software , Bayes Theorem , DNA Methylation , Gene Expression Profiling , HumansABSTRACT
MOTIVATION: High-dimensional single-cell snapshot data are becoming widespread in the systems biology community, as a mean to understand biological processes at the cellular level. However, as temporal information is lost with such data, mathematical models have been limited to capture only static features of the underlying cellular mechanisms. RESULTS: Here, we present a modular framework which allows to recover the temporal behaviour from single-cell snapshot data and reverse engineer the dynamics of gene expression. The framework combines a dimensionality reduction method with a cell time-ordering algorithm to generate pseudo time-series observations. These are in turn used to learn transcriptional ODE models and do model selection on structural network features. We apply it on synthetic data and then on real hematopoietic stem cells data, to reconstruct gene expression dynamics during differentiation pathways and infer the structure of a key gene regulatory network. AVAILABILITY AND IMPLEMENTATION: C++ and Matlab code available at https://www.helmholtz-muenchen.de/fileadmin/ICB/software/inferenceSnapshot.zip.
Subject(s)
Gene Expression Profiling , Gene Regulatory Networks , Algorithms , Hematopoiesis/genetics , Hematopoietic Stem Cells/metabolism , Kinetics , Models, Genetic , Single-Cell Analysis , Systems Biology/methodsABSTRACT
Novel specific therapies for psoriasis and eczema have been developed, and they mark a new era in the treatment of these complex inflammatory skin diseases. However, within their broad clinical spectrum, psoriasis and eczema phenotypes overlap making an accurate diagnosis impossible in special cases, not to speak about predicting the clinical outcome of an individual patient. Here, we present a novel robust molecular classifier (MC) consisting of NOS2 and CCL27 gene that diagnosed psoriasis and eczema with a sensitivity and specificity of >95% in a cohort of 129 patients suffering from (i) classical forms; (ii) subtypes; and (iii) clinically and histologically indistinct variants of psoriasis and eczema. NOS2 and CCL27 correlated with clinical and histological hallmarks of psoriasis and eczema in a mutually antagonistic way, thus highlighting their biological relevance. In line with this, the MC could be transferred to the level of immunofluorescence stainings for iNOS and CCL27 protein on paraffin-embedded sections, where patients were diagnosed with sensitivity and specificity >88%. Our MC proved superiority over current gold standard methods to distinguish psoriasis and eczema and may therefore build the basis for molecular diagnosis of chronic inflammatory skin diseases required to establish personalized medicine in the field.
Subject(s)
Chemokine CCL27/metabolism , Eczema/diagnosis , Nitric Oxide Synthase Type II/metabolism , Psoriasis/diagnosis , Adult , Aged , Cohort Studies , Eczema/classification , Eczema/metabolism , Female , Fluorescent Antibody Technique , Humans , Male , Middle Aged , Psoriasis/classification , Psoriasis/metabolismABSTRACT
Modern high-throughput methods allow the investigation of biological functions across multiple 'omics' levels. Levels include mRNA and protein expression profiling as well as additional knowledge on, for example, DNA methylation and microRNA regulation. The reason for this interest in multi-omics is that actual cellular responses to different conditions are best explained mechanistically when taking all omics levels into account. To map gene products to their biological functions, public ontologies like Gene Ontology are commonly used. Many methods have been developed to identify terms in an ontology, overrepresented within a set of genes. However, these methods are not able to appropriately deal with any combination of several data types. Here, we propose a new method to analyse integrated data across multiple omics-levels to simultaneously assess their biological meaning. We developed a model-based Bayesian method for inferring interpretable term probabilities in a modular framework. Our Multi-level ONtology Analysis (MONA) algorithm performed significantly better than conventional analyses of individual levels and yields best results even for sophisticated models including mRNA fine-tuning by microRNAs. The MONA framework is flexible enough to allow for different underlying regulatory motifs or ontologies. It is ready-to-use for applied researchers and is available as a standalone application from http://icb.helmholtz-muenchen.de/mona.
Subject(s)
Genes , Models, Genetic , Algorithms , Bayes Theorem , DNA Methylation , Gene Expression Profiling , MicroRNAs/metabolism , Proteins/metabolism , RNA, Messenger/metabolismABSTRACT
MicroRNAs represent ~22 nt long endogenous small RNA molecules that have been experimentally shown to regulate gene expression post-transcriptionally. One main interest in miRNA research is the investigation of their functional roles, which can typically be accomplished by identification of mi-/mRNA interactions and functional annotation of target gene sets. We here present a novel method "miRlastic", which infers miRNA-target interactions using transcriptomic data as well as prior knowledge and performs functional annotation of target genes by exploiting the local structure of the inferred network. For the network inference, we applied linear regression modeling with elastic net regularization on matched microRNA and messenger RNA expression profiling data to perform feature selection on prior knowledge from sequence-based target prediction resources. The novelty of miRlastic inference originates in predicting data-driven intra-transcriptome regulatory relationships through feature selection. With synthetic data, we showed that miRlastic outperformed commonly used methods and was suitable even for low sample sizes. To gain insight into the functional role of miRNAs and to determine joint functional properties of miRNA clusters, we introduced a local enrichment analysis procedure. The principle of this procedure lies in identifying regions of high functional similarity by evaluating the shortest paths between genes in the network. We can finally assign functional roles to the miRNAs by taking their regulatory relationships into account. We thoroughly evaluated miRlastic on a cohort of head and neck cancer (HNSCC) patients provided by The Cancer Genome Atlas. We inferred an mi-/mRNA regulatory network for human papilloma virus (HPV)-associated miRNAs in HNSCC. The resulting network best enriched for experimentally validated miRNA-target interaction, when compared to common methods. Finally, the local enrichment step identified two functional clusters of miRNAs that were predicted to mediate HPV-associated dysregulation in HNSCC. Our novel approach was able to characterize distinct pathway regulations from matched miRNA and mRNA data. An R package of miRlastic was made available through: http://icb.helmholtz-muenchen.de/mirlastic.
Subject(s)
Carcinoma, Squamous Cell/genetics , Gene Expression Regulation, Neoplastic , Gene Regulatory Networks , Head and Neck Neoplasms/genetics , MicroRNAs/metabolism , Cluster Analysis , Humans , MicroRNAs/genetics , RNA, Messenger/genetics , RNA, Messenger/metabolism , Sample Size , Squamous Cell Carcinoma of Head and NeckABSTRACT
Biological membranes encompass and compartmentalize cells and organelles and are a prerequisite to life as we know it. One defining feature of membranes is an astonishing diversity of building blocks. The mechanisms and principles organizing the thousands of proteins and lipids that make up membrane bilayers in cells are still under debate. Many terms and mechanisms have been introduced over the years to account for certain phenomena and aspects of membrane organization and function. Recently, the different viewpoints - focusing on lipids vs. proteins or physical vs. molecular driving forces for membrane organization - are increasingly converging. Here we review the basic properties of biological membranes and the most common theories for lateral segregation of membrane components before discussing an emerging model of a self-organized, multi-domain membrane or 'patchwork membrane'.
Subject(s)
Cell Membrane/metabolism , Membrane Lipids/metabolism , Membrane Proteins/metabolism , Cell Membrane/chemistry , Membrane Lipids/chemistry , Membrane Proteins/chemistry , Models, BiologicalABSTRACT
RNA-binding proteins (RBPs) are critical host factors for viral infection, however, large scale experimental investigation of the binding landscape of human RBPs to viral RNAs is costly and further complicated due to sequence variation between viral strains. To fill this gap, we investigated the role of RBPs in the context of SARS-CoV-2 by constructing the first in silico map of human RBP-viral RNA interactions at nucleotide-resolution using two deep learning methods (pysster and DeepRiPe) trained on data from CLIP-seq experiments on more than 100 human RBPs. We evaluated conservation of RBP binding between six other human pathogenic coronaviruses and identified sites of conserved and differential binding in the UTRs of SARS-CoV-1, SARS-CoV-2 and MERS. We scored the impact of mutations from 11 variants of concern on protein-RNA interaction, identifying a set of gain- and loss-of-binding events, as well as predicted the regulatory impact of putative future mutations. Lastly, we linked RBPs to functional, OMICs and COVID-19 patient data from other studies, and identified MBNL1, FTO and FXR2 RBPs as potential clinical biomarkers. Our results contribute towards a deeper understanding of how viruses hijack host cellular pathways and open new avenues for therapeutic intervention.
ABSTRACT
The human liver has a remarkable capacity to regenerate and thus compensate over decades for fibrosis caused by toxic chemicals, drugs, alcohol, or malnutrition. To date, no protective mechanisms have been identified that help the liver tolerate these repeated injuries. In this study, we revealed dysregulation of lipid metabolism and mild inflammation as protective mechanisms by studying longitudinal multi-omic measurements of liver fibrosis induced by repeated CCl4 injections in mice (n = 45). Based on comprehensive proteomics, transcriptomics, blood- and tissue-level profiling, we uncovered three phases of early disease development-initiation, progression, and tolerance. Using novel multi-omic network analysis, we identified multi-level mechanisms that are significantly dysregulated in the injury-tolerant response. Public data analysis shows that these profiles are altered in human liver diseases, including fibrosis and early cirrhosis stages. Our findings mark the beginning of the tolerance phase as the critical switching point in liver response to repetitive toxic doses. After fostering extracellular matrix accumulation as an acute response, we observe a deposition of tiny lipid droplets in hepatocytes only in the Tolerant phase. Our comprehensive study shows that lipid metabolism and mild inflammation may serve as biomarkers and are putative functional requirements to resist further disease progression.
Subject(s)
Fatty Liver , Reinjuries , Humans , Animals , Mice , Inflammation , Liver Cirrhosis/chemically inducedSubject(s)
Blood Proteins/metabolism , Dermatitis, Atopic/blood , Adolescent , Adult , Biomarkers/blood , Blood Proteins/immunology , Cytokines/blood , Dermatitis, Atopic/immunology , Female , Humans , Least-Squares Analysis , Linear Models , Male , Middle Aged , Models, Biological , Models, Statistical , Severity of Illness Index , Young AdultABSTRACT
COVID-19 is a heterogeneous disease caused by SARS-CoV-2. Aside from infections of the lungs, the disease can spread throughout the body and damage many other tissues, leading to multiorgan failure in severe cases. The highly variable symptom severity is influenced by genetic predispositions and preexisting diseases which have not been investigated in a large-scale multimodal manner. We present a holistic analysis framework, setting previously reported COVID-19 genes in context with prepandemic data, such as gene expression patterns across multiple tissues, polygenetic predispositions, and patient diseases, which are putative comorbidities of COVID-19. First, we generate a multimodal network using the prior-based network inference method KiMONo. We then embed the network to generate a meaningful lower-dimensional representation of the data. The input data are obtained via the Genotype-Tissue Expression project (GTEx), containing expression data from a range of tissues with genomic and phenotypic information of over 900 patients and 50 tissues. The generated network consists of nodes, that is, genes and polygenic risk scores (PRS) for several diseases/phenotypes, as well as for COVID-19 severity and hospitalization, and links between them if they are statistically associated in a regularized linear model by feature selection. Applying network embedding on the generated multimodal network allows us to perform efficient network analysis by identifying nodes close by in a lower-dimensional space that correspond to entities which are statistically linked. By determining the similarity between COVID-19 genes and other nodes through embedding, we identify disease associations to tissues, like the brain and gut. We also find strong associations between COVID-19 genes and various diseases such as ischemic heart disease, cerebrovascular disease, and hypertension. Moreover, we find evidence linking PTPN6 to a range of comorbidities along with the genetic predisposition of COVID-19, suggesting that this kinase is a central player in severe cases of COVID-19. In conclusion, our holistic network inference coupled with network embedding of multimodal data enables the contextualization of COVID-19-associated genes with respect to tissues, disease states, and genetic risk factors. Such contextualization can be exploited to further elucidate the biological importance of known and novel genes for severity of the disease in patients.
ABSTRACT
Breakdown of synthesis, excretion and detoxification defines liver failure. Post-hepatectomy liver failure (PHLF) is specific for liver resection and a rightfully feared complication due to high lethality and limited therapeutic success. Individual cytokine and growth factor profiles may represent potent predictive markers for recovery of liver function. We aimed to investigate these profiles in post-hepatectomy regeneration. This study combined a time-dependent cytokine and growth factor profiling dataset of a training (30 patients) and a validation (14 patients) cohorts undergoing major liver resection with statistical and predictive models identifying individual pathway signatures. 2319 associations were tested. Primary hepatocytes isolated from patient tissue samples were stimulated and their proliferation was analysed through DNA content assay. Common expression trajectories of cytokines and growth factors with strong correlation to PHLF, morbidity and mortality were identified despite highly individual perioperative dynamics. Especially, dynamics of EGF, HGF, and PLGF were associated with mortality. PLGF was additionally associated with PHLF and complications. A global association-network was calculated and validated to investigate interdependence of cytokines and growth factors with clinical attributes. Preoperative cytokine and growth factor signatures were identified allowing prediction of mortality following major liver resection by regression modelling. Proliferation analysis of corresponding primary human hepatocytes showed associations of individual regenerative potential with clinical outcome. Prediction of PHLF was possible on as early as first postoperative day (POD1) with AUC above 0.75. Prediction of PHLF and mortality is possible on POD1 with liquid-biopsy based risk profiling. Further utilization of these models would allow tailoring of interventional strategies according to individual profiles.
Subject(s)
Liver Failure , Liver Neoplasms , Cytokines , Hepatectomy/adverse effects , Humans , Liver Failure/etiology , Liver Function Tests , Liver Neoplasms/surgery , Liver Regeneration , Postoperative Complications , Retrospective StudiesABSTRACT
Biological research and clinical management in psychiatry face two major impediments: the high degree of overlap in psychopathology between diagnoses and the inherent heterogeneity with regard to severity. Here, we aim to stratify cases into homogeneous transdiagnostic subgroups using psychometric information with the ultimate aim of identifying individuals with higher risk for severe illness. 397 participants of the PsyCourse study with schizophrenia- or bipolar-spectrum diagnoses were prospectively phenotyped over 18 months. Factor analysis of mixed data of different rating scales and subsequent longitudinal clustering were used to cluster disease trajectories. Five clusters of longitudinal trajectories were identified in the psychopathologic dimensions. Clusters differed significantly with regard to Global Assessment of Functioning, disease course, and-in some cases-diagnosis while there were no significant differences regarding sex, age at baseline or onset, duration of illness, or polygenic burden for schizophrenia. Longitudinal clustering may aid in identifying transdiagnostic homogeneous subgroups of individuals with severe psychiatric disease.
Subject(s)
Bipolar Disorder , Mental Disorders , Bipolar Disorder/diagnosis , Bipolar Disorder/epidemiology , Bipolar Disorder/psychology , Cluster Analysis , Hospitals , Humans , Mental Disorders/diagnosis , Mental Disorders/epidemiology , PsychopathologyABSTRACT
Single-cell RNA sequencing (scRNA-seq) has enabled researchers to study gene expression at a cellular resolution. However, noise due to amplification and dropout may obstruct analyses, so scalable denoising methods for increasingly large but sparse scRNA-seq data are needed. We propose a deep count autoencoder network (DCA) to denoise scRNA-seq datasets. DCA takes the count distribution, overdispersion and sparsity of the data into account using a negative binomial noise model with or without zero-inflation, and nonlinear gene-gene dependencies are captured. Our method scales linearly with the number of cells and can, therefore, be applied to datasets of millions of cells. We demonstrate that DCA denoising improves a diverse set of typical scRNA-seq data analyses using simulated and real datasets. DCA outperforms existing methods for data imputation in quality and speed, enhancing biological discovery.