Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 146
Filter
1.
Article in English | MEDLINE | ID: mdl-38594933

ABSTRACT

Deciphering the regulatory code of gene expression and interpreting the transcriptional effects of genome variation are critical challenges in human genetics. Modern experimental technologies have resulted in an abundance of data, enabling the development of sequence-based deep learning models that link patterns embedded in DNA to the biochemical and regulatory properties contributing to transcriptional regulation, including modeling epigenetic marks, 3D genome organization, and gene expression, with tissue and cell-type specificity. Such methods can predict the functional consequences of any noncoding variant in the human genome, even rare or never-before-observed variants, and systematically characterize their consequences beyond what is tractable from experiments or quantitative genetics studies alone. Recently, the development and application of interpretability approaches have led to the identification of key sequence patterns contributing to the predicted tasks, providing insights into the underlying biological mechanisms learned and revealing opportunities for improvement in future models.

2.
Nat Methods ; 21(3): 488-500, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38361019

ABSTRACT

Protein-protein interactions (PPIs) drive cellular processes and responses to environmental cues, reflecting the cellular state. Here we develop Tapioca, an ensemble machine learning framework for studying global PPIs in dynamic contexts. Tapioca predicts de novo interactions by integrating mass spectrometry interactome data from thermal/ion denaturation or cofractionation workflows with protein properties and tissue-specific functional networks. Focusing on the thermal proximity coaggregation method, we improved the experimental workflow. Finely tuned thermal denaturation afforded increased throughput, while cell lysis optimization enhanced protein detection from different subcellular compartments. The Tapioca workflow was next leveraged to investigate viral infection dynamics. Temporal PPIs were characterized during the reactivation from latency of the oncogenic Kaposi's sarcoma-associated herpesvirus. Together with functional assays, NUCKS was identified as a proviral hub protein, and a broader role was uncovered by integrating PPI networks from alpha- and betaherpesvirus infections. Altogether, Tapioca provides a web-accessible platform for predicting PPIs in dynamic contexts.


Subject(s)
Herpesvirus 8, Human , Manihot , Sarcoma, Kaposi , Sarcoma, Kaposi/metabolism , Viral Proteins/metabolism , Manihot/metabolism , Virus Latency , Herpesvirus 8, Human/metabolism
3.
Nucleic Acids Res ; 52(2): 572-582, 2024 Jan 25.
Article in English | MEDLINE | ID: mdl-38084892

ABSTRACT

Single same cell RNAseq/ATACseq multiome data provide unparalleled potential to develop high resolution maps of the cell-type specific transcriptional regulatory circuitry underlying gene expression. We present CREMA, a framework that recovers the full cis-regulatory circuitry by modeling gene expression and chromatin activity in individual cells without peak-calling or cell type labeling constraints. We demonstrate that CREMA overcomes the limitations of existing methods that fail to identify about half of functional regulatory elements which are outside the called chromatin 'peaks'. These circuit sites outside called peaks are shown to be important cell type specific functional regulatory loci, sufficient to distinguish individual cell types. Analysis of mouse pituitary data identifies a Gata2-circuit for the gonadotrope-enriched disease-associated Pcsk1 gene, which is experimentally validated by reduced gonadotrope expression in a gonadotrope conditional Gata2-knockout model. We present a web accessible human immune cell regulatory circuit resource, and provide CREMA as an R package.


Subject(s)
Gonadotrophs , Pituitary Gland , Mice , Humans , Animals , Pituitary Gland/metabolism , Gonadotrophs/metabolism , Chromatin/genetics , Chromatin/metabolism , Regulatory Sequences, Nucleic Acid
4.
bioRxiv ; 2023 Nov 04.
Article in English | MEDLINE | ID: mdl-37961197

ABSTRACT

To facilitate single cell multi-omics analysis and improve reproducibility, we present SPEEDI (Single-cell Pipeline for End to End Data Integration), a fully automated end-to-end framework for batch inference, data integration, and cell type labeling. SPEEDI introduces data-driven batch inference and transforms the often heterogeneous data matrices obtained from different samples into a uniformly annotated and integrated dataset. Without requiring user input, it automatically selects parameters and executes pre-processing, sample integration, and cell type mapping. It can also perform downstream analyses of differential signals between treatment conditions and gene functional modules. SPEEDI's data-driven batch inference method works with widely used integration and cell-typing tools. By developing data-driven batch inference, providing full end-to-end automation, and eliminating parameter selection, SPEEDI improves reproducibility and lowers the barrier to obtaining biological insight from these valuable single-cell datasets. The SPEEDI interactive web application can be accessed at https://speedi.princeton.edu/.

5.
Nat Comput Sci ; 3(7): 644-657, 2023 Jul.
Article in English | MEDLINE | ID: mdl-37974651

ABSTRACT

Resolving chromatin-remodeling-linked gene expression changes at cell-type resolution is important for understanding disease states. Here we describe MAGICAL (Multiome Accessibility Gene Integration Calling and Looping), a hierarchical Bayesian approach that leverages paired single-cell RNA sequencing and single-cell transposase-accessible chromatin sequencing from different conditions to map disease-associated transcription factors, chromatin sites, and genes as regulatory circuits. By simultaneously modeling signal variation across cells and conditions in both omics data types, MAGICAL achieved high accuracy on circuit inference. We applied MAGICAL to study Staphylococcus aureus sepsis from peripheral blood mononuclear single-cell data that we generated from subjects with bloodstream infection and uninfected controls. MAGICAL identified sepsis-associated regulatory circuits predominantly in CD14 monocytes, known to be activated by bacterial sepsis. We addressed the challenging problem of distinguishing host regulatory circuit responses to methicillin-resistant and methicillin-susceptible S. aureus infections. Although differential expression analysis failed to show predictive value, MAGICAL identified epigenetic circuit biomarkers that distinguished methicillin-resistant from methicillin-susceptible S. aureus infections.

6.
bioRxiv ; 2023 Oct 09.
Article in English | MEDLINE | ID: mdl-37808658

ABSTRACT

Endurance exercise is an important health modifier. We studied cell-type specific adaptations of human skeletal muscle to acute endurance exercise using single-nucleus (sn) multiome sequencing in human vastus lateralis samples collected before and 3.5 hours after 40 min exercise at 70% VO2max in four subjects, as well as in matched time of day samples from two supine resting circadian controls. High quality same-cell RNA-seq and ATAC-seq data were obtained from 37,154 nuclei comprising 14 cell types. Among muscle fiber types, both shared and fiber-type specific regulatory programs were identified. Single-cell circuit analysis identified distinct adaptations in fast, slow and intermediate fibers as well as LUM-expressing FAP cells, involving a total of 328 transcription factors (TFs) acting at altered accessibility sites regulating 2,025 genes. These data and circuit mapping provide single-cell insight into the processes underlying tissue and metabolic remodeling responses to exercise.

7.
Cell Rep Methods ; 3(9): 100580, 2023 09 25.
Article in English | MEDLINE | ID: mdl-37703883

ABSTRACT

Human biology is rooted in highly specialized cell types programmed by a common genome, 98% of which is outside of genes. Genetic variation in the enormous noncoding space is linked to the majority of disease risk. To address the problem of linking these variants to expression changes in primary human cells, we introduce ExPectoSC, an atlas of modular deep-learning-based models for predicting cell-type-specific gene expression directly from sequence. We provide models for 105 primary human cell types covering 7 organ systems, demonstrate their accuracy, and then apply them to prioritize relevant cell types for complex human diseases. The resulting atlas of sequence-based gene expression and variant effects is publicly available in a user-friendly interface and readily extensible to any primary cell types. We demonstrate the accuracy of our approach through systematic evaluations and apply the models to prioritize ClinVar clinical variants of uncertain significance, verifying our top predictions experimentally.


Subject(s)
Ascomycota , Humans , Gene Expression/genetics
8.
Cell Rep Methods ; 3(2): 100395, 2023 02 27.
Article in English | MEDLINE | ID: mdl-36936082

ABSTRACT

Assays detecting blood transcriptome changes are studied for infectious disease diagnosis. Blood-based RNA alternative splicing (AS) events, which have not been well characterized in pathogen infection, have potential normalization and assay platform stability advantages over gene expression for diagnosis. Here, we present a computational framework for developing AS diagnostic biomarkers. Leveraging a large prospective cohort of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection and whole-blood RNA sequencing (RNA-seq) data, we identify a major functional AS program switch upon viral infection. Using an independent cohort, we demonstrate the improved accuracy of AS biomarkers for SARS-CoV-2 diagnosis compared with six reported transcriptome signatures. We then optimize a subset of AS-based biomarkers to develop microfluidic PCR diagnostic assays. This assay achieves nearly perfect test accuracy (61/62 = 98.4%) using a naive principal component classifier, significantly more accurate than a gene expression PCR assay in the same cohort. Therefore, our RNA splicing computational framework enables a promising avenue for host-response diagnosis of infection.


Subject(s)
COVID-19 , Communicable Diseases , Humans , SARS-CoV-2/genetics , COVID-19/diagnosis , Alternative Splicing/genetics , COVID-19 Testing , RNA , Prospective Studies , Biomarkers/analysis
9.
Mol Syst Biol ; 19(5): e11361, 2023 05 09.
Article in English | MEDLINE | ID: mdl-36919946

ABSTRACT

DNA methylation comprises a cumulative record of lifetime exposures superimposed on genetically determined markers. Little is known about methylation dynamics in humans following an acute perturbation, such as infection. We characterized the temporal trajectory of blood epigenetic remodeling in 133 participants in a prospective study of young adults before, during, and after asymptomatic and mildly symptomatic SARS-CoV-2 infection. The differential methylation caused by asymptomatic or mildly symptomatic infections was indistinguishable. While differential gene expression largely returned to baseline levels after the virus became undetectable, some differentially methylated sites persisted for months of follow-up, with a pattern resembling autoimmune or inflammatory disease. We leveraged these responses to construct methylation-based machine learning models that distinguished samples from pre-, during-, and postinfection time periods, and quantitatively predicted the time since infection. The clinical trajectory in the young adults and in a diverse cohort with more severe outcomes was predicted by the similarity of methylation before or early after SARS-CoV-2 infection to the model-defined postinfection state. Unlike the phenomenon of trained immunity, the postacute SARS-CoV-2 epigenetic landscape we identify is antiprotective.


Subject(s)
COVID-19 , Young Adult , Humans , COVID-19/genetics , SARS-CoV-2/genetics , Prospective Studies , DNA Methylation/genetics , Protein Processing, Post-Translational
11.
Cell Syst ; 13(12): 989-1001.e8, 2022 12 21.
Article in English | MEDLINE | ID: mdl-36549275

ABSTRACT

The identification of a COVID-19 host response signature in blood can increase the understanding of SARS-CoV-2 pathogenesis and improve diagnostic tools. Applying a multi-objective optimization framework to both massive public and new multi-omics data, we identified a COVID-19 signature regulated at both transcriptional and epigenetic levels. We validated the signature's robustness in multiple independent COVID-19 cohorts. Using public data from 8,630 subjects and 53 conditions, we demonstrated no cross-reactivity with other viral and bacterial infections, COVID-19 comorbidities, or confounders. In contrast, previously reported COVID-19 signatures were associated with significant cross-reactivity. The signature's interpretation, based on cell-type deconvolution and single-cell data analysis, revealed prominent yet complementary roles for plasmablasts and memory T cells. Although the signal from plasmablasts mediated COVID-19 detection, the signal from memory T cells controlled against cross-reactivity with other viral infections. This framework identified a robust, interpretable COVID-19 signature and is broadly applicable in other disease contexts. A record of this paper's transparent peer review process is included in the supplemental information.


Subject(s)
COVID-19 , Virus Diseases , Humans , SARS-CoV-2
12.
Cell Syst ; 13(11): 924-931.e4, 2022 11 16.
Article in English | MEDLINE | ID: mdl-36323307

ABSTRACT

Male sex is a major risk factor for SARS-CoV-2 infection severity. To understand the basis for this sex difference, we studied SARS-CoV-2 infection in a young adult cohort of United States Marine recruits. Among 2,641 male and 244 female unvaccinated and seronegative recruits studied longitudinally, SARS-CoV-2 infections occurred in 1,033 males and 137 females. We identified sex differences in symptoms, viral load, blood transcriptome, RNA splicing, and proteomic signatures. Females had higher pre-infection expression of antiviral interferon-stimulated gene (ISG) programs. Causal mediation analysis implicated ISG differences in number of symptoms, levels of ISGs, and differential splicing of CD45 lymphocyte phosphatase during infection. Our results indicate that the antiviral innate immunity set point causally contributes to sex differences in response to SARS-CoV-2 infection. A record of this paper's transparent peer review process is included in the supplemental information.


Subject(s)
COVID-19 , Immunity, Innate , Sex Characteristics , Female , Humans , Male , Young Adult , COVID-19/immunology , Interferons , Proteomics , SARS-CoV-2
13.
Epidemiology ; 33(6): 797-807, 2022 11 01.
Article in English | MEDLINE | ID: mdl-35944149

ABSTRACT

BACKGROUND: Marine recruits training at Parris Island experienced an unexpectedly high rate of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection, despite preventive measures including a supervised, 2-week, pre-entry quarantine. We characterize SARS-CoV-2 transmission in this cohort. METHODS: Between May and November 2020, we monitored 2,469 unvaccinated, mostly male, Marine recruits prospectively during basic training. If participants tested negative for SARS-CoV-2 by quantitative polymerase chain reaction (qPCR) at the end of quarantine, they were transferred to the training site in segregated companies and underwent biweekly testing for 6 weeks. We assessed the effects of coronavirus disease 2019 (COVID-19) prevention measures on other respiratory infections with passive surveillance data, performed phylogenetic analysis, and modeled transmission dynamics and testing regimens. RESULTS: Preventive measures were associated with drastically lower rates of other respiratory illnesses. However, among the trainees, 1,107 (44.8%) tested SARS-CoV-2-positive, with either mild or no symptoms. Phylogenetic analysis of viral genomes from 580 participants revealed that all cases but one were linked to five independent introductions, each characterized by accumulation of mutations across and within companies, and similar viral isolates in individuals from the same company. Variation in company transmission rates (mean reproduction number R 0 ; 5.5 [95% confidence interval [CI], 5.0, 6.1]) could be accounted for by multiple initial cases within a company and superspreader events. Simulations indicate that frequent rapid-report testing with case isolation may minimize outbreaks. CONCLUSIONS: Transmission of wild-type SARS-CoV-2 among Marine recruits was approximately twice that seen in the community. Insights from SARS-CoV-2 outbreak dynamics and mutations spread in a remote, congregate setting may inform effective mitigation strategies.


Subject(s)
COVID-19 , Disease Outbreaks , Military Personnel , COVID-19/epidemiology , COVID-19/prevention & control , Disease Outbreaks/prevention & control , Female , Humans , Male , Military Personnel/statistics & numerical data , Phylogeny , SARS-CoV-2/genetics , SARS-CoV-2/isolation & purification , United States/epidemiology
14.
Nat Genet ; 54(7): 940-949, 2022 07.
Article in English | MEDLINE | ID: mdl-35817977

ABSTRACT

Epigenomic profiling has enabled large-scale identification of regulatory elements, yet we still lack a systematic mapping from any sequence or variant to regulatory activities. We address this challenge with Sei, a framework for integrating human genetics data with sequence information to discover the regulatory basis of traits and diseases. Sei learns a vocabulary of regulatory activities, called sequence classes, using a deep learning model that predicts 21,907 chromatin profiles across >1,300 cell lines and tissues. Sequence classes provide a global classification and quantification of sequence and variant effects based on diverse regulatory activities, such as cell type-specific enhancer functions. These predictions are supported by tissue-specific expression, expression quantitative trait loci and evolutionary constraint data. Furthermore, sequence classes enable characterization of the tissue-specific, regulatory architecture of complex traits and generate mechanistic hypotheses for individual regulatory pathogenic mutations. We provide Sei as a resource to elucidate the regulatory basis of human health and disease.


Subject(s)
Quantitative Trait Loci , Regulatory Sequences, Nucleic Acid , Chromatin/genetics , Epigenomics , Human Genetics , Humans , Quantitative Trait Loci/genetics , Regulatory Sequences, Nucleic Acid/genetics
15.
Nucleic Acids Res ; 50(14): 8168-8192, 2022 08 12.
Article in English | MEDLINE | ID: mdl-35871289

ABSTRACT

Nucleocapsid protein (N-protein) is required for multiple steps in betacoronaviruses replication. SARS-CoV-2-N-protein condenses with specific viral RNAs at particular temperatures making it a powerful model for deciphering RNA sequence specificity in condensates. We identify two separate and distinct double-stranded, RNA motifs (dsRNA stickers) that promote N-protein condensation. These dsRNA stickers are separately recognized by N-protein's two RNA binding domains (RBDs). RBD1 prefers structured RNA with sequences like the transcription-regulatory sequence (TRS). RBD2 prefers long stretches of dsRNA, independent of sequence. Thus, the two N-protein RBDs interact with distinct dsRNA stickers, and these interactions impart specific droplet physical properties that could support varied viral functions. Specifically, we find that addition of dsRNA lowers the condensation temperature dependent on RBD2 interactions and tunes translational repression. In contrast RBD1 sites are sequences critical for sub-genomic (sg) RNA generation and promote gRNA compression. The density of RBD1 binding motifs in proximity to TRS-L/B sequences is associated with levels of sub-genomic RNA generation. The switch to packaging is likely mediated by RBD1 interactions which generate particles that recapitulate the packaging unit of the virion. Thus, SARS-CoV-2 can achieve biochemical complexity, performing multiple functions in the same cytoplasm, with minimal protein components based on utilizing multiple distinct RNA motifs that control N-protein interactions.


Subject(s)
Coronavirus Nucleocapsid Proteins , RNA, Double-Stranded , SARS-CoV-2 , Binding Sites , Coronavirus Nucleocapsid Proteins/chemistry , Phosphoproteins/chemistry , RNA, Double-Stranded/genetics , RNA, Viral/genetics , RNA-Binding Proteins/metabolism , SARS-CoV-2/genetics , Temperature
16.
Sci Adv ; 8(23): eabn4965, 2022 06 10.
Article in English | MEDLINE | ID: mdl-35675394

ABSTRACT

Kidney Precision Medicine Project (KPMP) is building a spatially specified human kidney tissue atlas in health and disease with single-cell resolution. Here, we describe the construction of an integrated reference map of cells, pathways, and genes using unaffected regions of nephrectomy tissues and undiseased human biopsies from 56 adult subjects. We use single-cell/nucleus transcriptomics, subsegmental laser microdissection transcriptomics and proteomics, near-single-cell proteomics, 3D and CODEX imaging, and spatial metabolomics to hierarchically identify genes, pathways, and cells. Integrated data from these different technologies coherently identify cell types/subtypes within different nephron segments and the interstitium. These profiles describe cell-level functional organization of the kidney following its physiological functions and link cell subtypes to genes, proteins, metabolites, and pathways. They further show that messenger RNA levels along the nephron are congruent with the subsegmental physiological activity. This reference atlas provides a framework for the classification of kidney disease when multiple molecular mechanisms underlie convergent clinical phenotypes.


Subject(s)
Kidney Diseases , Kidney , Humans , Kidney/pathology , Kidney Diseases/metabolism , Metabolomics/methods , Proteomics/methods , Transcriptome
17.
Front Immunol ; 13: 821730, 2022.
Article in English | MEDLINE | ID: mdl-35479098

ABSTRACT

Young adults infected with SARS-CoV-2 are frequently asymptomatic or develop only mild disease. Because capturing representative mild and asymptomatic cases require active surveillance, they are less characterized than moderate or severe cases of COVID-19. However, a better understanding of SARS-CoV-2 asymptomatic infections might shed light into the immune mechanisms associated with the control of symptoms and protection. To this aim, we have determined the temporal dynamics of the humoral immune response, as well as the serum inflammatory profile, of mild and asymptomatic SARS-CoV-2 infections in a cohort of 172 initially seronegative prospectively studied United States Marine recruits, 149 of whom were subsequently found to be SARS-CoV-2 infected. The participants had blood samples taken, symptoms surveyed and PCR tests for SARS-CoV-2 performed periodically for up to 105 days. We found similar dynamics in the profiles of viral load and in the generation of specific antibody responses in asymptomatic and mild symptomatic participants. A proteomic analysis using an inflammatory panel including 92 analytes revealed a pattern of three temporal waves of inflammatory and immunoregulatory mediators, and a return to baseline for most of the inflammatory markers by 35 days post-infection. We found that 23 analytes were significantly higher in those participants that reported symptoms at the time of the first positive SARS-CoV-2 PCR compared with asymptomatic participants, including mostly chemokines and cytokines associated with inflammatory response or immune activation (i.e., TNF-α, TNF-ß, CXCL10, IL-8). Notably, we detected 7 analytes (IL-17C, MMP-10, FGF-19, FGF-21, FGF-23, CXCL5 and CCL23) that were higher in asymptomatic participants than in participants with symptoms; these are known to be involved in tissue repair and may be related to the control of symptoms. Overall, we found a serum proteomic signature that differentiates asymptomatic and mild symptomatic infections in young adults, including potential targets for developing new therapies and prognostic tests.


Subject(s)
COVID-19 , Fibroblast Growth Factors , Humans , Interleukin-17 , Matrix Metalloproteinase 10 , Proteomics , SARS-CoV-2
18.
Cell Rep ; 38(10): 110467, 2022 03 08.
Article in English | MEDLINE | ID: mdl-35263594

ABSTRACT

Despite their importance in tissue homeostasis and renewal, human pituitary stem cells (PSCs) are incompletely characterized. We describe a human single nucleus RNA-seq and ATAC-seq resource from pediatric, adult, and aged postmortem pituitaries (snpituitaryatlas.princeton.edu) and characterize cell-type-specific gene expression and chromatin accessibility programs for all major pituitary cell lineages. We identify uncommitted PSCs, committing progenitor cells, and sex differences. Pseudotime trajectory analysis indicates that early-life PSCs are distinct from the other age groups. Linear modeling of same-cell multiome data identifies regulatory domain accessibility sites and transcription factors that are significantly associated with gene expression in PSCs compared with other cell types and within PSCs. We identify distinct deterministic mechanisms that contribute to heterogeneous marker expression within PSCs. These findings characterize human stem cell lineages and reveal diverse mechanisms regulating key PSC genes and cell type identity.


Subject(s)
Chromatin , Transcriptome , Aged , Child , Chromatin Immunoprecipitation Sequencing , Female , Humans , Male , Stem Cells/metabolism , Transcription Factors/metabolism , Transcriptome/genetics
20.
Nat Methods ; 18(11): 1317-1321, 2021 11.
Article in English | MEDLINE | ID: mdl-34725480

ABSTRACT

The scaling of single-cell data exploratory analysis with the rapidly growing diversity and quantity of single-cell omics datasets demands more interpretable and robust data representation that is generalizable across datasets. Here, we have developed a 'linearly interpretable' framework that combines the interpretability and transferability of linear methods with the representational power of non-linear methods. Within this framework we introduce a data representation and visualization method, GraphDR, and a structure discovery method, StructDR, that unifies cluster, trajectory and surface estimation and enables their confidence set inference.


Subject(s)
Algorithms , Computational Biology/methods , Computer Graphics/statistics & numerical data , Datasets as Topic , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods , Software , Animals , Humans , RNA-Seq
SELECTION OF CITATIONS
SEARCH DETAIL
...