Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 26.462
Filter
Add more filters

Coleção CLAP
Publication year range
1.
Cell ; 186(11): 2345-2360.e16, 2023 05 25.
Article in English | MEDLINE | ID: mdl-37167971

ABSTRACT

A functional network of blood vessels is essential for organ growth and homeostasis, yet how the vasculature matures and maintains homeostasis remains elusive in live mice. By longitudinally tracking the same neonatal endothelial cells (ECs) over days to weeks, we found that capillary plexus expansion is driven by vessel regression to optimize network perfusion. Neonatal ECs rearrange positions to evenly distribute throughout the developing plexus and become positionally stable in adulthood. Upon local ablation, adult ECs survive through a plasmalemmal self-repair response, while neonatal ECs are predisposed to die. Furthermore, adult ECs reactivate migration to assist vessel repair. Global ablation reveals coordinated maintenance of the adult vascular architecture that allows for eventual network recovery. Lastly, neonatal remodeling and adult maintenance of the skin vascular plexus are orchestrated by temporally restricted, neonatal VEGFR2 signaling. Our work sheds light on fundamental mechanisms that underlie both vascular maturation and adult homeostasis in vivo.


Subject(s)
Endothelial Cells , Neovascularization, Physiologic , Animals , Mice , Endothelial Cells/physiology , Neovascularization, Physiologic/physiology , Skin , Cell Membrane
2.
Cell ; 173(2): 400-416.e11, 2018 04 05.
Article in English | MEDLINE | ID: mdl-29625055

ABSTRACT

For a decade, The Cancer Genome Atlas (TCGA) program collected clinicopathologic annotation data along with multi-platform molecular profiles of more than 11,000 human tumors across 33 different cancer types. TCGA clinical data contain key features representing the democratized nature of the data collection process. To ensure proper use of this large clinical dataset associated with genomic features, we developed a standardized dataset named the TCGA Pan-Cancer Clinical Data Resource (TCGA-CDR), which includes four major clinical outcome endpoints. In addition to detailing major challenges and statistical limitations encountered during the effort of integrating the acquired clinical data, we present a summary that includes endpoint usage recommendations for each cancer type. These TCGA-CDR findings appear to be consistent with cancer genomics studies independent of the TCGA effort and provide opportunities for investigating cancer biology using clinical correlates at an unprecedented scale.


Subject(s)
Neoplasms/pathology , Databases, Genetic , Genomics , Humans , Kaplan-Meier Estimate , Neoplasms/genetics , Neoplasms/mortality , Proportional Hazards Models
3.
Immunity ; 55(2): 308-323.e9, 2022 02 08.
Article in English | MEDLINE | ID: mdl-34800368

ABSTRACT

Tumor-infiltrating dendritic cells (DCs) assume varied functional states that impact anti-tumor immunity. To delineate the DC states associated with productive anti-tumor T cell immunity, we compared spontaneously regressing and progressing tumors. Tumor-reactive CD8+ T cell responses in Batf3-/- mice lacking type 1 DCs (DC1s) were lost in progressor tumors but preserved in regressor tumors. Transcriptional profiling of intra-tumoral DCs within regressor tumors revealed an activation state of CD11b+ conventional DCs (DC2s) characterized by expression of interferon (IFN)-stimulated genes (ISGs) (ISG+ DCs). ISG+ DC-activated CD8+ T cells ex vivo comparably to DC1. Unlike cross-presenting DC1, ISG+ DCs acquired and presented intact tumor-derived peptide-major histocompatibility complex class I (MHC class I) complexes. Constitutive type I IFN production by regressor tumors drove the ISG+ DC state, and activation of MHC class I-dressed ISG+ DCs by exogenous IFN-ß rescued anti-tumor immunity against progressor tumors in Batf3-/- mice. The ISG+ DC gene signature is detectable in human tumors. Engaging this functional DC state may present an approach for the treatment of human disease.


Subject(s)
CD8-Positive T-Lymphocytes/immunology , Dendritic Cells/immunology , Histocompatibility Antigens Class I/immunology , Interferon Type I/immunology , Lymphocytes, Tumor-Infiltrating/immunology , Animals , Antigens, Neoplasm/immunology , CD11b Antigen/immunology , Cross-Priming , Dendritic Cells/drug effects , Interferon-beta/administration & dosage , Interferon-beta/pharmacology , Mice , Neoplasms/immunology , Receptors, Interferon/immunology , Signal Transduction/immunology , Tumor Microenvironment/immunology
4.
Cell ; 161(7): 1539-1552, 2015 Jun 18.
Article in English | MEDLINE | ID: mdl-26091037

ABSTRACT

The adenomatous polyposis coli (APC) tumor suppressor is mutated in the vast majority of human colorectal cancers (CRC) and leads to deregulated Wnt signaling. To determine whether Apc disruption is required for tumor maintenance, we developed a mouse model of CRC whereby Apc can be conditionally suppressed using a doxycycline-regulated shRNA. Apc suppression produces adenomas in both the small intestine and colon that, in the presence of Kras and p53 mutations, can progress to invasive carcinoma. In established tumors, Apc restoration drives rapid and widespread tumor-cell differentiation and sustained regression without relapse. Tumor regression is accompanied by the re-establishment of normal crypt-villus homeostasis, such that once aberrantly proliferating cells reacquire self-renewal and multi-lineage differentiation capability. Our study reveals that CRC cells can revert to functioning normal cells given appropriate signals and provide compelling in vivo validation of the Wnt pathway as a therapeutic target for treatment of CRC.


Subject(s)
Adenomatous Polyposis Coli Protein/metabolism , Colorectal Neoplasms/genetics , Disease Models, Animal , Intestine, Large/pathology , Intestine, Small/pathology , Adenomatous Polyposis Coli Protein/genetics , Animals , Cell Proliferation , Colorectal Neoplasms/pathology , Doxycycline/administration & dosage , Genes, p53 , Intestinal Polyps/metabolism , Intestinal Polyps/pathology , Intestine, Large/metabolism , Intestine, Small/metabolism , Mice , Mice, Transgenic , Proto-Oncogene Proteins p21(ras)/genetics , RNA Interference , Wnt Signaling Pathway
5.
Mol Cell ; 75(3): 605-619.e6, 2019 08 08.
Article in English | MEDLINE | ID: mdl-31255466

ABSTRACT

Accurate DNA replication is essential to preserve genomic integrity and prevent chromosomal instability-associated diseases including cancer. Key to this process is the cells' ability to stabilize and restart stalled replication forks. Here, we show that the EXD2 nuclease is essential to this process. EXD2 recruitment to stressed forks suppresses their degradation by restraining excessive fork regression. Accordingly, EXD2 deficiency leads to fork collapse, hypersensitivity to replication inhibitors, and genomic instability. Impeding fork regression by inactivation of SMARCAL1 or removal of RECQ1's inhibition in EXD2-/- cells restores efficient fork restart and genome stability. Moreover, purified EXD2 efficiently processes substrates mimicking regressed forks. Thus, this work identifies a mechanism underpinned by EXD2's nuclease activity, by which cells balance fork regression with fork restoration to maintain genome stability. Interestingly, from a clinical perspective, we discover that EXD2's depletion is synthetic lethal with mutations in BRCA1/2, implying a non-redundant role in replication fork protection.


Subject(s)
DNA Helicases/genetics , DNA Replication/genetics , Exodeoxyribonucleases/genetics , RecQ Helicases/genetics , BRCA1 Protein/genetics , BRCA2 Protein/genetics , Genomic Instability/genetics , HeLa Cells , Humans , Neoplasms/genetics , Synthetic Lethal Mutations/genetics
6.
Proc Natl Acad Sci U S A ; 121(10): e2307876121, 2024 Mar 05.
Article in English | MEDLINE | ID: mdl-38422017

ABSTRACT

During real-time language comprehension, our minds rapidly decode complex meanings from sequences of words. The difficulty of doing so is known to be related to words' contextual predictability, but what cognitive processes do these predictability effects reflect? In one view, predictability effects reflect facilitation due to anticipatory processing of words that are predictable from context. This view predicts a linear effect of predictability on processing demand. In another view, predictability effects reflect the costs of probabilistic inference over sentence interpretations. This view predicts either a logarithmic or a superlogarithmic effect of predictability on processing demand, depending on whether it assumes pressures toward a uniform distribution of information over time. The empirical record is currently mixed. Here, we revisit this question at scale: We analyze six reading datasets, estimate next-word probabilities with diverse statistical language models, and model reading times using recent advances in nonlinear regression. Results support a logarithmic effect of word predictability on processing difficulty, which favors probabilistic inference as a key component of human language processing.


Subject(s)
Comprehension , Language , Humans , Models, Statistical
7.
Proc Natl Acad Sci U S A ; 121(33): e2403210121, 2024 Aug 13.
Article in English | MEDLINE | ID: mdl-39110727

ABSTRACT

Polygenic risk scores (PRS) enhance population risk stratification and advance personalized medicine, but existing methods face several limitations, encompassing issues related to computational burden, predictive accuracy, and adaptability to a wide range of genetic architectures. To address these issues, we propose Aggregated L0Learn using Summary-level data (ALL-Sum), a fast and scalable ensemble learning method for computing PRS using summary statistics from genome-wide association studies (GWAS). ALL-Sum leverages a L0L2 penalized regression and ensemble learning across tuning parameters to flexibly model traits with diverse genetic architectures. In extensive large-scale simulations across a wide range of polygenicity and GWAS sample sizes, ALL-Sum consistently outperformed popular alternative methods in terms of prediction accuracy, runtime, and memory usage by 10%, 20-fold, and threefold, respectively, and demonstrated robustness to diverse genetic architectures. We validated the performance of ALL-Sum in real data analysis of 11 complex traits using GWAS summary statistics from nine data sources, including the Global Lipids Genetics Consortium, Breast Cancer Association Consortium, and FinnGen Biobank, with validation in the UK Biobank. Our results show that on average, ALL-Sum obtained PRS with 25% higher accuracy on average, with 15 times faster computation and half the memory than the current state-of-the-art methods, and had robust performance across a wide range of traits and diseases. Furthermore, our method demonstrates stable prediction when using linkage disequilibrium computed from different data sources. ALL-Sum is available as a user-friendly R software package with publicly available reference data for streamlined analysis.


Subject(s)
Genome-Wide Association Study , Multifactorial Inheritance , Humans , Multifactorial Inheritance/genetics , Genome-Wide Association Study/methods , Machine Learning , Genetic Predisposition to Disease , Polymorphism, Single Nucleotide
8.
Hum Mol Genet ; 33(4): 342-354, 2024 Feb 01.
Article in English | MEDLINE | ID: mdl-37944069

ABSTRACT

Peripheral blood mononuclear cells (PBMCs) reflect systemic immune response during cancer progression. However, a comprehensive understanding of the composition and function of PBMCs in cancer patients is lacking, and the potential of these features to assist cancer diagnosis is also unclear. Here, the compositional and status differences between cancer patients and healthy donors in PBMCs were investigated by single-cell RNA sequencing (scRNA-seq), involving 262,025 PBMCs from 68 cancer samples and 14 healthy samples. We observed an enhanced activation and differentiation of most immune subsets in cancer patients, along with reduction of naïve T cells, expansion of macrophages, impairment of NK cells and myeloid cells, as well as tumor promotion and immunosuppression. Based on characteristics including differential cell type abundances and/or hub genes identified from weight gene co-expression network analysis (WGCNA) modules of each major cell type, we applied logistic regression to construct cancer diagnosis models. Furthermore, we found that the above models can distinguish cancer patients and healthy donors with high sensitivity. Our study provided new insights into using the features of PBMCs in non-invasive cancer diagnosis.


Subject(s)
Leukocytes, Mononuclear , Neoplasms , Humans , Single-Cell Gene Expression Analysis , Neoplasms/diagnosis , Neoplasms/genetics , Cell Differentiation , Cell Transformation, Neoplastic
9.
Hum Mol Genet ; 33(8): 724-732, 2024 Apr 08.
Article in English | MEDLINE | ID: mdl-38271184

ABSTRACT

Since first publication of the American College of Medical Genetics and Genomics/Association for Medical Pathology (ACMG/AMP) variant classification guidelines, additional recommendations for application of certain criteria have been released (https://clinicalgenome.org/docs/), to improve their application in the diagnostic setting. However, none have addressed use of the PS4 and PP4 criteria, capturing patient presentation as evidence towards pathogenicity. Application of PS4 can be done through traditional case-control studies, or "proband counting" within or across clinical testing cohorts. Review of the existing PS4 and PP4 specifications for Hereditary Cancer Gene Variant Curation Expert Panels revealed substantial differences in the approach to defining specifications. Using BRCA1, BRCA2 and TP53 as exemplar genes, we calibrated different methods proposed for applying the "PS4 proband counting" criterion. For each approach, we considered limitations, non-independence with other ACMG/AMP criteria, broader applicability, and variability in results for different datasets. Our findings highlight inherent overlap of proband-counting methods with ACMG/AMP frequency codes, and the importance of calibration to derive dataset-specific code weights that can account for potential between-dataset differences in ascertainment and other factors. Our work emphasizes the advantages and generalizability of logistic regression analysis over simple proband-counting approaches to empirically determine the relative predictive capacity and weight of various personal clinical features in the context of multigene panel testing, for improved variant interpretation. We also provide a general protocol, including instructions for data formatting and a web-server for analysis of personal history parameters, to facilitate dataset-specific calibration analyses required to use such data for germline variant classification.


Subject(s)
Genetic Variation , Neoplasms , Humans , Genetic Variation/genetics , Genetic Testing/methods , Genome, Human , Phenotype , Genes, Neoplasm , Neoplasms/genetics
10.
Am J Hum Genet ; 110(7): 1177-1199, 2023 07 06.
Article in English | MEDLINE | ID: mdl-37419091

ABSTRACT

The existing framework of Mendelian randomization (MR) infers the causal effect of one or multiple exposures on one single outcome. It is not designed to jointly model multiple outcomes, as would be necessary to detect causes of more than one outcome and would be relevant to model multimorbidity or other related disease outcomes. Here, we introduce multi-response Mendelian randomization (MR2), an MR method specifically designed for multiple outcomes to identify exposures that cause more than one outcome or, conversely, exposures that exert their effect on distinct responses. MR2 uses a sparse Bayesian Gaussian copula regression framework to detect causal effects while estimating the residual correlation between summary-level outcomes, i.e., the correlation that cannot be explained by the exposures, and vice versa. We show both theoretically and in a comprehensive simulation study how unmeasured shared pleiotropy induces residual correlation between outcomes irrespective of sample overlap. We also reveal how non-genetic factors that affect more than one outcome contribute to their correlation. We demonstrate that by accounting for residual correlation, MR2 has higher power to detect shared exposures causing more than one outcome. It also provides more accurate causal effect estimates than existing methods that ignore the dependence between related responses. Finally, we illustrate how MR2 detects shared and distinct causal exposures for five cardiovascular diseases in two applications considering cardiometabolic and lipidomic exposures and uncovers residual correlation between summary-level outcomes reflecting known relationships between cardiovascular diseases.


Subject(s)
Cardiovascular Diseases , Humans , Cardiovascular Diseases/epidemiology , Cardiovascular Diseases/genetics , Bayes Theorem , Multimorbidity , Mendelian Randomization Analysis/methods , Causality , Genome-Wide Association Study
11.
Development ; 150(14)2023 07 15.
Article in English | MEDLINE | ID: mdl-37390294

ABSTRACT

Caudal developmental defects, including caudal regression, caudal dysgenesis and sirenomelia, are devastating conditions affecting the skeletal, nervous, digestive, reproductive and excretory systems. Defects in mesodermal migration and blood supply to the caudal region have been identified as possible causes of caudal developmental defects, but neither satisfactorily explains the structural malformations in all three germ layers. Here, we describe caudal developmental defects in transmembrane protein 132a (Tmem132a) mutant mice, including skeletal, posterior neural tube closure, genitourinary tract and hindgut defects. We show that, in Tmem132a mutant embryos, visceral endoderm fails to be excluded from the medial region of early hindgut, leading directly to the loss or malformation of cloaca-derived genitourinary and gastrointestinal structures, and indirectly to the neural tube and kidney/ureter defects. We find that TMEM132A mediates intercellular interaction, and physically interacts with planar cell polarity (PCP) regulators CELSR1 and FZD6. Genetically, Tmem132a regulates neural tube closure synergistically with another PCP regulator Vangl2. In summary, we have identified Tmem132a as a new regulator of PCP, and hindgut malformation as the underlying cause of developmental defects in multiple caudal structures.


Subject(s)
Neural Tube Defects , Mice , Animals , Neural Tube Defects/metabolism , Neural Tube/metabolism , Neurulation , Germ Layers/metabolism , Cell Polarity/physiology , Membrane Proteins/genetics , Membrane Proteins/metabolism
12.
Brief Bioinform ; 25(4)2024 May 23.
Article in English | MEDLINE | ID: mdl-38836403

ABSTRACT

In precision medicine, both predicting the disease susceptibility of an individual and forecasting its disease-free survival are areas of key research. Besides the classical epidemiological predictor variables, data from multiple (omic) platforms are increasingly available. To integrate this wealth of information, we propose new methodology to combine both cooperative learning, a recent approach to leverage the predictive power of several datasets, and polygenic hazard score models. Polygenic hazard score models provide a practitioner with a more differentiated view of the predicted disease-free survival than the one given by merely a point estimate, for instance computed with a polygenic risk score. Our aim is to leverage the advantages of cooperative learning for the computation of polygenic hazard score models via Cox's proportional hazard model, thereby improving the prediction of the disease-free survival. In our experimental study, we apply our methodology to forecast the disease-free survival for Alzheimer's disease (AD) using three layers of data. One layer contains epidemiological variables such as sex, APOE (apolipoprotein E, a genetic risk factor for AD) status and 10 leading principal components. Another layer contains selected genomic loci, and the last layer contains methylation data for selected CpG sites. We demonstrate that the survival curves computed via cooperative learning yield an AUC of around $0.7$, above the state-of-the-art performance of its competitors. Importantly, the proposed methodology returns (1) a linear score that can be easily interpreted (in contrast to machine learning approaches), and (2) a weighting of the predictive power of the involved data layers, allowing for an assessment of the importance of each omic (or other) platform. Similarly to polygenic hazard score models, our methodology also allows one to compute individual survival curves for each patient.


Subject(s)
Alzheimer Disease , Precision Medicine , Humans , Precision Medicine/methods , Alzheimer Disease/genetics , Alzheimer Disease/mortality , Disease-Free Survival , Machine Learning , Proportional Hazards Models , Multifactorial Inheritance , Male , Female , Multiomics
13.
Brief Bioinform ; 25(4)2024 May 23.
Article in English | MEDLINE | ID: mdl-38886006

ABSTRACT

Reconstructing the topology of gene regulatory network from gene expression data has been extensively studied. With the abundance functional transcriptomic data available, it is now feasible to systematically decipher regulatory interaction dynamics in a logic form such as a Boolean network (BN) framework, which qualitatively indicates how multiple regulators aggregated to affect a common target gene. However, inferring both the network topology and gene interaction dynamics simultaneously is still a challenging problem since gene expression data are typically noisy and data discretization is prone to information loss. We propose a new method for BN inference from time-series transcriptional profiles, called LogicGep. LogicGep formulates the identification of Boolean functions as a symbolic regression problem that learns the Boolean function expression and solve it efficiently through multi-objective optimization using an improved gene expression programming algorithm. To avoid overly emphasizing dynamic characteristics at the expense of topology structure ones, as traditional methods often do, a set of promising Boolean formulas for each target gene is evolved firstly, and a feed-forward neural network trained with continuous expression data is subsequently employed to pick out the final solution. We validated the efficacy of LogicGep using multiple datasets including both synthetic and real-world experimental data. The results elucidate that LogicGep adeptly infers accurate BN models, outperforming other representative BN inference algorithms in both network topology reconstruction and the identification of Boolean functions. Moreover, the execution of LogicGep is hundreds of times faster than other methods, especially in the case of large network inference.


Subject(s)
Algorithms , Gene Expression Profiling , Gene Regulatory Networks , Gene Expression Profiling/methods , Humans , Transcriptome , Software , Computational Biology/methods , Neural Networks, Computer
14.
Proc Natl Acad Sci U S A ; 120(23): e2212154120, 2023 06 06.
Article in English | MEDLINE | ID: mdl-37253012

ABSTRACT

The personality trait neuroticism is tightly linked to mental health, and neurotic people experience stronger negative emotions in everyday life. But, do their negative emotions also show greater fluctuation? This commonsensical notion was recently questioned by [Kalokerinos et al. Proc Natl Acad Sci USA 112, 15838-15843 (2020)], who suggested that the associations found in previous studies were spurious. Less neurotic people often report very low levels of negative emotion, which is usually measured with bounded rating scales. Therefore, they often pick the lowest possible response option, which severely constrains the amount of emotional variability that can be observed in principle. Applying a multistep statistical procedure that is supposed to correct for this dependency, [Kalokerinos et al. Proc Natl Acad Sci USA 112, 15838-15843 (2020)] no longer found an association between neuroticism and emotional variability. However, like other common approaches for controlling for undesirable effects due to bounded scales, this method is opaque with respect to the assumed mechanism of data generation and might not result in a successful correction. We thus suggest an alternative approach that a) takes into account that emotional states outside of the scale bounds can occur and b) models associations between neuroticism and both the mean and variability of emotion in a single step with the help of Bayesian censored location-scale models. Simulations supported this model over alternative approaches. We analyzed 13 longitudinal datasets (2,518 individuals and 11,170 measurements in total) and found clear evidence that more neurotic people experience greater variability in negative emotion.


Subject(s)
Emotions , Mental Health , Humans , Neuroticism/physiology , Bayes Theorem , Emotions/physiology
15.
Proc Natl Acad Sci U S A ; 120(7): e2206994120, 2023 Feb 14.
Article in English | MEDLINE | ID: mdl-36763535

ABSTRACT

Recent advances in high-resolution imaging techniques and particle-based simulation methods have enabled the precise microscopic characterization of collective dynamics in various biological and engineered active matter systems. In parallel, data-driven algorithms for learning interpretable continuum models have shown promising potential for the recovery of underlying partial differential equations (PDEs) from continuum simulation data. By contrast, learning macroscopic hydrodynamic equations for active matter directly from experiments or particle simulations remains a major challenge, especially when continuum models are not known a priori or analytic coarse graining fails, as often is the case for nondilute and heterogeneous systems. Here, we present a framework that leverages spectral basis representations and sparse regression algorithms to discover PDE models from microscopic simulation and experimental data, while incorporating the relevant physical symmetries. We illustrate the practical potential through a range of applications, from a chiral active particle model mimicking nonidentical swimming cells to recent microroller experiments and schooling fish. In all these cases, our scheme learns hydrodynamic equations that reproduce the self-organized collective dynamics observed in the simulations and experiments. This inference framework makes it possible to measure a large number of hydrodynamic parameters in parallel and directly from video data.

16.
Proc Natl Acad Sci U S A ; 120(9): e2218375120, 2023 02 28.
Article in English | MEDLINE | ID: mdl-36821583

ABSTRACT

The recent increase in openly available ancient human DNA samples allows for large-scale meta-analysis applications. Trans-generational past human mobility is one of the key aspects that ancient genomics can contribute to since changes in genetic ancestry-unlike cultural changes seen in the archaeological record-necessarily reflect movements of people. Here, we present an algorithm for spatiotemporal mapping of genetic profiles, which allow for direct estimates of past human mobility from large ancient genomic datasets. The key idea of the method is to derive a spatial probability surface of genetic similarity for each individual in its respective past. This is achieved by first creating an interpolated ancestry field through space and time based on multivariate statistics and Gaussian process regression and then using this field to map the ancient individuals into space according to their genetic profile. We apply this algorithm to a dataset of 3138 aDNA samples with genome-wide data from Western Eurasia in the last 10,000 y. Finally, we condense this sample-wise record with a simple summary statistic into a diachronic measure of mobility for subregions in Western, Central, and Southern Europe. For regions and periods with sufficient data coverage, our similarity surfaces and mobility estimates show general concordance with previous results and provide a meta-perspective of genetic changes and human mobility.


Subject(s)
DNA, Ancient , Genomics , Humans , History, Ancient , DNA, Ancient/analysis , Europe
17.
Proc Natl Acad Sci U S A ; 120(48): e2306275120, 2023 Nov 28.
Article in English | MEDLINE | ID: mdl-37983488

ABSTRACT

Big data and large-scale machine learning have had a profound impact on science and engineering, particularly in fields focused on forecasting and prediction. Yet, it is still not clear how we can use the superior pattern-matching abilities of machine learning models for scientific discovery. This is because the goals of machine learning and science are generally not aligned. In addition to being accurate, scientific theories must also be causally consistent with the underlying physical process and allow for human analysis, reasoning, and manipulation to advance the field. In this paper, we present a case study on discovering a symbolic model for oceanic rogue waves from data using causal analysis, deep learning, parsimony-guided model selection, and symbolic regression. We train an artificial neural network on causal features from an extensive dataset of observations from wave buoys, while selecting for predictive performance and causal invariance. We apply symbolic regression to distill this black-box model into a mathematical equation that retains the neural network's predictive capabilities, while allowing for interpretation in the context of existing wave theory. The resulting model reproduces known behavior, generates well-calibrated probabilities, and achieves better predictive scores on unseen data than current theory. This showcases how machine learning can facilitate inductive scientific discovery and paves the way for more accurate rogue wave forecasting.

18.
Proc Natl Acad Sci U S A ; 120(13): e2221311120, 2023 Mar 28.
Article in English | MEDLINE | ID: mdl-36940328

ABSTRACT

Leveraging a scientific infrastructure for exploring how students learn, we have developed cognitive and statistical models of skill acquisition and used them to understand fundamental similarities and differences across learners. Our primary question was why do some students learn faster than others? Or, do they? We model data from student performance on groups of tasks that assess the same skill component and that provide follow-up instruction on student errors. Our models estimate, for both students and skills, initial correctness and learning rate, that is, the increase in correctness after each practice opportunity. We applied our models to 1.3 million observations across 27 datasets of student interactions with online practice systems in the context of elementary to college courses in math, science, and language. Despite the availability of up-front verbal instruction, like lectures and readings, students demonstrate modest initial prepractice performance, at about 65% accuracy. Despite being in the same course, students' initial performance varies substantially from about 55% correct for those in the lower half to 75% for those in the upper half. In contrast, and much to our surprise, we found students to be astonishingly similar in estimated learning rate, typically increasing by about 0.1 log odds or 2.5% in accuracy per opportunity. These findings pose a challenge for theories of learning to explain the odd combination of large variation in student initial performance and striking regularity in student learning rate.

19.
J Neurosci ; 44(12)2024 Mar 20.
Article in English | MEDLINE | ID: mdl-38199865

ABSTRACT

Regression is a key feature of neurodevelopmental disorders such as autism spectrum disorder, Fragile X syndrome, and Rett syndrome (RTT). RTT is caused by mutations in the X-linked gene methyl-CpG-binding protein 2 (MECP2). It is characterized by an early period of typical development with subsequent regression of previously acquired motor and speech skills in girls. The syndromic phenotypes are individualistic and dynamic over time. Thus far, it has been difficult to capture these dynamics and syndromic heterogeneity in the preclinical Mecp2-heterozygous female mouse model (Het). The emergence of computational neuroethology tools allows for robust analysis of complex and dynamic behaviors to model endophenotypes in preclinical models. Toward this first step, we utilized DeepLabCut, a marker-less pose estimation software to quantify trajectory kinematics and multidimensional analysis to characterize behavioral heterogeneity in Het in the previously benchmarked, ethologically relevant social cognition task of pup retrieval. We report the identification of two distinct phenotypes of adult Het: Het that display a delay in efficiency in early days and then improve over days like wild-type mice and Het that regress and perform worse in later days. Furthermore, regression is dependent on age and behavioral context and can be detected in the initial days of retrieval. Together, the novel identification of two populations of Het suggests differential effects on neural circuitry, opens new avenues to investigate the underlying molecular and cellular mechanisms of heterogeneity, and designs better studies for stratifying therapeutics.


Subject(s)
Autism Spectrum Disorder , Rett Syndrome , Humans , Female , Animals , Mice , Rett Syndrome/genetics , Rett Syndrome/metabolism , Methyl-CpG-Binding Protein 2/genetics , Methyl-CpG-Binding Protein 2/metabolism , Phenotype , Mutation/genetics , Social Behavior , Disease Models, Animal
20.
Genet Epidemiol ; 2024 Jul 09.
Article in English | MEDLINE | ID: mdl-38982682

ABSTRACT

The prediction of the susceptibility of an individual to a certain disease is an important and timely research area. An established technique is to estimate the risk of an individual with the help of an integrated risk model, that is, a polygenic risk score with added epidemiological covariates. However, integrated risk models do not capture any time dependence, and may provide a point estimate of the relative risk with respect to a reference population. The aim of this work is twofold. First, we explore and advocate the idea of predicting the time-dependent hazard and survival (defined as disease-free time) of an individual for the onset of a disease. This provides a practitioner with a much more differentiated view of absolute survival as a function of time. Second, to compute the time-dependent risk of an individual, we use published methodology to fit a Cox's proportional hazard model to data from a genetic SNP study of time to Alzheimer's disease (AD) onset, using the lasso to incorporate further epidemiological variables such as sex, APOE (apolipoprotein E, a genetic risk factor for AD) status, 10 leading principal components, and selected genomic loci. We apply the lasso for Cox's proportional hazards to a data set of 6792 AD patients (composed of 4102 cases and 2690 controls) and 87 covariates. We demonstrate that fitting a lasso model for Cox's proportional hazards allows one to obtain more accurate survival curves than with state-of-the-art (likelihood-based) methods. Moreover, the methodology allows one to obtain personalized survival curves for a patient, thus giving a much more differentiated view of the expected progression of a disease than the view offered by integrated risk models. The runtime to compute personalized survival curves is under a minute for the entire data set of AD patients, thus enabling it to handle datasets with 60,000-100,000 subjects in less than 1 h.

SELECTION OF CITATIONS
SEARCH DETAIL