RESUMO
A functional network of blood vessels is essential for organ growth and homeostasis, yet how the vasculature matures and maintains homeostasis remains elusive in live mice. By longitudinally tracking the same neonatal endothelial cells (ECs) over days to weeks, we found that capillary plexus expansion is driven by vessel regression to optimize network perfusion. Neonatal ECs rearrange positions to evenly distribute throughout the developing plexus and become positionally stable in adulthood. Upon local ablation, adult ECs survive through a plasmalemmal self-repair response, while neonatal ECs are predisposed to die. Furthermore, adult ECs reactivate migration to assist vessel repair. Global ablation reveals coordinated maintenance of the adult vascular architecture that allows for eventual network recovery. Lastly, neonatal remodeling and adult maintenance of the skin vascular plexus are orchestrated by temporally restricted, neonatal VEGFR2 signaling. Our work sheds light on fundamental mechanisms that underlie both vascular maturation and adult homeostasis in vivo.
Assuntos
Células Endoteliais , Neovascularização Fisiológica , Animais , Camundongos , Células Endoteliais/fisiologia , Neovascularização Fisiológica/fisiologia , Pele , Membrana CelularRESUMO
For a decade, The Cancer Genome Atlas (TCGA) program collected clinicopathologic annotation data along with multi-platform molecular profiles of more than 11,000 human tumors across 33 different cancer types. TCGA clinical data contain key features representing the democratized nature of the data collection process. To ensure proper use of this large clinical dataset associated with genomic features, we developed a standardized dataset named the TCGA Pan-Cancer Clinical Data Resource (TCGA-CDR), which includes four major clinical outcome endpoints. In addition to detailing major challenges and statistical limitations encountered during the effort of integrating the acquired clinical data, we present a summary that includes endpoint usage recommendations for each cancer type. These TCGA-CDR findings appear to be consistent with cancer genomics studies independent of the TCGA effort and provide opportunities for investigating cancer biology using clinical correlates at an unprecedented scale.
Assuntos
Neoplasias/patologia , Bases de Dados Genéticas , Genômica , Humanos , Estimativa de Kaplan-Meier , Neoplasias/genética , Neoplasias/mortalidade , Modelos de Riscos ProporcionaisRESUMO
Tumor-infiltrating dendritic cells (DCs) assume varied functional states that impact anti-tumor immunity. To delineate the DC states associated with productive anti-tumor T cell immunity, we compared spontaneously regressing and progressing tumors. Tumor-reactive CD8+ T cell responses in Batf3-/- mice lacking type 1 DCs (DC1s) were lost in progressor tumors but preserved in regressor tumors. Transcriptional profiling of intra-tumoral DCs within regressor tumors revealed an activation state of CD11b+ conventional DCs (DC2s) characterized by expression of interferon (IFN)-stimulated genes (ISGs) (ISG+ DCs). ISG+ DC-activated CD8+ T cells ex vivo comparably to DC1. Unlike cross-presenting DC1, ISG+ DCs acquired and presented intact tumor-derived peptide-major histocompatibility complex class I (MHC class I) complexes. Constitutive type I IFN production by regressor tumors drove the ISG+ DC state, and activation of MHC class I-dressed ISG+ DCs by exogenous IFN-ß rescued anti-tumor immunity against progressor tumors in Batf3-/- mice. The ISG+ DC gene signature is detectable in human tumors. Engaging this functional DC state may present an approach for the treatment of human disease.
Assuntos
Linfócitos T CD8-Positivos/imunologia , Células Dendríticas/imunologia , Antígenos de Histocompatibilidade Classe I/imunologia , Interferon Tipo I/imunologia , Linfócitos do Interstício Tumoral/imunologia , Animais , Antígenos de Neoplasias/imunologia , Antígeno CD11b/imunologia , Apresentação Cruzada , Células Dendríticas/efeitos dos fármacos , Interferon beta/administração & dosagem , Interferon beta/farmacologia , Camundongos , Neoplasias/imunologia , Receptores de Interferon/imunologia , Transdução de Sinais/imunologia , Microambiente Tumoral/imunologiaRESUMO
The adenomatous polyposis coli (APC) tumor suppressor is mutated in the vast majority of human colorectal cancers (CRC) and leads to deregulated Wnt signaling. To determine whether Apc disruption is required for tumor maintenance, we developed a mouse model of CRC whereby Apc can be conditionally suppressed using a doxycycline-regulated shRNA. Apc suppression produces adenomas in both the small intestine and colon that, in the presence of Kras and p53 mutations, can progress to invasive carcinoma. In established tumors, Apc restoration drives rapid and widespread tumor-cell differentiation and sustained regression without relapse. Tumor regression is accompanied by the re-establishment of normal crypt-villus homeostasis, such that once aberrantly proliferating cells reacquire self-renewal and multi-lineage differentiation capability. Our study reveals that CRC cells can revert to functioning normal cells given appropriate signals and provide compelling in vivo validation of the Wnt pathway as a therapeutic target for treatment of CRC.
Assuntos
Proteína da Polipose Adenomatosa do Colo/metabolismo , Neoplasias Colorretais/genética , Modelos Animais de Doenças , Intestino Grosso/patologia , Intestino Delgado/patologia , Proteína da Polipose Adenomatosa do Colo/genética , Animais , Proliferação de Células , Neoplasias Colorretais/patologia , Doxiciclina/administração & dosagem , Genes p53 , Pólipos Intestinais/metabolismo , Pólipos Intestinais/patologia , Intestino Grosso/metabolismo , Intestino Delgado/metabolismo , Camundongos , Camundongos Transgênicos , Proteínas Proto-Oncogênicas p21(ras)/genética , Interferência de RNA , Via de Sinalização WntRESUMO
Accurate DNA replication is essential to preserve genomic integrity and prevent chromosomal instability-associated diseases including cancer. Key to this process is the cells' ability to stabilize and restart stalled replication forks. Here, we show that the EXD2 nuclease is essential to this process. EXD2 recruitment to stressed forks suppresses their degradation by restraining excessive fork regression. Accordingly, EXD2 deficiency leads to fork collapse, hypersensitivity to replication inhibitors, and genomic instability. Impeding fork regression by inactivation of SMARCAL1 or removal of RECQ1's inhibition in EXD2-/- cells restores efficient fork restart and genome stability. Moreover, purified EXD2 efficiently processes substrates mimicking regressed forks. Thus, this work identifies a mechanism underpinned by EXD2's nuclease activity, by which cells balance fork regression with fork restoration to maintain genome stability. Interestingly, from a clinical perspective, we discover that EXD2's depletion is synthetic lethal with mutations in BRCA1/2, implying a non-redundant role in replication fork protection.
Assuntos
DNA Helicases/genética , Replicação do DNA/genética , Exodesoxirribonucleases/genética , RecQ Helicases/genética , Proteína BRCA1/genética , Proteína BRCA2/genética , Instabilidade Genômica/genética , Células HeLa , Humanos , Neoplasias/genética , Mutações Sintéticas Letais/genéticaRESUMO
Atherosclerosis results from lipid-driven inflammation of the arterial wall that fails to resolve. Imbalances in macrophage accumulation and function, including diminished migratory capacity and defective efferocytosis, fuel maladaptive inflammation and plaque progression. The neuroimmune guidance cue netrin-1 has dichotomous roles in inflammation partly due to its multiple receptors; in atherosclerosis, netrin-1 promotes macrophage survival and retention via its receptor Unc5b. To minimize the pleiotropic effects of targeting netrin-1, we tested the therapeutic potential of deleting Unc5b in mice with advanced atherosclerosis. We generated Unc5bfl/flCx3cr1creERT2/WT mice, which allowed conditional deletion of Un5b (∆Unc5bMØ) in monocytes and macrophages by tamoxifen injection. After inducing advanced atherosclerosis by hepatic PCSK9 overexpression and western diet feeding for 20 wk, Unc5b was deleted and hypercholesterolemia was normalized to simulate clinical lipid management. Deletion of myeloid Unc5b led to a 40% decrease in atherosclerotic plaque burden and reduced plaque complexity compared to Unc5bfl/flCx3cr1WT/WT littermate controls (CtrlMØ). Consistently, plaque macrophage content was reduced by 50% in ∆Unc5bMØ mice due to reduced plaque Ly6Chi monocyte recruitment and macrophage retention. Compared to CtrlMØ mice, plaques in ∆Unc5bMØ mice had reduced necrotic area and fewer apoptotic cells, which correlated with improved efferocytotic capacity by Unc5b-deficient macrophages in vivo and in vitro. Beneficial changes in macrophage dynamics in the plaque upon Unc5b deletion were accompanied by an increase in atheroprotective T cell populations, including T-regulatory and Th2 cells. Our data identify Unc5b in advanced atherosclerosis as a therapeutic target to induce pro-resolving restructuring of the plaque immune cells and to promote atherosclerosis regression.
Assuntos
Aterosclerose , Macrófagos , Receptores de Netrina , Placa Aterosclerótica , Animais , Masculino , Camundongos , Aterosclerose/imunologia , Aterosclerose/patologia , Aterosclerose/metabolismo , Inflamação/patologia , Inflamação/metabolismo , Inflamação/imunologia , Macrófagos/imunologia , Macrófagos/metabolismo , Camundongos Knockout , Monócitos/imunologia , Monócitos/metabolismo , Receptores de Netrina/metabolismo , Netrina-1/metabolismo , Netrina-1/genética , Placa Aterosclerótica/patologia , Placa Aterosclerótica/metabolismo , Receptores de Superfície Celular/metabolismo , Receptores de Superfície Celular/genéticaRESUMO
During real-time language comprehension, our minds rapidly decode complex meanings from sequences of words. The difficulty of doing so is known to be related to words' contextual predictability, but what cognitive processes do these predictability effects reflect? In one view, predictability effects reflect facilitation due to anticipatory processing of words that are predictable from context. This view predicts a linear effect of predictability on processing demand. In another view, predictability effects reflect the costs of probabilistic inference over sentence interpretations. This view predicts either a logarithmic or a superlogarithmic effect of predictability on processing demand, depending on whether it assumes pressures toward a uniform distribution of information over time. The empirical record is currently mixed. Here, we revisit this question at scale: We analyze six reading datasets, estimate next-word probabilities with diverse statistical language models, and model reading times using recent advances in nonlinear regression. Results support a logarithmic effect of word predictability on processing difficulty, which favors probabilistic inference as a key component of human language processing.
Assuntos
Compreensão , Idioma , Humanos , Modelos EstatísticosRESUMO
As ambush-hunting predators that consume large prey after long intervals of fasting, Burmese pythons evolved with unique adaptations for modulating organ structure and function. Among these is cardiac hypertrophy that develops within three days following a meal (Andersen et al., 2005, Secor, 2008), which we previously showed was initiated by circulating growth factors (Riquelme et al., 2011). Postprandial cardiac hypertrophy in pythons also rapidly regresses with subsequent fasting (Secor, 2008); however, the molecular mechanisms that regulate the dynamic cardiac remodeling in pythons during digestion are largely unknown. In this study, we employed a multiomics approach coupled with targeted molecular analyses to examine remodeling of the python ventricular transcriptome and proteome throughout digestion. We found that forkhead box protein O1 (FoxO1) signaling was suppressed prior to hypertrophy development and then activated during regression, which coincided with decreased and then increased expression, respectively, of FoxO1 transcriptional targets involved in proteolysis. To define the molecular mechanistic role of FoxO1 in hypertrophy regression, we used cultured mammalian cardiomyocytes treated with postfed python plasma. Hypertrophy regression both in pythons and in vitro coincided with activation of FoxO1-dependent autophagy; however, the introduction of a FoxO1-specific inhibitor prevented both regression of cell size and autophagy activation. Finally, to determine whether FoxO1 activation could induce regression, we generated an adenovirus expressing a constitutively active FoxO1. FoxO1 activation was sufficient to prevent and reverse postfed plasma-induced hypertrophy, which was partially prevented by autophagy inhibition. Our results indicate that modulation of FoxO1 activity contributes to the dynamic ventricular remodeling in postprandial Burmese pythons.
Assuntos
Boidae , Proteína Forkhead Box O1 , Coração , Animais , Autofagia , Proteína Forkhead Box O1/metabolismo , Proteína Forkhead Box O1/genética , Miócitos Cardíacos/metabolismo , Período Pós-Prandial , Transdução de Sinais , Transcriptoma , Coração/fisiologiaRESUMO
Polygenic risk scores (PRS) enhance population risk stratification and advance personalized medicine, but existing methods face several limitations, encompassing issues related to computational burden, predictive accuracy, and adaptability to a wide range of genetic architectures. To address these issues, we propose Aggregated L0Learn using Summary-level data (ALL-Sum), a fast and scalable ensemble learning method for computing PRS using summary statistics from genome-wide association studies (GWAS). ALL-Sum leverages a L0L2 penalized regression and ensemble learning across tuning parameters to flexibly model traits with diverse genetic architectures. In extensive large-scale simulations across a wide range of polygenicity and GWAS sample sizes, ALL-Sum consistently outperformed popular alternative methods in terms of prediction accuracy, runtime, and memory usage by 10%, 20-fold, and threefold, respectively, and demonstrated robustness to diverse genetic architectures. We validated the performance of ALL-Sum in real data analysis of 11 complex traits using GWAS summary statistics from nine data sources, including the Global Lipids Genetics Consortium, Breast Cancer Association Consortium, and FinnGen Biobank, with validation in the UK Biobank. Our results show that on average, ALL-Sum obtained PRS with 25% higher accuracy on average, with 15 times faster computation and half the memory than the current state-of-the-art methods, and had robust performance across a wide range of traits and diseases. Furthermore, our method demonstrates stable prediction when using linkage disequilibrium computed from different data sources. ALL-Sum is available as a user-friendly R software package with publicly available reference data for streamlined analysis.
Assuntos
Estudo de Associação Genômica Ampla , Herança Multifatorial , Humanos , Herança Multifatorial/genética , Estudo de Associação Genômica Ampla/métodos , Aprendizado de Máquina , Predisposição Genética para Doença , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Peripheral blood mononuclear cells (PBMCs) reflect systemic immune response during cancer progression. However, a comprehensive understanding of the composition and function of PBMCs in cancer patients is lacking, and the potential of these features to assist cancer diagnosis is also unclear. Here, the compositional and status differences between cancer patients and healthy donors in PBMCs were investigated by single-cell RNA sequencing (scRNA-seq), involving 262,025 PBMCs from 68 cancer samples and 14 healthy samples. We observed an enhanced activation and differentiation of most immune subsets in cancer patients, along with reduction of naïve T cells, expansion of macrophages, impairment of NK cells and myeloid cells, as well as tumor promotion and immunosuppression. Based on characteristics including differential cell type abundances and/or hub genes identified from weight gene co-expression network analysis (WGCNA) modules of each major cell type, we applied logistic regression to construct cancer diagnosis models. Furthermore, we found that the above models can distinguish cancer patients and healthy donors with high sensitivity. Our study provided new insights into using the features of PBMCs in non-invasive cancer diagnosis.
Assuntos
Leucócitos Mononucleares , Neoplasias , Humanos , Análise da Expressão Gênica de Célula Única , Neoplasias/diagnóstico , Neoplasias/genética , Diferenciação Celular , Transformação Celular NeoplásicaRESUMO
Since first publication of the American College of Medical Genetics and Genomics/Association for Medical Pathology (ACMG/AMP) variant classification guidelines, additional recommendations for application of certain criteria have been released (https://clinicalgenome.org/docs/), to improve their application in the diagnostic setting. However, none have addressed use of the PS4 and PP4 criteria, capturing patient presentation as evidence towards pathogenicity. Application of PS4 can be done through traditional case-control studies, or "proband counting" within or across clinical testing cohorts. Review of the existing PS4 and PP4 specifications for Hereditary Cancer Gene Variant Curation Expert Panels revealed substantial differences in the approach to defining specifications. Using BRCA1, BRCA2 and TP53 as exemplar genes, we calibrated different methods proposed for applying the "PS4 proband counting" criterion. For each approach, we considered limitations, non-independence with other ACMG/AMP criteria, broader applicability, and variability in results for different datasets. Our findings highlight inherent overlap of proband-counting methods with ACMG/AMP frequency codes, and the importance of calibration to derive dataset-specific code weights that can account for potential between-dataset differences in ascertainment and other factors. Our work emphasizes the advantages and generalizability of logistic regression analysis over simple proband-counting approaches to empirically determine the relative predictive capacity and weight of various personal clinical features in the context of multigene panel testing, for improved variant interpretation. We also provide a general protocol, including instructions for data formatting and a web-server for analysis of personal history parameters, to facilitate dataset-specific calibration analyses required to use such data for germline variant classification.
Assuntos
Variação Genética , Neoplasias , Humanos , Variação Genética/genética , Testes Genéticos/métodos , Genoma Humano , Fenótipo , Genes Neoplásicos , Neoplasias/genéticaRESUMO
The existing framework of Mendelian randomization (MR) infers the causal effect of one or multiple exposures on one single outcome. It is not designed to jointly model multiple outcomes, as would be necessary to detect causes of more than one outcome and would be relevant to model multimorbidity or other related disease outcomes. Here, we introduce multi-response Mendelian randomization (MR2), an MR method specifically designed for multiple outcomes to identify exposures that cause more than one outcome or, conversely, exposures that exert their effect on distinct responses. MR2 uses a sparse Bayesian Gaussian copula regression framework to detect causal effects while estimating the residual correlation between summary-level outcomes, i.e., the correlation that cannot be explained by the exposures, and vice versa. We show both theoretically and in a comprehensive simulation study how unmeasured shared pleiotropy induces residual correlation between outcomes irrespective of sample overlap. We also reveal how non-genetic factors that affect more than one outcome contribute to their correlation. We demonstrate that by accounting for residual correlation, MR2 has higher power to detect shared exposures causing more than one outcome. It also provides more accurate causal effect estimates than existing methods that ignore the dependence between related responses. Finally, we illustrate how MR2 detects shared and distinct causal exposures for five cardiovascular diseases in two applications considering cardiometabolic and lipidomic exposures and uncovers residual correlation between summary-level outcomes reflecting known relationships between cardiovascular diseases.
Assuntos
Doenças Cardiovasculares , Humanos , Doenças Cardiovasculares/epidemiologia , Doenças Cardiovasculares/genética , Teorema de Bayes , Multimorbidade , Análise da Randomização Mendeliana/métodos , Causalidade , Estudo de Associação Genômica AmplaRESUMO
Caudal developmental defects, including caudal regression, caudal dysgenesis and sirenomelia, are devastating conditions affecting the skeletal, nervous, digestive, reproductive and excretory systems. Defects in mesodermal migration and blood supply to the caudal region have been identified as possible causes of caudal developmental defects, but neither satisfactorily explains the structural malformations in all three germ layers. Here, we describe caudal developmental defects in transmembrane protein 132a (Tmem132a) mutant mice, including skeletal, posterior neural tube closure, genitourinary tract and hindgut defects. We show that, in Tmem132a mutant embryos, visceral endoderm fails to be excluded from the medial region of early hindgut, leading directly to the loss or malformation of cloaca-derived genitourinary and gastrointestinal structures, and indirectly to the neural tube and kidney/ureter defects. We find that TMEM132A mediates intercellular interaction, and physically interacts with planar cell polarity (PCP) regulators CELSR1 and FZD6. Genetically, Tmem132a regulates neural tube closure synergistically with another PCP regulator Vangl2. In summary, we have identified Tmem132a as a new regulator of PCP, and hindgut malformation as the underlying cause of developmental defects in multiple caudal structures.
Assuntos
Defeitos do Tubo Neural , Camundongos , Animais , Defeitos do Tubo Neural/metabolismo , Tubo Neural/metabolismo , Neurulação , Camadas Germinativas/metabolismo , Polaridade Celular/fisiologia , Proteínas de Membrana/genética , Proteínas de Membrana/metabolismoRESUMO
Reconstructing the topology of gene regulatory network from gene expression data has been extensively studied. With the abundance functional transcriptomic data available, it is now feasible to systematically decipher regulatory interaction dynamics in a logic form such as a Boolean network (BN) framework, which qualitatively indicates how multiple regulators aggregated to affect a common target gene. However, inferring both the network topology and gene interaction dynamics simultaneously is still a challenging problem since gene expression data are typically noisy and data discretization is prone to information loss. We propose a new method for BN inference from time-series transcriptional profiles, called LogicGep. LogicGep formulates the identification of Boolean functions as a symbolic regression problem that learns the Boolean function expression and solve it efficiently through multi-objective optimization using an improved gene expression programming algorithm. To avoid overly emphasizing dynamic characteristics at the expense of topology structure ones, as traditional methods often do, a set of promising Boolean formulas for each target gene is evolved firstly, and a feed-forward neural network trained with continuous expression data is subsequently employed to pick out the final solution. We validated the efficacy of LogicGep using multiple datasets including both synthetic and real-world experimental data. The results elucidate that LogicGep adeptly infers accurate BN models, outperforming other representative BN inference algorithms in both network topology reconstruction and the identification of Boolean functions. Moreover, the execution of LogicGep is hundreds of times faster than other methods, especially in the case of large network inference.
Assuntos
Algoritmos , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Perfilação da Expressão Gênica/métodos , Humanos , Transcriptoma , Software , Biologia Computacional/métodos , Redes Neurais de ComputaçãoRESUMO
In precision medicine, both predicting the disease susceptibility of an individual and forecasting its disease-free survival are areas of key research. Besides the classical epidemiological predictor variables, data from multiple (omic) platforms are increasingly available. To integrate this wealth of information, we propose new methodology to combine both cooperative learning, a recent approach to leverage the predictive power of several datasets, and polygenic hazard score models. Polygenic hazard score models provide a practitioner with a more differentiated view of the predicted disease-free survival than the one given by merely a point estimate, for instance computed with a polygenic risk score. Our aim is to leverage the advantages of cooperative learning for the computation of polygenic hazard score models via Cox's proportional hazard model, thereby improving the prediction of the disease-free survival. In our experimental study, we apply our methodology to forecast the disease-free survival for Alzheimer's disease (AD) using three layers of data. One layer contains epidemiological variables such as sex, APOE (apolipoprotein E, a genetic risk factor for AD) status and 10 leading principal components. Another layer contains selected genomic loci, and the last layer contains methylation data for selected CpG sites. We demonstrate that the survival curves computed via cooperative learning yield an AUC of around $0.7$, above the state-of-the-art performance of its competitors. Importantly, the proposed methodology returns (1) a linear score that can be easily interpreted (in contrast to machine learning approaches), and (2) a weighting of the predictive power of the involved data layers, allowing for an assessment of the importance of each omic (or other) platform. Similarly to polygenic hazard score models, our methodology also allows one to compute individual survival curves for each patient.
Assuntos
Doença de Alzheimer , Medicina de Precisão , Humanos , Medicina de Precisão/métodos , Doença de Alzheimer/genética , Doença de Alzheimer/mortalidade , Intervalo Livre de Doença , Aprendizado de Máquina , Modelos de Riscos Proporcionais , Herança Multifatorial , Masculino , Feminino , MultiômicaRESUMO
The personality trait neuroticism is tightly linked to mental health, and neurotic people experience stronger negative emotions in everyday life. But, do their negative emotions also show greater fluctuation? This commonsensical notion was recently questioned by [Kalokerinos et al. Proc Natl Acad Sci USA 112, 15838-15843 (2020)], who suggested that the associations found in previous studies were spurious. Less neurotic people often report very low levels of negative emotion, which is usually measured with bounded rating scales. Therefore, they often pick the lowest possible response option, which severely constrains the amount of emotional variability that can be observed in principle. Applying a multistep statistical procedure that is supposed to correct for this dependency, [Kalokerinos et al. Proc Natl Acad Sci USA 112, 15838-15843 (2020)] no longer found an association between neuroticism and emotional variability. However, like other common approaches for controlling for undesirable effects due to bounded scales, this method is opaque with respect to the assumed mechanism of data generation and might not result in a successful correction. We thus suggest an alternative approach that a) takes into account that emotional states outside of the scale bounds can occur and b) models associations between neuroticism and both the mean and variability of emotion in a single step with the help of Bayesian censored location-scale models. Simulations supported this model over alternative approaches. We analyzed 13 longitudinal datasets (2,518 individuals and 11,170 measurements in total) and found clear evidence that more neurotic people experience greater variability in negative emotion.
Assuntos
Emoções , Saúde Mental , Humanos , Neuroticismo/fisiologia , Teorema de Bayes , Emoções/fisiologiaRESUMO
The recent increase in openly available ancient human DNA samples allows for large-scale meta-analysis applications. Trans-generational past human mobility is one of the key aspects that ancient genomics can contribute to since changes in genetic ancestry-unlike cultural changes seen in the archaeological record-necessarily reflect movements of people. Here, we present an algorithm for spatiotemporal mapping of genetic profiles, which allow for direct estimates of past human mobility from large ancient genomic datasets. The key idea of the method is to derive a spatial probability surface of genetic similarity for each individual in its respective past. This is achieved by first creating an interpolated ancestry field through space and time based on multivariate statistics and Gaussian process regression and then using this field to map the ancient individuals into space according to their genetic profile. We apply this algorithm to a dataset of 3138 aDNA samples with genome-wide data from Western Eurasia in the last 10,000 y. Finally, we condense this sample-wise record with a simple summary statistic into a diachronic measure of mobility for subregions in Western, Central, and Southern Europe. For regions and periods with sufficient data coverage, our similarity surfaces and mobility estimates show general concordance with previous results and provide a meta-perspective of genetic changes and human mobility.
Assuntos
DNA Antigo , Genômica , Humanos , História Antiga , DNA Antigo/análise , Europa (Continente)RESUMO
Recent advances in high-resolution imaging techniques and particle-based simulation methods have enabled the precise microscopic characterization of collective dynamics in various biological and engineered active matter systems. In parallel, data-driven algorithms for learning interpretable continuum models have shown promising potential for the recovery of underlying partial differential equations (PDEs) from continuum simulation data. By contrast, learning macroscopic hydrodynamic equations for active matter directly from experiments or particle simulations remains a major challenge, especially when continuum models are not known a priori or analytic coarse graining fails, as often is the case for nondilute and heterogeneous systems. Here, we present a framework that leverages spectral basis representations and sparse regression algorithms to discover PDE models from microscopic simulation and experimental data, while incorporating the relevant physical symmetries. We illustrate the practical potential through a range of applications, from a chiral active particle model mimicking nonidentical swimming cells to recent microroller experiments and schooling fish. In all these cases, our scheme learns hydrodynamic equations that reproduce the self-organized collective dynamics observed in the simulations and experiments. This inference framework makes it possible to measure a large number of hydrodynamic parameters in parallel and directly from video data.
RESUMO
Big data and large-scale machine learning have had a profound impact on science and engineering, particularly in fields focused on forecasting and prediction. Yet, it is still not clear how we can use the superior pattern-matching abilities of machine learning models for scientific discovery. This is because the goals of machine learning and science are generally not aligned. In addition to being accurate, scientific theories must also be causally consistent with the underlying physical process and allow for human analysis, reasoning, and manipulation to advance the field. In this paper, we present a case study on discovering a symbolic model for oceanic rogue waves from data using causal analysis, deep learning, parsimony-guided model selection, and symbolic regression. We train an artificial neural network on causal features from an extensive dataset of observations from wave buoys, while selecting for predictive performance and causal invariance. We apply symbolic regression to distill this black-box model into a mathematical equation that retains the neural network's predictive capabilities, while allowing for interpretation in the context of existing wave theory. The resulting model reproduces known behavior, generates well-calibrated probabilities, and achieves better predictive scores on unseen data than current theory. This showcases how machine learning can facilitate inductive scientific discovery and paves the way for more accurate rogue wave forecasting.
RESUMO
Leveraging a scientific infrastructure for exploring how students learn, we have developed cognitive and statistical models of skill acquisition and used them to understand fundamental similarities and differences across learners. Our primary question was why do some students learn faster than others? Or, do they? We model data from student performance on groups of tasks that assess the same skill component and that provide follow-up instruction on student errors. Our models estimate, for both students and skills, initial correctness and learning rate, that is, the increase in correctness after each practice opportunity. We applied our models to 1.3 million observations across 27 datasets of student interactions with online practice systems in the context of elementary to college courses in math, science, and language. Despite the availability of up-front verbal instruction, like lectures and readings, students demonstrate modest initial prepractice performance, at about 65% accuracy. Despite being in the same course, students' initial performance varies substantially from about 55% correct for those in the lower half to 75% for those in the upper half. In contrast, and much to our surprise, we found students to be astonishingly similar in estimated learning rate, typically increasing by about 0.1 log odds or 2.5% in accuracy per opportunity. These findings pose a challenge for theories of learning to explain the odd combination of large variation in student initial performance and striking regularity in student learning rate.