RESUMO
A central principle in neuroscience is that neurons within the brain act in concert to produce perception, cognition, and adaptive behavior. Neurons are organized into specialized brain areas, dedicated to different functions to varying extents, and their function relies on distributed circuits to continuously encode relevant environmental and body-state features, enabling other areas to decode (interpret) these representations for computing meaningful decisions and executing precise movements. Thus, the distributed brain can be thought of as a series of computations that act to encode and decode information. In this perspective, we detail important concepts of neural encoding and decoding and highlight the mathematical tools used to measure them, including deep learning methods. We provide case studies where decoding concepts enable foundational and translational science in motor, visual, and language processing.
Assuntos
Encéfalo , Modelos Neurológicos , Neurônios , Encéfalo/fisiologia , Humanos , Neurônios/fisiologia , AnimaisRESUMO
The construction of a predictive model of an entire eukaryotic cell that describes its dynamic structure from atomic to cellular scales is a grand challenge at the intersection of biology, chemistry, physics, and computer science. Having such a model will open new dimensions in biological research and accelerate healthcare advancements. Developing the necessary experimental and modeling methods presents abundant opportunities for a community effort to realize this goal. Here, we present a vision for creation of a spatiotemporal multi-scale model of the pancreatic ß-cell, a relevant target for understanding and modulating the pathogenesis of diabetes.
Assuntos
Células Secretoras de Insulina/metabolismo , Modelos Biológicos , Biologia Computacional , Descoberta de Drogas , Humanos , Células Secretoras de Insulina/citologia , Proteínas/química , Proteínas/metabolismoRESUMO
Recent developments in synthetic biology, next-generation sequencing, and machine learning provide an unprecedented opportunity to rationally design new disease treatments based on measured responses to gene perturbations and drugs to reprogram cells. The main challenges to seizing this opportunity are the incomplete knowledge of the cellular network and the combinatorial explosion of possible interventions, both of which are insurmountable by experiments. To address these challenges, we develop a transfer learning approach to control cell behavior that is pre-trained on transcriptomic data associated with human cell fates, thereby generating a model of the network dynamics that can be transferred to specific reprogramming goals. The approach combines transcriptional responses to gene perturbations to minimize the difference between a given pair of initial and target transcriptional states. We demonstrate our approach's versatility by applying it to a microarray dataset comprising >9,000 microarrays across 54 cell types and 227 unique perturbations, and an RNASeq dataset consisting of >10,000 sequencing runs across 36 cell types and 138 perturbations. Our approach reproduces known reprogramming protocols with an AUROC of 0.91 while innovating over existing methods by pre-training an adaptable model that can be tailored to specific reprogramming transitions. We show that the number of gene perturbations required to steer from one fate to another increases with decreasing developmental relatedness and that fewer genes are needed to progress along developmental paths than to regress. These findings establish a proof-of-concept for our approach to computationally design control strategies and provide insights into how gene regulatory networks govern phenotype.
Assuntos
Reprogramação Celular , Redes Reguladoras de Genes , Humanos , Reprogramação Celular/genética , Diferenciação Celular , Controle Comportamental , Aprendizado de MáquinaRESUMO
Predicting the temporal and spatial patterns of South Asian monsoon rainfall within a season is of critical importance due to its impact on agriculture, water availability, and flooding. The monsoon intraseasonal oscillation (MISO) is a robust northward-propagating mode that determines the active and break phases of the monsoon and much of the regional distribution of rainfall. However, dynamical atmospheric forecast models predict this mode poorly. Data-driven methods for MISO prediction have shown more skill, but only predict the portion of the rainfall corresponding to MISO rather than the full rainfall signal. Here, we combine state-of-the-art ensemble precipitation forecasts from a high-resolution atmospheric model with data-driven forecasts of MISO. The ensemble members of the detailed atmospheric model are projected onto a lower-dimensional subspace corresponding to the MISO dynamics and are then weighted according to their distance from the data-driven MISO forecast in this subspace. We thereby achieve improvements in rainfall forecasts over India, as well as the broader monsoon region, at 10- to 30-d lead times, an interval that is generally considered to be a predictability gap. The temporal correlation of rainfall forecasts is improved by up to 0.28 in this time range. Our results demonstrate the potential of leveraging the predictability of intraseasonal oscillations to improve extended-range forecasts; more generally, they point toward a future of combining dynamical and data-driven forecasts for Earth system prediction.
RESUMO
The ability to concisely describe the dynamical behavior of soft materials through closed-form constitutive relations holds the key to accelerated and informed design of materials and processes. The conventional approach is to construct constitutive relations through simplifying assumptions and approximating the time- and rate-dependent stress response of a complex fluid to an imposed deformation. While traditional frameworks have been foundational to our current understanding of soft materials, they often face a twofold existential limitation: i) Constructed on ideal and generalized assumptions, precise recovery of material-specific details is usually serendipitous, if possible, and ii) inherent biases that are involved by making those assumptions commonly come at the cost of new physical insight. This work introduces an approach by leveraging recent advances in scientific machine learning methodologies to discover the governing constitutive equation from experimental data for complex fluids. Our rheology-informed neural network framework is found capable of learning the hidden rheology of a complex fluid through a limited number of experiments. This is followed by construction of an unbiased material-specific constitutive relation that accurately describes a wide range of bulk dynamical behavior of the material. While extremely efficient in closed-form model discovery for a real-world complex system, the model also provides insight into the underpinning physics of the material.
RESUMO
Fluorescence microscopy is essential for studying living cells, tissues and organisms. However, the fluorescent light that switches on fluorescent molecules also harms the samples, jeopardizing the validity of results - particularly in techniques such as super-resolution microscopy, which demands extended illumination. Artificial intelligence (AI)-enabled software capable of denoising, image restoration, temporal interpolation or cross-modal style transfer has great potential to rescue live imaging data and limit photodamage. Yet we believe the focus should be on maintaining light-induced damage at levels that preserve natural cell behaviour. In this Opinion piece, we argue that a shift in role for AIs is needed - AI should be used to extract rich insights from gentle imaging rather than recover compromised data from harsh illumination. Although AI can enhance imaging, our ultimate goal should be to uncover biological truths, not just retrieve data. It is essential to prioritize minimizing photodamage over merely pushing technical limits. Our approach is aimed towards gentle acquisition and observation of undisturbed living systems, aligning with the essence of live-cell fluorescence microscopy.
Assuntos
Inteligência Artificial , Software , Microscopia de FluorescênciaRESUMO
Advances in imaging, segmentation and tracking have led to the routine generation of large and complex microscopy datasets. New tools are required to process this 'phenomics' type data. Here, we present 'Cell PLasticity Analysis Tool' (cellPLATO), a Python-based analysis software designed for measurement and classification of cell behaviours based on clustering features of cell morphology and motility. Used after segmentation and tracking, the tool extracts features from each cell per timepoint, using them to segregate cells into dimensionally reduced behavioural subtypes. Resultant cell tracks describe a 'behavioural ID' at each timepoint, and similarity analysis allows the grouping of behavioural sequences into discrete trajectories with assigned IDs. Here, we use cellPLATO to investigate the role of IL-15 in modulating human natural killer (NK) cell migration on ICAM-1 or VCAM-1. We find eight behavioural subsets of NK cells based on their shape and migration dynamics between single timepoints, and four trajectories based on sequences of these behaviours over time. Therefore, by using cellPLATO, we show that IL-15 increases plasticity between cell migration behaviours and that different integrin ligands induce different forms of NK cell migration.
Assuntos
Movimento Celular , Interleucina-15 , Células Matadoras Naturais , Humanos , Células Matadoras Naturais/citologia , Células Matadoras Naturais/metabolismo , Células Matadoras Naturais/imunologia , Interleucina-15/metabolismo , Software , Molécula 1 de Adesão Intercelular/metabolismo , Molécula 1 de Adesão de Célula Vascular/metabolismoRESUMO
Across many scientific disciplines, the development of computational models and algorithms for generating artificial or synthetic data is gaining momentum. In biology, there is a great opportunity to explore this further as more and more big data at multi-omics level are generated recently. In this opinion, we discuss the latest trends in biological applications based on process-driven and data-driven aspects. Moving ahead, we believe these methodologies can help shape novel multi-omics-scale cellular inferences.
Assuntos
Algoritmos , Biologia Computacional , Biologia Computacional/métodos , Genômica/métodos , Humanos , Big Data , Proteômica/métodos , MultiômicaRESUMO
Rescoring of peptide spectrum matches originating from database search engines enabled by peptide property predictors is exceeding the performance of peptide identification from traditional database search engines. In contrast to the peptide spectrum match scores calculated by traditional database search engines, rescoring peptide spectrum matches generates scores based on comparing observed and predicted peptide properties, such as fragment ion intensities and retention times. These newly generated scores enable a more efficient discrimination between correct and incorrect peptide spectrum matches. This approach was shown to lead to substantial improvements in the number of confidently identified peptides, facilitating the analysis of challenging datasets in various fields such as immunopeptidomics, metaproteomics, proteogenomics, and single-cell proteomics. In this review, we summarize the key elements leading up to the recent introduction of multiple data-driven rescoring pipelines. We provide an overview of relevant post-processing rescoring tools, introduce prominent data-driven rescoring pipelines for various applications, and highlight limitations, opportunities, and future perspectives of this approach and its impact on mass spectrometry-based proteomics.
Assuntos
Peptídeos , Proteômica , Proteômica/métodos , Peptídeos/metabolismo , Peptídeos/química , Humanos , Bases de Dados de Proteínas , Espectrometria de Massas/métodos , Ferramenta de BuscaRESUMO
The interplay between bacterial chromosome organization and functions such as transcription and replication can be studied in increasing detail using novel experimental techniques. Interpreting the resulting quantitative data, however, can be theoretically challenging. In this minireview, we discuss how connecting experimental observations to biophysical theory and modeling can give rise to new insights on bacterial chromosome organization. We consider three flavors of models of increasing complexity: simple polymer models that explore how physical constraints, such as confinement or plectoneme branching, can affect bacterial chromosome organization; bottom-up mechanistic models that connect these constraints to their underlying causes, for instance, chromosome compaction to macromolecular crowding, or supercoiling to transcription; and finally, data-driven methods for inferring interpretable and quantitative models directly from complex experimental data. Using recent examples, we discuss how biophysical models can both deepen our understanding of how bacterial chromosomes are structured and give rise to novel predictions about bacterial chromosome organization.
RESUMO
Understanding complex living systems, which are fundamentally constrained by physical phenomena, requires combining experimental data with theoretical physical and mathematical models. To develop such models, collaborations between experimental cell biologists and theoreticians are increasingly important but these two groups often face challenges achieving mutual understanding. To help navigate these challenges, this Perspective discusses different modelling approaches, including bottom-up hypothesis-driven and top-down data-driven models, and highlights their strengths and applications. Using cell mechanics as an example, we explore the integration of specific physical models with experimental data from the molecular, cellular and tissue level up to multiscale input. We also emphasize the importance of constraining model complexity and outline strategies for crosstalk between experimental design and model development. Furthermore, we highlight how physical models can provide conceptual insights and produce unifying and generalizable frameworks for biological phenomena. Overall, this Perspective aims to promote fruitful collaborations that advance our understanding of complex biological systems.
Assuntos
Modelos Biológicos , Modelos TeóricosRESUMO
The dynamical properties of many complex physical and biological systems can be quantified from the energy landscape theory. Previous approaches focused on estimating the transition rate from landscape reconstruction based on data. However, for general non-equilibrium systems (such as gene regulatory systems), both the energy landscape and the probability flux are important to determine the transition rate between attractors. In this work, we proposed a data-driven approach to estimate non-equilibrium transition rate, which combines the kernel density estimation and non-equilibrium transition rate theory. Our approach shows superior performance in estimating transition rate from data, compared with previous methods, due to the introduction of a nonparametric density estimation method and the new saddle point by considering the effects of flux. We demonstrate the practical validity of our approach by applying it to a simplified cell fate decision model and a high-dimensional stem cell differentiation model. Our approach can be applied to other biological and physical systems.
Assuntos
Termodinâmica , Diferenciação Celular , Expressão GênicaRESUMO
BACKGROUND: Portal vein thrombosis (PVT) is a significant issue in cirrhotic patients, necessitating early detection. This study aims to develop a data-driven predictive model for PVT diagnosis in chronic hepatitis liver cirrhosis patients. METHODS: We employed data from a total of 816 chronic cirrhosis patients with PVT, divided into the Lanzhou cohort (n = 468) for training and the Jilin cohort (n = 348) for validation. This dataset encompassed a wide range of variables, including general characteristics, blood parameters, ultrasonography findings and cirrhosis grading. To build our predictive model, we employed a sophisticated stacking approach, which included Support Vector Machine (SVM), Naïve Bayes and Quadratic Discriminant Analysis (QDA). RESULTS: In the Lanzhou cohort, SVM and Naïve Bayes classifiers effectively classified PVT cases from non-PVT cases, among the top features of which seven were shared: Portal Velocity (PV), Prothrombin Time (PT), Portal Vein Diameter (PVD), Prothrombin Time Activity (PTA), Activated Partial Thromboplastin Time (APTT), age and Child-Pugh score (CPS). The QDA model, trained based on the seven shared features on the Lanzhou cohort and validated on the Jilin cohort, demonstrated significant differentiation between PVT and non-PVT cases (AUROC = 0.73 and AUROC = 0.86, respectively). Subsequently, comparative analysis showed that our QDA model outperformed several other machine learning methods. CONCLUSION: Our study presents a comprehensive data-driven model for PVT diagnosis in cirrhotic patients, enhancing clinical decision-making. The SVM-Naïve Bayes-QDA model offers a precise approach to managing PVT in this population.
Assuntos
Veia Porta , Trombose Venosa , Humanos , Veia Porta/patologia , Fatores de Risco , Teorema de Bayes , Medicina de Precisão , Cirrose Hepática/complicações , Cirrose Hepática/diagnóstico , Fibrose , Trombose Venosa/complicações , Trombose Venosa/diagnósticoRESUMO
Neurodegenerative dementia syndromes, such as primary progressive aphasias (PPA), have traditionally been diagnosed based, in part, on verbal and non-verbal cognitive profiles. Debate continues about whether PPA is best divided into three variants and regarding the most distinctive linguistic features for classifying PPA variants. In this cross-sectional study, we initially harnessed the capabilities of artificial intelligence and natural language processing to perform unsupervised classification of short, connected speech samples from 78 pateints with PPA. We then used natural language processing to identify linguistic features that best dissociate the three PPA variants. Large language models discerned three distinct PPA clusters, with 88.5% agreement with independent clinical diagnoses. Patterns of cortical atrophy of three data-driven clusters corresponded to the localization in the clinical diagnostic criteria. In the subsequent supervised classification, 17 distinctive features emerged, including the observation that separating verbs into high- and low-frequency types significantly improved classification accuracy. Using these linguistic features derived from the analysis of short, connected speech samples, we developed a classifier that achieved 97.9% accuracy in classifying the four groups (three PPA variants and healthy controls). The data-driven section of this study showcases the ability of large language models to find natural partitioning in the speech of patients with PPA consistent with conventional variants. In addition, the work identifies a robust set of language features indicative of each PPA variant, emphasizing the significance of dividing verbs into high- and low-frequency categories. Beyond improving diagnostic accuracy, these findings enhance our understanding of the neurobiology of language processing.
Assuntos
Afasia Primária Progressiva , Inteligência Artificial , Fala , Humanos , Afasia Primária Progressiva/diagnóstico , Afasia Primária Progressiva/classificação , Masculino , Idoso , Feminino , Pessoa de Meia-Idade , Fala/fisiologia , Estudos Transversais , Atrofia/patologia , Processamento de Linguagem NaturalRESUMO
SignificanceAn invisibility cloak to conceal objects from an outside observer has long been a subject of interest in metamaterial design. While cloaks have been manufactured for optical, thermal, and electric fields, limited progress has been made for mechanical cloaks. Most existing designs rely on mapping-based methods, which have so far been limited to special base cells and a narrow selection of voids with simple shapes. In this study, we develop a fundamentally different approach by exploiting data-driven designs to offer timely, customized solutions to mechanical cloaking that were previously difficult to obtain. Through simulations and experimental validations, we show that excellent cloaking performance can be achieved for various boundary conditions, shapes of voids, base cells, and even multiple voids.
RESUMO
The emergence of new variants of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a major concern given their potential impact on the transmissibility and pathogenicity of the virus as well as the efficacy of therapeutic interventions. Here, we predict the mutability of all positions in SARS-CoV-2 protein domains to forecast the appearance of unseen variants. Using sequence data from other coronaviruses, preexisting to SARS-CoV-2, we build statistical models that not only capture amino acid conservation but also more complex patterns resulting from epistasis. We show that these models are notably superior to conservation profiles in estimating the already observable SARS-CoV-2 variability. In the receptor binding domain of the spike protein, we observe that the predicted mutability correlates well with experimental measures of protein stability and that both are reliable mutability predictors (receiver operating characteristic areas under the curve â¼0.8). Most interestingly, we observe an increasing agreement between our model and the observed variability as more data become available over time, proving the anticipatory capacity of our model. When combined with data concerning the immune response, our approach identifies positions where current variants of concern are highly overrepresented. These results could assist studies on viral evolution and future viral outbreaks and, in particular, guide the exploration and anticipation of potentially harmful future SARS-CoV-2 variants.
Assuntos
COVID-19/virologia , Epistasia Genética , Epitopos , Mutação , SARS-CoV-2/genética , Glicoproteína da Espícula de Coronavírus/química , Glicoproteína da Espícula de Coronavírus/genética , Proteínas Virais/química , Algoritmos , Área Sob a Curva , Biologia Computacional/métodos , Análise Mutacional de DNA , Bases de Dados de Proteínas , Aprendizado Profundo , Epitopos/química , Genoma Viral , Humanos , Modelos Estatísticos , Mutagênese , Probabilidade , Domínios Proteicos , Curva ROCRESUMO
SignificanceScience-based data-driven methods that can describe the rheological behavior of complex fluids can be transformative across many disciplines. Digital rheometer twins, which are developed here, can significantly reduce the cost, time, and energy required to characterize complex fluids and predict their future behavior. This is made possible by combining two different methods of informing neural networks with the rheological underpinnings of a system, resulting in quantitative recovery of a gel's response to different flow protocols. The platform developed here is general enough that it can be extended to areas well beyond complex fluids modeling.
RESUMO
SignificanceThe analysis of complex systems with many degrees of freedom generally involves the definition of low-dimensional collective variables more amenable to physical understanding. Their dynamics can be modeled by generalized Langevin equations, whose coefficients have to be estimated from simulations of the initial high-dimensional system. These equations feature a memory kernel describing the mutual influence of the low-dimensional variables and their environment. We introduce and implement an approach where the generalized Langevin equation is designed to maximize the statistical likelihood of the observed data. This provides an efficient way to generate reduced models to study dynamical properties of complex processes such as chemical reactions in solution, conformational changes in biomolecules, or phase transitions in condensed matter systems.
Assuntos
Simulação de Dinâmica Molecular , Funções VerossimilhançaRESUMO
Porous membranes, either polymeric or two-dimensional materials, have been extensively studied because of their outstanding performance in many applications such as water filtration. Recently, inspired by the significant success of machine learning (ML) in many areas of scientific discovery, researchers have started to tackle the problem in the field of membrane design using data-driven ML tools. In this Mini Review, we summarize research efforts on three types of applications of machine learning in membrane design, including (1) membrane property prediction using ML, (2) gaining physical insight and drawing quantitative relationships between membrane properties and performance using explainable artificial intelligence, and (3) ML-guided design, optimization, or virtual screening of membranes. On top of the review of previous research, we discuss the challenges associated with applying ML for membrane design and potential future directions.
RESUMO
Controlling the magnetic state of two-dimensional (2D) materials is crucial for spintronics. By employing data-mining and autonomous density functional theory calculations, we demonstrate the switching of magnetic properties of 2D non-van der Waals materials upon hydrogen passivation. The magnetic configurations are tuned to states with flipped and enhanced moments. For 2D CdTiO3âa diamagnetic compound in the pristine caseâwe observe an onset of ferromagnetism upon hydrogenation. Further investigation of the magnetization density of the pristine and passivated systems provides a detailed analysis of modified local spin symmetries and the emergence of ferromagnetism. Our results indicate that selective surface passivation is a powerful tool for tailoring magnetic properties of nanomaterials, such as non-vdW 2D compounds.